./textproc/cmark, CommonMark parsing and rendering library and program in C

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 0.30.2nb1, Package name: cmark-0.30.2nb1, Maintainer: pkgsrc-users

cmark is the C reference implementation of CommonMark, a rationalized version
of Markdown syntax with a spec.

It provides a shared library (libcmark) with functions for parsing CommonMark
documents to an abstract syntax tree (AST), manipulating the AST, and rendering
the document to HTML, groff man, LaTeX, CommonMark, or an XML representation of
the AST. It also provides a command-line program (cmark) for parsing and
rendering CommonMark documents.


Required to build:
[pkgtools/cwrappers] [lang/python37]

Master sites:

Filesize: 240.267 KB

Version history: (Expand)


CVS history: (Expand)


   2021-11-29 11:44:16 by Dan Cirnat | Files touched by this commit (3)
Log message:
cmark: Fix building dependencies with strict function prototypes
   2021-10-26 13:23:42 by Nia Alarie | Files touched by this commit (1161)
Log message:
textproc: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):
./textproc/convertlit/distinfo clit18src.zip
   2021-10-09 21:20:08 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
cmark: updated to 0.30.2

0.30.2

* Fix parsing of emphasis before links.
  Fixes a regression introduced with commit ed0a4bf.

* Update to Unicode 14.0 (data-man).

* Add `~` to safe href character set.

* Update CMakeLists.txt.  Bump the minimum required
  CMake to 3.7. Imperatively define output name for static library.

* Fix install paths in libcmark.pc.
  `CMAKE_INSTALL_<dir>` can be relative or absolute path, so it is wrong to
  prefix CMAKE_INSTALL_PREFIX because if CMAKE_INSTALL_<dir> is set to an
  absolute path it will result in a malformed path with two absolute paths
  joined together.  Instead, use `CMAKE_INSTALL_FULL_<dir>` from
  GNUInstallDirs.
   2021-10-07 17:02:49 by Nia Alarie | Files touched by this commit (1162)
Log message:
textproc: Remove SHA1 hashes for distfiles
   2021-07-17 18:29:31 by Adam Ciarcinski | Files touched by this commit (4) | Package updated
Log message:
cmark: updated to 0.30.1

[0.30.1]

  * Properly indent block-level contents of list items in man.
    This handles nested lists as well as items with multiple paragraphs.
    The change requires addition of a new field block_number_in_list_item
    to cmark_renderer, but this does not change the public API.
  * Fix quadratic behavior when parsing emphasis (Nick
    Wellnhofer).  Delimiters can be deleted, so store delimiter positions
    instead of pointers in `openers_bottom`. Besides causing undefined
    behavior when reading a dangling pointer, this could also result
    in quadratic behavior when parsing emphasis.
  * Fix quadratic behavior when parsing smart quotes (Nick Wellnhofer).
    Remove matching smart quote delimiters.  Otherwise, the same opener
    could be found over and over, preventing the `openers_bottom`
    optimization from kicking in and leading to quadratic behavior when
    processing lots of quotes.
  * Modify CMake configuration so that the project can be built with
    older versions of CMake (Saleem Abdulrasool).  (In 0.30.0,
    some features were used that require CMake >= 3.3.) The cost of this
    backwards compatibility is that developers must now explicitly invoke
    `cmark_add_compile_options` when a new compilation target is added.
  * Remove a comma at the end of an enumerator list, which was flagged
    by clang as a C++11 extension.
  * make_man_page.py: use absolute path with CDLL. This avoids the error
    "file system relative paths not allowed in hardened programs."
  * Include cmark version in cmark(3) man page (instead of LOCAL).

[0.30.0]

  * Use official 0.30 spec.txt.
  * Add `cmark_get_default_mem_allocator()`.  API change: this
    adds a new exported function in cmark.h.
  * An optimization we used for emphasis parsing was
    too aggressive, causing us to miss some emphasis that was legal
    according to the spec.  We fix this by indexing the `openers_bottom`
    table not just by the type of delimiter and the length of the
    closing delimiter mod 3, but by whether the closing delimiter
    can also be an opener.  (The algorithm for determining emphasis
    matching depends on all these factors.)  Add regression test.
  * Fix quadratic behavior with inline HTML (Nick Wellnhofer).
    Repeated starting sequences like `<?`, `<!DECL ` or `<![CDATA[` could
    lead to quadratic behavior if no matching ending sequence was found.
    Separate the inline HTML scanners. Remember if scanning the whole input
    for a specific ending sequence failed and skip subsequent scans.
  * Speed up hierarchy check in tree manipulation API (Nick Wellnhofer).
    Skip hierarchy check in the common case that the inserted child has
    no children.
  * Fix quadratic behavior when parsing inlines (Nick Wellnhofer).
    The inline parsing code would call `cmark_node_append_child` to append
    nodes. This public function has a sanity check which is linear in the
    depth of the tree. Repeated calls could show quadratic behavior in
    degenerate trees. Use a special function to append nodes without this
    check.  (Issue found by OSS-Fuzz.)
  * Replace invalid characters in XML output (Nick wellnhofer).
    Control characters, U+FFFE and U+FFFF aren't allowed in XML 1.0, so
    replace them with U+FFFD (replacement character). This doesn't solve
    the problem how to roundtrip these characters, but at least we don't
    produce invalid XML.
  * Avoid quadratic output growth with reference links (Nick Wellnhofer).
    Keep track of the number bytes added through expansion of reference
    links and limit the total to the size of the input document. Always
    allow a minimum of 100KB.  Unfortunately, cmark has no error handling,
    so all we can do is to stop expanding reference links without returning
    an error. This should never be an issue in practice though. The 100KB
    minimum alone should cover all real-world cases.
  * Fix issue with type-7 HTML blocks interrupting paragraphs
    (see commonmark/commonmark.js).
  * Treat `textarea` like `script`, `style`, `pre` (type 1 HTML block),
    in accordance with spec change.
  * Define whitespace per spec (Asherah Conor).
  * Add `MAX_INDENT` for xml.  Otherwise we can get quadratic
    increase in size with deeply nested structures.
  * Fix handling of empty strings when creating XML/HTML output
    (Steffen Kieß).
  * Commonmark renderer: always use fences for code.
    This solves problems with adjacent code blocks being merged.
  * Improve rendering of commonmark code spans with spaces.
  * Cleaner approach to max digits for numeric entities.
    This modifies unescaping in `houdini_html_u.c` rather than
    the entity handling in `inlines.c`.  Unlike the other,
    this approach works also in e.g. link titles.
  * Fix entity parser (and api test) to respect length limit on
    numeric entities.
  * Don't allow link destinations with unbalanced unescaped parentheses.
    See commonmark/commonmark.js.
  * `print_usage()`: Minor grammar fix, swap two words (Øyvind A. Holm).
  * Don't call `memcpy` with `NULL` as first parameter.
    This is illegal according to the C standard, sec. 7.1.4.
    See <https://www.imperialviolet.org/2016/06/26/nonnull.html>.
  * Add needed include in `blocks.c`.
  * Fix unnecessary variable assignment.
  * Skip UTF-8 BOM if present at beginning of buffer.
  * Fix URL check in `is_autolink` (Nick Wellnhofer).  In a recent commit,
    the check was changed to `strcmp`, but we really have to use `strncmp`.
  * Fix null pointer deref in `is_autolink` (Nick Wellnhofer).
    Introduced by a recent commit. Found by OSS-Fuzz.
  * Rearrange struct cmark_node (Nick Wellnhofer).  Introduce multi-purpose
    data/len members in struct cmark_node. This is mainly used to store
    literal text for inlines, code and HTML blocks.
    Move the content strbuf for blocks from `cmark_node` to `cmark_parser`.
    When finalizing nodes that allow inlines (paragraphs and headings),
    detach the strbuf and store the block content in the node's data/len
    members. Free the block content after processing inlines.
    Reduces size of struct `cmark_node` by 8 bytes.
  * Improve packing of `struct cmark_list` (Nick Wellnhofer).
  * Use C string instead of chunk in a number of contexts (Nick Wellnhofer).
    The node struct never references memory of other nodes now.
    Node accessors don't have to check for delayed creation of C strings,
    so parsing and iterating all literals using the public API should
    actually be faster than before.  These changes also reduce the size
    of `struct cmark_node`.
  * Add casts for MSVC10 (from kivikakk in cmark-cfm).
  * commonmark renderer:  better escaping in smart mode.  When
    `CMARK_OPT_SMART` is enabled, we escape literal `-`, `.`, and quote
    characters when needed to avoid their being "smartified."
  * Add options field to `cmark_renderer`.
  * commonmark.c - use `size_t` instead of `int`.
  * Include `string.h` in `cmark-fuzz.c`.
  * Fix (hash collisions for references) (Vicent Marti via cmark-gfm).
    Reimplemented reference storage as follows:
    1. New references are always inserted at the end of a linked list. This
    is an O(1) operation, and does not check whether an existing (duplicate)
    reference with the same label already exists in the document.
    2. Upon the first call to `cmark_reference_lookup` (when it is expected
    that no further references will be added to the reference map), the
    linked list of references is written into a fixed-size array.
    3. The fixed size array can then be efficiently sorted in-place in O(n
    log n). This operation only happens once. We perform this sort in a
    _stable_ manner to ensure that the earliest link reference in the
    document always has preference, as the spec dictates. To accomplish
    this, every reference is tagged with a generation number when initially
    inserted in the linked list.
    4. The sorted array is then compacted in O(n). Since it was sorted in a
    stable way, the first reference for each label is preserved and the
    duplicates are removed, matching the spec.
    5. We can now simply perform a binary search for the current
    `cmark_reference_lookup` query in O(log n). Any further lookup calls
    will also be O(log n), since the sorted references table only needs to
    be generated once.
    The resulting implementation is notably simple (as it uses standard
    library builtins `qsort` and `bsearch`), whilst performing better than
    the fixed size hash table in documents that have a high number of
    references and never becoming pathological regardless of the input.
  * Comment out unused function `cmark_strbuf_cstr` in `buffer.h`.
  * Re-add `--safe` command-line option as a no-op, for backwards
    compatibility.
  * Update to Unicode 13.0
  * Generate and install cmake-config file (Reinhold Gschweicher).
    Add full cmake support. The project can either be used with
    `add_subdirectory` or be installed into the system (or some other
    directory) and be found with `find_package(cmark)`. In both cases the
    cmake target `cmark::cmark` and/or `cmark::cmark_static` is all that
    is needed to be linked.  Previously the `cmarkConfig.cmake` file
    was generated, but not installed.  As additional bonus of generation
    by cmake we get a generated `cmake-config-version.cmake` file for
    `find_package()` to search for the same major version.
    The generated config file is position independent, allowing the
    installed directory to be copied or moved and still work.
    The following four files are generated and installed:
    `lib/cmake/cmark/cmark-config.cmake`,
    `lib/cmake/cmark/cmark-config-version.cmake`,
    `lib/cmake/cmark/cmark-targets.cmake`,
    `lib/cmake/cmark/cmark-targets-release.cmake`.
  * Adjust the MinGW paths for MinGW64 (Daniil Baturin).
  * Fix CMake generator expression checking for MSVC (Nick Wellnhofer).
  * Fix `-Wconst-qual` warning (Saleem Abdulrasool).  This enables building
    with `/Zc:strictString` with MSVC as well.
  * Improve and modernize cmake build (Saleem Abdulrasool).
    + Build: add exports targets for build tree usage.
    + Uuse target properties for include paths.
    + Remove the unnecessary execute permission on CMakeLists.txt.
    + Reduce property computation in CMake.
    + Use `CMAKE_INCLUDE_CURRENT_DIRECTORY`.
    + Improve man page installation.
    + Only include `GNUInstallDirs` once.
    + Replace `add_compile_definitions` with `add_compile_options`
      since the former was introduced in 3.12.
    + Cleanup CMake.
    + Inline a variable.
    + Use `LINKER_LANGUAGE` property for C++ runtime.
    + Use CMake to control C standard.
    + Use the correct variable.
    + Loosen the compiler check
    + Hoist shared flags to top-level CMakeLists
    + Remove duplicated flags.
    + Use `add_compile_options` rather than modify `CMAKE_C_FLAGS`.
    + Hoist sanitizer flags to global state.
    + Hoist `-fvisibilty` flags to top-level.
    + Hoist the debug flag handling.
    + Hoist the profile flag handling.
    + Remove incorrect variable handling.
    + Remove unused CMake includes.
  * Remove "-rdynamic" flag for static builds (Eric Pruitt).
  * Fixed installation on other than Ubuntu GNU/Linux distributions
    (Vitaly Zaitsev).
  * Link executable with static or shared library (Nick Wellnhofer).
    If `CMARK_STATIC` is on (default), link the executable with the static
    library. This produces exactly the same result as compiling the library
    sources again and linking with the object files.
    If `CMARK_STATIC` is off, link the executable with the shared library.
    This wasn't supported before and should be the preferred way to
    package cmark on Linux distros.
    Building only a shared library and a statically linked executable
    isn't supported anymore but this doesn't seem useful.
  * Reintroduce version check for MSVC /TP flag (Nick Wellnhofer).
    The flag is only required for old MSVC versions.
  * normalize.py: use `html.escape` instead of `cgi.escape`.
  * Fix pathological_tests.py on Windows (Nick Wellnhofer).
    When using multiprocessing on Windows, the main program must be
    guarded with a `__name__` check.
  * Remove useless `__name__` check in test scripts (Nick Wellnhofer).
  * Add CIFuzz (Leo Neat).
  * cmark.1 - Document --unsafe instead of --safe.
  * cmark.1: remove docs for `--normalize` which no longer exists.
  * Add lint target to Makefile.
  * Add uninstall target to Makefile.
  * Update benchmarks.
  * Fix typo in documentation (Tim Gates).
  * Increase timeout for pathological tests to avoid CI failure.
  * Update the Racket wrapper with the safe -> unsafe flag change
   2020-04-11 12:55:05 by Adam Ciarcinski | Files touched by this commit (2)
Log message:
cmark: dynamically link the executable
   2019-04-25 09:33:32 by Maya Rashish | Files touched by this commit (620)
Log message:
PKGREVISION bump for anything using python without a PYPKGPREFIX.

This is a semi-manual PKGREVISION bump.
   2019-04-09 08:04:13 by Adam Ciarcinski | Files touched by this commit (2) | Package updated
Log message:
cmark: updated to 0.29.0

0.29.0:
Update spec to 0.29.
Make rendering safe by default. Adds CMARK_OPT_UNSAFE and make CMARK_OPT_SAFE a \ 
no-op (for API compatibility). The new default behavior is to suppress raw HTML \ 
and potentially dangerous links. The CMARK_OPT_UNSAFE option has to be set \ 
explicitly to prevent this. NOTE: This change will require modifications in \ 
bindings for cmark and in most libraries and programs that use cmark.
Add sourcepos info for inlines (Yuki Izumi).
Disallow more than 32 nested balanced parens in a link (Yuki Izumi).
Resolve link references before creating setext header. A setext header line \ 
after a link reference should not create a header, according to the spec.
commonmark renderer: improve escaping. URL-escape special characters when escape \ 
mode is URL, and not otherwise. Entity-escape control characters (< 0x20) in \ 
non-literal escape modes.
render: only emit actual newline when escape mode is LITERAL. For markdown \ 
content, e.g., in other contexts we want some kind of escaping, not a literal \ 
newline.
Update code span normalization to conform with spec change.
Allow empty <> link destination in reference link.
Remove leftover includes of memory.h.
A link destination can't start with < unless it is an angle-bracket link that \ 
also ends with >. (If your URL really starts with <, URL-escape it.)
Allow internal delimiter runs to match if both have lengths that are multiples of 3.
Include references.h in parser.h.
Fix [link](<foo\>).
Use hand-rolled scanner for thematic break. Keep track of the last position \ 
where a thematic break failed to match on a line, to avoid rescanning \ 
unnecessarily.
Rename ends_with_blank_line with S_ prefix.
Add CMARK_NODE__LAST_LINE_CHECKED flag. Use this to avoid unnecessary recursion \ 
in ends_with_blank_line.
In ends_with_blank_line, call S_set_last_line_blank to avoid unnecessary \ 
repetition. Once we settle whether a list item ends in a blank line, we don't \ 
need to revisit this in considering parent list items.
Disallow unescaped ( in parenthesized link title.
Copy line/col info straight from opener/closer (Ashe Connor). We can't rely on \ 
anything in subj since it's been modified while parsing the subject and could \ 
represent line info from a future line. This is simple and works.
render.c: reset last_breakable after cr.
Fix a typo in houdini_href_e.c (Felix Yan).
commonmark writer: use ~~~ fences if info string contains backtick. This is \ 
needed for round-trip tests.
Update scanners for new info string rules.
Add XSLT stylesheet to convert cmark XML back to Commonmark. Initial version of \ 
an XSLT stylesheet that converts the XML format produced by cmark -t xml back to \ 
Commonmark.
Check for whitespace before reference title.
Bump CMake to version 3 (Jonathan Müller).
Build: Remove deprecated call to add_compiler_export_flags() (Jonathan Müller). \ 
It is deprecated in CMake 3.0, the replacement is to set the \ 
CXX_VISIBILITY_PRESET (or in our case C_VISIBILITY_PRESET) and \ 
VISIBILITY_INLINES_HIDDEN properties of the target. We're already setting them \ 
by setting the CMake variables anyway, so the call can be removed.
Build: only attempt to install MSVC system libraries on Windows (Saleem \ 
Abdulrasool). Newer versions of CMake attempt to query the system for \ 
information about the VS 2017 installation. Unfortunately, this query fails on \ 
non-Windows systems when cross-compiling: cmake_host_system_information does not \ 
recognize <key> VS_15_DIR. CMake will not find these system libraries on \ 
non-Windows hosts anyways, and we were silencing the warnings, so simply omit \ 
the installation when cross-compiling to Windows.
Simplify code normalization, in line with spec change.
Implement code span spec changes. These affect both parsing and writing commonmark.
Add link parsing corner cases to regressions (Ashe Connor).
Add xml:space="preserve" in XML output when appropriate (Nguyễn \ 
Thái Ngọc Duy). (For text, code, code_block, html_inline and html_block \ 
tags.)
Removed meta from list of block tags. Added regression test.
entity_tests.py - omit noisy success output.
pathological_tests.py: make tests run faster. Commented out the (already \ 
ignored) "many references" test, which times out. Reduced the \ 
iterations for a couple other tests.
pathological_tests.py: added test for deeply nested lists.
Optimize S_find_first_nonspace. We were needlessly redoing things we'd already \ 
done. Now we skip the work if the first nonspace is greater than the current \ 
offset. This fixes pathological slowdown with deeply nested lists. For N = 3000, \ 
the time goes from over 17s to about 0.7s. Thanks to Martin Mitas for diagnosing \ 
the problem.
Allow spaces in link destination delimited with pointy brackets.
Adjust max length of decimal/numeric entities.
Fix inline raw HTML parsing. This fixes a recently added failing spec test case. \ 
Previously spaces were being allowed in unquoted attribute values; no we forbid \ 
them.
Don't allow list markers to be indented >= 4 spaces.
Check for empty buffer when rendering (Phil Turnbull). For empty documents, \ 
->size is zero so renderer.buffer->ptr[renderer.buffer->size - 1] will \ 
cause an out-of-bounds read. Empty buffers always point to the global \ 
cmark_strbuf__initbuf buffer so we read cmark_strbuf__initbuf[-1].
Also run API tests with CMARK_SHARED=OFF (Nick Wellnhofer).
Rename roundtrip and entity tests (Nick Wellnhofer). Rename the tests to reflect \ 
that they use the library, not the executable.
Generate export header for static-only build.
Fuzz width parameter too (Phil Turnbull). Allow the width parameter to be \ 
generated too so we get better fuzz-coverage.
Don't discard empty fuzz test-cases (Phil Turnbull). We currently discard fuzz \ 
test-cases that are empty but empty inputs are valid markdown. This improves the \ 
fuzzing coverage slightly.
Fixed exit code for pathological tests.
Add allowed failures to pathological_tests.py. This allows us to include tests \ 
that we don't yet know how to pass.
Add timeout to pathological_tests.py. Tests must complete in 8 seconds or are errors.
Add more pathological tests.
Use pledge(2) on OpenBSD (Ashe Connor).
Update the Racket wrapper (Eli Barzilay).
Makefile: For afl target, don't build tests.