./www/htmlcxx, Simple non-validating CSS1 and HTML parser for C++

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 0.85, Package name: htmlcxx-0.85, Maintainer: pkgsrc-users

htmlcxx is a simple non-validating CSS1 and HTML parser for C++.
Although there are several other HTML parsers available, htmlcxx
has some characteristics that make it unique:

* STL like navigation of DOM tree, using the excellent tree.hh library
from Kasper Peeters
* It is possible to reproduce exactly, character by character, the
original document from the parse tree
* Bundled css parser
* Optional parsing of attributes
* C++ code that looks like C++ (not so true anymore)
* Offsets of tags/elements in the original document are stored in
the nodes of the DOM tree

The parsing politics of htmlcxx were created trying to mimic Mozilla
Firefox behavior. So you should expect parse trees similar to those
create by Firefox. However, differently from Firefox, htmlcxx does
not insert non-existent stuff in your html. Therefore, serializing
the DOM tree gives exactly the same bytes contained in the original
HTML document.


Required to build:
[pkgtools/cwrappers]

Master sites:

SHA1: e56fef830db51041fd297d269d24379b2dccb928
RMD160: d357b4c29127aae7f06da666c004c9db26ef29a4
Filesize: 404.906 KB

Version history: (Expand)


CVS history: (Expand)


   2017-05-12 13:35:49 by Jonathan Perkin | Files touched by this commit (1)
Log message:
Requires libiconv.
   2015-11-04 03:47:43 by Alistair G. Crooks | Files touched by this commit (758)
Log message:
Add SHA512 digests for distfiles for www category

Problems found locating distfiles:
	Package haskell-cgi: missing distfile haskell-cgi-20001206.tar.gz
	Package nginx: missing distfile array-var-nginx-module-0.04.tar.gz
	Package nginx: missing distfile encrypted-session-nginx-module-0.04.tar.gz
	Package nginx: missing distfile headers-more-nginx-module-0.261.tar.gz
	Package nginx: missing distfile nginx_http_push_module-0.692.tar.gz
	Package nginx: missing distfile set-misc-nginx-module-0.29.tar.gz
	Package nginx-devel: missing distfile echo-nginx-module-0.58.tar.gz
	Package nginx-devel: missing distfile form-input-nginx-module-0.11.tar.gz
	Package nginx-devel: missing distfile lua-nginx-module-0.9.16.tar.gz
	Package nginx-devel: missing distfile nginx_http_push_module-0.692.tar.gz
	Package nginx-devel: missing distfile set-misc-nginx-module-0.29.tar.gz
	Package php-owncloud: missing distfile owncloud-8.2.0.tar.bz2

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.
   2014-02-16 23:58:51 by Thomas Klausner | Files touched by this commit (7)
Log message:
Import htmlcxx-0.85 as www/htmlcxx.

htmlcxx is a simple non-validating CSS1 and HTML parser for C++.
Although there are several other HTML parsers available, htmlcxx
has some characteristics that make it unique:

* STL like navigation of DOM tree, using the excellent tree.hh library
  from Kasper Peeters
* It is possible to reproduce exactly, character by character, the
  original document from the parse tree
* Bundled css parser
* Optional parsing of attributes
* C++ code that looks like C++ (not so true anymore)
* Offsets of tags/elements in the original document are stored in
  the nodes of the DOM tree

The parsing politics of htmlcxx were created trying to mimic Mozilla
Firefox behavior. So you should expect parse trees similar to those
create by Firefox. However, differently from Firefox, htmlcxx does
not insert non-existent stuff in your html. Therefore, serializing
the DOM tree gives exactly the same bytes contained in the original
HTML document.