Update to 0.999999999
* Use upstream filename as DISTNAME
* The latest version for Chromium build
Released on July 15, 2016
Fix attribute order going to the tree builder to be document order instead \
of reverse document order(!).
Released on July 14, 2016
Added ordereddict as a mandatory dependency on Python 2.6.
Added lxml, genshi, datrie, charade, and all extras that will do the right \
thing based on the specific interpreter implementation.
Now requires the mock package for the testsuite.
Cease supporting DATrie under PyPy.
Remove ``PullDOM`` support, as this hasn't ever been properly tested, \
doesn't entirely work, and as far as I can tell is completely unused by anyone.
Move testsuite to py.test.
Fix #124: move to webencodings for decoding the input byte stream; this \
makes html5lib compliant with the Encoding Standard, and introduces a required \
dependency on webencodings.
Cease supporting Python 3.2 (in both CPython and PyPy forms).
Fix comments containing double-dash with lxml 3.5 and above.
Use scripting disabled by default (as we don't implement scripting).
Fix #11, avoiding the XSS bug potentially caused by serializer allowing \
attribute values to be escaped out of in old browser versions, changing the \
quote_attr_values option on serializer to take one of three values, \
"always" (the old True value), "legacy" (the new option, and \
the new default), and "spec" (the old False value, and the old \
Fix #72 by rewriting the sanitizer to apply only to treewalkers (instead of \
the tokenizer); as such, this will require amending all callers of it to use it \
via the treewalker API.
Drop support of charade, now that chardet is supported once more.
Replace the charset keyword argument on parse and related methods with a set \
of keyword arguments: override_encoding, transport_encoding, \
same_origin_parent_encoding, likely_encoding, and default_encoding.
Move filters._base, treebuilder._base, and treewalkers._base to .base to \
clarify their status as public.
Get rid of the sanitizer package. Merge sanitizer.sanitize into the \
sanitizer.htmlsanitizer module and move that to saniziter. This means anyone who \
used sanitizer.sanitize or sanitizer.HTMLSanitizer needs no code changes.
Rename treewalkers.lxmletree to .etree_lxml and treewalkers.genshistream to \
.genshi to have a consistent API.
Move a whole load of stuff (inputstream, ihatexml, trie, tokenizer, utils) \
to be underscore prefixed to clarify their status as private.
Released on September 10, 2015
Fix #195: fix the sanitizer to drop broken URLs (it threw an exception \
between 0.9999 and 0.999999).
Released on July 7, 2015
Fix #189: fix the sanitizer to allow relative URLs again (as it did prior to \
Released on April 30, 2015
Fix #188: fix the sanitizer to not throw an exception when sanitizing bogus \
Released on April 29, 2015
Fix #153: Sanitizer fails to treat some attributes as URLs. Despite how this \
sounds, this has no known security implications. No known version of IE (5.5 to \
current), Firefox (3 to current), Safari (6 to current), Chrome (1 to current), \
or Opera (12 to current) will run any script provided in these attributes.
Pass error message to the ParseError exception in strict parsing mode.
Allow data URIs in the sanitizer, with a whitelist of content-types.
Add support for Python implementations that don't support lone surrogates \
(read: Jython). Fixes #2.
Remove localization of error messages. This functionality was totally unused \
(and untested that everything was localizable), so we may as well follow \
numerous browsers in not supporting translating technical strings.
Expose treewalkers.pprint as a public API.
Add a documentEncoding property to HTML5Parser, fix #121.
Update to 0.999:
Released on December 23, 2013
* Fix #127: add work-around for CPython issue #20007: .read(0) on
http.client.HTTPResponse drops the rest of the content.
* Fix #115: lxml treewalker can now deal with fragments containing, at
their root level, text nodes with non-ASCII characters on Python 2.
Released on September 10, 2013
* No library changes from 1.0b3; released as 0.99 as pip has changed
behaviour from 1.4 to avoid installing pre-release versions per
Released on July 24, 2013
* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
implementation using it should be moved to
``NonRecursiveTreeWalker``, as everything bundled with html5lib has
* Fix #67 so that ``BufferedStream`` to correctly returns a bytes
object, thereby fixing any case where html5lib is passed a
non-seekable RawIOBase-like object.
Released on June 27, 2013
* Removed reordering of attributes within the serializer. There is now
an ``alphabetical_attributes`` option which preserves the previous
behaviour through a new filter. This allows attribute order to be
preserved through html5lib if the tree builder preserves order.
* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
``treeadapters.sax.to_sax`` which is generic and supports any
treewalker; it also resolves all known bugs with ``dom2sax``.
* Fix treewalker assertions on hitting bytes strings on
Python 2. Previous to 1.0b1, treewalkers coped with mixed
bytes/unicode data on Python 2; this reintroduces this prior
behaviour on Python 2. Behaviour is unchanged on Python 3.
Released on May 17, 2013
* Implementation updated to implement the `HTML specification
<http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
* Python 3.2+ supported in a single codebase using the ``six`` library.
* Removed support for Python 2.5 and older.
* Removed the deprecated Beautiful Soup 3 treebuilder.
``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
since it doesn't support namespaces, foreign content like SVG and
MathML is parsed incorrectly.
* Removed ``simpletree`` from the package. The default tree builder is
now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
available, and ``xml.etree.ElementTree`` otherwise).
* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
output was well-formed XML, and hence provided little of use.
* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
return the default DOM treebuilder, which uses ``xml.dom.minidom``.
* Optional heuristic character encoding detection now based on
``charade`` for Python 2.6 - 3.3 compatibility.
* Optional ``Genshi`` treewalker support fixed.
* Many bugfixes, including:
* #33: null in attribute value breaks XML AttValue;
* #4: nested, indirect descendant, <button> causes infinite loop;
* `Google Code 215
detect seekable streams;
* `Google Code 206
support for <video preload=...>, <audio preload=...>;
* `Google Code 205
support for <video poster=...>;
* `Google Code 202
file breaks InputStream.
* Source code is now mostly PEP 8 compliant.
* Test harness has been improved and now depends on ``nose``.
* Documentation updated and moved to http://html5lib.readthedocs.org/.