./textproc/ruby-nokogiri, HTML, XML, SAX, and Reader parser with XPath and CSS selector support

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.12.4, Package name: ruby27-nokogiri-1.12.4, Maintainer: tsutsui

Nokogiri parses and searches XML/HTML very quickly, and also has correctly
implemented CSS3 selector support as well as XPath support.

Features:

* XPath support for document searching
* CSS3 selector support for document searching
* XML/HTML builder
* Drop in replacement for Hpricot (though not bug for bug)


Required to run:
[textproc/libxml2] [textproc/libxslt] [misc/ruby-mini_portile2] [lang/ruby26-base]

Required to build:
[devel/ruby-pkg-config] [pkgtools/cwrappers]

Master sites:

SHA1: 0eb79e072c21a25d0131e143b69a963944306438
RMD160: cc3550259ed66d0a2871622c102050eca6d497c2
Filesize: 9364 KB

Version history: (Expand)


CVS history: (Expand)


   2021-09-21 11:36:04 by Jonathan Perkin | Files touched by this commit (1)
Log message:
ruby-nokogiri: Work around ARFLAGS bug on SunOS.

This can be removed once bootstrap-mk-files is fixed after the freeze and when
it's likely that most users have switched to a newer bootstrap.
   2021-09-18 10:05:09 by Takahiro Kambe | Files touched by this commit (3)
Log message:
textproc/ruby-nokogiri: fix dependency

Allow depending to misc/ruby-mini_portile2 2.7.0 and later.

* Override gemspec.
* There is dependency to mini_portile2 in ext/nokogiri/extconf.rb.  :(

No PKGREVISION bump since this change fix broken package.
   2021-09-11 11:54:46 by Izumi Tsutsui | Files touched by this commit (2) | Package updated
Log message:
ruby-nokogiri: update to 1.12.4.

Upstream changes
https://github.com/sparklemotion/nokogi … ag/v1.12.4

1.12.4 / 2021-08-29

Notable fix: Namespace inheritance

Namespace behavior when reparenting nodes has historically been poorly
specified and the behavior diverged between CRuby and JRuby. As a result,
making this behavior consistent in v1.12.0 introduced a breaking change.

This patch release reverts the Builder behavior present in v1.12.0..v1.12.3 but
keeps the Document behavior. This release also introduces a Document attribute
to allow affected users to easily change this behavior for their legacy code
without invasive changes.

Compensating Feature in XML::Document

This release of Nokogiri introduces a new Document boolean attribute,
namespace_inheritance, which controls whether children should inherit a
namespace when they are reparented. Nokogiri::XML:Document defaults this
attribute to false meaning "do not inherit," thereby making explicit the
behavior change introduced in v1.12.0.

CRuby users who desire the pre-v1.12.0 behavior may set
document.namespace_inheritance = true before reparenting nodes.

See https://nokogiri.org/rdoc/Nokogiri/XML/Document.html#
namespace_inheritance-instance_method for example usage.

Fix for XML::Builder

However, recognizing that we want Builder-created children to inherit
namespaces, Builder now will set namespace_inheritance=true on the underlying
document for both JRuby and CRuby. This means that, on CRuby, the pre-v1.12.0
behavior is restored.

Users who want to turn this behavior off may pass a keyword argument to the
Builder constructor like so:

Nokogiri::XML::Builder.new(namespace_inheritance: false)

See https://nokogiri.org/rdoc/Nokogiri/XML/Builder.html#
label-Namespace+inheritance for example usage.

Downstream gem maintainers

Note that any downstream gems may want to specifically omit Nokogiri
v1.12.0--v1.12.3 from their dependency specification if they rely on child
namespace inheritance:

Gem::Specification.new do |gem|
  # ...
  gem.add_runtime_dependency 'nokogiri', '!=1.12.3', '!=1.12.2', '!=1.12.1', \ 
'!=1.12.0'
  # ...
end

Fixed

  * [JRuby] Fix NPE in Schema parsing when an imported resource doesn't have a
    systemId. [#2296] (Thanks, @pepijnve!)
   2021-04-21 13:43:04 by Adam Ciarcinski | Files touched by this commit (1822)
Log message:
revbump for textproc/icu
   2021-03-19 17:52:25 by Izumi Tsutsui | Files touched by this commit (3) | Package updated
Log message:
ruby-nokogiri: update to 1.11.2.

Upstream changelog (from CHANGELOG.md):

1.11.2 / 2021-03-11

Fixed

  * [CRuby] NodeSet may now safely contain Node objects from multiple
    documents. Previously the GC lifecycle of the parent Document objects could
    lead to nodes being GCed while still in scope. [#1952]
  * [CRuby] Patch libxml2 to avoid "huge input lookup" errors on large \ 
CDATA
    elements. (See upstream GNOME/libxml2#200 and GNOME/libxml2!100.) [#2132].
  * [CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and
    link against nokogiri.so by including LDFLAGS in Nokogiri::VERSION_INFO. [#
    2167]
  * [CRuby] {XML,HTML}::Document.parse now invokes #initialize exactly once.
    Previously #initialize was invoked twice on each object.
  * [JRuby] {XML,HTML}::Document.parse now invokes #initialize exactly once.
    Previously #initialize was not called, which was a problem for subclassing
    such as done by Loofah.

Improved

  * Reduce the number of object allocations needed when parsing an
    HTML::DocumentFragment. [#2087] (Thanks, @ashmaroli!)
  * [JRuby] Update the algorithm used to calculate Node#line to be wrong
    less-often. The underlying parser, Xerces, does not track line numbers, and
    so we've always used a hacky solution for this method. [#1223, #2177]
  * Introduce --enable-system-libraries and --disable-system-libraries flags to
    extconf.rb. These flags provide the same functionality as
    --use-system-libraries and the NOKOGIRI_USE_SYSTEM_LIBRARIES environment
    variable, but are more idiomatic. [#2193] (Thanks, @eregon!)
  * [TruffleRuby] --disable-static is now the default on TruffleRuby when the
    packaged libraries are used. This is more flexible and compiles faster.
    (Note, though, that the default on TR is still to use system libraries.) [#
    2191, #2193] (Thanks, @eregon!)

Changed

  * Nokogiri::XML::Path is now a Module (previously it has been a Class). It
    has been acting solely as a Module since v1.0.0. See 8461c74.
   2021-01-08 18:09:42 by Izumi Tsutsui | Files touched by this commit (3) | Package updated
Log message:
ruby-nokogiri: update to 1.11.1.

Upstream changelog (from CHANGELOG.md):

v1.11.1 / 2021-01-06

 Fixed

  * [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers
    no longer call libxml-ruby's handlers. Instead, they defensively override
    the libxml2 global handler before parsing. [#2168]

v1.11.0 / 2021-01-03

 Notes

 Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine
architecture. On supported platforms, this removes the need for compiling the C
extension and the packaged libraries. This results in much faster installation
and more reliable installation, which as you probably know are the biggest
headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0 we
are also shipping native gems for these platforms:

  * Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
  * OSX/Darwin: x86_64-darwin and arm64-darwin

We'd appreciate your thoughts and feedback on this work at #2075.

 Dependencies

 Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled native
gems.

This release ends support for:

  * Ruby 2.3, for which official support ended on 2019-03-31 [#1886] (Thanks
    @ashmaroli!)
  * Ruby 2.4, for which official support ended on 2020-04-05
  * JRuby 9.1, which is the Ruby 2.3-compatible release.

 Gems

  * Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
  * [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005]
    (Thanks, @alejandroperea!)

 Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled
"XML::Schema parsing treats input as untrusted by default".

 Added

  * Add Node methods for manipulating "keyword attributes" (for \ 
example, class
    and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove.
    [#2000]
  * Add support for CSS queries a:has(> b), a:has(~ b), and a:has(+ b). [#688]
    (Thanks, @jonathanhefner!)
  * Add Node#value? to better match expected semantics of a Hash-like object.
    [#1838, #1840] (Thanks, @MatzFan!)
  * [CRuby] Add Nokogiri::XML::Node#line= for use by downstream libs like
    nokogumbo. [#1918] (Thanks, @stevecheckoway!)
  * nokogiri.gemspec is back after a 10-year hiatus. We still prefer you use
    the official releases, but master is pretty stable these days, and YOLO.

 Performance

  * [CRuby] The CSS ~= operator and class selector . are about 2x faster.
    [#2137, #2135]
  * [CRuby] Patch libxml2 to call strlen from xmlStrlen rather than the naive
    implementation, because strlen is generally optimized for the architecture.
    [#2144] (Thanks, @ilyazub!)
  * Improve performance of some namespace operations. [#1916] (Thanks,
    @ashmaroli!)
  * Remove unnecessary array allocations from Node serialization methods
    [#1911] (Thanks, @ashmaroli!)
  * Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks,
    @ashmaroli!)
  * Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks,
    @ilyazub!)
  * [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks,
    @kares!)
  * [CRuby] RelaxNG.from_document no longer leaks memory. [#2114]

 Improved

  * [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for
    browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
  * {HTML,XML}::Document#parse now accept Pathname objects. Previously this
    worked only if the referenced file was less than 4096 bytes long; longer
    files resulted in undefined behavior because the read method would be
    repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!)
  * [CRuby] Nokogumbo builds faster because it can now use header files
    provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
  * Add frozen_string_literal: true magic comment to all lib files. [#1745]
    (Thanks, @oniofchaos!)
  * [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)

 Fixed

  * HTML Parsing in "strict" mode (i.e., the RECOVER parse option not \ 
set) now
    correctly raises a XML::SyntaxError exception. Previously the value of the
    RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby.
    [#2130]
  * The CSS ~= operator now correctly handles non-space whitespace in the class
    attribute. commit e45dedd
  * The switch to turn off the CSS-to-XPath cache is now thread-local, rather
    than being shared mutable state. [#1935]
  * The Node methods add_previous_sibling, previous=, before, add_next_sibling,
    next=, after, replace, and swap now correctly use their parent as the
    context node for parsing markup. These methods now also raise a
    RuntimeError if they are called on a node with no parent. [nokogumbo#160]
  * [JRuby] XML::Schema XSD validation errors are captured in XML::Schema#
    errors. These errors were previously ignored.
  * [JRuby] Standardize reading from IO like objects, including StringIO.
    [#1888, #1897]
  * [JRuby] Fix how custom XPath function namespaces are inferred to be less
    naive. [#1890, #2148]
  * [JRuby] Clarify exception message when custom XPath functions can't be
    resolved.
  * [JRuby] Comparison of Node to Document with Node#<=> now matches
    CRuby/libxml2 behavior.
  * [CRuby] Syntax errors are now correctly captured in Document#errors for
    short HTML documents. Previously the SAX parser used for encoding detection
    was clobbering libxml2's global error handler.
  * [CRuby] Fixed installation on AIX with respect to vasprintf. [#1908]
  * [CRuby] On some platforms, avoid symbol name collision with glibc's
    canonicalize. [#2105]
  * [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]
  * [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release
    candidates. [#1788] (Thanks, @stevecheckoway!)
  * [JRuby] Fixed document encoding regression in v1.11.0 release candidates.
    [#2080, #2083] (Thanks, @thbar!)

 Removed

  * The internal method Nokogiri::CSS::Parser.cache_on= has been removed. Use
    .set_cache if you need to muck with the cache internals.
  * The class method Nokogiri::CSS::Parser.parse has been removed. This was
    originally deprecated in 2009 in 13db61b. Use Nokogiri::CSS.parse instead.

 Changed

 XML::Schema input is now "untrusted" by default

Address CVE-2020-26247.

In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema
were trusted by default, allowing external resources to be accessed over the
network, potentially enabling XXE or SSRF attacks.

This behavior is counter to the security policy intended by Nokogiri
maintainers, which is to treat all input as untrusted by default whenever
possible.

Please note that this security fix was pushed into a new minor version, 1.11.x,
rather than a patch release to the 1.10.x branch, because it is a breaking
change for some schemas and the risk was assessed to be "Low Severity".

More information and instructions for enabling "trusted input" behavior in
v1.11.0.rc4 and later is available at the public advisory.

 HTML parser now obeys the strict or norecover parsing option

(Also noted above in the "Fixed" section) HTML Parsing in \ 
"strict" mode (i.e.,
the RECOVER parse option not set) now correctly raises a XML::SyntaxError
exception. Previously the value of the RECOVER bit was being ignored by CRuby
and was misinterpreted by JRuby.

If you're using the default parser options, you will be unaffected by this fix.
If you're passing strict or norecover to your HTML parser call, you may be
surprised to see that the parser now fails to recover and raises a
XML::SyntaxError exception. Given the number of HTML documents on the internet
that libxml2 would consider to be ill-formed, this is probably not what you
want, and you can omit setting that parse option to restore the behavior that
you have been relying upon.

Apologies to anyone inconvenienced by this breaking bugfix being present in a
minor release, but I felt it was appropriate to introduce this fix because it's
straightforward to fix any code that has been relying on this buggy behavior.

 VersionInfo, the output of nokogiri -v, and related constants

This release changes the metadata provided in Nokogiri::VersionInfo which also
affects the output of nokogiri -v. Some related constants have also been
changed. If you're using VersionInfo programmatically, or relying on constants
related to underlying library versions, please read the detailed changes for
Nokogiri::VersionInfo at #2139 and accept our apologies for the inconvenience.
   2020-11-05 10:09:30 by Ryo ONODERA | Files touched by this commit (1814)
Log message:
*: Recursive revbump from textproc/icu-68.1
   2020-10-03 16:27:32 by Izumi Tsutsui | Files touched by this commit (2) | Package updated
Log message:
ruby-nokogiri: update to 1.10.10.

Upstream chages (from CHANGELOG.md):

1.10.10 / 2020-07-06

Features

* [MRI] Cross-built Windows gems now support Ruby 2.7 [#2029]. Note that
  prior to this release, the v1.11.x prereleases provided this support.