./textproc/ruby-nokogiri, HTML, XML, SAX, and Reader parser with XPath and CSS selector support

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.16.3, Package name: ruby32-nokogiri-1.16.3, Maintainer: tsutsui

Nokogiri parses and searches XML/HTML very quickly, and also has correctly
implemented CSS3 selector support as well as XPath support.

Features:

* XPath support for document searching
* CSS3 selector support for document searching
* XML/HTML builder
* Drop in replacement for Hpricot (though not bug for bug)


Required to run:
[textproc/libxml2] [textproc/libxslt] [misc/ruby-mini_portile2] [lang/ruby31-base]

Required to build:
[devel/ruby-pkg-config]

Master sites:

Filesize: 4518.5 KB

Version history: (Expand)


CVS history: (Expand)


   2024-03-16 17:33:40 by Izumi Tsutsui | Files touched by this commit (2) | Package updated
Log message:
ruby-nokogiri: update to 1.16.3.

Upstream changes:
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.16.3

Dependencies

 * [CRuby] Vendored libxml2 is updated to v2.12.6 from v2.12.5. (@flavorjones)

Changed

 * [CRuby] XML::Reader sets the @encoding instance variable during reading if
   it is not passed into the initializer. Previously, it would remain nil. The
   behavior of Reader#encoding has not changed. This works around changes to
   how libxml2 reports the encoding used in v2.12.6.
   2024-02-16 15:32:28 by Izumi Tsutsui | Files touched by this commit (2) | Package updated
Log message:
ruby-nokogiri: update to 1.16.2.

Upstream changes:
 https://github.com/sparklemotion/nokogiri/releases

v1.16.2 / 2024-02-04

Security

  * [CRuby] Vendored libxml2 is updated to address CVE-2024-25062. See
    GHSA-xc9x-jj77-9p9j for more information.

Dependencies

  * [CRuby] Vendored libxml2 is updated to v2.12.5 from v2.12.4. (@flavorjones)

v1.16.1 / 2024-02-03

Dependencies

  * [CRuby] Vendored libxml2 is updated to v2.12.4 from v2.12.3. (@flavorjones)

Fixed

  * [CRuby] XML::Reader defaults the encoding to UTF-8 if it's not specified in
    either the document or as a method parameter. Previously non-ASCII
    characters were serialized as NCRs in this case. [#2891] (@flavorjones)
  * [CRuby] Restored support for compilation by GCC versions earlier than 4.6,
    which was broken in v1.15.0 (540e9ae). [#3090] (@adfoster-r7)
  * [CRuby] Patched upstream libxml2 to allow parsing HTML5 in the context of a
    namespaced node (e.g., foreign content like MathML). [#3112, #3116] (
    @flavorjones)
  * [CRuby] Fixed a small memory leak in libgumbo (HTML5 parser) when the
    maximum tree depth limit is hit. [#3098, #3100] (@stevecheckoway)

v1.16.0 / 2023-12-27

Notable Changes

Ruby

This release introduces native gem support for Ruby 3.3.

This release ends support for Ruby 2.7, for which upstream support ended
2023-03-31.

Pattern matching

This version marks official support for the pattern matching API in XML::Attr,
XML::Document, XML::DocumentFragment, XML::Namespace, XML::Node, and
XML::NodeSet (and their subclasses), originally introduced as an experimental
feature in v1.14.0. (@flavorjones)

Documentation on what can be matched:

  * XML::Attr#deconstruct_keys
  * XML::Document#deconstruct_keys
  * XML::Namespace#deconstruct_keys
  * XML::Node#deconstruct_keys
  * XML::DocumentFragment#deconstruct
  * XML::NodeSet#deconstruct

Dependencies

  * [CRuby] Vendored libxml2 is updated to v2.12.3 from v2.11.6. (@flavorjones)
      + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.0
      + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.1
      + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.2
      + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.3

Fixed

  * CSS nth pseudo-classes now handle spaces, e.g. "2n + 1". [#3018]
    (@fusion2004)
  * [CRuby] libgumbo no longer leaks memory when an incomplete tag is abandoned
    by the HTML5 parser. [#3036] (@flavorjones)

Removed

  * Removed Nokogiri::HTML5.get which was deprecated in v1.12.0. [#2278]
    (@flavorjones)
  * Removed the CSS-to-XPath utility modules XPathVisitorAlwaysUseBuiltins and
    XPathVisitorOptimallyUseBuiltins, which were deprecated in v1.13.0 in favor
    of XPathVisitor constructor args. [#2403] (@flavorjones)
  * Removed XML::Reader#attribute_nodes which was deprecated in v1.13.8 in
    favor of #attribute_hash. [#2598, #2599] (@flavorjones)
  * [CRuby] Removed the libxml/libxml2_path key from VersionInfo, used in the
    past for third-party library integration, in favor of the nokogiri/cppflags
    and nokogiri/ldflags keys. Please note that third-party library integration
    is not fully supported and may be deprecated soon, see #2746 for more
    context. [#2143] (@flavorjones)
   2023-12-26 19:56:09 by Thomas Klausner | Files touched by this commit (1)
Log message:
ruby-nokogiri: make sure libxml2 is in compiler search path
   2023-11-18 16:39:16 by Izumi Tsutsui | Files touched by this commit (2) | Package updated
Log message:
ruby-nokogori: update to 1.15.5

Upstream changes:
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.5

1.15.5 / 2023-11-17

Dependencies

* [CRuby] Vendored libxml2 is updated to v2.11.6 from v2.11.5. For details
  please see https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.6
* [CRuby] Vendored libxslt is updated to v1.1.39 from v1.1.38. For details
  please see https://gitlab.gnome.org/GNOME/libxslt/-/releases/v1.1.39
   2023-11-08 14:21:43 by Thomas Klausner | Files touched by this commit (2377)
Log message:
*: recursive bump for icu 74.1
   2023-08-21 20:38:16 by Amitai Schleier | Files touched by this commit (1)
Log message:
ruby-nokogiri: strip --no-as-needed on macOS to fix build.
   2023-08-12 10:43:14 by Izumi Tsutsui | Files touched by this commit (2) | Package updated
Log message:
ruby-nokogiri: update to 1.15.4.

Upstream changes:
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.4

1.15.4 / 2023-08-11

Dependencies

  * [CRuby] Vendored libxml2 is updated to v2.11.5 from v2.11.4. For details
    please see https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.5

Fixed

  * Fixed a typo in a HTML5 parser error message. [#2927] (Thanks,
    @anishathalye!)
  * [CRuby] ObjectSpace.memsize_of is now safe to call on Documents with
    complex DTDs. In previous versions, this debugging method could result in a
    segfault. [#2923, #2924]
   2023-08-06 05:07:59 by Izumi Tsutsui | Files touched by this commit (3) | Package updated
Log message:
ruby-nokogiri: update to 1.15.3.

Upstream changes:
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.3
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.2
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.1
 https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.0

1.15.3 / 2023-07-05

Fixed

 * Passing an object that is not a kind of XML::Node as the first parameter to
   CDATA.new now raises a TypeError. Previously this would result in either a
   segfault (CRuby) or a Java exception (JRuby). [#2920]
 * Passing an object that is not a kind of XML::Node as the first parameter to
   Schema.from_document now raises a TypeError. Previously this would result
   in either a segfault (CRuby) or a Java exception (JRuby). [#2920]
 * [CRuby] Passing an object that is not a kind of XML::Node as the second
   parameter to Text.new now raises a TypeError. Previously this would result
   in a segfault. [#2920]
 * [CRuby] Replacing a node's children via methods like Node#inner_html=, #
   children=, and #replace no longer defensively dups the node's next sibling
   if it is a Text node. This behavior was originally adopted to work around
   libxml2's memory management (see #283 and #595) but should not have
   included operations involving xmlAddChild(). [#2916]
 * [JRuby] Fixed NPE when serializing an unparented HTML node. [#2559, #2895]
   (Thanks, @cbasguti!)

1.15.2 / 2023-05-24

Dependencies

 * [JRuby] Vendored org.nokogiri:nekodtd is updated to v0.1.11.noko2. This is
   functionally equivalent to v0.1.11.noko1 but restores support for Java 8.

Fixed

 * [JRuby] Java 8 support is restored, fixing a regression present in
   v1.14.0..v1.14.4 and v1.15.0..v1.15.1. [#2887]

1.15.1 / 2023-05-19

Dependencies

 * [CRuby] Vendored libxml2 is updated to v2.11.4 from v2.11.3. For details
   please see https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.4

Fixed

 * [CRuby] The libxml2 update fixes an encoding regression when push-parsing
   UTF-8 sequences. [#2882, upstream issue and commit]

1.15.0 / 2023-05-15

Notes

Ability to opt into system malloc and free

Since 2009, Nokogiri has configured libxml2 to use ruby_xmalloc et al for
memory management. This has provided benefits for memory management, but comes
with a performance penalty.

Users can now opt into using system malloc for libxml2 memory management by
setting an environment variable:

# "default" here means "libxml2's default" which is system malloc
NOKOGIRI_LIBXML_MEMORY_MANAGEMENT=default

Benchmarks show that this setting will significantly improve performance, but
be aware that the tradeoff may involve poorer memory management including
bloated heap sizes and/or OOM conditions.

You can read more about this in the decision record at adr/
2023-04-libxml-memory-management.md.

Dependencies

 * [CRuby] Vendored libxml2 is updated to v2.11.3 from v2.10.4. For details
   please see:
    + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.0
    + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.1
    + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.2
    + https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.3
 * [CRuby] Vendored libxslt is updated to v1.1.38 from v1.1.37. For details
   please see:
    + https://gitlab.gnome.org/GNOME/libxslt/-/releases/v1.1.38

Added

 * Encoding objects may now be passed to serialization methods like #to_xml, #
   to_html, #serialize, and #write_to to specify the output encoding.
   Previously only encoding names (strings) were accepted. [#2774, #2798]
   (Thanks, @ellaklara!)
 * [CRuby] Users may opt into using system malloc for libxml2 memory
   management. For more detail, see note above or adr/
   2023-04-libxml-memory-management.md.

Changed

 * [CRuby] Schema.from_document now makes a defensive copy of the document if
   it has blank text nodes with Ruby objects instantiated for them. This
   prevents unsafe behavior in libxml2 from causing a segfault. There is a
   small performance cost, but we think this has the virtue of being "what the
   user meant" since modifying the original is surprising behavior for most
   users. Previously this was addressed in v1.10.9 by raising an exception.

Fixed

 * [CRuby] XSLT.transform now makes a defensive copy of the document if it has
   blank text nodes with Ruby objects instantiated for them and the template
   uses xsl:strip-spaces. This prevents unsafe behavior in libxslt from
   causing a segfault. There is a small performance cost, but we think this
   has the virtue of being "what the user meant" since modifying the \ 
original
   is surprising behavior for most users. Previously this would allow unsafe
   memory access and potentially segfault. [#2800]

Improved

 * Nokogiri::XML::Node::SaveOptions#inspect now shows the names of the options
   set in the bitmask, similar to ParseOptions. [#2767]
 * #inspect and pretty-printing are improved for AttributeDecl,
   ElementContent, ElementDecl, and EntityDecl.
 * [CRuby] The C extension now uses Ruby's TypedData API for managing all the
   libxml2 structs. Write barriers may improve GC performance in some extreme
   cases. [#2808] (Thanks, @etiennebarrie and @byroot!)
 * [CRuby] ObjectSpace.memsize_of reports a pretty good guess of memory usage
   when called on Nokogiri::XML::Document objects. [#2807] (Thanks,
   @etiennebarrie and @byroot!)
 * [CRuby] Users installing the "ruby" platform gem and compiling \ 
libxml2 and
   libxslt from source will now be using a modern config.guess and config.sub
   that supports new architectures like loongarch64. [#2831] (Thanks,
   @zhangwenlong8911!)
 * [CRuby] HTML5 parser:
    + adjusts the specified attributes, adding xlink:arcrole and removing
      xml:base [#2841, #2842]
    + allows <hr> in <select> [whatwg/html#3410, whatwg/html#9124]
 * [JRuby] Node#first_element_child now returns nil if there are only
   non-element children. Previously a null pointer exception was raised. [#
   2808, #2844]
 * Documentation for Nokogiri::XSLT now has usage examples including custom
   function handlers.

Deprecated

 * Passing a Nokogiri::XML::Node as the first parameter to CDATA.new is
   deprecated and will generate a warning. This parameter should be a kind of
   Nokogiri::XML::Document. This will become an error in a future version of
   Nokogiri.
 * Passing a Nokogiri::XML::Node as the first parameter to
   Schema.from_document is deprecated and will generate a warning. This
   parameter should be a kind of Nokogiri::XML::Document. This will become an
   error in a future version of Nokogiri.
 * Passing a Nokogiri::XML::Node as the second parameter to Text.new is
   deprecated and will generate a warning. This parameter should be a kind of
   Nokogiri::XML::Document. This will become an error in a future version of
   Nokogiri.
 * [CRuby] Calling a custom XPath function without the nokogiri namespace is
   deprecated and will generate a warning. Support for non-namespaced
   functions will be removed in a future version of Nokogiri. (Note that JRuby
   has never supported non-namespaced custom XPath functions.)