Subject: CVS commit: pkgsrc/www/py-scrapy
From: Adam Ciarcinski
Date: 2023-04-27 11:33:44
Message id: 20230427093344.D8741FA87@cvs.NetBSD.org

Log Message:
py-scrapy: updated to 2.8.0

Scrapy 2.8.0 (2023-02-02)
-------------------------

This is a maintenance release, with minor features, bug fixes, and cleanups.

Deprecation removals
~~~~~~~~~~~~~~~~~~~~
-   The ``scrapy.utils.gz.read1`` function, deprecated in Scrapy 2.0, has now
    been removed. Use the :meth:`~io.BufferedIOBase.read1` method of
    :class:`~gzip.GzipFile` instead.
-   The ``scrapy.utils.python.to_native_str`` function, deprecated in Scrapy
    2.0, has now been removed. Use :func:`scrapy.utils.python.to_unicode`
    instead.
-   The ``scrapy.utils.python.MutableChain.next`` method, deprecated in Scrapy
    2.0, has now been removed. Use
    :meth:`~scrapy.utils.python.MutableChain.__next__` instead.
-   The ``scrapy.linkextractors.FilteringLinkExtractor`` class, deprecated
    in Scrapy 2.0, has now been removed. Use
    :class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
    instead.
-   Support for using environment variables prefixed with ``SCRAPY_`` to
    override settings, deprecated in Scrapy 2.0, has now been removed.
-   Support for the ``noconnect`` query string argument in proxy URLs,
    deprecated in Scrapy 2.0, has now been removed. We expect proxies that used
    to need it to work fine without it.
-   The ``scrapy.utils.python.retry_on_eintr`` function, deprecated in Scrapy
    2.3, has now been removed.
-   The ``scrapy.utils.python.WeakKeyCache`` class, deprecated in Scrapy 2.4,
    has now been removed.

Deprecations
~~~~~~~~~~~~
-   :exc:`scrapy.pipelines.images.NoimagesDrop` is now deprecated.
-   :meth:`ImagesPipeline.convert_image
    <scrapy.pipelines.images.ImagesPipeline.convert_image>` must now accept a
    ``response_body`` parameter.

New features
~~~~~~~~~~~~
-   Applied black_ coding style to files generated with the
    :command:`genspider` and :command:`startproject` commands.
    .. _black: https://black.readthedocs.io/en/stable/

-   :setting:`FEED_EXPORT_ENCODING` is now set to ``"utf-8"`` in the
    ``settings.py`` file that the :command:`startproject` command generates.
    With this value, JSON exports won’t force the use of escape sequences for
    non-ASCII characters.
-   The :class:`~scrapy.extensions.memusage.MemoryUsage` extension now logs the
    peak memory usage during checks, and the binary unit MiB is now used to
    avoid confusion.
-   The ``callback`` parameter of :class:`~scrapy.http.Request` can now be set
    to :func:`scrapy.http.request.NO_CALLBACK`, to distinguish it from
    ``None``, as the latter indicates that the default spider callback
    (:meth:`~scrapy.Spider.parse`) is to be used.

Bug fixes
~~~~~~~~~
-   Enabled unsafe legacy SSL renegotiation to fix access to some outdated
    websites.
-   Fixed STARTTLS-based email delivery not working with Twisted 21.2.0 and
    better.
-   Fixed the :meth:`finish_exporting` method of :ref:`item exporters
    <topics-exporters>` not being called for empty files.
-   Fixed HTTP/2 responses getting only the last value for a header when
    multiple headers with the same name are received.
-   Fixed an exception raised by the :command:`shell` command on some cases
    when :ref:`using asyncio <using-asyncio>`.
-   When using :class:`~scrapy.spiders.CrawlSpider`, callback keyword arguments
    (``cb_kwargs``) added to a request in the ``process_request`` callback of a
    :class:`~scrapy.spiders.Rule` will no longer be ignored.
-   The :ref:`images pipeline <images-pipeline>` no longer re-encodes JPEG
    files.
-   Fixed the handling of transparent WebP images by the :ref:`images pipeline
    <images-pipeline>`.
-   :func:`scrapy.shell.inspect_response` no longer inhibits ``SIGINT``
    (Ctrl+C).
-   :class:`LinkExtractor <scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor>`
    with ``unique=False`` no longer filters out links that have identical URL
    *and* text.
-   :class:`~scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware` now
    ignores URL protocols that do not support ``robots.txt`` (``data://``,
    ``file://``).
-   Silenced the ``filelock`` debug log messages introduced in Scrapy 2.6.
-   Fixed the output of ``scrapy -h`` showing an unintended ``**commands**``
    line.
-   Made the active project indication in the output of :ref:`commands
    <topics-commands>` more clear.

Documentation
~~~~~~~~~~~~~
-   Documented how to :ref:`debug spiders from Visual Studio Code
    <debug-vscode>`.
-   Documented how :setting:`DOWNLOAD_DELAY` affects per-domain concurrency.
-   Improved consistency.
-   Fixed typos.

Quality assurance
~~~~~~~~~~~~~~~~~
-   Applied :ref:`black coding style <coding-style>`, sorted import statements,
    and introduced :ref:`pre-commit <scrapy-pre-commit>`.
-   Switched from :mod:`os.path` to :mod:`pathlib`.
-   Addressed many issues reported by Pylint.
-   Improved code readability.
-   Improved package metadata.
-   Removed direct invocations of ``setup.py``.
-   Removed unnecessary :class:`~collections.OrderedDict` usages.
-   Removed unnecessary ``__str__`` definitions.
-   Removed obsolete code and comments.
-   Fixed test and CI issues.

Files:
RevisionActionfile
1.17modifypkgsrc/www/py-scrapy/Makefile
1.9modifypkgsrc/www/py-scrapy/PLIST
1.13modifypkgsrc/www/py-scrapy/distinfo