./www/py-scrapy, High-level Web Crawling and Web Scraping framework

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.4.0, Package name: py27-scrapy-1.4.0, Maintainer: pkgsrc-users

Scrapy is a fast high-level web crawling and web scraping framework, used to
crawl websites and extract structured data from their pages. It can be used for
a wide range of purposes, from data mining to monitoring and automated testing.


Required to run:
[net/py-twisted] [devel/py-setuptools] [lang/python27] [www/py-parsel] [devel/py-pydispatcher] [devel/py-queuelib]

Required to build:
[pkgtools/cwrappers]

Master sites:

SHA1: 24222debf2e6b9220a91a56c476c208ac5ecb8e5
RMD160: ef20b9288851962fb552c1045e297c8917a74d17
Filesize: 877.108 KB

Version history: (Expand)


CVS history: (Expand)


   2017-09-04 20:08:31 by Thomas Klausner | Files touched by this commit (163)
Log message:
Follow some redirects.
   2017-05-20 08:25:36 by Adam Ciarcinski | Files touched by this commit (4)
Log message:
Scrapy 1.4 does not bring that many breathtaking new features
but quite a few handy improvements nonetheless.

Scrapy now supports anonymous FTP sessions with customizable user and
password via the new :setting:`FTP_USER` and :setting:`FTP_PASSWORD` settings.
And if you're using Twisted version 17.1.0 or above, FTP is now available
with Python 3.

There's a new :meth:`response.follow <scrapy.http.TextResponse.follow>` method
for creating requests; **it is now a recommended way to create Requests
in Scrapy spiders**. This method makes it easier to write correct
spiders; ``response.follow`` has several advantages over creating
``scrapy.Request`` objects directly:

* it handles relative URLs;
* it works properly with non-ascii URLs on non-UTF8 pages;
* in addition to absolute and relative URLs it supports Selectors;
  for ``<a>`` elements it can also extract their href values.
   2017-03-19 23:59:11 by Adam Ciarcinski | Files touched by this commit (2)
Log message:
Changes 1.3.3:
Bug fixes
- Make ``SpiderLoader`` raise ``ImportError`` again by default for missing
  dependencies and wrong :setting:`SPIDER_MODULES`.
  These exceptions were silenced as warnings since 1.3.0.
  A new setting is introduced to toggle between warning or exception if needed ;
  see :setting:`SPIDER_LOADER_WARN_ONLY` for details.
   2017-02-13 22:25:33 by Adam Ciarcinski | Files touched by this commit (4)
Log message:
Added www/py-scrapy version 1.3.2

Scrapy is a fast high-level web crawling and web scraping framework, used to
crawl websites and extract structured data from their pages. It can be used for
a wide range of purposes, from data mining to monitoring and automated testing.