./www/seaward, Crawler which searches for links or a specified word in a website

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.0.3, Package name: seaward-1.0.3, Maintainer: pin

Seaward is a crawler used to discover every link on a web page and its linked
pages without duplicates or to search for a word starting from the given URL.

If you want to save the links inside a file, you can run
'seaward <URL> --silent > file.txt', and if you experience many timeout errors
try using a higher timeout with '-t'.
With the '-d 0' option you crawl only the web page passed in the URL parameter,
with '-d 1' also the pages linked to it (always within the same web site) and
so on.


Master sites:

Filesize: 429.992 KB

Version history: (Expand)


CVS history: (Expand)


   2024-02-23 18:25:27 by pin | Files touched by this commit (3) | Package updated
Log message:
www/seaward: update to 1.0.3

 - It is now possible to "strictly" crawl a url, run --help to get more
   information
 - The code now handles more anomalies (less panics! yay)
 - Link fragments (#) are now removed to avoid revisiting the same page with
   different tags
 - Printouts used for logging information will now be in the following format
   [logging level] ..."
 - The crate "ctrlc" has been replaced by tokio::signal
   2024-02-09 14:58:23 by pin | Files touched by this commit (3) | Package updated
Log message:
www/seaward: update to 1.0.2

 - The heuristic used to find the optimum timeout has been modified:
   instead of using an average, it will be used the maximum time registered
   plus a margin (in the request samples).

 - Request samples timings are shown during execution (disable them with --silent).
   2023-10-25 00:11:51 by Thomas Klausner | Files touched by this commit (2298)
Log message:
*: bump for openssl 3
   2023-07-13 22:42:35 by pin | Files touched by this commit (5)
Log message:
www/seaward: import package

Seaward is a crawler used to discover every link on a web page and its linked
pages without duplicates or to search for a word starting from the given URL.

If you want to save the links inside a file, you can run
'seaward <URL> --silent > file.txt', and if you experience many timeout \ 
errors
try using a higher timeout with '-t'.
With the '-d 0' option you crawl only the web page passed in the URL parameter,
with '-d 1' also the pages linked to it (always within the same web site) and
so on.