./www/anubis, Proof-of-work check to stop AI bots

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 1.18.0, Package name: anubis-1.18.0, Maintainer: bsiegert

Anubis weighs the soul of your connection using a sha256 proof-of-work
challenge in order to protect upstream resources from scraper bots.

Installing and using this will likely result in your website not being
indexed by some search engines. This is considered a feature of Anubis,
not a bug.

This is a bit of a nuclear response, but AI scraper bots scraping so
aggressively have forced my hand. I hate that I have to do this, but
this is what we get for the modern Internet because bots don't conform
to standards like robots.txt, even when they claim to.

In most cases, you should not need this and can probably get by using
Cloudflare to protect a given origin. However, for circumstances where
you can't or won't use Cloudflare, Anubis is there for you.


Master sites:

Filesize: 770.734 KB

Version history: (Expand)


CVS history: (Expand)


   2025-05-13 19:33:14 by Benny Siegert | Files touched by this commit (3) | Package updated
Log message:
anubis: update to 1.18.0

v1.18.0: Varis zos Galvus

The big ticket feature in this release is CEL expression matching
support. This allows you to tailor your approach for the individual
services you are protecting.

These can be as simple as:

- name: allow-api-requests
  action: ALLOW
  expression:
    all:
      - '"Accept" in headers'
      - 'headers["Accept"] == "application/json"'
      - 'path.startsWith("/api/")'

Or as complicated as:

- name: allow-git-clients
  action: ALLOW
  expression:
    all:
      - >-
        (
          userAgent.startsWith("git/") ||
          userAgent.contains("libgit") ||
          userAgent.startsWith("go-git") ||
          userAgent.startsWith("JGit/") ||
          userAgent.startsWith("JGit-")
        )
      - '"Git-Protocol" in headers'
      - headers["Git-Protocol"] == "version=2"

The docs have more information.  This is a simple, lovable, and complete
implementation of this feature so that administrators can get hacking
ASAP.

Other changes:

-   Use CSS variables to deduplicate styles
-   Fixed native packages not containing the stdlib and botPolicies.yaml
-   Change import syntax to allow multi-level imports
-   Changed the startup logging to use JSON formatting as all the other
    logs do.
-   Added the ability to do expression matching with CEL
-   Add a warning for clients that don't store cookies
-   Disable Open Graph passthrough by default
-   Clarify the license of the mascot images
-   Started Suppressing 'Context canceled' errors from http in the logs

v1.17.0: Asahi sas Brutus

v.1.17.0 is a rather large release. This kind of giant feature release
will not happen again as this has caused significant problems with
testing in various configurations. Automated testing is being worked on
but I have nothing to report yet.

Big-ticket features include but are not limited to:

-   Configuration can be in YAML or JSON
-   Configuration snippets can be imported from the default library or
    anywhere on the filesystem
-   Default rules now flag "Opera" after seeing an attack in the wild
    that does that
-   Many documentation and build script fixes
-   AI-robots.txt rules are added to the default config to stop the
    worst offenders that care to identify themselves
-   Apache, Nginx, and Traefik have gotten documentation
-   Users can match by headers as well as user agents or paths
-   Internal refactoring to make Anubis faster and easier to maintain
-   "Secondary screening" has been removed to give a more consistent
    user experience
-   The Internet Archive is allowlisted by default
-   X-Forwarded-For header calculation should be a bit better
-   Subpath support (run anubis on /git)
-   Many implicit things have been documented
   2025-05-09 21:16:40 by Benny Siegert | Files touched by this commit (5)
Log message:
New package, www/anubis.

Anubis weighs the soul of your connection using a sha256 proof-of-work
challenge in order to protect upstream resources from scraper bots.

Installing and using this will likely result in your website not being
indexed by some search engines. This is considered a feature of Anubis,
not a bug.

This is a bit of a nuclear response, but AI scraper bots scraping so
aggressively have forced my hand. I hate that I have to do this, but
this is what we get for the modern Internet because bots don't conform
to standards like robots.txt, even when they claim to.

In most cases, you should not need this and can probably get by using
Cloudflare to protect a given origin. However, for circumstances where
you can't or won't use Cloudflare, Anubis is there for you.