Subject: CVS commit: pkgsrc/mail/spamassassin
From: Greg Troxel
Date: 2022-12-17 15:29:34
Message id: 20221217142934.2615DFA90@cvs.NetBSD.org

Log Message:
mail/spamassassin: Update to 4.0.0

Tested on NetBSD 9 amd64 with postfix and spamass-milter.  NB the rule
renaming below and the modified init.pre, which will not be merged by
updating if init.pre is locally modifed.

Upstream Release Notes:

Introduction
------------

Apache SpamAssassin 4.0.0 contains numerous tweaks and bug fixes over
the past releases. In particular, it includes major changes that
significantly improve the handling of text in international language.

As with any major release, there are countless functional patches and
improvements to upgrade to 4.0.0. Apache SpamAssassin 4.0.0 includes
several years of fixes that significantly improve classification and
performance. It has been thoroughly tested in production systems. We
strongly recommend upgrading as soon as possible.

Notable features:
=================

New plugins
-----------

There are three new plugins added with this release:

#1 Mail::SpamAssassin::Plugin::ExtractText

This plugin uses external tools to extract text from message parts,
and then sets the text as the rendered part. All SpamAssassin rules
that apply to the rendered part will run on the extracted text as
well.

#2 Mail::SpamAssassin::Plugin::DMARC

This plugin checks if emails match DMARC policy after parsing DKIM and
SPF results.

#3 Mail::SpamAssassin::Plugin::DecodeShortURLs

This plugin looks for URLs shortened by a list of URL shortening
services. Upon finding a matching URL, plugin will send a HTTP request
to the shortening service and retrieve the Location-header which
points to the actual shortened URL. It then adds this URL to the list
of URIs extracted by SpamAssassin which can then be accessed by uri
rules and plugins such as URIDNSBL.

Removed plugin
--------------

HashCash module, formerly deprecated, has now been removed completely

Notable changes
---------------

This release includes fixes for the following:

  - Support for international text such as UTF-8 rules has been
     completed and significantly improved to include native UTF-8
     processing

  - Bayes plugin has been improved to skip common words aka noise
     words written in languages other than English

  - OLEVBMacro plugin has been improved in order to detect more
     Microsoft Office macros and dangerous content. It has also been
     improved to extract URIs from Office documents for automatic
     inclusion in rules such as RBL lookups.

  - You can now use Captured Tags to use tags “captured” in one rule
     inside other rules

  - sa-update(1) tool has been improved with three new options:

    #1 forcemirror: forces sa-update to use a specific mirror server,

    #2 score-multiplier: adjust all scores from update channel by a
      given multiplier to quickly level set scores to match your
      preferred threshold

    #3 score-limit adjusts all scores from update channel over a
      specified limit to a new limit

* SSL client certificate support has been improved and made easier to
   implement with spamc/spamd

* DKIM plugin can now detect ARC signatures

* More work on improving the configuration and internal coding to use
  more inclusive and less divisive language

* spamc(1) speed has been improved when both SSL and compression are
  used

* The normalize_charset option is now enabled by default. NOTE: Rules
  should not expect specific non-UTF-8 or UTF-8 encoding in the body.
  Matching is done against the raw body, which may vary depending on
  normalize_charset setting and whether UTF-8 decoding was successful.

* Mail::SPF is now the only supported module used by the SPF plugin.

* Mail::SPF::Query use is deprecated, along with settings
  do_not_use_mail_spf, do_not_use_mail_spf_query.

 * SPF lookups are not done asynchronously and you may consider using
   an SPF filter at the MTA level (pypolicyd-spf / spf-engine / etc)
   which generates a Received-SPF header that can be parsed by
   SpamAssassin.

 * The default sa-update ruleset doesn't make ASN lookups or header
   additions anymore.  Configure desired methods (asn_use_geodb /
   asn_use_dns) and add_header clauses manually, as described in
   documentation for the Mail::SpamAssassin::Plugin::ASN.

New configuration options
-------------------------

All rules, functions, command line options and modules that contain
"whitelist" or "blacklist" have been renamed to \ 
"welcomelist" and
"blocklist" terms

Old options will continue to work for backwards compatibility until at
least the Apache SpamAssassin version 4.1.0 release

New tflag "nolog" added to hide info coming from rules in SpamAssassin
reports

New dns_options "nov4" and "nov6" added.
IMPORTANT:; You must set nov6 if your DNS resolver is filtering IPv6
AAAA replies.

Razor2 razor_fork option added. It will fork separate Razor2 process
and read in the results later asynchronously, increasing
throughput. When this is used, rule priorities are automatically
adjusted to -100.

Pyzor pyzor_fork option added. It will fork separate Pyzor process and
read in the results later asynchronously, increasing throughput. When
this is used, rule priorities are automatically adjusted to -100

urirhsbl and urirhssub rules now support "notrim" tflag, which forces
querying the full hostname, instead of trimmed domain

report_charset now defaults to UTF-8 which may change the rendering of
SpamAssassin reports

Notable Internal changes
------------------------

Meta rules no longer use priority values, they are evaluated
dynamically when the rules they depend on are finished

DNS and other asynchronous lookups like DCC or Razor2 plugins are now
launched when priority -100 is reached. This allows short circuiting
at lower priority without sending unneeded DNS queries

New internal Mail::SpamAssassin::GeoDB module supporting RelayCountry
and URILocalBL plugins provides a unified interface to Geographic IP
modules. These include:
    MaxMind::DB::Reader (GeoIP2)
    Geo::IP
    IP::Country::DB_File
    IP::Country::Fast.

Bayes and TxRep Message-ID tracking now uses a different hashing
method

Optimizations
-------------

Apache SpamAssassin 4.0.0 represents years of work by the project with
numerous improvements, new rule types, and internal native handling of
messages in international languages. These three key optimizations
will improve the efficiency of SpamAssassin:

    DNS queries are now done asynchronously for overall speed
    improvements

    DCC checks can now use dccifd asynchronously for improved throughput

    Pyzor and Razor fork use separate processes done asynchronously
    for increased throughput

Files:
RevisionActionfile
1.148modifypkgsrc/mail/spamassassin/Makefile
1.81modifypkgsrc/mail/spamassassin/distinfo
1.7modifypkgsrc/mail/spamassassin/patches/patch-Makefile.PL
1.3modifypkgsrc/mail/spamassassin/patches/patch-spamc_libspamc.c