./mail/OSBF-lua, Lua C module for text classification

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 2.0.4nb6, Package name: osbf-lua-2.0.4nb6, Maintainer: pkgsrc-users

OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module
for text classification. It is a port of the OSBF classifier implemented in
the CRM114 project. This implementation attempts to put focus on the
classification task itself by using Lua as the scripting language, a powerful
yet light-weight and fast language, which makes it easier to build and test
more elaborated filters and training methods.

The OSBF algorithm is a typical Bayesian classifier but enhanced with two
techniques originally developed for the CRM114 project: Orthogonal Sparse
Bigrams - OSB, for feature extraction, and Exponential Differential Document
Count - EDDC (a.k.a Confidence Factor), for automatic feature selection.
Combined, these two techniques produce a highly accurate classifier. OSBF
was developed focused on two classes, SPAM and NON-SPAM, so the performance
for more than two classes may not be the same.

spamfilter.lua is an anti-spam filter written in Lua using the OSBF-lua
module. It takes special advantage of EDDC to introduce TONE-HR, a highly
effective training method. The combination of OSB, EDDC and TONE-HR to
enhance a classical Bayesian classifier resulted in the best spam filtering
performance in TREC's Spam Track 2006 and the CEAS 2008 Live Spam Filter
Challenge.


Required to run:
[lang/lua52]

Required to build:
[pkgtools/cwrappers]

Master sites:

Filesize: 80.413 KB

Version history: (Expand)


CVS history: (Expand)


   2021-10-26 12:54:34 by Nia Alarie | Files touched by this commit (356)
Log message:
mail: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

The following distfiles were unfetchable (possibly fetched
conditionally?):

./mail/qmail/distinfo netqmail-1.05-TAI-leapsecs.patch
   2021-10-07 16:25:52 by Nia Alarie | Files touched by this commit (357)
Log message:
mail: Remove SHA1 hashes for distfiles
   2020-04-28 11:45:26 by Thomas Klausner | Files touched by this commit (1)
Log message:
OSBF-lua: limit to lua 5.2
   2020-04-14 14:47:12 by Thomas Klausner | Files touched by this commit (1)
Log message:
OSBF-lua: drop maintainership.
   2018-07-04 15:40:45 by Jonathan Perkin | Files touched by this commit (423)
Log message:
*: Move SUBST_STAGE from post-patch to pre-configure

Performing substitutions during post-patch breaks tools such as mkpatches,
making it very difficult to regenerate correct patches after making changes,
and often leading to substituted string replacements being committed.
   2015-11-04 00:27:24 by Alistair G. Crooks | Files touched by this commit (312)
Log message:
Add SHA512 digests for distfiles for mail category

Problems found locating distfiles:
	Package mutt: missing distfile patch-1.5.24.rr.compressed.gz
	Package p5-Email-Valid: missing distfile Email-Valid-1.198.tar.gz
	Package pine: missing distfile fancy.patch.gz
	Package postgrey: missing distfile targrey-0.31-postgrey-1.34.patch
	Package qmail: missing distfile badrcptto.patch
	Package qmail: missing distfile outgoingip.patch
	Package qmail: missing distfile qmail-1.03-realrcptto-2006.12.10.patch
	Package qmail: missing distfile qmail-smtpd-viruscan-1.3.patch
	Package thunderbird24: missing distfile enigmail-1.7.2.tar.gz
	Package thunderbird31: missing distfile enigmail-1.7.2.tar.gz

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.
   2014-10-20 00:27:48 by Alexander Nasonov | Files touched by this commit (59) | Package updated
Log message:
Revbump after lang/lua51 update.
   2014-05-03 15:01:25 by Alexander Nasonov | Files touched by this commit (33)
Log message:
Adapt to Lua multiversion support.