./mail/OSBF-lua, Lua C module for text classification

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 2.0.4nb6, Package name: osbf-lua-2.0.4nb6, Maintainer: wiz

OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module
for text classification. It is a port of the OSBF classifier implemented in
the CRM114 project. This implementation attempts to put focus on the
classification task itself by using Lua as the scripting language, a powerful
yet light-weight and fast language, which makes it easier to build and test
more elaborated filters and training methods.

The OSBF algorithm is a typical Bayesian classifier but enhanced with two
techniques originally developed for the CRM114 project: Orthogonal Sparse
Bigrams - OSB, for feature extraction, and Exponential Differential Document
Count - EDDC (a.k.a Confidence Factor), for automatic feature selection.
Combined, these two techniques produce a highly accurate classifier. OSBF
was developed focused on two classes, SPAM and NON-SPAM, so the performance
for more than two classes may not be the same.

spamfilter.lua is an anti-spam filter written in Lua using the OSBF-lua
module. It takes special advantage of EDDC to introduce TONE-HR, a highly
effective training method. The combination of OSB, EDDC and TONE-HR to
enhance a classical Bayesian classifier resulted in the best spam filtering
performance in TREC's Spam Track 2006 and the CEAS 2008 Live Spam Filter
Challenge.


Required to run:
[lang/lua52]

Master sites:

SHA1: 6fd4fb6496c20e9340cdcff4820c50a793e2ea27
RMD160: ba808072739de2bcb40ce81f0177ef7588508670
Filesize: 80.413 KB

Version history: (Expand)


CVS history: (Expand)


   2015-11-04 00:27:24 by Alistair G. Crooks | Files touched by this commit (312)
Log message:
Add SHA512 digests for distfiles for mail category

Problems found locating distfiles:
	Package mutt: missing distfile patch-1.5.24.rr.compressed.gz
	Package p5-Email-Valid: missing distfile Email-Valid-1.198.tar.gz
	Package pine: missing distfile fancy.patch.gz
	Package postgrey: missing distfile targrey-0.31-postgrey-1.34.patch
	Package qmail: missing distfile badrcptto.patch
	Package qmail: missing distfile outgoingip.patch
	Package qmail: missing distfile qmail-1.03-realrcptto-2006.12.10.patch
	Package qmail: missing distfile qmail-smtpd-viruscan-1.3.patch
	Package thunderbird24: missing distfile enigmail-1.7.2.tar.gz
	Package thunderbird31: missing distfile enigmail-1.7.2.tar.gz

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.
   2014-10-20 00:27:48 by Alexander Nasonov | Files touched by this commit (59) | Package updated
Log message:
Revbump after lang/lua51 update.
   2014-05-03 15:01:25 by Alexander Nasonov | Files touched by this commit (33)
Log message:
Adapt to Lua multiversion support.
   2013-10-30 07:49:55 by David A. Holland | Files touched by this commit (34) | Package updated
Log message:
Bump PKGREVISION of packages whose Lua depends changed form, but whose
own PKGNAME is unchanged.
   2013-10-11 22:25:34 by Thomas Klausner | Files touched by this commit (5) | Package updated
Log message:
More fixes for lua-5.2. From https://github.com/arunpersaud/osbf-lua
with the help of John R. Shannon.
Bump PKGREVISION.
   2013-07-05 16:31:40 by Thomas Klausner | Files touched by this commit (2)
Log message:
Fix build with lua-5.2.
   2013-07-04 23:27:59 by Adam Ciarcinski | Files touched by this commit (44)
Log message:
Revbump after updating lang/lua to 5.2.2.
   2013-02-01 23:25:26 by Thomas Klausner | Files touched by this commit (2)
Log message:
Pick up maintainership.