Bayes Junk Tool 
| Home | Download | Screenshots | Bugs | Mailing List | CVS | Members |
Download
| Bayes Junk Tool | |||
| Description | Version | Size | Requirements |
| Binary-only maintenance release 0.2.1 | 0.2.1 | 39k | Java Runtime Edition, version 1.4 or higher |
| Source-only download | 0.2 | 489k | Java Development Kit, version 1.4 or higher |
| Source and binary (Java Class) download | 0.2 | 533k | Java Runtime Edition, version 1.4 or higher |
| Source-only download | 0.1 | 34k | Java Development Kit, version 1.4 or higher |
| Source and binary (Java Class) download | 0.1 | 98k | Java Runtime Edition, version 1.4 or higher |
| Sample XML and DAT Token Files | |||
| Description | Size | File name | |
| This is a copy of the tokens from my training.dat. Only tokens which had at least 20 good or bad hits were included. I seem to get a high ratio of Spanish and Portuguese spam, so it may more readily filter those out than other types of spam (for example, Chinese). | 210k | straxus.xml | |
| 66k | straxus.dat | ||
| This is a copy of the tokens from Rob Stow's training.dat. Only tokens which had at least 5 good or bad hits were included. It is optimized for "get rich quick" schemes, pr0n, and some email virii. | 195k | robstow.xml | |
| 55k | robstow.dat | ||
| This is a copy of the tokens from Morten Hansen's training.dat. Only tokens which had at least 20 good or bad hits were included. It is not yet known what kinds of spam this token file is optimized for. | 1032k | mhansen.xml | |
| 311k | mhansen.dat | ||
| This is a copy of the tokens from Dmitry Diskin's training.dat. Only tokens which had at least 5 good or bad hits were included. It is optimized for Russian spam. | 129k | ddiskin.xml | |
| 41k | ddiskin.dat | ||
| This is a copy of the tokens from Christian Hamacher's training.dat. Only tokens which had at least 20 good or bad hits were included. It is optimized for allowing German emails and English emails with technical terms while eliminating most HTML spam. | 276k | chamacher.xml | |
| 90k | chamacher.dat | ||
| This is a copy of the tokens from Jan Gundtofte-Bruun's training.dat. Only tokens which had at least 5 good or bad hits were included. It is optimized for mostly English spam (mortgages, pills, loans, etc). | 175k | jangb.xml | |
| 50k | jangb.dat | ||
| This is a copy of the tokens from Oliver Putz's training.dat. Only tokens which had at least 20 good or bad hits were included. It is optimized for allowing German emails. | 1038k | oputz.xml | |
| 316k | oputz.dat | ||
| This is a copy of the tokens from Will Smith's training.dat. Only tokens which had at least 20 good or bad hits were included. It is optimized for allowing emails that a busy wembaster would receive (such as cron job output, statistics, security notices, wikipedia changes, and emails relating to open source software) as well as eBay auction notices while rejecting most English and Chinese spam. | 894k | wsmith.xml | |
| 275k | wsmith.dat | ||
To merge one of the sample token files with your own training.dat, please do the following:
- Start up the Bayes Junk Tool in GUI mode (-g command-line switch)
- Under the File menu, select "Import and Merge..." (or press Ctrl-I)
- Select the XML or dat file which was downloaded, and press OK. This will merge the selected file into the existing set of tokens. Please be patient with XML files, this may take a little while (see bug 3947)
- Select "Save As..." from the File menu (or press Ctrl-S) and save as a Data file. Name the file training.dat.
- When Mozilla is fully closed (including QuickLaunch), copy this saved training.dat over top of your existing training.dat in your Mozilla profile folder. It is always wise to make backups before copying over profile files, so keep that in mind.
You should notice an immediate increase in your Junk Mail filter's effectiveness.
If you would like to upload either your training.dat or your exported XML training file so that others can benefit from it, please email it to me at straxus@baynet.net. Please note when using this link that you will need to remove everything except "straxus" from the beginning of the email address. I will add instructions about how to create a "nice" XML token file later, but for now feel free to send me what you have.