mozdev.org

Bayes Junk Tool   

Bayes Junk Tool
Description Version Size Requirements
Binary-only maintenance release 0.2.1 0.2.1 39k Java Runtime Edition, version 1.4 or higher
Source-only download 0.2 489k Java Development Kit, version 1.4 or higher
Source and binary (Java Class) download 0.2 533k Java Runtime Edition, version 1.4 or higher
Source-only download 0.1 34k Java Development Kit, version 1.4 or higher
Source and binary (Java Class) download 0.1 98k Java Runtime Edition, version 1.4 or higher
Sample XML and DAT Token Files
Description Size File name
This is a copy of the tokens from my training.dat. Only tokens which had at least 20 good or bad hits were included. I seem to get a high ratio of Spanish and Portuguese spam, so it may more readily filter those out than other types of spam (for example, Chinese). 210k straxus.xml
66k straxus.dat
This is a copy of the tokens from Rob Stow's training.dat. Only tokens which had at least 5 good or bad hits were included. It is optimized for "get rich quick" schemes, pr0n, and some email virii. 195k robstow.xml
55k robstow.dat
This is a copy of the tokens from Morten Hansen's training.dat. Only tokens which had at least 20 good or bad hits were included. It is not yet known what kinds of spam this token file is optimized for. 1032k mhansen.xml
311k mhansen.dat
This is a copy of the tokens from Dmitry Diskin's training.dat. Only tokens which had at least 5 good or bad hits were included. It is optimized for Russian spam. 129k ddiskin.xml
41k ddiskin.dat
This is a copy of the tokens from Christian Hamacher's training.dat. Only tokens which had at least 20 good or bad hits were included. It is optimized for allowing German emails and English emails with technical terms while eliminating most HTML spam. 276k chamacher.xml
90k chamacher.dat
This is a copy of the tokens from Jan Gundtofte-Bruun's training.dat. Only tokens which had at least 5 good or bad hits were included. It is optimized for mostly English spam (mortgages, pills, loans, etc). 175k jangb.xml
50k jangb.dat
This is a copy of the tokens from Oliver Putz's training.dat. Only tokens which had at least 20 good or bad hits were included. It is optimized for allowing German emails. 1038k oputz.xml
316k oputz.dat
This is a copy of the tokens from Will Smith's training.dat. Only tokens which had at least 20 good or bad hits were included. It is optimized for allowing emails that a busy wembaster would receive (such as cron job output, statistics, security notices, wikipedia changes, and emails relating to open source software) as well as eBay auction notices while rejecting most English and Chinese spam. 894k wsmith.xml
275k wsmith.dat

To merge one of the sample token files with your own training.dat, please do the following:

  1. Start up the Bayes Junk Tool in GUI mode (-g command-line switch)
  2. Under the File menu, select "Import and Merge..." (or press Ctrl-I)
  3. Select the XML or dat file which was downloaded, and press OK. This will merge the selected file into the existing set of tokens. Please be patient with XML files, this may take a little while (see bug 3947)
  4. Select "Save As..." from the File menu (or press Ctrl-S) and save as a Data file. Name the file training.dat.
  5. When Mozilla is fully closed (including QuickLaunch), copy this saved training.dat over top of your existing training.dat in your Mozilla profile folder. It is always wise to make backups before copying over profile files, so keep that in mind.

You should notice an immediate increase in your Junk Mail filter's effectiveness.

If you would like to upload either your training.dat or your exported XML training file so that others can benefit from it, please email it to me at straxus@baynet.net. Please note when using this link that you will need to remove everything except "straxus" from the beginning of the email address. I will add instructions about how to create a "nice" XML token file later, but for now feel free to send me what you have.

The bayesjunktool project can be contacted through the mailing list or the member list.
Copyright © 2000-2017. All rights reserved. Terms of Use & Privacy Policy.