C. Fischer | 7 Mar 12:46
Favicon

anon cvs access?

what happened to ifiles CVS repository?

,----
| /src/ifile
| 0 p2 # cvs -z0 up
|  -> main loop with CVSROOT=:pserver:anoncvs <at> subversions.gnu.org:/cvsroot/ifile
|  -> Connecting to subversions.gnu.org(199.232.41.3):2401
| cvs [update aborted]: connect to subversions.gnu.org(199.232.41.3):2401 failed: Operation timed out
|  -> Lock_Cleanup()
`----

has something like the host or the organization of the repo changed?

  clemens
C. Fischer | 7 Mar 12:47
Favicon

naive bayes algorithm in ifile?

another idea i'm toying with is making a (portable) standard-prolog
implementation of naive bayes for (email/usenet) text classification.  the
free prologs have improved much over the years, and i want to know if a prolog
implementation is fast enough.

given n categories, t[i]; i {1..n} tokens per category, m[i]; i {1..n}
messages per category and for every token a record (age, c:i); i {1..n}, could
somebody please give a simple, english description of the algorithm needed to
classify a message?  i need to understand how token ageing can be used to keep
the database small, containing only the tokens that contribute the most to
classification and dropping the rest.

do i really need floating point operations or can i get away with integer
arithmetic?  could rational numbers be a better solution?

  clemens
C. Fischer | 7 Mar 12:48
Favicon

usage of ifiles threshold option?

could somebody please give an example of using ifiles `-T' (--threshold)
option?  i want to know how to derive a specific number for it.

  clemens
C. Fischer | 6 Mar 13:58
Favicon

ifile + MIME

in these days, with complicated MIME messages and the applicability of ifile
to the anti-spam domain, it seems ifile should grok MIME.

<URL:http://www.ivarch.com/programs/qsf.shtml>

qsf has quite sensible MIME handling:  only MIME types "text/*" are
classified, with HTML tags stripped and proper qp and base64 decoding.

question:  would it suffice to delete matching "<...>" pairs, or do they
sometimes get escaped in some way, or is it legal to qp/base64 encode them?

i'm thinking of taking qsfs MIME parser and add it to ifiles lexer.

  clemens
Jason Rennie | 12 Dec 20:21
Picon
Picon
Favicon

ifile 1.3.5 is out

Includes Derek Peschel's updates to configure.ac and argp/configure.in.

Jason
Bill Williamson | 6 Oct 07:38

space vs tab

I am using ifile in a "crazy style" auto-filter arrangement on my 
courier imap server (with procmail).  I have folders with spaces in 
their name, and currently do some odd substitution tricks to get it changed.

Attached is my procmailrc and my ifile.sh "autolearn" file which runs 
once every 15 minutes from cron.  Any suggestions on de-crazifying this 
would be helpful as well.

#!/bin/bash
IFS="
"
for mail in $(find /home/batkiwi/Maildir -cmin -15 -path '*cur*' -type f | grep -v 'Trash' | grep -v 'Sent' |
grep -v 'Spam' )
do
	if [ -f $mail ]
	then

	folder=`echo $mail | sed 's/.*Maildir\/\(.*\)\/.*/\1/'`
	if [ "$folder" = "cur" ]
	then
		folder="INBOX"
	else
		folder=`echo $folder | sed 's/\.\(.*\)\/cur/\1/' | sed 's/ /:/g'`
	fi

	oldfolder=`cat "$mail" | formail -z -x X-IFile-Guess | sed 's/ /:/g'`
	subject=`cat "$mail" | formail -z -x Subject`
	if [ "$oldfolder" = "$folder" ]
(Continue reading)

Paolo | 29 Apr 10:11

patch to add classifying threshold

Hi all,

seems that ifile dev has stopped a while ago... anyone still using it?
well, for those interested, I've submitted a patch to add an 'in-between'
response, ie, from patch description:

"this patch adds a -T x (--threshold) option, such that when you classify 
and have at least 2 classes - eg spam,ham - for the top 2 ratings, 
if x>0 you get:

 ifile -T 20 -q msg

 Reading message from standard input...
 spam -78,04470873
 ham -79,89371347
 diff[spam,ham](%) 1,17

and get an 'in between' response for the -q -c mode, if 
R=(rating0-rating1)/(rating0+rating1), R*1000 < x:
            [note the added comment for this ^^^ in patch browser]

ifile -q -c -T 20 msg

 spam,ham

ie 20 means 2% threshold."
So eg you'd use a procmail(1) rules snippet like:
...

:0w
(Continue reading)

Alexandra Walford | 28 Jan 00:25
Picon

Location of Ifile tarballs

Hi,

I'm just wondering where I can find the 1.3.3 source code these
days (the download area, savannah.nongnu.org/download/files, is
empty); 1.3.0 works fine for me but I'm in an upgrade-y mood :)

Thanks!

Alexandra

--

-- 
"It was a virgin forest, a place where the Hand of Man
  had never set foot."
_______________________________________________
Ifile-discuss mailing list
Ifile-discuss <at> nongnu.org
http://mail.nongnu.org/mailman/listinfo/ifile-discuss
Karl Vogel | 30 Dec 20:59
Picon
Favicon

New .idata file for my spam collection

Greetings:

My spam collection has grown a bit:

       SpamAssassin collection:   1,897 messages
           www.spamarchive.org:   6,364 messages
        G. Taylor's collection:   2,313 messages
                 UK collection:     646 messages
    Bruce Guenter's collection:  45,018 messages
              Junk I've gotten:  38,190 messages
  -----------------------------------------------
                         Total:  94,428 messages

The average message size is around 5,700 bytes, so I only use the first
5k or so from each message when making new .idata files.  This reduces
the size without hurting accuracy; when I add my good mail to .idata,
I only see one or two spams sneak through per 1000 incoming messages,
and that's usually due to someone on my whitelist getting a courtesy
copy of the same junk.

My spam-only .idata file can be found here:
http://www.dnaco.net/~vogelke/Software/Internet/Servers/Mail/Spam/Ifile/

--

-- 
Karl Vogel                      I don't speak for the USAF or my company
vogelke at pobox dot com                   http://www.pobox.com/~vogelke

The colder the X-ray table, the more of your body is required on it.
Aleksandr Milewski | 17 Oct 22:10

iFile as help-desk front-end

I'm considering using ifile as a front-end for a helpdesk system, so 
ifile would sort inbound questions by subject.

It looks very promising, but I have a couple of questions.

1. What do the numbers reported by ifile -q really mean?
	I believe that for this system, simply giving up and routing to 
a human would be better than guessing wrong, so I'd like to have a 
"unknown" bin that collects the stuff that isn't matched well by 
ifile. I was under the impression that the numbers reported were a 
"quality of match" metric, but in cases where nothing matches 
(feeding Jabberwocky to ifile when it's been trained on an OS X FAQ) 
returns 0 for all categories. Is this a special case, and if I get 
exactly zero, or some very negative number, I should assume the match 
is poor?

2. Does a tiered implementation make sense?
	I may have hundreds of bins in this system, and it occurred to 
me that I could create a system with multiple instances of ifile 
doing a tiered filtering scheme. Something like training the first 
instance on Mac vs. Windows, and letting it filter into those two 
bins. Each of those gets fed into a second filter that classifies 
more specifically.
	Is there any advantage to this approach, or am I better off 
letting ifile sort things out over a large number of bins

Thanks in advance,
	Zandr
--

-- 
---------------------------------------------------------------------
(Continue reading)

Xavier DUTOIT | 22 Sep 17:59

A few questions about ifile features compared to other classifiers

Dear all,

I've been looking at lots of bayesian filtering OSS around, and yours seem to be one of few multi-purpose (with POPFile) as opposed to "simple" antispam tools. I think ifile is the most interresting one to do what I want (a generic mail classifier on the server), but compared to others (mostly spam based), it seems to be not as complete as the other. I've seen some features that seem to be quite usefull,  could you tell me what you thing about them (ie. I don't find them usefull, to complicated to include, working on it, since you've asked, try to add such a feature...) :

1) storage based on a real database ( Berkeley DB for instance) instead of your file format ?
Do you think it would improve its performance ?

2) Mail parsing .
Features like recognition and decoding of MIME attachments in quoted-printable and base64 encoding, Ignores HTML tags in emails, handling things like V'I'A'G'R'A (random choosed example ;), Scores only the Received, Subject, To, From, and Cc headers...

Well, if I'm correct, the only thing you can do right now is either parsing the header in full or ignoring it. Althrough It arguable about where to put the mail parser code (should it be done elsewhere that in ifile ?), I feel that it is important to take the mail formats specificities into account when analysing its content.

Have you tried to adapt the code writed in bogofilter for instance to add such features to ifile ? Do you think It's worth trying (I'm volunteering) ?

3) A last thing about sort accuracy.
I read in one page that some of you reached 96% accurate classification. How have you calculated that ? Do you all have such high percentage ?



Thanks in advance,

Xavier


_______________________________________________
Ifile-discuss mailing list
Ifile-discuss <at> nongnu.org
http://mail.nongnu.org/mailman/listinfo/ifile-discuss

Gmane