Tracy Reed | 12 Apr 02:14

priolist.mfp regex problem

I have been successfully using priolist.mfp to match email addresses that I
want to black/whitelist for ages. Now I want to block on a particular phrase
such as:

yahoo messenger online now

which is from a particularly egregious sort of spam/scam my organization is
receiving. I have tried putting the following combinations, none of which have
worked:

-yahoo messenger online now
-yahoo\ messenger\ online\ now
-yahoo\smessenger\sonline\snow
-"yahoo messenger online now"

and probably various others which I cannot now reproduce. I really expected the
first one to work. I have finally stumbled upon a combination which does work:

-yahoo.messenger.online.now

Why would . work (I'm plenty familiar with regex and know it will match
anything) but \s to match the spaces or simply raw spaces not work?

Is there some unintentional parsing being done on the whitespace I put in which
is breaking things?

Thanks!

--

-- 
Tracy Reed
(Continue reading)

Steve Pellegrin | 29 Mar 03:55

libcrm and svm

Hi Bill,

It's been a long time since I last wrote to the list. I'm interested in playing around with libcrm and the SVM
classifier, but am a bit confused.

1. Where might I find the "best" version of the library? The link on the wiki seems pretty old. For me, "best"
means reliable enough for doing personal spam filtering.
2. The HowTo mentions CRM114_SVM, but I see that there is also CRM114_LIBSVM and that the two choices follow
different code paths in crm114_base. Which should I use, or why might I choose one vs. the other?

Best Regards,
   Steve

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
wsy | 26 Mar 19:52
Favicon

Re: Help needed with a "wedged" CRM114 installation

Tracy Reed <treed@...> writes:

> [1:text/plain Hide]
> On Mon, Mar 26, 2012 at 08:16:31AM -0400, wsy@... spake thusly:
>> A priolist file looks like this: a + or a -, then a space, then a
>> pattern.  A leading "+" means "whitelist", a leading "-" means
>> "blacklist".  As a regex, it looks like this:
>> 
>>   (+|-) pattern
>> 
>> For example, to whitelist my lawer, my doctor, and blacklist 
>> my ex-girlfriend:
>> 
>> + my_lawyer@...
>> + my_doctor@...
>> - my_ex_girlfriend@...
>> + my_new_girlfriend@...
>
> I notice you have a space between the +/- and the email address. Is that
> necessary? Are # allowed as comments? My understanding is that they can be the
> first char on a line as a comment. I recently had a bunch of comments at end of
> lines like:

I *think* there's supposed to be a space there, but it's been
so long since I touched the code that I can't remember!

I *think* that a column 1 '#' can be used as a comment.

> - my_ex_girlfriend@... # Psycho, do NOT drunk dial!

(Continue reading)

Martin Lucina | 22 Mar 15:24
Gravatar

Re: Help needed with a "wedged" CRM114 installation

Hi Bill,

wsy@... said:
> Martin Lucina <martin@...> writes:
> 
> > Hi Bill,
> >
> > I've dropped the Cc: to the list, since even after confirming my
> > subscription to crm114-general it still seems to want moderator approval,
> > so I still can't post to the list.

Added the Cc: to the list back in, have now received email from the system
confirming that I'm subscribed.

> >
> >> I'm sorry, I must have missed something.
> >> 
> >> What was the question/issue?
> >
> > Here's my original email:
> >
> > Hello,
> >                             
> > I have been using CRM114, along with the Dovecot "antispam" plugin [1], to
> > handle my spam filtering for some years now.  Things have worked fairly
> > well, with CRM114 catching all SPAM after initial training, and classifying
> > 1-2 emails every few days as "unsure".
> >
> > I've always assumed that the fairly constant trickle of email being
> > classified as "unsure" was due to the fact that I receive most email in
(Continue reading)

Nico Kadel-Garcia | 28 Feb 04:22
Picon

Confidential information scanning backup files

I was discussing a security issue with my new employer today, about scanning backups of servers that should not have confidential data on them for precisely such data. In the short term, scanning the email would do, especially if attachmants can be scanned. And I thought of CRM114 for the task, instead of the very slow and painful tools that are often used now.
 
Is there a toolkit for such scanning? I'd much prefer to avoid take on the full integration project, but if anyone's already got such a toolkit assembled, even if it's a commercial toolkit, I'd love to review it for use at my new workplace.
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Crm114-general mailing list
Crm114-general@...
https://lists.sourceforge.net/lists/listinfo/crm114-general
Jskud.CRM114 | 7 Feb 07:27

modest update to CLASSIFY_DETAILS.txt

Hi, All.

I'm back to looking at CRM114, and I decided to update some of the
documentation, working from "the (possibly) slightly unstable latest
mainline version" -- thanks for the continued good work, and I hope the 
updates prove useful.  This is the first.

/Jskud

--- crm114-20120205-ORIG/CLASSIFY_DETAILS.txt	2009-09-11 11:25:57.000000000 -0700
+++ crm114-20120205/CLASSIFY_DETAILS.txt	2012-02-06 07:49:00.000000000 -0800
@@ -18,7 +18,8 @@
 The current distribution builds in this set of classifiers.  The
 classifiers are:

-1) SBPH Markovian (the default) This is an extension of Bayesian
+1) SBPH Markovian (the default) - This classifier uses
+   Sparse Binary Polynomial Hashing (SBPH), an extension of Bayesian
    classification, mapping features in the input text into a Markov
    Random Field.  This turns each token in the input into 2^(N-1)
    features, which gives high accuracy but at high computation
@@ -52,7 +53,7 @@
    other classifiers.  It _will_ work against binary files, though,
    which none of the other classifiers will.

-5) Hyperspatial classification - this experimental classifier
+5) Hyperspatial classification - This experimental classifier
    tokenizes, but does not use Bayes law at all, nor statistical
    "clumping".  During learning, each example document generates a
    single point in a 4 billion dimensional hyperspace.  The
@@ -719,7 +720,7 @@

 

-The format of a SBPH or OSB Markovian .css file (and, for winnow a
+The format of a SBPH or OSB Markovian .css file (and, for Winnow a
 .cow file) is a 64-bit hash of a feature (whether the feature is a
 single word, a bigram, or a full SBPH does not matter) and a 32-bit
 representation of the value.  In .css files, the 32 bits is an

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
Jskud.CRM114 | 7 Feb 07:31

updates to CRM114_Mailfilter_HOWTO.txt

Hi, All.

I'm back to looking at CRM114, and I decided to update some of the
documentation, working from "the (possibly) slightly unstable latest
mainline version" -- thanks for the continued good work, and I hope the 
updates prove useful.  This is the second of two.

I mostly reworked the formatting to make it consistent, and made the
step titles consistent as well, fixing a few obvious typos in the
process.

/Jskud

--- crm114-20120205-ORIG/CRM114_Mailfilter_HOWTO.txt	2009-09-11 11:25:57.000000000 -0700
+++ crm114-20120205/CRM114_Mailfilter_HOWTO.txt	2012-02-06 19:52:36.000000000 -0800
@@ -7,7 +7,7 @@
 		The CRM114 & Mailfilter HOWTO

 		    -Bill Yerazunis, 2003-09-18
-			(last update 2009-03-02)
+			(last update 2012-02-06)

 
 This is the CRM114 Mailfilter HOWTO.  It describes how to set up CRM114
@@ -31,7 +31,7 @@

    ----------------------------------------------------------

-That said, I hope CRM114, Mailreaver, and Mailreaver is useful to you;
+That said, I hope CRM114, Mailfilter, and Mailreaver is useful to you;
 it's been very useful to me.  It's been keeping my mailbox clear of
 clutter for since 2002; I'm convinced it has better performance than
 I-the-human at killing spam without accidentally deleting important
@@ -64,12 +64,15 @@

 	     - Bill Yerazunis (wsy@...)

--------------------------------------------------------------------

-	Step 0:  Scientes Inamicae  (Know Thy Enemy)
+------------------------------------------------------------------------
+------------------------------------------------------------------------
+
+
+      Step 0: Scientes Inamicae  (Know Thy Enemy)

-These are the major steps in using CRM114 Mailfilter.  The steps are
-pretty simple:
+These are the other major steps in using CRM114 Mailfilter.  The steps
+are pretty simple:

       1) Downloading what you need

@@ -85,7 +88,7 @@
          (editing one file, most likely change is ONE line, and we tell
 	 you which one)

-      3) Setting up the needed auxilliary files
+      4) Setting up other needed files

 	 (not more than 2 files to edit of no more than 5 lines each,
 	 plus typing one or two commands)
@@ -115,10 +118,11 @@
 	 don't need to know this, but you may find it useful.

 
+------------------------------------------------------------------------
+------------------------------------------------------------------------

--------------------------------------------------------------------------

-                  Step 1: Downloading.
+	Step 1: Downloading What You Need

 Get yourself a copy of a CRM114 kit.  The kits can always be found by
 visiting the CRM114 homepage at:
@@ -168,16 +172,14 @@

 Download the kits you will need (at least one of .src.tar.gz or
 .i386.tar.gz or .i386.rpm) and then proceed to "Step 2: Setting Up the
-Executables"
-
-
-
---------------------------------------------------------------------------
+Executables".

 
+------------------------------------------------------------------------
+------------------------------------------------------------------------

 
-                       Step 2: Setting Up the Executables
+	Step 2: Setting Up the Executables

 In this step, you will install four binaries into your system.
 The four binaries are:
@@ -262,8 +264,9 @@

 
 Congratulations!  You've now completed the installation of CRM114 and
-utilities from prebuilt binaries.  Proceed to "Step 3: Setting Up Needed
-Files.
+utilities from prebuilt binaries.  Proceed to "Step 3: Configuring
+Mailfilter or Mailreaver.
+

   -----

@@ -382,14 +385,14 @@

 
 Congratulations!  You've now completed the installation of CRM114 and
-utilities from source.  Move on to the next step - "Step 3: Setting Up
-Your .CSS Files" .
-
+utilities from source.  Move on to the next step - "Step 3: Configuring
+Mailfilter or Mailreaver".

 
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------

+
 	Step 3: Configuring Mailfilter or Mailreaver

 In this step you will tell Mailfilter or MailReaver what you want it
@@ -500,11 +503,11 @@
 Now, proceed to "Step 4: Setting Up Other Needed Files" .

 
---------------------------------------------------------------------
---------------------------------------------------------------------
+------------------------------------------------------------------------
+------------------------------------------------------------------------

 
-		Step 4: Setting Up Other Needed Files
+	Step 4: Setting Up Other Needed Files

 Now that the crm114 language is working, you need to set up your
 .css files,  your rewrites.mfp file, and your priolist.mfp file.
@@ -512,8 +515,8 @@
 All of these files need to exist (either by being there, or by
 being symlinked to) the directory where CRM114 will "run in"
 when an actual mail comes in.  Usually this is your per-user
-directory on the mail server (if your mail server is also your
-home directory, then it's there.).    If this is inconvenient,
+directory on the mail server (if your mail server also provides your
+home directory, then it's there.).  If this is inconvenient,
 you can use the --fileprefix option on the command line to
 tell CRM114 to "change over" to a different directory.  The files
 that need to be in the home (or --fileprefix) directory are:
@@ -582,9 +585,9 @@
 whitelists", you can now say "yes, and they're even _prioritized_
 blacklists and whitelists!".

+  -----

-
-       Step 4 Part 1 - Setting up the Rewrites file.
+  Step 4 Part 1 - Setting up the Rewrites file.

 To set up the rewrites.mfp file, edit the file "rewrites.mfp" and
 replace the placeholders (in this case, "wsy", "merl.com", and
@@ -628,23 +631,23 @@
 router, etc, add lines in rewrites.mfp for each email name, email
 address, server, router, and so forth.  This is something you really
 _should_ do, if you have more than one email path leading to the
-account that leads to an account that is being filtered by CRM114 (if
+account that leads to an account that is being filtered by CRM114.  (If
 you don't, a lot of learning will have to be repeated for each path,
 which will cost you accuracy and use up valuable feature slots in the
 .css files that you could use in more valuable ways otherwise.  On the
 other hand, if you have multiple email addresses that all channel
-through one CRM114 fileset, and the addresses recieve very different
+through one CRM114 fileset, and the addresses receive very different
 ratios of spam and nonspam (or, very differnt *types* of spam), then
 it _might_ be to your advantage to not use rewrites.mfp, (just replace
 it with an empty file), so that the extra statistical information of
-the incoming email address is not lost)
+the incoming email address is not lost.)

 If all this confuses you to no end, just make rewrites.mfp be an
-empty file and everything should decently well.
+empty file and everything should work decently well.

-       -----
+  -----

-       Step 4 Part 2 - Setting up the .CSS files
+  Step 4 Part 2 - Setting up the .CSS files

 
 You have a choice here.  You can either build your own files from your
@@ -662,7 +665,7 @@

 If your mail service runs on your local machine (say, you have just
 one machine - and I do hope you have a firewall in that case), then
-mailfilter will almost certainly "run" in your home directory- the
+mailfilter will almost certainly "run" in your home directory - the
 directory you're in when you log in.

 If your mail service runs on a mail server (not your local machine),
@@ -695,7 +698,7 @@
 Once you have these empty files you will have a high (50% or so)
 error rate for the first few hours, till you have 'taught' CRM114
 what your particular mix of spam and nonspam looks like.  Proceed
-below to "Step 4: Configuring Mailfilter".
+below to "Step 5: Engaging Mailfilter".

 Many people want to "preload" their spam collection into CRM114.  This
 used to be a bad idea.  CRM114 is optimized for TOE learning - "Train
@@ -741,8 +744,8 @@

   -----

-    Step 4 Part 2 Method C - BETA TEST - Using mailtrainer.crm to
-    Build .CSS Files
+  Step 4 Part 2 Method C - BETA TEST -
+       Using mailtrainer.crm to Build .CSS Files

 New in 20060101 is the "mailtrainer.crm" program.  This program
 accepts two directories of "archetype" good and spam email, and runs
@@ -767,7 +770,7 @@

   -----

-      Step 4 Part 2 Method D - ALPHA TEST -- MAKEFILE Build And
+   Step 4 Part 2 Method D - ALPHA TEST -- MAKEFILE Build And
         Preload .CSS Files From Fresh Spam and Nonspam

  CAUTION - this applies ONLY to kits 20060606 and later!!!  DO NOT DO
@@ -816,7 +819,7 @@
 installs post 20060606 .  Versions prior to that will hose you if
 you do this.

- --------
+  -----

   Step 4 Part 3 - Checking your installation

@@ -893,6 +896,7 @@
 Note: this works fine for the default classifiers like Markov, OSB,
 and OSB Unique, but _not_ for Winnow, Hyperspace, or Corellative
 classifiers; for OSBF classifiers use osbf-util instead of cssutil.
+See ./CLASSIFY_DETAILS.txt for a description of the classifiers.

 Type in:

@@ -959,10 +963,12 @@
 there are similarities.  That's pretty much typical- and it's a good sign
 that your filtering should be quite accurate.

-Now, move on to "Step 4: Configuring Mailfilter".
+Now, move on to "Step 5: Engaging Mailfilter".
+
+
+------------------------------------------------------------------------
+------------------------------------------------------------------------

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------

 	Step 5: Engaging Mailfilter

@@ -985,7 +991,7 @@

   -----

-      Step 5 Method A: For Procmail and Maildrop Users
+  Step 5 Method A: For Procmail and Maildrop Users

 For Procmail users just add a procmail recipe to .procmailrc to run
 CRM114 and mailfilter whenever your other procmail rules fail to
@@ -1011,7 +1017,8 @@
 To use mailreaver instead of mailfilter, just put "mailreaver.crm"
 in instead of "mailfilter.crm" .

-If you get the test message, proceed to "Step 6: Training CRM114".
+If you get the test message, proceed to "Step 6: Training CRM114 and
+Mailfilter".

 -----

@@ -1059,7 +1066,6 @@
 ----------------------------------------------------------------------------
 ----------------------------------------------------------------------------

-
 Advanced Topic: Huge Emails and Denial Of Service Avoidance

 CRM114 has a number of built-in anti-Denial-of-Service (anti-DoS)
@@ -1089,10 +1095,9 @@
    mail/crm-spam

 
-
   -----

-    Step 5 Method B: The .forward hook file
+  Step 5 Method B: The .forward hook file

 For .forward hook users you should be aware that you should NOT put a
 direct link to crm in /etc/smrsh; since crm can do arbitrary things,
@@ -1118,7 +1123,8 @@
   ----

 Once you have engaged CRM114 mailfilter, you now get to train it to
-recognize spam and nonspam.  Proceed to "Step 6: Training CRM114".
+recognize spam and nonspam.  Proceed to "Step 6: Training CRM114 and
+Mailfilter".

 Note: CRM114 contains a design decision that you may have to play
 with.  Instead of doing memory management games, which both consume
@@ -1138,7 +1144,9 @@
 a buffer-shuffling dance to minimize time spent reclaiming and
 compactifying memory.

----------------------------------------------------------------------------
+
+------------------------------------------------------------------------
+------------------------------------------------------------------------

 
 	Step 6: Training CRM114 and Mailfilter
@@ -1162,16 +1170,15 @@
 interchangeably here; the instructions say "mailfilter.crm" but
 mailreaver.crm works exactly the same way from the user point of view.

-   * Mail-to-Myself with In-Line Commands to retrain  (Method A)
-   * shell commands to retrain  (Method B)
-   * Mutt direct interface    (Method C)
-   * Some Other Interface    (Method D)
-
+   * Method A: mail-to-myself with in-line commands to retrain
+   * Method B: shell commands to retrain
+   * Method C: Mutt direct interface
+   * Method D: some other interface

 
-Whatever Way You Train : try to train _approximately_ equal amounts of spam and
-nonspam.  If you are within 50% one way or the other, performance will
-be very good.
+Whatever Way You Train: try to train _approximately_ equal amounts of
+spam and nonspam.  If you are within 50% one way or the other,
+performance will be very good.

 If you are running mailfilter.crm:

@@ -1198,8 +1205,9 @@
 error per thousand).

 
+  -----

-     Step 6 Method A: Mail-to-Myself
+  Step 6 Method A: Mail-to-Myself

 The first way is to use the in-line command feature.  Just forward
 the mistake back to yourself, with full headers (except edit out any
@@ -1247,25 +1255,24 @@
 If you are a mailreaver user, you also have a priority system you can
 access, either by editing your priolist.mfp file directly or by
 sending youself email in the following forms (where mypwd is the
-command passworda_regex_pattern is what will be used for priority
+command password, and a_regex_pattern is what will be used for priority
 matching.  Priority matches can occur in both the headers and body of
 the text.)

     command mypwd maxprio +a_regex_pattern      - sets a maximum priority GOOD
     command mypwd maxprio -a_regex_pattern      - sets a maximum priority SPAM
-    command mypwd minprio +a_regex_pattern      - sets a maximum priority GOOD
-    command mypwd minprio -a_regex_pattern      - sets a maximum priority SPAM
+    command mypwd minprio +a_regex_pattern      - sets a minimum priority GOOD
+    command mypwd minprio -a_regex_pattern      - sets a minimum priority SPAM
     command mypwd delprio a_regex_pattern       - deletes the first priority
-                                                 list entry that fully matches
-                                                 the regex pattern
-
-
+                                                  list entry that fully matches
+                                                  the regex pattern

 
+  -----

-    Step 6 Method B: Shell commands to retrain
+  Step 6 Method B: Shell commands to retrain

-   >> For mailfilter users (mailreaver is different - skip to below! <<
+   >> For mailfilter users (mailreaver is different - skip to below! <<)

 The second way to train in spam and nonspam is to use mailfilter.crm's
 shell command line options.  When you find a spam that was mistakenly
@@ -1286,7 +1293,7 @@
  [[ If you are using mailreaver.crm instead of mailfilter.crm, and
  cacheing is enabled, you don't even need to pipe in the full text in,
  all that's needed is either the intact X-CRM114-CacheID: line or the
- Message-ID line containing an intact sfid.  That's another reason to
+ Message-ID line containing an intact SFID.  That's another reason to
  switch to mailreaver! :) ]]

               >> For mailreaver.crm users <<
@@ -1294,7 +1301,7 @@
 You're in luck, assuming you have taken the default and left cacheing
 turned on.  All you need to pipe into mailreaver for training is any
 text or text fragment containing an intact X-CRM114-CacheID: line or
-the Message-ID line containing an intact sfid; mailreaver will go get
+the Message-ID line containing an intact SFID; mailreaver will go get
 the exact incoming text of the message and train it, so you don't need
 to worry about munged headers.

@@ -1352,9 +1359,9 @@
                          file; instead use the file so noted.

 
+  -----

-
-     Part 6 Method C: For Mutt Users
+  Step 6 Method C: For Mutt Users

 (Contributed by Mathieu Doidy and Joost van Baal:)

@@ -1375,8 +1382,9 @@
    * esc-h will tag a message, falsely classified as spam, as ham.

 
+  -----

-    Part 6 Method D: Some Other Method
+  Step 6 Method D: Some Other Method

 
 There are at least five other ways to retrain CRM114.  Some interface
@@ -1458,12 +1466,11 @@
 of daily use and about a gigabyte of email).

 
-
------------------------------------------------------------------------
-
+------------------------------------------------------------------------
+------------------------------------------------------------------------

 
-     Step 7: Adding Priority Lists, Whitelists, and Blacklists
+	Step 7: Adding Priority Lists, Whitelists, and Blacklists

 If you really want, you can add white, black, and priority lists
 to CRM114.  Most people don't need them, but there are always
@@ -1497,10 +1504,11 @@

 Lastly (well, actually firstly, because prio-listing happens before
 whitelisting or blacklisting) any mail that matches any regex in
-priolist.mfp .  The format of priolist.mfp is that the first character
-on the line is a + or a -, which indicates "whitelist" or "blacklist",
-and the rest of the line is a regex.  These regexes are tested
-in the order given in the file.  An empty file is perfectly acceptable.
+priolist.mfp is handled.  The format of priolist.mfp is that the first
+character on the line is a + or a -, which indicates "whitelist" or
+"blacklist", and the rest of the line is a regex.  These regexes are
+tested in the order given in the file.  An empty file is perfectly
+acceptable.

 For examples of how to set up the whitelist, blacklist, and priolist
 files, see the included "whitelist.mfp.example", "blacklist.mfp.example",
@@ -1513,10 +1521,11 @@
 add, otherwise you may get a rude surprise some day.

 
-----------------------------------------------------------------
+------------------------------------------------------------------------
+------------------------------------------------------------------------

 
-        Step 8: Useful Utilities
+	Step 8: Useful Utilities

 You don't _need_ to know the stuff in this section to set up and use
 CRM114 and mailfilter or mailreaver, but it might be useful to you- or
@@ -1539,7 +1548,7 @@

 
                    The cssutil utility:
-
+                    -------------------

 Usage is

@@ -1560,9 +1569,7 @@
                 -s css-size  - if no cssfile found, create new
                                cssfile with this many buckets.
                 -S css-size  - same as -s, but round up to next
-                               2^n + 1 boundary.
-
-
+                               2^k + 1 boundary.

 
 	   	    The cssdiff utility
@@ -1572,8 +1579,7 @@

     ./cssdiff somefile.css anotherfile.css

-which writes out a summary of how two different .css files are.
-
+which writes out a summary of how different two .css files are.

 
                     The cssmerge utility
@@ -1600,19 +1606,16 @@
      -s NNNN      -new file length, if needed

 
-
-
-
-		Enlarging a .css file
-                ---------------------
+		    Enlarging a .css file
+                    ---------------------

 One of the advantages of CRM114 is that the .css files are relatively
 small and of fixed size; they don't grow out of control and never need
 trimming if you use <microgroom>, which is the default.

 The disadvantage of this is that if your spam/nonspam discrimination
-is too convoluted, it won't be able to sort them out ( in trek-speak
-this is a high-order nonlinearity in the discrimination function ).
+is too convoluted, it won't be able to sort them out (in trek-speak
+this is a high-order nonlinearity in the discrimination function).
 The fix in this situation is to increase the dimensionality of the
 feature space.  The number of dimensions is about 1/12 the number of
 bytes in the .css files; this works well at about a million dimensions
@@ -1640,7 +1643,7 @@

 You can even combine steps 1 and 2, because newer versions of cssmerge
 will create a new file if needed (the -s N flag sets the number of slots
-in the new file; -S N does the same thing but rounds up to a 2^N+1
+in the new file; -S N does the same thing but rounds up to a 2^k+1
 boundary, which is recommended ).

 For example, here's how to increase the size of the spam.css file
@@ -1657,6 +1660,7 @@
 --------------------------------------------------------------------

   		  APPENDIX 1
+
                Using mailtrainer.crm

 
@@ -1902,12 +1906,11 @@
 improve your accuracy still more.

 
+------------------------------------------------------------------------

----------------------------------------------------------------------
-
-That's all!  If you have errors or updates (or find bugs!) please
-let me know; the best way is to join the CRM114-general mailing list; it's
-on the webpage:
+That's all!  If you have errors or updates (or find bugs!) please let me
+know; the best way is to join the CRM114-general mailing list; it's on
+the webpage:

    http://crm114.sourceforge.net

@@ -1920,3 +1923,5 @@
 Enjoy, and good luck.

        -Bill Yerazunis
+
+[]

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
R A Lichtensteiger | 5 Jan 19:37

rewrites.mfp and IPv6 addresses ... seems to not blow up

Hoi Zaeme,

So, setting up crm114 for someone else led me to a seriously overdue
cleanup of my own rewrites.mfp file.  I decided that it's probably
just doing string replacements to normalize the data, so it should
deal just fine with an IPv6 colon separated address.

So far, it seems to do so just fine, so, Bill, I think you can call
crm114 "IPv6 compliant"

Reto
--

-- 
R A Lichtensteiger	rali@...

"I knew he was in love with himself, but I thought it was a summer thing!"
   -- Helen Hunt's character in "Twister"

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
Khalid Ahsein | 3 Jan 19:25
Picon

CRM114 php extension

Hello everybody,


Happy new year :-)

I just released a php extension of the C-callable library crm114.
I use it for a blogs platform, to detect the spam articles, since 2 weeks.

Your library work fine.

Are you agree to let me maintain a official php extension of the crm114 system to web developers ?

Here is the futur project website : http://gymx.net/php-crm114/
The libcrm114 is staticely compiled with the php module.

Based on your documentation, I have implemented the major functions.

I'm open to discuss with you for any improvements or modifications.

Regards,

Khalid Ahsein

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Crm114-general mailing list
Crm114-general@...
https://lists.sourceforge.net/lists/listinfo/crm114-general
Ger Hobbelt | 14 Oct 23:24
Favicon
Gravatar

test 2



--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   ger-23zkImhT7stBDgjK7y7TUQ@public.gmane.org
mobile: +31-6-11 120 978
--------------------------------------------------

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Crm114-general mailing list
Crm114-general@...
https://lists.sourceforge.net/lists/listinfo/crm114-general
Jason Lewis | 11 Oct 03:29
Picon

problem upgrading from 20070810-BlameTheSegfault to 20100106-BlameMichelson

Hi,

I tried to upgrade the version of crm I use from
20070810-BlameTheSegfault to 20100106-BlameMichelson

whenever I run the classify script I get errors (pasted below).

I run crm from a working directory .crm114working, and in that directory
I placed the new mailreaver.crm and mailtrainer.crm files. (are there
other files I should put there also?

And then in my procmail filter, crm gets called like this:

:0fw: .msgid.lock
| /usr/bin/crm -u /home/jason/.crm114working  -w 64000000 mailreaver.crm

I re-ran the command with -t to get the trace pasted below.

Any ideas what might be going wrong? Is there something I need to do to
my css files to upgrade?

Thanks,

Jason

Executing line 526 :
Statement 526 is non-executable, continuing.

Parsing line 527 :
 -->  {

Executing line 527 :
Statement 527 is an openbracket. depth now 1.

Parsing line 528 :
 -->  isolate (:stats:) //

Executing line 528 :
executing an ISOLATE statement

Parsing line 529 :
 -->  classify [:in_text:] <:*:clf:> /:*:lcr:/     
(:*:fileprefix:nonspam.css :*:fileprefix:spam.css) (:stats:)

Mode #25, 'osb' turned on.
Mode #28, 'unique' turned on.
Mode #22, 'microgroom' turned on.

Executing line 529 :
Performing variable restriction.
Variable before expansion ':in_text:' len 9
Variable after expansion: ':in_text:' len 9
Using variable ':in_text:' for source.
Found that variable
Checking restriction at start 9 len 0 (subscr=0)
Nothing more to do in the var-restrict.
 unique engaged -repeated features are ignored
Classify list: -nonspam.css spam.css-
Classifying with file -nonspam.css- succhash=0, maxhash=0
MMAPping file nonspam.css for direct memory access.
Classifying with file -spam.css- succhash=0, maxhash=1
MMAPping file spam.css for direct memory access.
Running with 2 files for success out of 2 files
Catching FAULT generated on line 529
FAULT reason:

/usr/bin/crm: *ERROR*
  This file should have learncounts, but doesn't, and the learncount
slot is busy.  It's hosed.   Time to die.
 Sorry, but this program is very sick and probably should be killed off.
This happened at line 529 of file /home/jason/.crm114working/mailreaver.crm
(runtime system location: crm_osb_bayes.c(904) in routine:
crm_expr_osb_bayes_classify)

Trying trap at line 677:
trap (:broken_program_message:) /.*/

This TRAP will trap anything matching =.*= .
TRAP matched.
Next statement will be 677
CLASSIFY was a FAIL, skipping forward.
Finished the program /home/jason/.crm114working/mailreaver.crm.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct

Gmane