Myriam Abramson | 1 Aug 14:23 2014
Face
Picon

WordTokenizer


Hello,

I am getting something weird with  WordTokenizer. I am trying to
tokenize urls as follows:

java weka.filters.unsupervised.attribute.StringToWordVector -i mytest.arff -W 10 -tokenizer
"weka.core.tokenizers.WordTokenizer -delimiters \" ()\!,;/:_#-.='&?~\"" -o temp.out

mytest.arff is as follows:

 <at> relation ds

 <at> attribute url string
 <at> attribute genre { forum
,article,profile,error,wiki,product,document,blog,serp,links,portal,comment,e-shop }

 <at> data

http://www.reddit.com/user/Shadow_Jack,forum
http://www.reddit.com/r/RioGrandeValley/comments/2b6a49/dd/cj2zwka,comment
http://www.merriam-webster.com/,portal
https://www.google.com/search?client=ubuntu&channel=fs&q=causative+attack&ie=utf-8&oe=utf-8,serp

temp.out is as follows: Although I specified -W 10, I get more than 10 words, and the first instance does not
contain the genre which is not a string and should not have been touched. What's going on? I am using
Weka-3.7.11. Thanks.

 <at> relation
'ds-weka.filters.unsupervised.attribute.StringToWordVector-R1-W10-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer
(Continue reading)

ajitha padmanabhan | 1 Aug 11:35 2014
Picon

Weka Server-Nothing happens

I am using Windows 7, laptop and weka 3-7, When I started WekaServer in command prompt like

C:\Program Files\Weka-3-7>java weka.Run WekaServer -host ajitha -port 8088
2014-07-30 14:49:25.361::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2014-07-30 14:49:25.431::INFO:  jetty-6.1.21
2014-07-30 14:49:25.748::INFO:  Started SocketConnector <at> ajitha:8088
[WekaServer] Starting purge thread.
[WekaServer] Starting schedule checker.

without hostname and port also I get the same one.
Nothing happens.Its the same
How to proceed further.

Regards
Ajitha
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Andrei Perhinschi | 31 Jul 19:08 2014
Picon

JCBA with PredictiveApriori issue

Hello all,

 

I am running into an error when attempting to use the JCBA classifier (part of the classAssociationRules package). When using the default Apriori class association rule miner everything works great. If I switch to the PredictiveApriori CAR miner then I get the following error:

 

Problem evaluating classifier: invalid class index

 

This happens regardless of whether or not the CAR option is set to true or false (the way I understand it is the class index value is only used when this is set to true) or what the class index value is set to, although I usually leave it on default (-1). My weka version is 3.7.11, classAssociationRules 1.0.3, and Java 1.7.0_65.

 

Thank you for your help,

Andrei

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Martin O'Shea | 31 Jul 12:40 2014

Which .jar files using Weka and LibSVM in my own Java code

Hello

 

Apologies if part of this is a duplicate of another posting (list.waikato.ac.nz/pipermail/wekalist/2014-July/061555.html)  but I am really stuck at the moment and would appreciate any help available.

 

If I use Weka Explorer to run some training data against testing data using SVM with a linear kernel, everything is fine.

 

But I need to do this programmatically in my own Java. But at present my code fails at the line eval.evaluateModel(libsvm, test); in the code below:

 

Instances train = new Instances (...);

train.setClassIndex(train.numAttributes() - 1);

Instances test = new Instances (...) +

ClassificationType classificationType = ClassificationType.DAO("LIbSVM");      

LibSVM libsvm = new LibSVM();

String options = (classificationType.getParameters());

String[] optionsArray = options.split(" ");                 

libsvm.setOptions(optionsArray);

String[] pars = libsvm.getOptions();

Evaluation eval = new Evaluation(train);

eval.evaluateModel(libsvm, test);

System.out.println(eval.toSummaryString("\nResults\n======\n", false));

 

In the try...catch block around this code, the exception occurring is simply reported as ‘null’ (the full stack trace is displayed as a PS below.

 

So my question concerns the use of .jar files in the library folder, or CLASSPATH, of my application: which .jar files should I be using? I know that I need weka.jar and presumable the wrapper file LibSVM.jar, but do I need anything else? What about libsvm.jar as provided by the download from:

 

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

 

If the latter is the case, how can I resolve naming conflicts in Windows where LibSVM.jar and libsvm.jar are treated as the same file?

 

Thanks

 

Martin O’Shea.

 

PS: The full stack trace for the Java exception is:

 

null

weka.classifiers.functions.LibSVM.distributionForInstance(LibSVM.java:1489)

weka.classifiers.Evaluation.evaluationForSingleInstance(Evaluation.java:1560)

weka.classifiers.Evaluation.evaluateModelOnceAndRecordPrediction(Evaluation.java:1597)

weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1477)

visualRSS.test.Weka_LibSVM_Test.classify(Weka_LibSVM_Test.java:48)

visualRSS.initialisation.TestProgram_Context_Listener.contextInitialized(TestProgram_Context_Listener.java:29)

org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3972)

org.apache.catalina.core.StandardContext.start(StandardContext.java:4467)

org.apache.catalina.core.StandardContext.reload(StandardContext.java:3228)

org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:943)

org.apache.catalina.manager.ManagerServlet.doGet(ManagerServlet.java:361)

javax.servlet.http.HttpServlet.service(HttpServlet.java:617)

javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:558)

org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)

org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)

org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579)

org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1555)

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sowmya V.B. | 31 Jul 10:03 2014
Picon

Feature selection followed by 10-fold CV.

Dear All,

This message is about using Feature Selection on the entire dataset and later using 10-fold CV for evaluation.

my questions:

1) Is it better to perform feature selection on a part of the training set instead of doing it with the entire training set, especially when we are doing a 10-fold CV later? For example I used SVMAttrEval for feature selection first followed by SMO for classification. Since SVMAttributeEval too uses SMO internally for evaluating the attributes, am I not cheating?

2) Is it possible to choose attributes based on 10-fold CV using SVMAttributeEval?

Cheers,
Sowmya.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hamid Ghofrani | 31 Jul 00:16 2014
Picon

How to label data inside the ARFF file


Dear All,
I have already send this question to this link but no response yet,


I have the following dataset

Murder Assault UrbanPop Pct Alabama 13.2 236 58 21.2 Alaska 10.0 263 48 44.5 Arizona 8.1 294 80 31.0

each row has a label (Albama,Alaska,..). I want to include these labels inside my arff file such that after running my analysis and saving it into results.arff file, the predictions will include the labels.
I thought I add the labels as an attribute in the arff file and then exclude it from the model. Something like this

<at> attribute Label string <at> attribute Murder numeric <at> attribute Assault numeric <at> attribute UrbanPop numeric <at> attribute Pct numeric <at> data Alabama,0.97566,-1.122001,0.439804,-0.154697 Alaska,1.930538,-1.062427,-2.0195,0.434175 Arizona,1.745443,0.73846,-0.05423,0.826264

Then I used weka.filters.unsupervised.attribute.remove -R 1 to remove the label and then ran my analysis. The problem is when I save the results into arff file, the label is gone so How can I use the label without making it an attribute such that I will also appear in the output but won't affect the Weka analysis?

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jeff Pattillo | 30 Jul 16:07 2014
Picon

Negative Weights

My company has a process with many tasks and they want to weight the tasks based on how important they are to success.  They want to use this to assign grades between 0 & 1 to the contractors performing the tasks, so that contractors with higher grades should be more likely to bring success.

The weights need to be transparent so the contractors know how they are being graded.  If I perform logistic regression, some of the tasks receive negative weights, which makes no sense in this context.  Would y'all suggest a Bayesian algorithm for this?  Or something else entirely?

Also, for future knowledge, is there such a thing as constrained logistic regression?  Could you drop the tasks with negative weight and use maybe isotonic regression to recalibrate the grades to be the full range between 0 & 1?  I'm sure those are stupid ideas, just wondering what gets done in practice...

Jeff
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Fabio Minerva | 29 Jul 20:54 2014
Picon

Evaluate Confidence in Co-Training

Hello all, 

in according to co-training algorithm, each classifier predict an unlabel instance. Then for each class C the most confident instance of classifier A and B are put into the set of label instances L. Using API the method which gives to me the confident of an instance is distributionForInstance(Instance). 

Trying both method classifyInstance(Instance) and distributionForInstance(Instance), i obtain the same results, how is it possible ? So the confident is equal to of the prediction of that instance? I have this dark point.

Thanks for the help 

FM
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Andrei Perhinschi | 28 Jul 20:13 2014
Picon

CARuleMiner interface error with JCBA

Hello all,

 

I am trying to run the CBA classifier (JCBA) found in the “classAssociationRules” package (ver. 1.0.1). I keep encountering the following error:

 

Exception in thread "Thread-9" java.lang.NoSuchMethodError:

weka.associations.CARuleMiner.mineCARs(Lweka/core/Instances:)[Lweka/core/FastVector;

                                weka.classifiers.rules.car.JCBA.buildClassifier(JCBA.java:529)

                                weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1378)

                               

                                at weka.classifiers.rules.car.JCBA.buildClassifier(JCBA.java:529)

                                at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1378)

 

Attempting to run the WeightedClassifier classifier also included in the classAssociationRules package results in the same error (instead of JCBA.java the message lists WeightedClassifier.java but the error is the same). This leads me to believe the issue is with the CARuleMiner interface and not with the individual classifiers. I am using Weka 3.7.11 on Java 1.7.0_65, if any other information could help please let me know. If anybody has any ideas what the issue could be or how I could get around this that would be very appreciated.

 

Thank you for your help

Andrei

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Tan Kean Seng | 28 Jul 08:22 2014
Picon

Method to use FURIA classifier in Weka

I would like to use a classifier named FURIA in Weka. The training dataset that I used was Iris (which have 4 numeric type attributes and a nominal type class label, 150 instances). I choose FURIA as the classifier. However after I press the "Start" button, Weka start running but stop immediately (Weka bird at the bottom right of the window is stop running) and the status column is showing "Building model on training data...". 

I have no idea what is going on. I try on different datasets like Nursery (8 nominal type attributes and a nominal class label, 12960 instances) and others which I downloaded from UCI but all shows the same situation, weka just stop running after I press the "Start" button and showing the status "Building model on training data...".

My question is firstly, how to use the classifier FURIA in Weka? 
Secondly, do I need to make any special settings before I use the classifier (I use default settings for the classifier)? 

That is all of my question.

Thank you.



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
laurie bayet | 27 Jul 20:50 2014
Picon

Re: Wekalist Digest, Vol 136, Issue 55

hi,
sorry about the late reply. 
thank you for your answer, based on your advice i managed to get a proper set-up !
cheers,
laurie 


2014-06-30 2:00 GMT+02:00 <wekalist-request <at> list.waikato.ac.nz>:
Send Wekalist mailing list submissions to
        wekalist <at> list.waikato.ac.nz

To subscribe or unsubscribe via the World Wide Web, visit
        http://list.waikato.ac.nz/mailman/listinfo/wekalist
or, via email, send a message with subject or body 'help' to
        wekalist-request <at> list.waikato.ac.nz

You can reach the person managing the list at
        wekalist-owner <at> list.waikato.ac.nz

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. How to find a specific instance in text classification    when
      using Weka Classifier Visualizer (Amir H. Jadidinejad)
   2. Re: How to find a specific instance in text       classification
      when using Weka Classifier Visualizer (Eibe Frank)
   3. KnowledgeFlow setup for separate training and test sets (test
      labels unknown) using FilteredClassifier? (laurie bayet)
   4. Re: KnowledgeFlow setup for separate training and test sets
      (test labels unknown) using FilteredClassifier? (Mark Hall)


----------------------------------------------------------------------

Message: 1
Date: Sun, 29 Jun 2014 04:19:48 -0700
From: "Amir H. Jadidinejad" <amir.jadidi <at> yahoo.com>
To: Weka Mailing List <wekalist <at> list.waikato.ac.nz>
Subject: [Wekalist] How to find a specific instance in text
        classification  when using Weka Classifier Visualizer
Message-ID:
        <1404040788.99972.YahooMailNeo <at> web163804.mail.gq1.yahoo.com>
Content-Type: text/plain; charset="us-ascii"

I'm doing text classification and the process is as follows:
        * Load text files using "TextDirectoryLoader"
        * Convert text to vector using "StringToWordVector"
        * Apply SVM
Currently, I want to analysis the classifier's error. So leverage "Weka Classifier Visualizer", this is a beautiful confusion matrix. When I click on a specific instance, a popup window is shown. This window contains "Instance: X" where X is a number. Unfortunately I can't access to the original text document using this ID.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20140629/e167af7c/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 30 Jun 2014 09:12:41 +1200
From: Eibe Frank <eibe <at> waikato.ac.nz>
To: "Weka machine learning workbench list."
        <wekalist <at> list.waikato.ac.nz>
Subject: Re: [Wekalist] How to find a specific instance in text
        classification when using Weka Classifier Visualizer
Message-ID: <BB666CB0-1C52-4065-97CF-7C3B8F91FCF5 <at> waikato.ac.nz>
Content-Type: text/plain; charset=windows-1252

Use the AddID filter to add an ID attribute to your data. Then, wrap your classifier into a FilteredClassifier that applies the Remove filter to remove the ID attribute.

Cheers,
Eibe

On 29 Jun 2014, at 23:19, Amir H. Jadidinejad <amir.jadidi <at> yahoo.com> wrote:

> I'm doing text classification and the process is as follows:
>       ? Load text files using "TextDirectoryLoader"
>       ? Convert text to vector using "StringToWordVector"
>       ? Apply SVM
> Currently, I want to analysis the classifier's error. So leverage "Weka Classifier Visualizer", this is a beautiful confusion matrix. When I click on a specific instance, a popup window is shown. This window contains "Instance: X" where X is a number. Unfortunately I can't access to the original text document using this ID.
>
> Thanks.
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.waikato.ac.nz
> List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



------------------------------

Message: 3
Date: Sun, 29 Jun 2014 01:13:12 +0200
From: laurie bayet <lauriebayet <at> gmail.com>
To: wekalist <at> list.waikato.ac.nz
Subject: [Wekalist] KnowledgeFlow setup for separate training and test
        sets (test labels unknown) using FilteredClassifier?
Message-ID:
        <CAGHpV7z62R5WsD6+7swOnKKO6pDuONL5C0b4PGgDWFH_CdxJAw <at> mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I am new to Weka and using it for the prediction of unknown labels for the
first time.

I am using the "*KnowledgeFlow*" GUI in *Weka 3.6.11* and cannot find a way
to set up the workflow properly.

*The training and test sets come from two different .csv files.  *Both .csv
have the same number of columns and the same header.

*The last column of the test set (the Class column) is filled with NaNs as
the labels are unknown.*

I m trying to use the *FilteredClassifier* but either I get errors, or
nothing happens (the test data does not go through but no error is thrown).

A detailed explanation with screenshots and logs is posted on StackExchange
here:
http://stats.stackexchange.com/questions/105150/weka-test-set-not-getting-processed-in-knowledge-flow-classassigner-fails

Since posting on StackExchange I also tried to put the "NumericaToNominal"
filter (the class variable is binary 0/1 so has to be converted to nominal)
before the "ClassAssigner" on the SETUP 2 described on StackExchange, with
no luck (as in SETUP 2, no error is thrown but there is no output, it seems
that the test data processing quietly stops after "TestSetMaker").

If necessary I can provide the .csv files for training and test sets, but
my guess is that my problem is  very noobish and that its cause will be
obvious to anyone familiar with Weka :-)

Thank you for any help or cue to the solution !

Cheers,
Laurie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20140629/92a5e777/attachment-0001.html>

------------------------------

Message: 4
Date: Mon, 30 Jun 2014 10:03:25 +1200
From: Mark Hall <mhall <at> waikato.ac.nz>
To: "Weka machine learning workbench list."
        <wekalist <at> list.waikato.ac.nz>
Subject: Re: [Wekalist] KnowledgeFlow setup for separate training and
        test sets (test labels unknown) using FilteredClassifier?
Message-ID: <CFD6E335.1B2F3%mhall <at> waikato.ac.nz>
Content-Type: text/plain;       charset="UTF-8"

On 29/06/14 11:13 am, "laurie bayet" <lauriebayet <at> gmail.com> wrote:

>Hello,
>I am new to Weka and using it for the prediction of unknown labels for
>the first time.
>
>I am using the "KnowledgeFlow" GUI in Weka 3.6.11 and cannot find a way
>to set up the workflow properly.
>
>The training and test sets come from two different .csv files.  Both .csv
>have the same number of columns and the same header.
>
>The last column of the test set (the Class column) is filled with NaNs as
>the labels are unknown.
>
>I m trying to use the FilteredClassifier but either I get errors, or
>nothing happens (the test data does not go through but no error is
>thrown).
>
>A detailed explanation with screenshots and logs is posted on
>StackExchange here:
>http://stats.stackexchange.com/questions/105150/weka-test-set-not-getting-
>processed-in-knowledge-flow-classassigner-fails
>
>
>Since posting on StackExchange I also tried to put the
>"NumericaToNominal" filter (the class variable is binary 0/1 so has to be
>converted to nominal) before the "ClassAssigner" on the SETUP 2 described
>on StackExchange, with no luck (as in SETUP 2, no error is thrown but
>there is no output, it seems that the test data processing quietly stops
>after "TestSetMaker").
>
>If necessary I can provide the .csv files for training and test sets, but
>my guess is that my problem is  very noobish and that its cause will be
>obvious to anyone familiar with Weka :-)
>
>
>Thank you for any help or cue to the solution !


First of all you should verify that the ARFF structure created by the two
CSVLoaders in you flow is compatible in terms of the order of the
attributes and the order of values for nominal attributes. Connect a
TextViewer directly to each CSVLoader and compare the <at> attribute
definitions of the two datasets carefully. If there are differences then
you would be best to create two ARFF files to use with a unified header.

In your first flow the main problem is that you are assigning the class
attribute within the context of a MultiFilter in the FilteredClassifier.
All Weka classifiers expect a class attribute to be set in the data which
enters the buildClassifier() method - this is not the case with your setup
as the class attribute will not be set before the data passes through the
filters.

The problem with the second flow is that you are using two ClassAssigner
filters to set the class attribute for the training and testing data
separately. You should use one ClassAssigner filter with both the
trainingSet and testSet connections going into it. This is because the
filtering in the Knowledge Flow is batch based - i.e. there is a
difference between a training and test set in this context. In the case of
a test set connection, it expects that the filter has been
trained/initialized on a training set first. If you want to set the class
attribute separately in the training and testing portions of the flow then
there is actually a dedicated Knowledge Flow component (also called
ClassAssigner unfortunately) that can be found under the ?Evaluation? tab.
I admit that the error reporting leaves something to be desired in both
these examples :-)

If you are willing to use the development version of Weka (3.7) you?ll
find that the KnowledgeFlow has undergone a radical makeover, and is now
much nicer to use (with additional functionality over 3.6 too).

Cheers,
Mark.





------------------------------

_______________________________________________
Wekalist mailing list
Wekalist <at> list.waikato.ac.nz
http://list.waikato.ac.nz/mailman/listinfo/wekalist


End of Wekalist Digest, Vol 136, Issue 55
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane