Marc Stein | 12 Feb 19:16 2016
Picon

Command Line Strangeness

Hi,

I've been running the same command line configuration for over a year now with no problems. Today something seems to have changed.

java -cp weka.jar weka.classifiers.meta.Bagging -l Bagging2.model -T CL_1455294017.arff -p 1 -o

returns:

=== Predictions on test data ===

 inst#     actual  predicted error prediction (bankyears)
     1        1:?     1:Good       0.789 (2)
 

All good.

I have created a new model, same ARFF configuration and saved it into the local directory. It's named Bagging3.model.

When I run:

java -cp weka.jar weka.classifiers.meta.Bagging -l Bagging3.model -T CL_1455294017.arff -p 1 -o

I see:

java.lang.NullPointerException
at weka.core.Attribute.equalsMsg(Attribute.java:478)
at weka.core.Instances.equalHeadersMsg(Instances.java:638)
at weka.core.Instances.equalHeaders(Instances.java:655)
at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1423)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:359)
at weka.classifiers.meta.Bagging.main(Bagging.java:787)  

Basically, any model that I saved in the past works fine, but new ones all throw errors.

Any idea what is going on?

Previous models were built under Java 7, now runniing Java 8. 

Weka version is 3.7.13
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mahesh Pal | 12 Feb 10:27 2016
Picon

M5 model

Hi all,

After running M5 model tree on my data, a part of model of M5 model tree looks like below:

M_ACC <= 8.5 :

|   |   |   HC <= 2.5 :

|   |   |   |   B/C <= 0.5 : LM3

|   |   |   |   B/C >  0.5 : LM4

|   |   |   HC >  2.5 :



The question is: my data does not have HC values in decimal (it is a number like 2,3 or4), they why M5 model tree is using a value 2.5 here to create the model? Is it because of the binarisation of attribute as discussed in "Inducing model trees for continuous classes" by Wang and Witten.

regards,

Mahesh

--
Mahesh Pal
Department of Civil
NIT Kurukshetra, 136119
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Saravanan vijayakumar | 10 Feb 07:31 2016
Picon

Can't find scheme weka.classifiers.functions.LibSVM, or it is not runnable

Dear WEKA users,

I have faced a strange problem while executing the following command in terminal

java -classpath weka.jar weka.Run weka.classifiers.functions.LibSVM -T xx.arff -l xy.model -p 0

it throws an error "Can't find scheme weka.classifiers.functions.LibSVM, or it is not runnable"

I am using weka 3.7.5 and  java build 1.7.0_79-b15 and Centos 6.5. The reason i mentioned about the version of OS and Java is because the same command executes perfectly with different OS and different version of JAVA. The command works fine in CentoOS 6.6 and Fedora 21 with Java 1.7 and 1.8 respectively, but with similar Java version it throws error in CentOs6.5 and Fedora 19. Kindly someone help me to understand where the problem is!

Thanks in advance.

--
~~~~~~~~~~~~~~~~~~~~
Dr. Saravanan.V, 
DSK Post-Doctoral Fellow,
Centre for Advanced Study in Crystallography and Biophysics
University of Madras,
Alternate mail: brsaran <at> rediffmail.com; brsaran <at> bicpu.edu.in
Mb: +919884266881
~~~~~~~~~~~~~~~~~~~~~~~~~
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Phuong Pham | 9 Feb 20:11 2016
Picon

Different prediction index when flipping the setProbabilityEstimates option

Hi everyone,
My current problem is I have different results when using a model trained with setProbabilityEstimates=true VS a model trained with setProbabilityEstimates=false.

I have attached a pair of train and test files (toy problem, very small) with 2 instances in the test set.
Using setProbabilityEstimates=false; RBFSVM predicts both test instances as 0 (class index).
On the other hand, setProbabilityEstimates=true; RBSVM predicts both test instances as 1 (class index). I also checked the output probabilities and found out the class index =1 is correct.
There is a small java test file to run 2 data sets have been attached.

Would you mind tell me what I have done wrong here?
Thank you very much,
Phuong
Attachment (Example.java): application/octet-stream, 4036 bytes
Attachment (test.arff): application/octet-stream, 737 bytes
Attachment (train.arff): application/octet-stream, 7586 bytes
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
ahmed Iraq | 9 Feb 13:04 2016

Best documents Classification Method for topics cover ?

Dear all 

I need a way to clarify The dominant of documents  for the following data
set , the following data set produced after pre-processing all docs ,
 the following selected topics frequencies are follow : 
I have not represent other not interested words .. some of these words may
has high frequency ? 

TOPICS 
     id      Doc-name      total words        Politics        sport        
food         animals      
     1          doc1              1000              210            250          
100           350
     2          doc2              2000             1000           200          
200           400
     3          doc3              4000              500            100         
2000          200
   etc... 

questions are : is there any classification method to classify the document
dominant for this kind of data set ? 
if I consider doc1 is animals is this true ?
 is there any way to calculate probability of each topic in that document to
find doc dominant topic ? 
any suggestion please ?

note : 

 
I have calculated as follow : suppose document x contain the following words
in its sentences { dog , monkey , birds , cat , spider , cat , donkey ,
monkey } then the animals topics will become for this document x is 8 ..
etc. 

--
View this message in context: http://weka.8497.n7.nabble.com/Best-documents-Classification-Method-for-topics-cover-tp36567.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Sandro Puhmeister | 9 Feb 11:15 2016
Picon

Creating a new Weka Instances set in Java

Hi,

I'm trying to create some empty training sets with the following constructor: 
Instances(java.lang.String name, java.util.ArrayList<Attribute> attInfo, int capacity)

The problem is, that I don't use an ArrayList<Attribute> object at all. I have all my numeric data stored in a simple List 
like so: List <double[]> data = new LinkedList<double[]>();

When I create each Instance I use Instance= new Instance(1.0, data.get(n));
Data is extracted from some processed images (this means, no ARFF files or any other kind).

What can I set as the second parameter to the new Instances object when building a training 
set as described above? Do I need to define a new constructor or am I missing something?

Thank you!

Best Regards.

Sandro
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 8 Feb 21:36 2016
Picon
Picon

Re: Filter didn't make the test instance immediately available!

On 9/02/16, 3:14 AM, "Alexia" <wekalist-bounces <at> list.waikato.ac.nz on behalf of alexietta <at> gmail.com> wrote:

Hi everyone,
it's my first time using this list, but I hope I can find some answers :)

I'm trying to do some sentiment analysis on twitter by using java with weka's libraries and a balanced dataset I found online. (I'm using Weka 3.6.13)
My approach on the problem so far is the following:

1. Create the model by using 10-fold cross validation on a filtered classifier, which uses naiveBayesian classifier and as a filter a Multifilter. The multifilter uses a StringToWordVector as first, and then an AttributerSelection filter (with InfoGain and Ranker)

2.Using the created .model file, classify unseen istances using "model.classifyInstance(unlabeled.instance(i));"

For the first part I've no problems, but when I try to classify a string using the loaded *.model I get the error: "Filter didn't make the test instance immediately available!"
I've read online that is could be an issue related to the attribute selection part, but I don't get how to actually fix it.
Any clue?

I can’t reproduce this problem using a Reuters dataset and the Explorer. Does your test data have the same structure as the training data? Including the class attribute (you can use missing value as the class value for unlabelled test instances)? If everything is the same, does it work when you load your model into the Explorer (or run from the command line)?

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Alexia | 8 Feb 15:14 2016
Picon

Filter didn't make the test instance immediately available!

Hi everyone,
it's my first time using this list, but I hope I can find some answers :)

I'm trying to do some sentiment analysis on twitter by using java with weka's libraries and a balanced dataset I found online. (I'm using Weka 3.6.13)
My approach on the problem so far is the following:

1. Create the model by using 10-fold cross validation on a filtered classifier, which uses naiveBayesian classifier and as a filter a Multifilter. The multifilter uses a StringToWordVector as first, and then an AttributerSelection filter (with InfoGain and Ranker)

2.Using the created .model file, classify unseen istances using "model.classifyInstance(unlabeled.instance(i));"

For the first part I've no problems, but when I try to classify a string using the loaded *.model I get the error: "Filter didn't make the test instance immediately available!"
I've read online that is could be an issue related to the attribute selection part, but I don't get how to actually fix it.
Any clue?

Thank you,
Sara.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Arturo M. Mora | 8 Feb 10:48 2016
Picon

GUI vs command line: different selected attributes?

I am doing feature selection using supervised PCA+ranker. However, the results I am obtaining from GUI and command line are not the same, even though the data and parameters are exactly the same.
I am using weka V. 3.6.12 for both opening the GUI and to execute the .jar file from command line.
Below I am posting results obtained from GUI and then from command line:
-------- GUI options -----------
In "Preprocess" tab, as filter I have:
weka.filters.supervised.attribute.AttributeSelection -E "weka.attributeSelection.PrincipalComponents -R 0.95 -A 5" -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1"
-------- GUI results ------------
<at> relation 'features_trainString_fold0-weka.filters.supervised.attribute.AttributeSelection-Eweka.attributeSelection.PrincipalComponents -R 0.95 -A 5-Sweka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1'

<at> attribute 0.104f76+0.104f13-0.104f34-0.104f75+0.104f15... numeric
<at> attribute 0.195f27+0.183f68+0.174f99-0.173f52-0.17f14... numeric
<at> attribute -0.222f29-0.211f31+0.205f77-0.173f6-0.172f190... numeric
<at> attribute 0.228f1+0.225f86+0.199f47+0.187f185+0.177f61... numeric
.....
<at> attribute -0.246f163+0.214f161-0.212f144+0.209f151-0.196f124... numeric
<< In total, there are 62 transformed attributes. >>

<at> data
2.820436,10.004235,-4.873166,-4.664959,-4.810534,...
13.207034,-6.98444,-2.161941,0.057519,4.262764,...
...

---------- Command line weka call ----------
java -XX:+UseSerialGC -Xmx6g -classpath $CLASSPATH:weka.jar weka.filters.supervised.attribute.AttributeSelection \
-b \
-i "$fnFeaturesTrain_weka" -o "$output_fnFeaturesTrain_weka" \
-r "$fnFeaturesTest_weka" -s "$output_fnFeaturesTest_weka" \
  -E "weka.attributeSelection.PrincipalComponents -R 0.95 -A 5" \
  -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" 
-------- Command line output --------------
<at> relation 'features_polyAV2_trainString_fold0-weka.filters.supervised.attribute.AttributeSelection-Eweka.attributeSelection.PrincipalComponents -R 0.95 -A 5-Sweka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1'

<at> attribute -0.104f76-0.104f13+0.104f34+0.104f75-0.104f15... numeric
<at> attribute 0.195f27+0.184f68+0.175f99-0.174f52-0.17f14... numeric
<at> attribute 0.221f29+0.211f31-0.205f77+0.173f6+0.172f190... numeric
<at> attribute 0.227f1+0.224f86+0.199f47+0.188f185+0.177f61... numeric
....
<at> attribute 0.301f142-0.257f176+0.245f134-0.242f139-0.229f137... numeric
<< In total, there are 63 transformed attributes >>

<at> data
-2.848116,10.061543,4.799405,-4.700467,4.803406,...
-13.226917,-6.94257,2.276386,0.087506,-4.26625,...
....


Now, here are the key points:
1 - why do command line weka reports an extra attribute compared to those obtained by using GUI results?
2 - If you look closely, the attribute transformations are quite similar but in some cases with inverted sign, I don't think this is an issue as long as the sign for an attribute x is inverted for all samples.
3 - Actual attribute values in <at> data are quite similar however not the same, e.g.,
first sample: -2.848116,10.061543,4.799405,-4.700467,...
first sample: 2.820436,10.004235,-4.873166,-4.664959,...
Is this due to some rounding artifacts? 

I did this test after seeing that my classification results improved by 3-5% on different problems just by using the transform PC data to train/test the models.
As a sanity check, just to make sure test data presented by using batch command (-b option) is not affecting in anyway the PCA model generated from training (I know it should not at all, since only training data should be used to generate a model and then testing would use that model to get processed).

Please let me know if the points I listed above are normal differences expected to happen?

Thank you very much.
Regards



This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Fridon Sattari | 6 Feb 06:21 2016
Picon

implementation of A2DE algorithm

Hi All.
I am SATTARI from Iran and study computer science.
I want to implementation the A2DE algorithm in my paper with WEKA API. But I cant fiend any help page for this package . Please help me how we can the example for implementation A2DE classifier in test and train dateset.
I want to write this code like other classifier:
A2DE aode0 = new A2DE();
aode0.buildClassifier(newAttTrain);
Evaluation evalC = new Evaluation(newAttTrain);
evalC.crossValidateModel(aode0, newAttTrain, 10, new Random(1), (Object??);
But I can't understand what is the Object?
Please help me!!!!
Excuse me! My English is not good.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Soheila Masoudian | 5 Feb 18:43 2016
Picon

map test set to train set

I want to map attributes of a test set to a train set (I know there's other ways to have mapped train and test set, but for some reason I cannot use them).
the attributes are tokens of a string (output of StringToWordVector) like: 

<at> attribute hello numeric

I wrote the following code to do the mapping, but it seems that it's wrong! 

for (int i = 0; i < test.numInstances(); i++) {

Instance newInst = new SparseInstance(0);
newInst.setDataset(newTest);
List<Attribute> attrList = train.get_AttributeSet();

for (Attribute att : attrList) {
double val = test.get(i).value(att);
newInst.setValue(att, val);
}
newTest.add(newInst);
}The problem seem to be at "val = test.get(i).value(att);" since the "att" is an attrubte from train set and has different attribute index.so how can I pick an attribute from train set and then get its value from test set? Thanks in advance.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane