Alim Samat | 31 Oct 14:33 2014
Picon

call attribute selection from weka in matlab

Dear all,

I would like to use some attribute selection methods from weka, then use the slected attributes in my classifier which is written in Matlab.
My question is
command for calling attribute selection methods  in SimpleCLI:
java weka.attributeSelection.CfsSubsetEval -M -s "weka.attributeSelection.GreedyStepwise -D 1 -N 5" -i F:\mytrain.arff

What will be the command in Mallab? Like we can directly use weka.classifiers.Evaluation.evaluateModel(classifiername,paramtersset) for a classification problem in matlab.

Also, I would like to return the string results to a .txt file.

thank you very much.



bests,



Alim·Samat

PhD Student
Department of Geographical Information Science, Nanjing University,
No.163, Xianlin Avenue, Qixia District,
Nanjing, Jiangsu Province, China, 210023.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Eman | 30 Oct 18:54 2014
Picon

how to eveluate my result

Dear Weka list users,

I tried to use 75% training /25% testing to run my experiments in Java by
using some method in Weka commend line .

So, I 've got 1 result for Accuracy , and 1 result for incorrect instances .
In my work now to run the same method 5 many times (in each time use of 75%
training /25% testing  but with different random selection of training/
testing ).  

so, my question are :
1) how and where could I keep 75% training /25% testing  (is it in file or
is there any way to keep them on it then deal with them )? because I tried
in file but it takes lots of space in RAM.

2) Also, I found in several papers, there is method that used to evaluate
the result many times , by using standard error , like Mean+/_ standard
deviation .

So, is there any available method in Weka can be used to give like this
evaluation?

Best Regards

--
View this message in context: http://weka.8497.n7.nabble.com/how-to-eveluate-my-result-tp32575.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

double d s | 29 Oct 06:07 2014
Picon

Cluster

Hi all,

I am trying to determine "number of clusters" that might be useful for dataset. To do so, I applied several clusters' algorithms such as EM, Cobweb and SimpleKMeans. Each one produces different clusters' results (clusters' number).

My question:
What is the best (standard way) that provides a number of clusters in Weka. 

Thank you very much.
Sandler


 
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Suranga Kasthurirathne | 29 Oct 03:41 2014
Picon

Principal components : total of variance exceeds 100% for large data matrix


Hi everyone,

I'm new to Weka and data analysis, but i'm trying to use it (and principal components) to perform dimensionality reduction.

My dataset is 7000 rows long and 35000 columns wide. the class attribute for each outcome can be one of four values - a,b,c or d.
So even though there are 7000 rows of data, they result in only 4 possible outcomes. Plus a considerable majority of the columns have a value of zero.

When I perform a principal component analysis using Weka (and the specified defaults), it produces a list of ranked attributes. However, the total variance of these attributes don't sum to 100%, instead they're much larger.
Can anyone suggest whats wrong ? is it because of the zeros in my data, or because I have only four possible outcomes ? would an alternative dimensionality reduction approach work better for my scenario ? 


--
Thanks and Best Regards,
Suranga
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
usanchez | 28 Oct 15:47 2014
Picon

How to retrain SVM model? (java)

Hi all!

I would like to know how can I retrain a SVM  model generated in Java using
the weka library.

I know that some classifier give the option to update a model calling the
method "updateClassifier(Instance)", but to do so, the classifier must be
Updateable, and I don't that's the case of SVM.

Another way to do this can be to add the new instances to the previous ones,
and create the model. This way, we are not updating the previous model, but
creating a new one from scratch. Don't know if this is the best way to
"retrain" a model.

And last, I know that I can get the support vectors from the model and then,
add the vectors taken from the new instances to the old vectors. This way I
could get an updated model. How can I do this in Java using the Weka
library? Is it the best way to retrain the model?

Thank you very much!!

--
View this message in context: http://weka.8497.n7.nabble.com/How-to-retrain-SVM-model-java-tp32567.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Tali Boneh | 27 Oct 08:45 2014
Picon

supervised.attribute.Discretize

Hello,

I am using the filter weka.filters.supervised.attribute.Discretize with weka 3.7.11 
When using the GUI and the API I get the following results for 2 of my attributes ( attA and attB)

Very close but with additional strange (illegal) values using the API
<at> attribute attA_GUI {'\'(-inf-0.303115]\'','\'(0.303115-0.502771]\'','\'(0.502771-0.802285]\'','\'(0.802285-0.938245]\'','\'(0.938245-0.981397]\'','\'(0.981397-0.999999]\'','\'(0.999999-inf)\''}
<at> attribute attA_API {'\'(1-1]\'','\'(0.999995-1]\'','\'(0.981397-0.999995]\'','\'(0.938245-0.981397]\'','\'(0.303115-0.50277]\'','\'(0.802284-0.938245]\'','\'(0.50277-0.802284]\'','\'(-inf-0.303115]\''}

<at> attribute attB_GUI {'\'(-inf-0.000001]\'','\'(0.000001-0.018603]\'','\'(0.018603-0.061755]\'','\'(0.061755-0.197716]\'','\'(0.197716-0.49723]\'','\'(0.49723-0.696885]\'','\'(0.696885-inf)\''}
<at> attribute attB_API {'\'(0-0.000006]\'','\'(0.000006-0.018603]\'','\'(0.018603-0.061755]\'','\'(0.49723-0.696885]\'','\'(0.061755-0.197716]\'','\'(0.197716-0.49723]\'','\'(0.696885-inf)\'','\'(0-0]\'','\'(-inf-0]\''}

any suggestions?

Tali
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Malith Munasinghe | 27 Oct 06:57 2014
Picon

Weka Clustering testing with supplied test set

Hello,

I am new to Weka and I am using it for my final year research.
In weka clustering if we do clustering with training set and then re-evaluate model on current test set does it imply that those test instances are assigned to clusters generated in model with training set or are they creating new clusters combining both training and test sets ?

--

With regards,
Malith Munasinghe,
University of Colombo School of Computing.

Handle : mpmunasinghe (gmail, hotmail, facebook, twitter, linkedin)
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Nitish Varshney | 26 Oct 10:39 2014
Picon

Ranking test data set feeds and getting high accuracy in test data set

Hi,

   I am new to WEKA. I am trying to perform sentiment analysis on tweet data using following algorithm :

1) Text's POS tag has been extracted using ARK Tweet POS tagger.
2) Adverb, Adjectives and few other POS tags are used to create features set.
3) RBF and Linear SVM kernal provided by LibSVM has been used to train the classifier.
PS: Stemming, Negation handling, clause handling has also been done. 5000+ tweets are manually annotated by three different annotators to create train data set. Three class classification task has been performed.

However, I am currently getting only 76% accuracy via 10-fold cross-validation methodology.  I am not able to understand why I am not able to achieve standard accuracy (which should be according to me >80%). Please tell me if I am doing something wrong.

Also, I want to rank test data feeds, currently LibSVM function classifyInstance() gives me binary value only. I want to rank the results so that I can identify strong positive , strong negative tweets.

Thanks and Regards,
Nitish Varshney
M.Tech. (2nd Yr Scholar)
Dept of Computer Sc. & Engg.
Indian Institute of Technology Delhi
+91-8447002332
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
double d s | 25 Oct 09:43 2014
Picon

accuracy and precision

Hi dears,


I need assistance with the terms “accuracy” and “precision”.


Based on some references they considered them different:

Accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual (true) value.Precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.

However, after working with different dataset, I observed the results of both accuracy and precision look approximately equal to each other. For instance, I had this result:

=== Summary ===

Correctly Classified Instances         143               95.3333 %

Incorrectly Classified Instances         7                4.6667 %

Kappa statistic                          0.93 

Mean absolute error                      0.0495

Root mean squared error                  0.1599

Relative absolute error                 11.1476 %

Root relative squared error             33.9217 %

Total Number of Instances              150    

 

TP Rate

FP Rate

Precision

Recall

F-Measure

ROC Area

Class

1

0

1

1

1

1

Cl1

0.94

0.04

0.922

0.94

0.931

0.976

Cl2

0.92

0.03

0.939

0.92

0.929

0.976

Cl3

Weighted Avg.

0.953

0.023

0.953

0.953

0.953

0.984

 

 

From the result it can be clearly seen that both accuracy and precision look almost the same.


Therefore, my questions are:

1- Here in Weka do we consider "Correctly Classified Instances  " as the "accuracy"?
2- Are accuracy and precision considered to be same or different in this result?


Thanks for any assistance.

Sandler

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Alban Levy | 25 Oct 00:48 2014
Picon
Picon

Statistics of classification's results

Dear list members,

Greetings from Nottingham.


Is there a known relationship between some statistics of a dataset and the statistics obtained from running various classification algorithms on it (kappa statistic,...)? 

More precisely: After running many classification algorithms with 10-fold CV on 6 datasets (each being preprocessed in various ways), we normalised some information scores  (namely: percentage of correct answers, K&B Mean information, kappa, weighted Area under ROC) and accumulated the values obtained from each classification (see attached picture). 
The surprise came from the various behaviours of the curves, and I couldn't find any satisfying explanation of why, for example, on some dataset the blue curve is on top (K&BMeanInformation), when on some other the violet is (weighted area under ROC). Is there any? 

As this was rather puzzling, any lead would be appreciated.

Thanks for reading.
Best regards,
-- 
Alban




This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Martin L | 24 Oct 15:57 2014
Picon

Best way of learning

Hi Weka experts,

I am new in Weka and the field of data mining, and I want to learn them in a very correct way.

Before sending this letter, I read about this field, but most of the explanation are just described based on mathematical aspects, and this did not help me to learn, especially when you use Weka, the results will come out directly in a seconds, so there is no time to observe anything you have read before. On other hand, I noticed a lot of coding in many references which are applied by Java. Thus, I am confused because of this collection (math and programming).

Therefore, I am asking the help form the experts in data mining generally and Weka especially about the correct way to learn both in order to become professional, and be able to work on Weka with high level of confidence.

In addition, do I need to learn programming to become professional with Weka and data mining to work with them easily?

I will be highly appreciated any advice.

Martin

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane