Copus, Scott | 24 Jul 19:11 2014

Silent/unattended install?

Hi,

 

How can Weka be installed silently/unattended in a Win x64 environment?  Sorry if this is a dumb question—but the typical methods to install silently (“/s” command line) doesn’t appear to be working.  I have not found any install-related documentation.  Thanks.

 

--

Scott Copus, Computer Lab Systems Specialist

Academic Technology | Western Kentucky University

(270)745-3042 | http://www.wku.edu/it/labs

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Michael Hall | 24 Jul 10:31 2014
Picon

Improving classfiication

I have a classification for which so far my best results look like…

Correctly Classified Instances       13283               87.8505 %
Incorrectly Classified Instances      1837               12.1495 %

=== Confusion Matrix ===

    a    b    c    d    e    f    g   <-- classified as
 1682  340    0    0   32    4  102 |    a = 1
  335 1590   42    0  136   46   11 |    b = 2
    0   15 1822   78   23  222    0 |    c = 3
    0    0   42 2095    0   23    0 |    d = 4
    4   49   21    0 2066   20    0 |    e = 5
    1   14  159   47   15 1924    0 |    f = 6
   51    5    0    0    0    0 2104 |    g = 7

Notice that the two misclassifications in the upper left corner of the confusion matrix. 
'a' classified as 'b' and 'b' classified as 'a' 
account for about 1/3 of my total misclassification error. So for class 1 (a) and 2 (b) my 'by class' error results are the worst of the seven…

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.779    0.030    0.811      0.779    0.795      0.762    0.952     0.828     1
                 0.736    0.033    0.790      0.736    0.762      0.725    0.938     0.788     2

The next lowest TP Rate is 0.844 (class 3 as it turns out.)

Would there be any way to try and single out these two classes for improvement. I tried using cost to make these errors more penalized but that doesn't seem to improve overall class accuracy for these classes but rather to shuffle around which other classes it gets misclassified to. 

My other thought was take a subset of the data for only the classes selected as 1 and 2 and try training it separately just for these two classes. Would this make sense or is there some other better way to try and improve classification for weak classes?


Michael Hall



AppConverter convert Apple jvm to openjdk apps http://www195.pair.com/mik3hall/index.html#appconverter




_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Antonio | 23 Jul 11:36 2014
Picon

Problem on LIBSVM regression(time series forecasting)

Hi, I'm tryng to use LIBSVM regression for a forecast of 6 months in
following data:
I would use LIBSVM with RBF kernel and SVMTType-SVR with default data (I'm
not expert to modify that)

Due the few data I evaluate on training data, but I receive a forecast of
0.5804 for each 6 months so it seems that something is wrong?

Does anyone have some suggestion to improve me?

Thanks

 <at> relation Regression_test

 <at> attribute Period date yyyy-MM-dd
 <at> attribute Percentage numeric

 <at> data
2011-08-01,0
2011-09-01,0
2011-10-01,0.259403
2011-11-01,0.308642
2011-12-01,0.613497
2012-01-01,2.037662
2012-02-01,1.134486
2012-03-01,0.898727
2012-04-01,0.465357
2012-05-01,0.241168
2012-06-01,0.354359
2012-07-01,0.468987
2012-08-01,0.320699
2012-09-01,0.584155
2012-10-01,0.786552
2012-11-01,0.385395
2012-12-01,0.302407
2013-01-01,0.490101
2013-02-01,0.446799
2013-03-01,0.328194
2013-04-01,0.431381
2013-05-01,0.445664
2013-06-01,0.557984
2013-07-01,0.735813
2013-08-01,0.82297
2013-09-01,0.838937
2013-10-01,1.06926
2013-11-01,0.773331
2013-12-01,0.408285
2014-01-01,0.271486
2014-02-01,0.394293
2014-03-01,0.394572
2014-04-01,0.563725
2014-05-01,0.878016
2014-06-01,0.766179

--
View this message in context: http://weka.8497.n7.nabble.com/Problem-on-LIBSVM-regression-time-series-forecasting-tp31774.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Antonio | 23 Jul 11:10 2014
Picon

Problem on LIBSVM regression

Hi, I'm tryng to use LIBSVM regression for a forecast of 6 months in
following data:
I would use LIBSVM with RBF kernel and SVMTType-SVR with default data (I'm
not expert to modify that)

Due the few data I evaluate on training data, but I receive a forecast of
0.5804 for each 6 months?

Does anyone have some suggestion to improve me?

 <at> relation Regression_test

 <at> attribute Period date yyyy-MM-dd
 <at> attribute Percentage numeric

 <at> data
2011-08-01,0
2011-09-01,0
2011-10-01,0.259403
2011-11-01,0.308642
2011-12-01,0.613497
2012-01-01,2.037662
2012-02-01,1.134486
2012-03-01,0.898727
2012-04-01,0.465357
2012-05-01,0.241168
2012-06-01,0.354359
2012-07-01,0.468987
2012-08-01,0.320699
2012-09-01,0.584155
2012-10-01,0.786552
2012-11-01,0.385395
2012-12-01,0.302407
2013-01-01,0.490101
2013-02-01,0.446799
2013-03-01,0.328194
2013-04-01,0.431381
2013-05-01,0.445664
2013-06-01,0.557984
2013-07-01,0.735813
2013-08-01,0.82297
2013-09-01,0.838937
2013-10-01,1.06926
2013-11-01,0.773331
2013-12-01,0.408285
2014-01-01,0.271486
2014-02-01,0.394293
2014-03-01,0.394572
2014-04-01,0.563725
2014-05-01,0.878016
2014-06-01,0.766179

--
View this message in context: http://weka.8497.n7.nabble.com/Problem-on-LIBSVM-regression-tp31773.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Antonio | 23 Jul 11:09 2014
Picon

Problem on LIBSVM regression

Hi, I'm tryng to use LIBSVM regression for a forecast of 6 months in
following data:
I would use LIBSVM with RBF kernel and SVMTType-SVR with default data (I'm
not expert to modify that)

Due the few data I evaluate on training data, but I receive a forecast of
0.5804 for each 6 months?

Does anyone have some suggestion to improve me?

 <at> relation Regression_test

 <at> attribute Period date yyyy-MM-dd
 <at> attribute Percentage numeric

 <at> data
2011-08-01,0
2011-09-01,0
2011-10-01,0.259403
2011-11-01,0.308642
2011-12-01,0.613497
2012-01-01,2.037662
2012-02-01,1.134486
2012-03-01,0.898727
2012-04-01,0.465357
2012-05-01,0.241168
2012-06-01,0.354359
2012-07-01,0.468987
2012-08-01,0.320699
2012-09-01,0.584155
2012-10-01,0.786552
2012-11-01,0.385395
2012-12-01,0.302407
2013-01-01,0.490101
2013-02-01,0.446799
2013-03-01,0.328194
2013-04-01,0.431381
2013-05-01,0.445664
2013-06-01,0.557984
2013-07-01,0.735813
2013-08-01,0.82297
2013-09-01,0.838937
2013-10-01,1.06926
2013-11-01,0.773331
2013-12-01,0.408285
2014-01-01,0.271486
2014-02-01,0.394293
2014-03-01,0.394572
2014-04-01,0.563725
2014-05-01,0.878016
2014-06-01,0.766179

--
View this message in context: http://weka.8497.n7.nabble.com/Problem-on-LIBSVM-regression-tp31772.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Juan Manuel Barreneche | 22 Jul 19:13 2014
Picon

Parallel computing for GeneticSearch

Hi all, I need to make an Attribute Selection with WrapperSubsetEval and GeneticSearch. As you may know, this is a relatively slow method, as it makes lots of trainings in each iteration (-F folds for each "individual" in the population, at least).

Thinking about this, I was wondering if that process could be parallelized in order to reduce the computation time. Is there any package or built-in way to achieve this?

Thanks in advance,
Juan Manuel


--
MSc. Juan M. Barreneche Sarasola

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
David Costa | 22 Jul 18:35 2014
Picon

How to get only the Correctly Classified Instances

Hi all,


Correctly Classified Instances         106               80.916  %
Incorrectly Classified Instances        25               19.084  %
Kappa statistic                          0.6179
Mean absolute error                      0.2303
Root mean squared error                  0.3829
Relative absolute error                 46.0309 %
Root relative squared error             76.5312 %
Total Number of Instances              131     


How can I only get the Correctly Classified Instances instead of the whole stats summary using java?

thanks and best,
David
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
awhelan | 22 Jul 00:51 2014
Picon

Correct use of Weka API for J48 and AdaBoostM1

Hello, 

I have a question regarding J48 and AdaBoostM1. I have a test dataset and 
training dataset. I am using the Weka API to train and test a J48 classifier, 
and then an AdaBoostM1 classifier that is using J48 as its base classifier. The 
J48 classifier seems to work better than the AdaBoostM1 with J48 classifier on 
the same data, which I didn't think was possible (this 

I'm not finding a lot of documentation of examples of the Weka API for Boosting 
or Bagging so I am wondering if I am just using the API wrong. If I am using the 
API correctly then I realize I need to look at the data I am using and the way I 
am preparing it. 

 So my question is simply this. Am I using the API correctly in the following 2 
sets of code?? Thanks! -Andrew 

1)Simple use of J48 

 
DataSource trainSource = new DataSource("C:/data.train.arff"); 
  Instances train = trainSource.getDataSet(); 
             train.setClassIndex(train.numAttributes() - 1);  // setting class 
attribute 

DataSource testSource = new DataSource("C:/data.test.arff"); 

Instances test = testSource.getDataSet(); 
             test.setClassIndex(test.numAttributes() - 1);  // setting class 
attribute            

// classifier 
J48 j48 = new J48(); 
j48.setUnpruned(false);      

Evaluation eval=new Evaluation(train); 
eval.evaluateModel(j48, test); 
System.out.println("Summary: "+ 
                  eval2.toSummaryString()); 

 
2)J48 with AdaBoostM1 

 
DataSource trainSource = new DataSource("C:/data.train.arff"); 

Instances train = trainSource.getDataSet(); 
train.setClassIndex(train.numAttributes() - 1);  // setting class attribute 

             
DataSource testSource = new DataSource("C:/data.test.arff"); 
Instances test = testSource.getDataSet(); 
// setting class attribute  
test.setClassIndex(test.numAttributes() - 1);               
// classifer 
J48 j48 = new J48(); 
AdaBoostM1 classifier = new AdaBoostM1(); 
j48.setUnpruned(true);        
j48.setConfidenceFactor(30); 
classifier.setClassifier( j48 ); 
classifier.setNumIterations(100); 
classifier.buildClassifier(train); 
//evaluate j48 with cross validation 
Evaluation eval=new Evaluation(train); 
eval.evaluateModel(classifier, test); 
System.out.println("Summary: " + eval2.toSummaryString()); 

Does it look like I am using the API correctly? 
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Martin O'Shea | 21 Jul 13:52 2014

Multiclass classification using LibSVM

Hello

 

If I have multiple-class training and testing data in .arff files representing daily word frequencies in RSS feeds as follows:

 

<at> relation _dm_19040_031925_06112013_1383748052958_Boolean-weka.filters.unsupervised.attribute.NumericToNominal-R193

 

<at> attribute Keyword_us_invest_are_Frequency numeric

<at> attribute Keyword_syrian_forc_kill_Frequency numeric

<at> attribute Keyword_europ_debt_crisi_Frequency numeric

<at> attribute Keyword_bank_of_america_Frequency numeric

<at> attribute Keyword_exclus_us_fugit_Frequency numeric

<at> attribute Keyword_debt_rate_cut_Frequency numeric

<at> attribute Keyword_on_debt_crisi_Frequency numeric

<at> attribute Keyword_market_fall_on_Frequency numeric

<at> attribute Keyword_russian_hockey_team_Frequency numeric

<at> attribute RSSFeedCategoryDescription {'Business and finance and economics','News and current affairs','Science and nature and technology',Sport,'Entertainment and arts'}

 

<at> data

0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'Business and finance and economics'

0,3,0,4,0,0,2,1,0,1,2,0,0,0,0,0,1,0,0,1,3,2,0,0,0,0,0,0,0,25,0,0,0,0,1,0,0,2,0,0,1,1,2,1,0,0,0,0,2,1,0,2,0,2,1,2,4,0,2,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'Business and finance and economics'

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,3,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'Business and finance and economics'

0,0,2,0,0,7,0,2,1,1,0,0,1,0,0,1,2,0,0,3,4,0,0,0,1,2,0,0,0,2,0,0,0,1,0,0,0,0,0,0,2,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,'Business and finance and economics'

 

And so on.

 

Can you tell me if there are any likely issues with using LibSVM in Weka at all? For example, I am under the impression that it can handle multiple classes but do they have to be text or numeric?

 

The frequencies have not been calculated using Weka’s StringToWordVector but were calculated in Lucene. These have given comparable results using J48 and NaiveBayesMultinomial. I expect they results would also be similar for NaiveBayesMultinomialText which I think uses StringToWordVector?

 

Thanks

 

Martin O’Shea.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
ajitha padmanabhan | 21 Jul 09:38 2014
Picon

Weka 3.7.11

My research area is distributed data mining. I am using Laptop with windows 7 and installed weka 3.7.11 successfully as it supports distributed data mining.

Now, i want to simulate multiple setups. I installed the packages distributedWekaBase and distributedHadoop, wekaServer.
I have few questions
1.How to go for multiple setup environments(simulation)
2. I tried weka server pentaho but I am unable to run its giving the error java.lang.NoClassDefFoundError:weka/Run
3.I tried using Experimenter in the advanced tab, but I am unable to go further because how to goo for hosts.

Please provide me with solution of how to simulate the environment or is it possible for simulation .
Best Regards
Ajitha
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Eibe Frank | 21 Jul 03:59 2014
Picon
Picon

Re: SMOTE -N question

I agree with your interpretation. But the code seems to be consistent with the original description in

  http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node16.html

Here’s the relevant bit of code:

671         int[] valueCounts = new int[attr.numValues()];
672         int iVal = (int) instanceI.value(attr);
673         valueCounts[iVal]++;
674         for (int nnEx = 0; nnEx < nearestNeighbors; nnEx++) {
675           int val = (int) nnArray[nnEx].value(attr);
676           valueCounts[val]++;
677         }
678         int maxIndex = 0;
679         int max = Integer.MIN_VALUE;
680         for (int index = 0; index < attr.numValues(); index++) {
681           if (valueCounts[index] > max) {
682             max = valueCounts[index];
683             maxIndex = index;
684           }
685         }
686         values[attr.index()] = maxIndex;

Cheers,
Eibe

On 5 Jun 2014, at 18:43, Kalia Orphanou <korfan01 <at> cs.ucy.ac.cy> wrote:

> 
> I have a classification problem with two classes working on nominal data. I applied SMOTE-N (for nominal)
in WEKA to deal with imbalanced data. However, it is not clear to me how to use SMOTE-N for generating N
synthetic data for each feature vector in the minority class. SMOTE-N uses a modified version of the value
difference metric (VDM) to find the k-nearest neighbors for each feature vector in the minority class and
then the new minority class feature vector is generated by creating new set feature values by taking the
majority vote of the feature vector in consideration and its k nearest neighbors (k-nn).  But, how is this
process repeated to generate different multiple synthetic feature vectors for each feature vector in
the minority class? 
> 
> The way the algorithm is stated, it seems that one feature vector from the minority class can generate only
one synthetic feature vector (using its K-nn). Even if I change N  to 200, the synthetic feature vectors
generated with N=100 are duplicated.
> 
> -- 
> Kalia Orphanou
> LINC - Laboratory for Internet Computing
> Department of Computer Science, University of Cyprus
> PO Box 20537, 1678 Nicosia, Cyprus
> Tel: +357 22 892673
> Fax: +357 22 892701
> email: korfan01 <at> cs.ucy.ac.cy
> website: www.linc.ucy.ac.cy
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.waikato.ac.nz
> List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Gmane