double d s | 25 Oct 09:43 2014
Picon

accuracy and precision

Hi dears,


I need assistance with the terms “accuracy” and “precision”.


Based on some references they considered them different:

Accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual (true) value.Precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.

However, after working with different dataset, I observed the results of both accuracy and precision look approximately equal to each other. For instance, I had this result:

=== Summary ===

Correctly Classified Instances         143               95.3333 %

Incorrectly Classified Instances         7                4.6667 %

Kappa statistic                          0.93 

Mean absolute error                      0.0495

Root mean squared error                  0.1599

Relative absolute error                 11.1476 %

Root relative squared error             33.9217 %

Total Number of Instances              150    

 

TP Rate

FP Rate

Precision

Recall

F-Measure

ROC Area

Class

1

0

1

1

1

1

Cl1

0.94

0.04

0.922

0.94

0.931

0.976

Cl2

0.92

0.03

0.939

0.92

0.929

0.976

Cl3

Weighted Avg.

0.953

0.023

0.953

0.953

0.953

0.984

 

 

From the result it can be clearly seen that both accuracy and precision look almost the same.


Therefore, my questions are:

1- Here in Weka do we consider "Correctly Classified Instances  " as the "accuracy"?
2- Are accuracy and precision considered to be same or different in this result?


Thanks for any assistance.

Sandler

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Alban Levy | 25 Oct 00:48 2014
Picon
Picon

Statistics of classification's results

Dear list members,

Greetings from Nottingham.


Is there a known relationship between some statistics of a dataset and the statistics obtained from running various classification algorithms on it (kappa statistic,...)? 

More precisely: After running many classification algorithms with 10-fold CV on 6 datasets (each being preprocessed in various ways), we normalised some information scores  (namely: percentage of correct answers, K&B Mean information, kappa, weighted Area under ROC) and accumulated the values obtained from each classification (see attached picture). 
The surprise came from the various behaviours of the curves, and I couldn't find any satisfying explanation of why, for example, on some dataset the blue curve is on top (K&BMeanInformation), when on some other the violet is (weighted area under ROC). Is there any? 

As this was rather puzzling, any lead would be appreciated.

Thanks for reading.
Best regards,
-- 
Alban




This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Martin L | 24 Oct 15:57 2014
Picon

Best way of learning

Hi Weka experts,

I am new in Weka and the field of data mining, and I want to learn them in a very correct way.

Before sending this letter, I read about this field, but most of the explanation are just described based on mathematical aspects, and this did not help me to learn, especially when you use Weka, the results will come out directly in a seconds, so there is no time to observe anything you have read before. On other hand, I noticed a lot of coding in many references which are applied by Java. Thus, I am confused because of this collection (math and programming).

Therefore, I am asking the help form the experts in data mining generally and Weka especially about the correct way to learn both in order to become professional, and be able to work on Weka with high level of confidence.

In addition, do I need to learn programming to become professional with Weka and data mining to work with them easily?

I will be highly appreciated any advice.

Martin

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jeff Pattillo | 23 Oct 23:54 2014
Picon

Simple Logistic Worst-Case Run Time

What is the worst-case run time of the algorithm used for Simple Logistic?  In what paper was this algorithm derived?  Is the run-time analysis done in this paper?

Thanks in advance!  Just need for a high-level presentation...

Jeff
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Vinoth Chandrasekar | 23 Oct 20:14 2014
Picon

Reg: Modifying weka Model File

Hi All,

I have been using Weka for quite sometime . I use J48 to perform a classification task.
I have identified a path in the J48 tree which increase my false positives . Is there a way in weka to change the leaf of this path to another value or discard this path at all while evaluation.

Regards,
Vinoth Kumar Chandrasekar
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
double d s | 23 Oct 10:20 2014
Picon

probabilistic classifiers and non-probabilistic classifiers

Hi dears,

Is there any way to identify the probabilistic classifiers? 
In addition, Is there any common list that mentions classifiers that belong to probabilistic classifiers and non-probabilistic classifiers?

Thanks.
Sandler

 
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Rachel S | 22 Oct 13:38 2014
Picon

Re: Usinf Weka's prediction model outside Weka

(re-sending due to possible problem in subscription)

Hello,

I know Weka can be used to predict results on untrained data using the "save model" and "load model" options.
Is there a way to see the saved model in a readable form and learn from it or use the prediction formula by another program?

Thanks,
Rachel.


On Wed, Oct 22, 2014 at 2:07 PM, Rachel S <rachel.nepal <at> gmail.com> wrote:
Hello,

I know Weka can be used to predict results on untrained data using the "save model" and "load model" options.
Is there a way to see the saved model in a readable form and learn from it or use the prediction formula by another program?

Thanks,
Rachel.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Lauren Romeo | 22 Oct 18:15 2014
Picon

classifier-specific options in command line weka

Hi,

I am trying to use classifier specific options of the LMT classifier in weka command line.
I have no problem using the classifier with the general options, but when I try to include the classifier specific options, I get the following error:

Weka exception: Illegal options: -- -B -R -C -P -I -1 -M 15 -W 0.0 -A


This is the command that I am using?

java weka.classifiers.trees.LMT -T test.arff -t train.arff -p 0 -o -i -- -B -R -C -P -I -1 -M 15 -W 0.0 -A

What is the correct way to append the classifier specific options in the command line so that I do not get tha above error?

Thanks in advance.
Cheers,
Lauren

--
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
nsegev | 22 Oct 09:21 2014
Picon

J48 crash with GC overhead - with fix

I have a slightly large dataset - a few million examples and around 70
features, both numeric and nominal.
I'm using weka code to perform experiments via decision trees in my
project, specifically J48.
J48 crash due to memory overhead.

I've traced the problem to the "split" function in the
`ClassifierSplitModel` class in the `weka.classifiers.trees.j48` package.
The reason for the crash - for a nominal feature split creates N new
Instances, where N is the number of possible feature values, each is the
size of the original training set. It then adds all samples from the
training set to the new sets, and finally compactify them all.
For a nominal feature with 1000+ possible values and a training set of a
three million examples, this means creating 1000+ arrays of length
3,000,000.
On a 64bit machine this comes up to over 24GB of memory, and that's before
the data is even divided to the different arrays.

The fix is a simple one - allocating only the required space.
It means going over the training data once to count how many instances
will be in each set, and then a second time to actually place them.
See code bellow.

Noam.

Original code:

public final Instances[] split(Instances data) throws Exception {

  Instances[] instances = new Instances[m_numSubsets];
  double[] weights;
  double newWeight;
  Instance instance;
  int subset, i, j;

  for (j = 0; j < m_numSubsets; j++)
   instances[j] = new Instances((Instances) data, data.numInstances());
  for (i = 0; i < data.numInstances(); i++) {
   instance = ((Instances) data).instance(i);
   weights = weights(instance);
   subset = whichSubset(instance);
   if (subset > -1)
    instances[subset].add(instance);
   else
    for (j = 0; j < m_numSubsets; j++)
     if (Utils.gr(weights[j], 0)) {
      newWeight = weights[j] * instance.weight();
      instances[j].add(instance);
      instances[j].lastInstance().setWeight(newWeight);
     }
  }
  for (j = 0; j < m_numSubsets; j++)
   instances[j].compactify();

  return instances;
}

Fixed code:

public final Instances[] split(Instances data) throws Exception {

  Instances[] instances = initilizeSplitInstances(data);
  double[] weights;
  double newWeight;
  Instance instance;
  int subset, i, j;

  for (i = 0; i < data.numInstances(); i++) {
   instance = ((Instances) data).instance(i);
   weights = weights(instance);
   subset = whichSubset(instance);
   if (subset > -1)
    instances[subset].add(instance);
   else
    for (j = 0; j < m_numSubsets; j++)
     if (Utils.gr(weights[j], 0)) {
      newWeight = weights[j] * instance.weight();
      instances[j].add(instance);
      instances[j].lastInstance().setWeight(newWeight);
     }
  }
  for (j = 0; j < m_numSubsets; j++)
   instances[j].compactify();

  return instances;
}

private Instances[] initilizeSplitInstances(Instances data)
   throws Exception {

  int[] sizes = new int[m_numSubsets];
  for (int j = 0; j < m_numSubsets; ++j)
   sizes[j] = 0;

   <at> SuppressWarnings("unchecked")
  Enumeration<Instance> enu = data.enumerateInstances();
  while (enu.hasMoreElements()) {
   Instance instance = enu.nextElement();
   int subset = whichSubset(instance);
   if (subset > -1)
    ++sizes[subset];
   else
    for (int j = 0; j < m_numSubsets; ++j)
     ++sizes[j];
  }

  Instances[] instances = new Instances[m_numSubsets];
  for (int j = 0; j < m_numSubsets; ++j)
   instances[j] = new Instances(data, sizes[j]);
  return instances;
}

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Joana Machado | 22 Oct 01:28 2014
Picon

Classifier

I am using weka packet (weka.jar) for java but some functions are not available. 
For example, I want to use the Class Classifier with functions .makeCopy() and .getOptions() and they are not defined after doing: 
import weka.classifiers.Classifier;
I get the error messages: 'The method getOptions() is undefined for the type Classifier’ and ‘The method makeCopy(Classifier) is undefined for the type Classifier’.
Do I need to import other things or download other files?
Thank you.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sven_Schafer@t-online.de | 21 Oct 11:37 2014
Picon

Custom comparator and punishments for classifier like SMO

 

Hi folks,

 

 

 

i have two questions for you and yes, i have already searched in the mailing list, but without result. So here are my questions:

 

 

 

1. Is there a way to set a custom instance- or attribute-comparator for a classifier f.e. a SMO? I have read that every classifier handles the different instance attributes(numerical, nominal, String, ...) on it's own. Numerical attributes can easily being compared but my aim is to customize the distance calculation especially for String or nominal attributes. Detailled example: The distance between the numbers 1 and 3 is twice as much between 1 and 2. But to compute a distance between characters like "A" and "B" or "Hello" and "World" is not described for any classifier, right? So i want to set my own comparator or distance-funtion for exclusive attributes of an instance to the selected classifier. Is something like that possible in WEKA? Maybe my approach is wrong, that a classifier computes distances is incorrect, then please correct me. ;-)

 

 

 

2. The second question is aimed at the function of classifier in general. Usually a classifier gets a training data set with only correct classified instances. Is it possible to mark instances as wrong classified f.e. in the training data set or somewhere else? So a classifier could learn with rewards an punishments. Is somehting like that implementet for classifier in WEKA?

 

 

 

Thanks Sven

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane