1 Dec 2005 13:51

Macro and micro-averaging (repost)

```Hi Weka people,

Sorry for the post in html last time...no one replied.

I can't figure out how to calculate macro vs. micro-averaging. The principle is easy to understand, but I
always seem to get the same precision/recall when performing micro averaging. Can you please help me
getting from a confusion matrix like this:

A	B	C
A	4	1	0
B	0	1	1
C	0	1	2

to a 2x2 contingency table for performing micro-averaging? I get macro-averaged precision to
4/5+1/2+2/3=66% and recall to 4/4+1/3+2/3=67%, and micro-averaged to precision/recall to 7/10. (If
this is right, please change the numbers so that they illustrate a confusion matrix which results in
different precision recall when performing micro-averaging)

Cheers,
Thomas

PS. Does Weka perform this kind of calculation for me?
```
1 Dec 2005 13:59

3-classes FalsePositives, FalseNegatives, TruePositives, TrueNegatives

```Skipped content of type multipart/alternative-------------- next part -----=
---------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 2799 bytes
Desc: logo.gif
Url : https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/200512=
01/6a840c1e/attachment.gif
```
1 Dec 2005 14:02

using cost-matrix

```Skipped content of type multipart/alternative-------------- next part -----=
---------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 2799 bytes
Desc: logo.gif
Url : https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/200512=
01/d92fbac1/attachment.gif
```
1 Dec 2005 19:02

Re: Percentage Split - from command line

```thank you Peter for you interest,
but could you be more specific about RemovePercentage filter.
For example if I want to use J48 with percentage split 40% in the
input file: dataset.arff , are the follow commands right??:

java weka.filters.unsupervised.instance.RemovePercentage -P 40 -i
dataset.arff  -o  train.arff
java weka.filters.unsupervised.instance.RemovePercentage -P 60 -i
dataset.arff  -o  test.arff
java weka.classifiers.trees.J48 -t train.arff -T test.arff

Thanks in advance

On 11/24/05, Peter Reutemann <fracpete <at> waikato.ac.nz> wrote:
>  > I am writing a program, that uses Weka from command-line, and I don't
>  > know what option i have to use to modify  " Percentage Split " on
>  > input file.
>
> You will have to write it yourself or use the RemovePercentage filter
> (weka.filters.unsupervised.instance.RemovePercentage).
>
> The following code fragement from the Explorer
> (weka.gui.explorer.ClassifierPanel) generates two datasets called
> "train" and "test":
>
>    inst.randomize(new Random(rnd));
>    int trainSize = inst.numInstances() * percent / 100;
>    int testSize = inst.numInstances() - trainSize;
>    Instances train = new Instances(inst, 0, trainSize);
>    Instances test = new Instances(inst, trainSize, testSize);
```
(Continue reading)

1 Dec 2005 21:05

Re: Percentage Split - from command line

```> For example if I want to use J48 with percentage split 40% in the
> input file: dataset.arff , are the follow commands right??:
>
> java weka.filters.unsupervised.instance.RemovePercentage -P 40 -i
> dataset.arff  -o  train.arff
> java weka.filters.unsupervised.instance.RemovePercentage -P 60 -i
> dataset.arff  -o  test.arff
> java weka.classifiers.trees.J48 -t train.arff -T test.arff

The second call of RemovePercentage should be the following:

java weka.filters.unsupervised.instance.RemovePercentage -P 40 -i
dataset.arff  -o  test.arff -V

The "-V" inverts the matching, i.e., instead of removing the first 40
percent it skips them and removes the remaining 60.

Cheers, Peter
--

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/     +64 (7) 838-4466 Ext. 5174
```
2 Dec 2005 01:22

Re: RE: how to locate misclassified instances?

```> I performed the operation I described using shell scripts. Though with the
> exception of adding the RecID -- I did it manually, though there may be a
> filter which would do it for you -- the rest of it can definitely be done
> in explorer.

I've added a new filter to the CVS that adds a unique ID to each instance:
weka.filters.unsupervised.attribute.AddID

More about CVS access:
http://weka.sourceforge.net/wiki/index.php/CVS

HTH

Cheers, Peter
--

--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/     +64 (7) 838-4466 Ext. 5174
```
2 Dec 2005 08:13

RE: RE: how to locate misclassified instances?

```thanks a lot,
I will use that next time. For now it works as Andrew described. It is a
bit more difficult but it works
tue

-----Original Message-----
From: Peter Reutemann [mailto:fracpete <at> waikato.ac.nz]
Sent: 2. december 2005 01:23
To: Andrew Rosenberg
Cc: TUED (Tue Deleuran); WekaList
Subject: Re: [Wekalist] RE: how to locate misclassified instances?

> I performed the operation I described using shell scripts. Though with

> the
> exception of adding the RecID -- I did it manually, though there may
be a
> filter which would do it for you -- the rest of it can definitely be
done
> in explorer.

I've added a new filter to the CVS that adds a unique ID to each
instance:
weka.filters.unsupervised.attribute.AddID

More about CVS access:
http://weka.sourceforge.net/wiki/index.php/CVS

HTH

```
(Continue reading)

2 Dec 2005 14:17

Re: RE: how to locate misclassified instances?

```Talk about service with a smile!

Thanks a ton, Peter.

-a.

On Fri, 2 Dec 2005, Peter Reutemann wrote:

> > I performed the operation I described using shell scripts. Though with the
> > exception of adding the RecID -- I did it manually, though there may be a
> > filter which would do it for you -- the rest of it can definitely be done
> > in explorer.
>
> I've added a new filter to the CVS that adds a unique ID to each instance:
>    weka.filters.unsupervised.attribute.AddID
>
> More about CVS access:
>    http://weka.sourceforge.net/wiki/index.php/CVS
>
> HTH
>
> Cheers, Peter
>
```
2 Dec 2005 20:57

Maximum Likelihood V.S. Machine Learning

```I found in many cases machine learning methods beat maximum likelihood
method. Anyone can explain theoretical  why machine learning is better
than maximum likelihood method?

I appreciate!

Jianye Ge
```
2 Dec 2005 21:10

Generate instances from BN

```I realized that the generate instance method will not be correct if it
the BN contains edges that are not obeying the node ordering.

Here's how I come to the conclusion:

public void generateInstances(){
// Iterate # of intances to generate
for (int iInstance = 0; iInstance < m_nNrOfInstances; iInstance++) {
... ...
// Assign attributes sequentially
for (int iAtt = 0; iAtt < nNrOfAtts; iAtt++) {
... ...
// Look for parents of attribute
for (int iParent = 0; iParent <
m_ParentSets[iAtt].getNrOfParents(); iParent++) {
int nParent = m_ParentSets[iAtt].getParent(iParent);

// ERROR: What if instance.value(nParent) is not yet
generated?
iCPT = iCPT *
m_Instances.attribute(nParent).numValues() + instance.value(nParent);
}

... ... the rest
```

Gmane