Eibe Frank | 1 Feb 01:17 2005
Picon
Picon

Re: Problem running a Windows-made Experiment on a Linux machine

The class that throws the exception is not a Weka class. It's part of 
the Java distribution.

I had the same problem last week or so. It's not a Linux vs. Windows 
issue. The problem appears to occur when you are trying to run some 
classes from Java 1.5 on Java 1.4 or earlier. In my case the problem 
went away when I used exactly the same release versions of Java under 
both Windows and Linux.

Cheers,
Eibe

> Hi,
>
> I have the problem that I made an experiment using the Experimenter on 
> a Windows machine and that transferred the experiment-file to a Linux 
> machine and tried to run the experiment there. I took both, the 
> Windows and the Linux-Version of Weka off the net about the same time 
> - about two weeks from now. Since then I tried to get this to run but 
> I get no futher than this error message when running the experiment
>
> javax.swing.AbstractListModel; local class incompatible: stream 
> classdesc serialVersionUID = 6318899794864351705, local class 
> serialVersionUID = -3285184064379168730
>
> I know that some of my colleagues got it working and their experiments 
> run on the same version of Weka on linux. But neither did a new 
> installation of both - linux and windows - versions work nor did a 1 
> to 1 copy of the Weka directory. So what is not working here and how 
> can I fix it?
(Continue reading)

sione | 1 Feb 02:45 2005
Picon

Re: Independent Component Analysis


I thought some mentioned translating the matlab fastica version to Weka using Jama, but I can't find the post now. Does anyone know, did fastica make it to Weka?
Cliff,

Yes , it was me who mentioned  sometime last year that I will translate MatLab FastICA to Java
using JAMA. I have not got to it yet , but I did translate some other  MatLab ICA such as SOBI
(Second Order Blind Identification) , FOBI (Fourth Order Blind Identification) , EASI (Equivariant
Adaptive Separation via Independence) and a few others. FastICA is a larger package (many files and
functions) compared to what other MatLab ICA codes that I have translated  (most are just third of
a page to a full page long) . This  means that  more utility functions (and methods) which are
currently not available in JAMA are needed to be developed to make the translation of FastICA possible.
I have just started working on Java FastICA. When this is completed , I will post it to this list ,where
anybody who is interested in integrating it to Weka can do so. I think whoever  is going to integrate
the Java FastICA to WEKA, will still have to modify the codes from 'matrix based' to 'Java double
array based'.

Cheers,
Sione.
_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Tarkan Kurt | 1 Feb 02:51 2005
Picon

ROC Analysis in WEKA

Hi everyone,

Sorry if this question has been covered before, but could anyone help in how 
to perform roc analysis in weka. Simple or Advanced, all help is welcome.

Thank you

Tarkan
Elena Eneva | 1 Feb 10:06 2005

multi-class Logistic Regression probability distribution?

I have a question about using the coefficients of the WEKA Logistic Regression model to compute the probability distribution over the target classes.

 

I train a WEKA Logistic Regression model on some training data, and get the coefficients which the model outputs. Then using those coefficients I compute (outside of WEKA) the probability distribution over the target classes for the instances of the training set. When I compare that to the WEKA-produced probability distribution of the training set, there are differences, sometimes major. It makes me think that I don't understand how WEKA comes up with the probability distribution. Please help, if you can.

 

Here is what I do:

 

My dataset has 9 attributes, and each one is discretized into 10 buckets (effectively producing 90 binary attributes).

The target class is discretized into 4 classes/buckets.

 

I run the Logistic Regression classifier (maxIts=-1, ridge=0), and it produces 3 sets of coefficients (90 in each set, plus the intercept), as well as 3 sets of odds ratios (90 in each set).

 

For each training set instance (with attributes [x_1 .. x_90]), I compute three log odds ratios, based on the three coefficient sets [c1_1 .. c1_90], [c2_1 .. c2_90], and [c3_1 .. c3_90]:

 

g1 = c1_0 + x_1*c1_1 + x_2*c1_2 + ... + x_90*c1_90

g2 = c2_0 + x_1*c2_1 + x_2*c2_2 + ... + x_90*c2_90

g3 = c3_0 + x_1*c3_1 + x_2*c3_2 + ... + x_90*c3_90.

 

Let

P1 stand for P(instance is in class 1)

P2 stand for P(instance is in class 2)

P3 stand for P(instance is in class 3)

P4 stand for P(instance is in class 4).

 

Since:

 

g1 = log (P1/P4)

g2 = log (P2/P4)

g3 = log (P3/P4)

and

P1 + P2 + P3 + P4 = 1

 

we get:

 

P1 = exp(g1)*P4

P2 = exp(g2)*P4

P3 = exp(g3)*P4

P4 = 1/(1 + exp(g1) + exp(g2) + exp(g3))

 

Is this correct? The probabilities I get for the test set using the above formulas match perfectly those produced by WEKA for some instances, and for some are quite different. What's going on - ideas?

 

Thanks in advance for your help -

 

Elena

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Subramanyam Chitti | 1 Feb 20:14 2005
Picon

java Exception

Hi All,
  When I try to run weka, like in the README, I get the following
exception. Please help me out. I have no idea how to deal with this.

$ java -jar weka.jar
Exception in thread "main" java.util.zip.ZipException: No such file or
directory
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:112)
        at java.util.jar.JarFile.<init>(JarFile.java:127)
        at java.util.jar.JarFile.<init>(JarFile.java:65)

Thank you,
Chitti
Elena Eneva | 1 Feb 20:38 2005

multi-class Logistic Regression: part 2

Alright - I found a "More" button for Logistic Regression on a
Windows-version of Weka, which leads to an explanation of how to compute
the probability distribution over the classes, given the coefficient
matrix. 

The good news is, I was doing it correctly. 

The bad news is, I still don't understand why my class probabilities on
a test set are not always the same as the ones Weka comes up with
(sometimes they are identical, mostly they are close, but sometimes they
are very different). 

In the Weka explanation, I found this: "although original Logistic
Regression does not deal with instance weights, we modify the algorithm
a little bit to handle the instance weights." Could this have something
to do with the problem, or is this referring to the ridge estimator?

(I run "Logistic -D -R 0.0 -M -1", which is maxIts=-1, ridge=0)

Any help and ideas are much appreciated!
Elena

*****

For those who are interested, here is the Weka info I found on how to
compute the probabilities:

" ...
If there are k classes for n instances with m attributes, the parameter
matrix B to be calculated will be an m*(k-1) matrix.

The probability for class j with the exception of the last class is

Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1) 

The last class has probability

1-(sum[j=1..(k-1)]Pj(Xi)) 
	= 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1)
... "

*****
________________________________________
From: Elena Eneva 
Sent: Tuesday, February 01, 2005 12:58 AM
To: wekalist <at> list.scms.waikato.ac.nz
Subject: multi-class Logistic Regression probability distribution?

I have a question about using the coefficients of the WEKA Logistic
Regression model to compute the probability distribution over the target
classes. 

I train a WEKA Logistic Regression model on some training data, and get
the coefficients which the model outputs. Then using those coefficients
I compute (outside of WEKA) the probability distribution over the target
classes for the instances of the training set. When I compare that to
the WEKA-produced probability distribution of the training set, there
are differences, sometimes major. It makes me think that I don't
understand how WEKA comes up with the probability distribution. Please
help, if you can.

Here is what I do:

My dataset has 9 attributes, and each one is discretized into 10 buckets
(effectively producing 90 binary attributes).
The target class is discretized into 4 classes/buckets.

I run the Logistic Regression classifier (maxIts=-1, ridge=0), and it
produces 3 sets of coefficients (90 in each set, plus the intercept), as
well as 3 sets of odds ratios (90 in each set).

For each training set instance (with attributes [x_1 .. x_90]), I
compute three log odds ratios, based on the three coefficient sets [c1_1
.. c1_90], [c2_1 .. c2_90], and [c3_1 .. c3_90]:

g1 = c1_0 + x_1*c1_1 + x_2*c1_2 + ... + x_90*c1_90 
g2 = c2_0 + x_1*c2_1 + x_2*c2_2 + ... + x_90*c2_90 
g3 = c3_0 + x_1*c3_1 + x_2*c3_2 + ... + x_90*c3_90. 

Let 
P1 stand for P(instance is in class 1)
P2 stand for P(instance is in class 2)
P3 stand for P(instance is in class 3)
P4 stand for P(instance is in class 4).

Since: 

g1 = log (P1/P4)
g2 = log (P2/P4)
g3 = log (P3/P4)
and
P1 + P2 + P3 + P4 = 1

we get: 

P1 = exp(g1)*P4
P2 = exp(g2)*P4
P3 = exp(g3)*P4
P4 = 1/(1 + exp(g1) + exp(g2) + exp(g3))

Is this correct? The probabilities I get for the test set using the
above formulas match perfectly those produced by WEKA for some
instances, and for some are quite different. What's going on - ideas?

Thanks in advance for your help - 

Elena
Eibe Frank | 2 Feb 02:21 2005
Picon
Picon

Re: multi-class Logistic Regression: part 2


On Feb 2, 2005, at 9:12 AM, wekalist-request <at> list.scms.waikato.ac.nz 
wrote:

> The bad news is, I still don't understand why my class probabilities on
> a test set are not always the same as the ones Weka comes up with
> (sometimes they are identical, mostly they are close, but sometimes 
> they
> are very different).
>
> In the Weka explanation, I found this: "although original Logistic
> Regression does not deal with instance weights, we modify the algorithm
> a little bit to handle the instance weights." Could this have something
> to do with the problem, or is this referring to the ridge estimator?

No, weights only affect training, not testing.

One possible explanation for the discrepancy might be a buggy Java 
Virtual Machine. Another might be numerical instability.

Here's the piece of code from Logistic.java that computes the 
probabilities:

   private double[] evaluateProbability(double[] data){
     double[] prob = new double[m_NumClasses],
       v = new double[m_NumClasses];

     // Log-posterior before normalizing
     for(int j = 0; j < m_NumClasses-1; j++){
       for(int k = 0; k <= m_NumPredictors; k++){
         v[j] += m_Par[k][j] * data[k];
       }
     }
     v[m_NumClasses-1] = 0;

     // Do so to avoid scaling problems
     for(int m=0; m < m_NumClasses; m++){
       double sum = 0;
       for(int n=0; n < m_NumClasses-1; n++)
         sum += Math.exp(v[n] - v[m]);
       prob[m] = 1 / (sum + Math.exp(-v[m]));
     }

     return prob;
   }

Cheers,
Eibe
Mahesh Joshi | 2 Feb 05:51 2005
Picon

Some questions on Sequential Minimal Optimization for SVMs


Hi,

I am currently using SMO for some of my experiments. I came across a 
situation where all the training instances have the SAME CLASS. When 
training SMO on such a data set, I get the following exception:

======================================================================
java.lang.IllegalArgumentException: Can't normalize array. Sum is zero.
         at weka.core.Utils.normalize(Unknown Source)
         at weka.core.Utils.normalize(Unknown Source)
         at 
weka.classifiers.functions.SMO.distributionForInstance(Unknown Source)
         at weka.classifiers.Evaluation.evaluateModelOnce(Unknown Source)
         at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
         at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
         at weka.classifiers.functions.SMO.main(Unknown Source)
Can't normalize array. Sum is zero.
======================================================================

I noticed that there is a parameter to normalize/standardize/neither. Is 
this related in any way to the exception I receive? Trying -N 1/2 did 
not help. Any ideas on where I might be going wrong?

It seems logical that with such a data set, a classifier should probably 
learn that there is just one class and assign the same to all instances. 
This is true for the other classifiers. Has this something to do with 
the inherent binary nature of the SVM classifier?

Another question that I have is about multi-class SVMs. I read in the 
documentation that for the current SMO implementation, multiple binary 
classifiers are constructed and their results are combined. Could anyone 
please elaborate on how these results are combined. Any documentation / 
pointers regarding the same will be greatly appreciated.

Thank you!

Regards,
Mahesh
Eibe Frank | 3 Feb 02:05 2005
Picon
Picon

Re: Some questions on Sequential Minimal Optimization for SVMs

The exception is due to a bug. Yes, if there is only one class, that 
class should be predicted.

As the online help (and the Javadoc) state, multi-class problems are 
dealt with using "pairwise classification" (or "pairwise coupling" when 
logistic models are fit to the SVM's output).

Cheers,
Eibe

On Feb 3, 2005, at 12:28 PM, wekalist-request <at> list.scms.waikato.ac.nz 
wrote:

> Hi,
>
> I am currently using SMO for some of my experiments. I came across a 
> situation where all the training instances have the SAME CLASS. When 
> training SMO on such a data set, I get the following exception:
>
> ======================================================================
> java.lang.IllegalArgumentException: Can't normalize array. Sum is zero.
>         at weka.core.Utils.normalize(Unknown Source)
>         at weka.core.Utils.normalize(Unknown Source)
>         at 
> weka.classifiers.functions.SMO.distributionForInstance(Unknown Source)
>         at weka.classifiers.Evaluation.evaluateModelOnce(Unknown 
> Source)
>         at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
>         at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
>         at weka.classifiers.functions.SMO.main(Unknown Source)
> Can't normalize array. Sum is zero.
> ======================================================================
>
> I noticed that there is a parameter to normalize/standardize/neither. 
> Is this related in any way to the exception I receive? Trying -N 1/2 
> did not help. Any ideas on where I might be going wrong?
>
> It seems logical that with such a data set, a classifier should 
> probably learn that there is just one class and assign the same to all 
> instances. This is true for the other classifiers. Has this something 
> to do with the inherent binary nature of the SVM classifier?
>
> Another question that I have is about multi-class SVMs. I read in the 
> documentation that for the current SMO implementation, multiple binary 
> classifiers are constructed and their results are combined. Could 
> anyone please elaborate on how these results are combined. Any 
> documentation / pointers regarding the same will be greatly 
> appreciated.
>
> Thank you!
>
> Regards,
> Mahesh
Denis Bueno | 3 Feb 07:22 2005
Picon

User-extensible information gain metrics for Id3

Not sure if this is exactly the right place to ask this, but here goes.

I have an assignment for school for which I'm using Weka. (Don't
worry; the assignment doesn't require me to _implement_ a machine
learning algorithm; just to _compare_ performances of various
algorithms on some data.) One of the parts of the assignment is to
implement an alternate "information gain"-type metric for Id3.

So I figured the easiest way would be to make a classifier extend
weka.classifiers.Id3, and over-ride the `computeInfoGain' method (of
whose existence I'm aware because I'm looking at the Weka source).
Alas, that method is private. I could easily make that method
protected, over-ride it with my version, recompile Weka, and just use
that, but, it seems that the ability to over-ride the
`computeInfoGain' method would be useful in general, so I wonder if
the Weka maintainers approve of making that change permanently.

Maybe there is something I'm overlooking which makes the task I'm
trying to complete even easier. Please tell me if that's the case.

Thanks in advance.

--

-- 
Denis Bueno

Gmane