Hasan, Quazi M | 1 Nov 2010 09:09
Picon

Can anyone plz guide me in selecting SVM params

Hello,

 

I am trying to use SVM for my classification problem which is SMO in WEKA. However, I don’t know how to select the parameter values for SMO Linear classifier in WEKA. I tried to use CVParameterSelection, but the problem is in CVParameterSelection I have to pass some parameters. How will I choose those parameters? For example,

 

In CVParameterSelection, addCVParameter("C 2 10 5") can be used to choose the “C” value but, how will I choose that it will be varying from 2 to 10 ? It can be 2 to 100 also right?

 

Please guide me on choosing the parameter values in SMO Linear, SMO with PolyNomial Kernel and RBFKernel.

 

Thanks,

 

QUAZI HASAN

Graduate Assistant

The University of Texas at Arlington

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 1 Nov 2010 09:33
Favicon

Re: need help on using Linear SMO

On 30/10/10 9:07 AM, Hasan, Quazi M wrote:
> Hello Everyone,
>
> I am trying to use Linear SMO for my text classification. I did the
> following things to make it Linear:
>
> *SMO smoClassifier = new SMO();*
>
> *PolyKernel polyK = new PolyKernel();*
>
> **
>
> *polyK.setUseLowerOrder(false);*
>
> *polyK.setExponent(1.0);*
>
> *smoClassifier.setKernel(polyK);*
>
> *smoClassifier.buildClassifier(data);*
>
> *Evaluation eval = new Evaluation(trainData);*
>
> *eval.evaluateModel(smoClassifier, testInstances);*
>
> But, I’m getting very low *accuracy(26%).* Here the attributes in the
> arff file are all numeric values (the frequency of words). Could anyone
> tell me what I’m doing wrong?
>
> Thanks in advance.

How many classes has your data got? Have you tried different values for 
the complexity parameter (c), turning off normalization (or normalizing 
the vector length with weka.filters.unsupervised.instance.Normalize)? 
How does NaiveBayesMultinomial perform on your data?

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 1 Nov 2010 09:40
Favicon

Re: linux mysql classpath problem

On 30/10/10 1:51 PM, hexhawk wrote:
> Hello,
>
> I am trying to use weka to connect to a mysql database but after
> attempting many things to fix it I continue to get the "Trying to add
> database driver (JDBC): org.gjt.mm.mysql.Driver - Error, not in
> CLASSPATH?" error.
>
> I have the mysql-connector-java-5.1.13-bin.jar in my CLASSPATH. I have
> confirmed this by using 'echo $CLASSPATH' and it being correctly shown.
> I wrote a java program that uses the same jdbcURL that is listed in the
> DatabaseUtils.props that resides in my home directory to query the mysql
> server and it works properly. The only change I made to
> DatabaseUtils.props.mysql was to the jdbcURL line. My url line looks
> like this:
>
> jdbcURL=jdbc:mysql://localhost:3306/weka
>
> I am using jdk version 1.6.0_20 and weka-3-6-3. Inside weka's simplecli
> using the 'java weka.core.SystemInfo' command returns 'java.class.path:
> weka-3-6-3/weka.jar' so for some reason weka isn't picking up my
> CLASSPATH environment variable. After trying the same steps with
> different versions of the mysql-connector and the older weka-3-4-17 and
> getting the same results, I tried it on 2 more linux machines with the
> same error as well. I also tried putting the mysql-connector and the
> DatabaseUtils.props in the weka directory and then properly setting the
> CLASSPATH again, but to no avail. I confirmed the java program I wrote
> to test the mysql connectivity worked no matter where the connector was
> located.
>
> I had access to a friend's windows machine and I was able to follow the
> same steps and it worked the first time, but unfortunately I don't have
> a windows computer to use regularly. All the other machines I tried were
> able to reach the mysql server on my main box without any problems, so I
> doubt that the database is the problem.
>
> I have searched through the mailing list archives and endlessly on
> google to try to find a possible solution, but it appears that most
> people that get the error have not properly set CLASSPATH, and once they
> do, it works. Whenever I run into a problem like this it is 99% of the
> time some mistake that I have made, and I am sure it is in this case too.
>
> Does anyone have any tips for anything else to try? Could there be some
> kind of conflict between my versions of weka/jdk/mysql-connector?

Have you tried starting Weka like so:

java -cp 
weka-3-6-3/weka.jar:weka-3-6-3/mysql-connector-java-5.1.13-bin.jar 
weka.gui.GUIChooser

This assumes that the mysql jar file is in weka-3-6-3.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Hasan, Quazi M | 1 Nov 2010 15:48
Picon

RE: need help on using Linear SMO

Hi Mark,

Thanks for your response. Below you will find your question's answers:

1. How many classes has your data got? 

   Ans:I have 100 class labels.

2. Have you tried different values for the complexity parameter (c), turning off normalization (or
normalizing the vector length with weka.filters.unsupervised.instance.Normalize)? 

->For Complexity Parameter, I tried to use different values but, the highest accuracy I got after that is
49%. How will choose the value for "C" or on what basis do I need to vary the values of "C"?

->Normalization: I haven't tried normalization. What does normalization do? My features are numeric
values, which are frequency of words in text document.

3.How does NaiveBayesMultinomial perform on your data?

->With NaiveBayesMultinomial, I got 84% accuracy. Now to compare the result with NaiveBayes I'm using
SVM. SVM should perform better than NaiveBayes right?

4. Do I have to also set the epsilon value in Linear SVM? How will choose or try the different values of
Epsilon? Is there any boundary or base to select that value? Please help me.

Thanks,
-Quazi Hasan.

-----Original Message-----
From: wekalist-bounces <at> list.scms.waikato.ac.nz
[mailto:wekalist-bounces <at> list.scms.waikato.ac.nz] On Behalf Of Mark Hall
Sent: Monday, November 01, 2010 3:33 AM
To: Weka machine learning workbench list.
Subject: Re: [Wekalist] need help on using Linear SMO

On 30/10/10 9:07 AM, Hasan, Quazi M wrote:
> Hello Everyone,
>
> I am trying to use Linear SMO for my text classification. I did the 
> following things to make it Linear:
>
> *SMO smoClassifier = new SMO();*
>
> *PolyKernel polyK = new PolyKernel();*
>
> **
>
> *polyK.setUseLowerOrder(false);*
>
> *polyK.setExponent(1.0);*
>
> *smoClassifier.setKernel(polyK);*
>
> *smoClassifier.buildClassifier(data);*
>
> *Evaluation eval = new Evaluation(trainData);*
>
> *eval.evaluateModel(smoClassifier, testInstances);*
>
> But, I'm getting very low *accuracy(26%).* Here the attributes in the 
> arff file are all numeric values (the frequency of words). Could 
> anyone tell me what I'm doing wrong?
>
> Thanks in advance.

How many classes has your data got? Have you tried different values for the complexity parameter (c),
turning off normalization (or normalizing the vector length with
weka.filters.unsupervised.instance.Normalize)? 
How does NaiveBayesMultinomial perform on your data?

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
GDombi | 1 Nov 2010 20:01
Picon
Favicon

Re: Backpropagation analysis using weka

Hi Marylee,

My experience is with other neural network programs.
But I assume they all setup similarly.

Your network is setup 4 input nodes, 3 hidden and 3 output.
In reality there are 5 input nodes, 4 hidden and 3 output.
Each level but the output has a constant firing neuron (node) that
prevents a level from giving no signal. One can think of this node as
the intercept in a regression analysis. 

This node can bear some of the explanatory weight of the model, but it
should not bear too much otherwise your variables have not correlation
to the outcome. 

There is more to the story but I don't know how WEKA ANNs work in
particular.

Bye for now,

George

On Sun, 2010-10-31 at 11:00 -0700, MaryLee wrote:
> Hi ,
> I used weka to make a classification process on iris dataset. 
> this data set has 4 attributes and 4 input units, 3 hidden units and 3
> outputs 
> I have received these results on weka but I could not understand the meaning
> of the first part which says 
> Node 0 and threshold and these things ( because I do not think
> backpropagation use threshold. the perceptron only use it ) I do not know if
> I am write. and why in the result there are 5 node??
> Ok these are the results can you explain them to me :( , please?
> === Run information ===
> 
> Scheme:       weka.classifiers.functions.MultilayerPerceptron -L 0.7 -M 0.2
> -N 500 -V 0 -S 0 -E 20 -H a -G -R
> Relation:     iris
> Instances:    150
> Attributes:   5
>               sepallength
>               sepalwidth
>               petallength
>               petalwidth
>               class
> Test mode:    2-fold cross-validation
> 
> === Classifier model (full training set) ===
> 
> Sigmoid Node 0
>     Inputs    Weights
>     Threshold    -0.0491382673800735
>     Node 3    0.027023429587306844
>     Node 4    0.047737802038675406
>     Node 5    -8.617050637471438E-4
> Sigmoid Node 1
>     Inputs    Weights
>     Threshold    -0.001813155428890656
>     Node 3    0.004644222421406531
>     Node 4    -0.017963170264953733
>     Node 5    -0.029948356201707205
> Sigmoid Node 2
>     Inputs    Weights
>     Threshold    -0.015015642691489917
>     Node 3    -0.03513942801283209
>     Node 4    -0.02923590167358281
>     Node 5    -0.012147148731891946
> Sigmoid Node 3
>     Inputs    Weights
>     Threshold    0.021914050730164863
>     Attrib sepallength    0.0242525172953017
>     Attrib sepalwidth    -0.0305390009971664
>     Attrib petallength    0.03355062157463565
>     Attrib petalwidth    -0.019746308149674242
> Sigmoid Node 4
>     Inputs    Weights
>     Threshold    0.01390399970937567
>     Attrib sepallength    0.022662858742601488
>     Attrib sepalwidth    0.03201780630162036
>     Attrib petallength    0.02401299820526416
>     Attrib petalwidth    -0.018754663650378568
> Sigmoid Node 5
>     Inputs    Weights
>     Threshold    -0.016249285177650778
>     Attrib sepallength    -0.031117935463524673
>     Attrib sepalwidth    0.04259685106621908
>     Attrib petallength    0.02133110393823469
>     Attrib petalwidth    0.04388816188910931
> Class Iris-setosa
>     Input
>     Node 0
> Class Iris-versicolor
>     Input
>     Node 1
> Class Iris-virginica
>     Input
>     Node 2
> 
> 
> Time taken to build model: 21.87 seconds
> 
> === Stratified cross-validation ===
> === Summary ===
> 
> Correctly Classified Instances          50               33.3333 %
> Incorrectly Classified Instances       100               66.6667 %
> Kappa statistic                          0     
> Mean absolute error                      0.4444
> Root mean squared error                  0.4715
> Relative absolute error                 99.9975 %
> Root relative squared error            100.0097 %
> Total Number of Instances              150     
> 
> === Detailed Accuracy By Class ===
> 
>                TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area 
> Class
>                  0         0          0         0         0          1       
> Iris-setosa
>                  0         0          0         0         0          0.513   
> Iris-versicolor
>                  1         1          0.333     1         0.5        0.021   
> Iris-virginica
> Weighted Avg.    0.333     0.333      0.111     0.333     0.167      0.511
> 
> === Confusion Matrix ===
> 
>   a  b  c   <-- classified as
>   0  0 50 |  a = Iris-setosa
>   0  0 50 |  b = Iris-versicolor
>   0  0 50 |  c = Iris-virginica
> 
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
MaryLee | 1 Nov 2010 21:51
Picon
Favicon

Re: Backpropagation analysis using weka


Hi George,

thank you for replying and sharing what you know, you really help me 
now I can guess that the theshold in the result of backpropagation in weka
means the constant neuron
and I can see clearly what does it mean by node 0,1, and 2 these are the
output neurons 
while 3,4 and 5 means the hidden units 
and under each of them you can see the inputs of each node
:)  

I have another different question.
In weka there are many ways to stop training network such as the number of
epochs. another way is by determining threshold. if the error falls below
this threshold the network will stop training
I could find the epochs in weka but I could not find any parameter to
determine the threshold 
what I found is something called validationthreshold which terminate
validation testing!!! which is different from what I want 
so how can I find this parameter which determine this threshold on training
data??

Kind regards,
MaryLee

GDombi wrote:
> 
> Hi Marylee,
> 
> My experience is with other neural network programs.
> But I assume they all setup similarly.
> 
> Your network is setup 4 input nodes, 3 hidden and 3 output.
> In reality there are 5 input nodes, 4 hidden and 3 output.
> Each level but the output has a constant firing neuron (node) that
> prevents a level from giving no signal. One can think of this node as
> the intercept in a regression analysis. 
> 
> This node can bear some of the explanatory weight of the model, but it
> should not bear too much otherwise your variables have not correlation
> to the outcome. 
> 
> There is more to the story but I don't know how WEKA ANNs work in
> particular.
> 
> Bye for now,
> 
> George
> 
> On Sun, 2010-10-31 at 11:00 -0700, MaryLee wrote:
>> Hi ,
>> I used weka to make a classification process on iris dataset. 
>> this data set has 4 attributes and 4 input units, 3 hidden units and 3
>> outputs 
>> I have received these results on weka but I could not understand the
>> meaning
>> of the first part which says 
>> Node 0 and threshold and these things ( because I do not think
>> backpropagation use threshold. the perceptron only use it ) I do not know
>> if
>> I am write. and why in the result there are 5 node??
>> Ok these are the results can you explain them to me :( , please?
>> === Run information ===
>> 
>> Scheme:       weka.classifiers.functions.MultilayerPerceptron -L 0.7 -M
>> 0.2
>> -N 500 -V 0 -S 0 -E 20 -H a -G -R
>> Relation:     iris
>> Instances:    150
>> Attributes:   5
>>               sepallength
>>               sepalwidth
>>               petallength
>>               petalwidth
>>               class
>> Test mode:    2-fold cross-validation
>> 
>> === Classifier model (full training set) ===
>> 
>> Sigmoid Node 0
>>     Inputs    Weights
>>     Threshold    -0.0491382673800735
>>     Node 3    0.027023429587306844
>>     Node 4    0.047737802038675406
>>     Node 5    -8.617050637471438E-4
>> Sigmoid Node 1
>>     Inputs    Weights
>>     Threshold    -0.001813155428890656
>>     Node 3    0.004644222421406531
>>     Node 4    -0.017963170264953733
>>     Node 5    -0.029948356201707205
>> Sigmoid Node 2
>>     Inputs    Weights
>>     Threshold    -0.015015642691489917
>>     Node 3    -0.03513942801283209
>>     Node 4    -0.02923590167358281
>>     Node 5    -0.012147148731891946
>> Sigmoid Node 3
>>     Inputs    Weights
>>     Threshold    0.021914050730164863
>>     Attrib sepallength    0.0242525172953017
>>     Attrib sepalwidth    -0.0305390009971664
>>     Attrib petallength    0.03355062157463565
>>     Attrib petalwidth    -0.019746308149674242
>> Sigmoid Node 4
>>     Inputs    Weights
>>     Threshold    0.01390399970937567
>>     Attrib sepallength    0.022662858742601488
>>     Attrib sepalwidth    0.03201780630162036
>>     Attrib petallength    0.02401299820526416
>>     Attrib petalwidth    -0.018754663650378568
>> Sigmoid Node 5
>>     Inputs    Weights
>>     Threshold    -0.016249285177650778
>>     Attrib sepallength    -0.031117935463524673
>>     Attrib sepalwidth    0.04259685106621908
>>     Attrib petallength    0.02133110393823469
>>     Attrib petalwidth    0.04388816188910931
>> Class Iris-setosa
>>     Input
>>     Node 0
>> Class Iris-versicolor
>>     Input
>>     Node 1
>> Class Iris-virginica
>>     Input
>>     Node 2
>> 
>> 
>> Time taken to build model: 21.87 seconds
>> 
>> === Stratified cross-validation ===
>> === Summary ===
>> 
>> Correctly Classified Instances          50               33.3333 %
>> Incorrectly Classified Instances       100               66.6667 %
>> Kappa statistic                          0     
>> Mean absolute error                      0.4444
>> Root mean squared error                  0.4715
>> Relative absolute error                 99.9975 %
>> Root relative squared error            100.0097 %
>> Total Number of Instances              150     
>> 
>> === Detailed Accuracy By Class ===
>> 
>>                TP Rate   FP Rate   Precision   Recall  F-Measure   ROC
>> Area 
>> Class
>>                  0         0          0         0         0          1       
>> Iris-setosa
>>                  0         0          0         0         0         
>> 0.513   
>> Iris-versicolor
>>                  1         1          0.333     1         0.5       
>> 0.021   
>> Iris-virginica
>> Weighted Avg.    0.333     0.333      0.111     0.333     0.167     
>> 0.511
>> 
>> === Confusion Matrix ===
>> 
>>   a  b  c   <-- classified as
>>   0  0 50 |  a = Iris-setosa
>>   0  0 50 |  b = Iris-versicolor
>>   0  0 50 |  c = Iris-virginica
>> 
>> 
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 
> 
> 
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 
> 

--

-- 
View this message in context: http://old.nabble.com/Backpropagation-analysis-using-weka-tp30099455p30108502.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Remco Bouckaert | 1 Nov 2010 22:01
Picon

Re: Laplacian Smoothing for Bayesian Networks

On Sunday 31 October 2010 08:25:54 Sauli Rintanen wrote:
> I want to ask a question about TAN (Tree Augmentd Naive Bayes) Search
> algorithm (located in the Bayesian Network section in Weka). I want to
> apply Laplace Smoothing to TAN classifier, but I couldn't figure out how
> will I do it.
> 
> 
> I know that Laplace smoothing for Naive Bayes is:  P(X=i) = (ni + 1) /
> (N+K) but what is the Laplace Smoothing formula for TAN? I was
> unsuccessful in my Google searches..

The way Bayes nets are organized in Weka is that learning of the networks 
structure is completely separate from learning of the conditional probability 
tables.

TAN is an algorithm for learning a network structure.

Laplace smoothing is a method for probability estimation.

So, these methods are completely unrelated. By default, the conditional 
probability tables are estimated as P(X=i) = (ni + alpha) / (N+K*alpha) where 
alpha=0.5 by default. To get the estimator you want, just specify alpha as 1 
by setting the appropriate option (-A 1 for SimpleEstimator).

Remco

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Curtis Jensen | 1 Nov 2010 23:03

Re: Re: a problem about hierarchical clustering using Cobweb

"Cobweb can be a little tricky"

FYI: I've found that modifying the cutoff value can be helpful in
producing better results.  Generally, it over fits or under fits.
Finding the cutoff value in between the two states is an iterative
effort.  I've had to go out many decimal places to find good values.

--
Curtis

On Thu, Oct 28, 2010 at 11:24 AM, Saunders, Stewart <ssaunder <at> purdue.edu> wrote:
> Dan -  Leaf 2 contains just 1 instance.  It is not necessarily instance 1;
> it may be instance 25; you cannot tell from the tree.  Cobweb can be a
> little tricky.
>
> Stewart
>
> -----Original Message-----
> From: wekalist-bounces <at> list.scms.waikato.ac.nz
> [mailto:wekalist-bounces <at> list.scms.waikato.ac.nz] On Behalf Of Dan He
> Sent: Thursday, October 28, 2010 12:58 PM
> To: wekalist
> Subject: [Wekalist] Re: a problem about hierarchical clustering using Cobweb
>
> Hi, anybody used Cobweb in weka before? I haven't got any answer for my
> question yet.
>
> Thanks
> Dan
>
> ----- Original Message -----
> From: "Dan He" <danhe <at> cs.ucla.edu>
> To: "wekalist" <wekalist <at> list.scms.waikato.ac.nz>
> Sent: Monday, October 25, 2010 11:32:33 PM
> Subject: Re: a problem about hierarchical clustering using Cobweb
>
> Hi, anybody knows the answer for my following question?
>
> Thanks
> Dan
>
> ----- Original Message -----
> From: "Dan He" <danhe <at> cs.ucla.edu>
> To: "wekalist" <wekalist <at> list.scms.waikato.ac.nz>
> Sent: Monday, October 25, 2010 10:04:10 AM
> Subject: a problem about hierarchical clustering using Cobweb
>
> Hi,All:
>  I have a problem on how to interpret the results of Cobweb, the
> hierarchical clustering algorithm.
>
> For example:
>
> node 0 [26]
> |   node 1 [22]
> |   |   leaf 2 [1]
> |   node 1 [22]
> |   |   leaf 3 [2]
> |   node 1 [22]
> |   |   node 4 [2]
> |   |   |   leaf 5 [1]
> |   |   node 4 [2]
> |   |   |   leaf 6 [1]
> |   node 1 [22]
> |   |   node 7 [4]
> |   |   |   leaf 8 [1]
> |   |   node 7 [4]
> |   |   |   leaf 9 [1]
> |   |   node 7 [4]
> |   |   |   leaf 10 [1]
>
> I can only see the leaf index, but how to map the leaf index to the instance
> indexes? For example, does leaf 2 map to instance 1? Leaf 3 contains 2
> instances, does it map to instance 2 and instance 3, and similarly, leaf 5
> maps to instance 4?
>
> Thanks
> Dan
>
> --
>
> 2933 Math&Science Building
> www.cs.ucla.edu/~danhe
> 310-206-9096(O)
>
> --
>
> 2933 Math&Science Building
> www.cs.ucla.edu/~danhe
> 310-206-9096(O)
>
> --
> 2933 Math&Science Building
> www.cs.ucla.edu/~danhe
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Curtis Jensen | 1 Nov 2010 23:03

Re: EM Clustering Processing Time

FYI:  It's been my experience that EM does take a while.  20K
instances with 35 attributes took several hours for me.

--
Curtis

On Wed, Oct 13, 2010 at 7:43 PM, IQB <hasifiqbal <at> gmail.com> wrote:
>
> When i try to cluster a dataset having 17049 entries with 12 attributes per
> line, it takes a few hours to cluster. Is it normal? Any way to change weka
> settings to decrease the processing time?
> --
> View this message in context:
> http://old.nabble.com/EM-Clustering-Processing-Time-tp29958693p29958693.html
> Sent from the WEKA mailing list archive at Nabble.com.
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 2 Nov 2010 07:45
Favicon

Re: j48 giving incosistent results...what to do?

On 29/10/10 11:09 PM, gaurav bansal wrote:
> Hi all,
> i have three classes say a,b & c to be classified. i have data sets for
> all the three classes
> and i use the following two approached for classification :
>
> 1) i took the first 1000 instances of each class from the thousands of
> instances available for each class
> and with the 3000 instances of all the 3 classes, i used j48 classification.
> Now the decision tree that i got, gives good results (accuracy >90%)
> when i used the test sets of class a
> but around 50% detection accuracy only when used on test sets of class b.
>
> 2) for the other approach, i used the resample filter to get 1000
> instances of each class
> and with the 3000 instances of all the 3 classes, i used j48 classification.
> Now the decision tree that i got, gives good results (accuracy >90%)
> when i used the test sets of class b
> but around 50% detection accuracy only when used on test sets of class a.
>
> can anyone please tell me what to do (how to process my training data)
> so as to get a decision tree
> that gives me consistent results across different classes ?

Do you get similar behavior when running a cross-validation on each of 
your training data sets? Perhaps your samples aren't large enough.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Gmane