Mark Hall | 1 Mar 2010 02:13
Favicon

Re: cluster model validation

On 26/02/10 5:59 AM, wessel van persie wrote:
> Dear All,
>
> How to estimate the performance of models build by unsupervised
> learning algorithms in WEKA?
> I'm talking about algorithms which can be found in the associate or cluster tab.
>
> Is it possible to de a "data reproduction validation" in WEKA?
> This validation is like "classes to cluster validation" but without a
> fixed class.
>
> // X percent of the test data will be removed and validated on
> global X
>
> // pseudo code "data reproduction validation"
> for each record in testset:
>    randomly remove X values from record
>    using model(trainingset) reproduce these removed values
>    success rate  = number of values correctly reproduced

There is no general evaluation process to do this in Weka. I guess the within 
cluster sum of squares (computed by SimpleKMeans on the training data) is sort 
of similar to this.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
(Continue reading)

Mark Hall | 1 Mar 2010 02:17
Favicon

Re: Reducing Data Set using KNN (IBK)

On 27/02/10 1:48 AM, Uday Kamath wrote:
> Folks
> I want a good reduced data set for a large data set, for computational
> reasons . Following various papers, K-NN and variations are best for
> doing that. I know IBK in weka is KNN implementation, also there is
> Filter for "Resample" for reducing the data set. My question is
> 1. Can i independenlty use IBK for data set reduction, does this need
> code change as i don't see any options in (3.6 IBK) to output.
> 2. If not is there a way to Wire Resample to work with existing
> classifier like IBK etc?

If you are talking about using kNN to reduce the data set in some fashion then 
there isn't anything built in to Weka to do this - some coding would be required.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Mark Hall | 1 Mar 2010 02:22
Favicon

Re: Cloneable Instances

On 27/02/10 5:53 AM, Julien Carme wrote:
> Hello,
>
> I think it is rather useful to make deep copies of Instances. I have
> made it possible by making implementation of .clone() for all the
> necessary classes (Instance, all subclasses of Instance, Attribute,
> ProtectedProperties...)
>
> I was wondering wether they were reasons why these classes where not
> cloneable so far, and wether  maintener would be interested in such
> contribution.

I'd say speed is the main reason. Creating subsets of instances, where actual 
values don't have to change, is fast. Are you aware that modification of an 
attribute value in an Instance triggers a deep copy of all the values in the 
instance? Having a clone method (that deep copies) would make it easy for folks 
to create slower code if they weren't aware of the implications.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Mark Hall | 1 Mar 2010 02:31
Favicon

Re: SQLITE inconsistent state error from command line only

On 27/02/10 7:28 AM, ninjasmith wrote:
>
> Hi,
>
> I've sucessfully accessed my SQLITE results DB on mac using the latest
> zentus drivers.  I'm using weka 3-6-1.  However when I try and connect from
> the command line using the following command
>
> java weka.core.converters.DatabaseLoader -Q "select * from
> LATEST_RESULTS_FEATURESET_2"
>
>
> I get the following error
>
> --- Exception caught ---
>
> Message:   SQLite JDBC: inconsistent internal state
>
> null
>
> this seems awfully similar to an widely discussed error on this board from a
> year ago.  maybe the fix only works in explorer?
>
> I haven't had much sucess with .42 zentus drivers but will try again.
> anyone else sucessfully connected on the command line?

Hmm, the last time this issue came up we decided that the newer Zentus drivers 
didn't like having their result sets closed. At the time, I tried the native 
driver (0.42) for the Mac and it worked fine.

(Continue reading)

Lisham Bonakdar | 1 Mar 2010 09:35
Picon

Re: M5 model tree result

Dear Mark,

I would like to tank you for your help, but the sum of the second numbers in the parenthesis is not 100 %. Based on what you said, the sum of the second numbers should be 100 %.
Regards,
Lisham.

On Wed, Feb 24, 2010 at 12:22 AM, Mark Hall <mhall <at> pentaho.com> wrote:
On 24/02/10 9:28 AM, Lisham Bonakdar wrote:
Dear all,

I used M5 model tree for my work (design of rubble-mound breakwaters).
the developed tree is shown below.

Emo <= 0.325 : LM1 *(34/36.948%)*

Emo > 0.325 :

| Emo <= 0.595 : LM2 *(71/40.15%)*

| Emo > 0.595 :

| | P <= -0.651 : LM3 *(31/21.768%)*

| | P > -0.651 : LM4* (26/41.683%)*

I can not understand the meaning of "*(34/36.948%)" or
"**(26/41.683%)"** .* I was wondering if you can help me.

The first number in the parenthesis at each leaf is the number of instances that reach that leaf; the second, is the square root of the mean squared error of the predictions from the leaf's linear model for the instances that reach the leaf, expressed as a percentage of the global standard deviation of the class attribute (i.e. the standard deviation of the class attribute computed from all the training data).

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Häring, Tim (LWF | 1 Mar 2010 09:52
Picon
Favicon

high SMO Parameter

Hello list,

I`m trying to fit a SMO classifier using an test dataset to calibrate the single model parameters.
Unfortunately I don`t have a background in computer science or mathematics and so it is not that easy to
check out the available SVM literature. I would like to know which values for a SMO classifier are
reasonable? I get the best results when choosing a C (the complexity parameter) of 100 and a gamma for the
RBF Kernel of 10. For me that's fine, but I don`t know if those values are meaningful because the default
values for C or gamma were rather small (between 0 and 1).

Is there any SMO / SVM guy who can give me an estimate of these parameters?

Thanks a lot.

TIM

----------------------------------------------------------------------------------- 
Tim Häring
Bavarian State Institute of Forest Research 
Department of Forest Ecology
Hans-Carl-von-Carlowitz-Platz 1
D-85354 Freising

E-Mail: tim.haering <at> lwf.bayern.de
http://www.lwf.bayern.de

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Mark Hall | 1 Mar 2010 09:59
Favicon

Re: M5 model tree result

Lisham Bonakdar wrote:
> Dear Mark,
> 
> I would like to tank you for your help, but the sum of the second 
> numbers in the parenthesis is not 100 %. Based on what you said, the sum 
> of the second numbers should be 100 %.

No, the sum will not be 100% The quality of each leaf is just shown as 
the RMSE of the predictions for the instances at that leaf divided by 
the global standard deviation (i.e. the RMSE from predicting the mean on 
all the training instances). The smaller this value, the better.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Santiago Muelas | 1 Mar 2010 10:49
Picon
Picon
Favicon

Re: increase the precision of real-valued attributes

I will try that. Thank you for your answer.

Best regards

Santiago

Mark Hall wrote:
> Santiago Muelas wrote:
>> Hi,
>>
>> I would like to know if it is possible to increase the precision of 
>> the output of real-valued attributes with the command line version of 
>> Weka. For what I have found on the manual (page 14 of 3.7.1 version), 
>> Weka uses a double floating value representation but usually only 
>> uses seven decimal digits. Since it mentions "usually" I was hoping 
>> that there could be a way for increasing this number (although I have 
>> not found anything on the manual).
>
> I'm afraid that precision is hard-coded in Weka. You'd have to alter 
> the code for the method(s) that you're interested in in order to 
> increase this.
>
> Cheers,
> Mark.
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: 
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: 
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Katia Kermanidis | 1 Mar 2010 16:10
Picon

Weka and LSA

Dear all,

I am familiar with Weka, but now starting to get acquainted with LSA.
Is there a tutorial, or something else I can read, regarding how to use LSA with Weka?
Are there maybe any online examples available?

Thank you very much in advance.

Best regards,

Katia Kermanidis
Department of Informatics
Ionian University
Corfu, Greece
kerman <at> ionio.gr

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Harri Saarikoski | 1 Mar 2010 17:14
Picon

Re: high SMO Parameter



2010/3/1 Häring, Tim (LWF) <Tim.Haering <at> lwf.bayern.de>
Hello list,

I`m trying to fit a SMO classifier using an test dataset to calibrate the single model parameters. Unfortunately I don`t have a background in computer science or mathematics and so it is not that easy to check out the available SVM literature. I would like to know which values for a SMO classifier are reasonable? I get the best results when choosing a C (the complexity parameter) of 100 and a gamma for the RBF Kernel of 10. For me that's fine, but I don`t know if those values are meaningful because the default values for C or gamma were rather small (between 0 and 1).

Is there any SMO / SVM guy who can give me an estimate of these parameters?
 
I guess general scheme would to permutate by moving the decimal point incrementally, e.g.:
- gamma 0.001, 0.01, 0.1, 1, 10
- c 0.1, 1.0, 10, 100, 1000
but I don't know where if at all either maxes or mins out
so no more progress can be obtained
 
btw, weka has GridSearch and other operators allowing testing of multiple parameters of the same classifier
Harri
 

Thanks a lot.

TIM


-----------------------------------------------------------------------------------
Tim Häring
Bavarian State Institute of Forest Research
Department of Forest Ecology
Hans-Carl-von-Carlowitz-Platz 1
D-85354 Freising

E-Mail: tim.haering <at> lwf.bayern.de
http://www.lwf.bayern.de



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



--
-----------------
Harri M.T. Saarikoski
M.A, PhD graduate student
Helsinki University
Finland
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane