Peter Reutemann | 1 Oct 2009 02:13
Picon

Re: Prediction Problem using Weka API with J48

Please no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html).

> I don't think it has to do with the number of attributes in the training and
> unlabeled data as they both have the same number of attributes (19+1 class
> attribute).  I am able to perform a prediction of the unlabeled data using
> Weka Explorer by using the training dataset to generate a J48 model and
> performing prediction using the unlabeled data.  The problem only occurs
> when using Weka API.  I should mention that the training data is not sorted
> by the class attribute (I don't think this should be a problem) but the
> order of the nominal class values of the class attribute in the  <at> attribute
> statement matches in both the training ARFF and the unlabeled ARFF.

I've created a little test class "Blah.java" from the supplied code
and ran it against the UCI dataset "iris" which worked fine for me.
Here's the commandline that I used:
  java Blah ~/somewhere/iris.arff ~/somewhere/iris.arff ~/somewhere_else/out.txt

Quick explanation of the parameters:
1. training set
2. unlabeled set
3. output file for the newly generated, labeled instances

For simplicity, I just used the same dataset as unlabeled data (I
ensure in the code that there are no class labels set).

You might want to try this class with your data and see what happens.

Also, please post next time *what* version of Weka you were using. I'm
using post-3.7.0 code (revision 5991).
(Continue reading)

Rafał Wardas | 1 Oct 2009 02:53
Picon

Cluster SimpleKMeans tbc.

Hmm..
 
I'm still fighting with cluster. I spent few hours on testing different combination of  Filters : Discretize -> AddCluster   and Cluster.
I think that I know why I get wrong results. AddCluster is probably some kind of polynomial from all parameters in selected range
with some probability for each, but it's not "really" connected with my Discretized value ( fast/medium/slow). That's why when I create
Cluster, matrix 3 x 3 ( Cluster to class evaluation ) from groups in Cluster and discretized Class are not properly distributed.
( I think that 80+% of values should be on diagonal ). For me Your solution is not correct, there might be missing some actions...

Even when I created addCluster   filter based only on Discretized value (fast/medium/slow)  later in Discretization for  SimpleKMeans
I received 50% of incorrect instances.

There is one article in WEKA about problem similar to my, with filter "Discretize", but Cluster is independent from input data.

Should I use something else then Cluster?

My Algorithm is really easy :
1. Discretize execution to 3 classes.
2. Create 3 cluster groups from rest of params where cluster group == Discretized class
3. Classify Instance without "execution time" to one of class from point 1

Rafal.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
J K Rai | 1 Oct 2009 10:41
Picon
Favicon

Re: Classifier

Hello

Harri, may I request you to elaborate in general on thumb rules based on the kind of information sought.

Regards,
Jitendra

--- On Wed, 30/9/09, Harri Saarikoski <harri.saarikoski <at> gmail.com> wrote:

> From: Harri Saarikoski <harri.saarikoski <at> gmail.com>
> Subject: Re: [Wekalist] Classifier
> To: "Weka machine learning workbench list." <wekalist <at> list.scms.waikato.ac.nz>
> Date: Wednesday, 30 September, 2009, 5:06 PM
> 
> 
> 2009/9/30 Fernando Zuher <fernando.zuher <at> gmail.com>
> 
> Hi people!
> 
> 
> 
> I am searching in Weka one classifier than better being to
> my dataset.
> 
> How can I find one?
> 
> 
> 
> Is it just empirically?
> 
> 
> 
> it is considered mostly empirical but some rules of thumb
> can be set
> 
> for that at least the following information about your
> dataset are needed:
> - number of features / prediction variables (also type:
> numeric / nominal / binary?)
> 
> - balance of instances -> classes, i.e. how many
> instances are there for each class
> - number of classes / targets
> 
> 
> - total number of instances / samples
> 
>  
> 
> Regards,
> 
> Fernando
> 
> 
> 
> _______________________________________________
> 
> Wekalist mailing list
> 
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> 
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 
> 
> 
> 
> -- 
> -----------------
> Harri M.T. Saarikoski
> M.A, PhD graduate student
> Helsinki University 
> Finland
> 
> 
> -----Inline Attachment Follows-----
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 

      Add whatever you love to the Yahoo! India homepage. Try now! http://in.yahoo.com/trynew

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
magomerlino | 1 Oct 2009 11:43
Picon

Re: no suitable driver found


Peter Reutemann-3 wrote:
> 
> 
> OK, I've just tested the following:
> - freshly installed Windows XP (32bit) running inside VirtualBox (I
> don't use Windows at all)
> - installed weka-3.6.1jre.exe (includes Java JRE 1.5.0_18)
> - installed MySQL essentials 5.1.39 MSI package
> - created database "weka"
> - downloaded MySQL Connector/J 5.1.10 and added the jar to my CLASSPATH
> variable
> - saved the DatabaseUtils.props that you provided in my Weka
> installation directory:
>   ~ changed URL to point to my "weka" database on my machine
>   ~ added "INT=5" to the props file (necessary for running the
> Experimenter)
> - ran an experiment on the iris dataset with ZeroR/J48, storing the
> results in the "weka" database
> - fired up the Explorer, connected to the "weka" database and returned
> the results from the Experiment run ("select * from Results0")
> - loaded data just fine, in SqlViewer and Explorer
> 
> Bottom line: no problems whatsoever.
> 
> Looks like there is something funny going on with your OS setup.
> Sorry, no idea what the problem is.
> 
> Cheers, Peter
> -- 
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
> 
> 

Thanks for all.
--

-- 
View this message in context: http://www.nabble.com/no-suitable-driver-found-tp25561699p25695781.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Paul Adriani | 1 Oct 2009 13:12
Picon

(no subject)

Dear mr,

 

Today I created the attached file prediction_creator_evaluation.kfml .  Using the attached train and test set I want to evaluate and save the results I visualize by using a text viewer component of the knowledge flow. Unfortunatly, I cannot  save the produced results automatically, manually it is possible by a doing a righter mouse click and choosing save result buffer. Is it possible to save these results automatically?

 

Warm regards,

 

Paul

 

 

Paul Adriani Msc.

Jan van Riebeekstraat 14-3

1057ZX Amsterdam

tel 0644141917

 

Infocaster BV

paul <at> infocaster.net

 

Attachment (prediction_creator_evaluation.kfml): application/octet-stream, 14 KiB
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sung Hee Park | 1 Oct 2009 16:57
Picon

Is anyone have experienced with loading a large arff file into WEKA ?

Hello All,
 
I  tried to load a large size of a arff file which is over more than 500MB. 
So, I extent java heap size to 1000MB for running java virtual machine by the following command in a batch file for WEKA run.
Java -Xmx1000m -jar weka.jar.
 
 But, I had out of memory message saying that weka needs more larger heap size or a smaller data file. 
Several times, I extent java heap size up to 2GB and still got the same error message. 
 
Please let me know what maximum size of a file weka can afford to load. How I can load a large volume of a arff file  into WEKA ?
 
Thanks in advance
 
Sung Hee

 
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Nissim Matatov | 1 Oct 2009 17:04
Picon

Re: Apriori Algorith

It seems you need to use java for it.
 
First , arrange a csv file like this :
 
A1 A2 A3 A4
P P P ?
P P P ?
P ? ? ?
P ? ? P
? P ? P
 
? mean absence . P - present . Don't use 1 . Weka treats it as numerical feature.
 
Apriori outputs Larg Itemsets . You can see them in Explorer . Each itemset come with counter . For example , for itemset of size 2 the output is :
 
A1=P A2=P               2
A1=P A3=P               2
A1=P A4=P               1
A2=P A3=P               2
A2=P A4=P               1
 
If you are interested top 3 frequent you can to order by and choose the appropriate itemsets.
 
The question is how to get Large Itemsets . First you can use public function toString() and parse the result it seems like in Explorer . Second you can add public function that will return
the private variable m_Ls and do the same procedure of top N requent itemsets.
 
I run the example with default values . IT seems they a good for your case.
 
Let me know if you consider any problem
 
Nissim

On Mon, Sep 28, 2009 at 9:01 PM, Pratikshya Kuinkel <prati.ekshya <at> gmail.com> wrote:
Dear all,

I just need to implement frequent set mining algorithm for my research. How would I use weka's associator for doing this? I just need to display the frequent term sets upto three levels with their corresponding frequencies. Also in each csv files, the size of item in the lines is not same. How can I address this missing value in Weka? Any suggestion would be of great help.

Regards,
Pratikshya

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Yong Bakos | 1 Oct 2009 17:06
Picon
Favicon

Weka 'online' / ceo


Hi,
I was wondering if any of you know what happened to the 'Weka Online'  
initiative and would be willing to enlighten me.

The ceodelegates.com domain has expired, and while I did some digging  
in the message archives, and a student of mine discovered an old link (http://74.54.140.114/~mysensev/ 
), I could not find any recent relevant information.

Thanks for your time.

Yong Bakos
Colorado, USA

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Thomas Debray | 2 Oct 2009 00:20
Picon
Favicon

Re: Is anyone have experienced with loading a large arff file into WEKA ?

Hello,

Normally, the memory of your computer limits the amount of data Weka can handle. Depending on what technique you use, files of less than 10MB can already cause memory overflows... Moreover, Weka loads the whole dataset into the memory by creating an object for each instance, attribute, etc... (which even shouldnt be necessary for techniques such as Naive Bayes).
Hence, I suggest you perform downsampling on your dataset before applying any technique or even loading it into weka  (eg bootstrapping or random subsampling)
Best regards

Thomas Debray

2009/10/1 Sung Hee Park <sungheep <at> gmail.com>
Hello All,
 
I  tried to load a large size of a arff file which is over more than 500MB. 
So, I extent java heap size to 1000MB for running java virtual machine by the following command in a batch file for WEKA run.
Java -Xmx1000m -jar weka.jar.
 
 But, I had out of memory message saying that weka needs more larger heap size or a smaller data file. 
Several times, I extent java heap size up to 2GB and still got the same error message. 
 
Please let me know what maximum size of a file weka can afford to load. How I can load a large volume of a arff file  into WEKA ?
 
Thanks in advance
 
Sung Hee

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Thomas Debray | Julius Center | Stratenum 6.131 | University Medical Center Utrecht  | P.O.Box 85500  | 3508 GA Utrecht | The Netherlands | www.juliuscenter.nl | www.thomasdebray.be | www.netstorm.be
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Rafał Wardas | 2 Oct 2009 03:04
Picon

Classifier C SVM fails to classify test set.

Hi,

Predicted parameter in learning set is discretized. I use learning set against this parameter to train C-SVM, I get

=== Confusion Matrix ===
   a   b   c   <-- classified as
 222   9   4 |   a = '(-inf-1151]'
   1  67   5 |   b = '(1151-2302]'
   0   6   6 |   c = '(2302-inf)'

which is what I was expectiong. ( 91% )

When I launch second run on test data ( where test data is part of  learning set with last one parameter as  unknown  / '?' /  )  against the same class parameter
i get :

Total Number of Instances                0    
Ignored Class Unknown Instances                 12    

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0         0          0         0         0          ?        '(-inf-1151]'
                 0         0          0         0         0          ?        '(1151-2302]'
                 0         0          0         0         0          ?        '(2302-inf)'
Weighted Avg.  NaN       NaN        NaN       NaN       NaN        NaN   

=== Confusion Matrix ===

 a b c   <-- classified as
 0 0 0 | a = '(-inf-1151]'
 0 0 0 | b = '(1151-2302]'
 0 0 0 | c = '(2302-inf)'

It's happening only with that parameter. Anyone can tell me why all params are not classified ?  I can't find straight answer in mailing list on that question

Thanks
R

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane