Peter Reutemann | 1 Nov 2009 21:53
Picon

Re: converting a dataset

Please no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html).

> Could you please tell me the difference between .data, .arff, .names and
> .dat? can I use any of them in java classifier program?

They're just different data formats (just open them in a text editor
and you'll see the difference). And you need specific converters for
loading/saving them. ARFF is Weka's own (and preferred) data format.

> How can I partition dataset.arff into two files one for training and other
> one for testing?

I've added a new FAQ: How do I divide a dataset into training and test set?
Link to the FAQs available from the Weka homepage.

NB: The Explorer allows you to split a the currently loaded dataset
into a training and test set on-the-fly.

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

(Continue reading)

Peter Reutemann | 1 Nov 2009 21:56
Picon

Re: Conditional probabilities java code after constructing the bayesian network

> Can anyone help me to get the conditional probabilities (java code) for
> nodes after constructing the bayesian network using BayesNet class?

Did you investigate in Weka's source code, how the member variable
m_Distributions that I mentioned in my previous post below is used?
https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/2009-October/019817.html

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Peter Reutemann | 1 Nov 2009 22:27
Picon

Re: Advice about n-grams and comparing vectors

> I have looked up the FAQ and have used the following:
>
> 0) Made 2 Directories mydata and testing. In mydata there are 2 directories,
> active and lazy, with text files in each. In testing, there are two
> directories, active and lazy which have one file each. Both the files in
> testing/active and testing/lazy are also present in the training (mydata)
> directories.
>
> 1)java -classpath ./weka.jar weka.core.converters.TextDirectoryLoader -dir
> ./testing/ > testing1.arff
>
> 2)java -classpath ./weka.jar weka.core.converters.TextDirectoryLoader -dir
> ./mydata/ > training1.arff
>
> 3)java -classpath ./weka.jar
> weka.filters.unsupervised.attribute.StringToWordVector -tokenizer
> "weka.core.tokenizers.NGramTokenizer -min 2 -max 3" -b -i training1.arff -o
> training2.arff -r testing1.arff -s testing2.arff
>
> Now I have a training and testing set in same format
>
> I used the GUI to selct a classifer and build a model and saved it.
>
> 4)java -classpath ./weka.jar weka.classifiers.lazy.LWL -l
> lazy.LWL_oct_30_2009.model -T testing2.arff -p 0

Why do you use the GUI for generating the model? And why don't use
just build the classifier on the fly?
-t: training set
-T: test set
(Continue reading)

Peter Reutemann | 1 Nov 2009 22:30
Picon

Re: Developing WEKA time-series plug-in

> I would like to develop a WEKA time-series plug-in, as a Master Project for
> the study Artificial Intelligence.
> This plugin would add some extra preprocessing filters and an extra "test
> option" in the Classify screen.
> And maybe also some extra visualisation options.
>
> For example:
> filter:   windowing, moving average, trend, fast Fourier transform.
> "test option":   sliding window validation
>
> Has there been other development by people who are not on the WEKA team?
> Are there pre-set guidelines on how to do this?

The easiest for you would be to implement a new panel in the Explorer.
Check out the wiki article "Adding tabs in the Explorer" for more
information:
  http://weka.wikispaces.com/Adding+tabs+in+the+Explorer

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

(Continue reading)

Peter Reutemann | 1 Nov 2009 22:31
Picon

Re: changing attribute selection algorithm

Please no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html).

> this means that by choosing j48 in classifier tab, selecting an
> attributeSelection like RandomOne  in preprocess tab would be useless?

Different tabs, different functionality. They're independent from each other.

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Peter Reutemann | 1 Nov 2009 22:35
Picon

Re: changing attribute selection algorithm

>> i've found this line,
>>  minResult = currentModel[i].gainRatio(); at C45ModelSelection  line 162
>> i think this is the line that should be modified to gain new algorithm for
>> spiliting a node, e.g infoGain() instead of gainRatio(). am i right?
>> but strangely when i use something like this getRandomDouble() instead of
>> gainRatio(), and i repeat the test, i get same result for each try!!
>> although i think it should be different each time!
>>
>> any idea please?

Did you check with a debugger whether the modified code was actually
used and a different attribute chosen?

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

shiloon | 2 Nov 2009 08:05
Picon

Re: changing attribute selection algorithm


it wasn't really top-posting. i just edited my previous post!
i checked the tree size, it changed with subsituing infoGain() ! but for
getRandom(), weired! i don't get random result!

Peter Reutemann-3 wrote:
> 
>>> i've found this line,
>>>  minResult = currentModel[i].gainRatio(); at C45ModelSelection  line 162
>>> i think this is the line that should be modified to gain new algorithm
>>> for
>>> spiliting a node, e.g infoGain() instead of gainRatio(). am i right?
>>> but strangely when i use something like this getRandomDouble() instead
>>> of
>>> gainRatio(), and i repeat the test, i get same result for each try!!
>>> although i think it should be different each time!
>>>
>>> any idea please?
> 
> Did you check with a debugger whether the modified code was actually
> used and a different attribute chosen?
> 
> Cheers, Peter
> -- 
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
(Continue reading)

john | 2 Nov 2009 08:46
Picon
Favicon

Re: Advice about n-grams and comparing vectors



From: Peter Reutemann <fracpete <at> gmail.com>
To: Weka machine learning workbench list. <wekalist <at> list.scms.waikato.ac.nz>
Sent: Sun, November 1, 2009 1:27:19 PM
Subject: Re: [Wekalist] Advice about n-grams and comparing v ectors

> I have looked up the FAQ and have used the following:
>
> 0) Made 2 Directories mydata and testing. In mydata there are 2 directories,
> active and lazy, with text files in each. In testing, there are two
> directories, active and lazy which have one file each. Both the files in
> testing/active and testing/lazy are also present in the training (mydata)
> directories.
>
> 1)java -classpath ./weka.jar weka.core.converters.TextDirectoryLoader -dir
> ./testing/ > testing1.arff
>
> 2)java -classpath ./weka.jar weka.core.converters.TextDirectoryLoader -dir
> ./mydata/ > training1.arff
>
> 3)java -classpath ./weka.jar
> weka.filters.unsupervised.attribute.StringToWordVector -tokenizer
> "we ka.core.tokenizers.NGramTokenizer -min 2 -max 3" -b -i training1.arff -o
> training2.arff -r testing1.arff -s testing2.arff
>
> Now I have a training and testing set in same format
>
> I used the GUI to selct a classifer and build a model and saved it.
>
> 4)java -classpath ./weka.jar weka.classifiers.lazy.LWL -l
> lazy.LWL_oct_30_2009.model -T testing2.arff -p 0

Why do you use the GUI for generating the model? And why don't use
just build the classifier on the fly?
-t: training set
-T: test set

If you really want to save the model and re-use it later, then use the
-d option to do this.

> The output is as follows:
> === Predictions on test data ===
>
>  inst#     actual  predicted      error
>      1      0       & nbsp;  0          0
>      2      0          0          0
>
> did something go wrong?
>
> The model has
>
> Relative absolute error                 19.021  %
> Root relative squared error             59.9506 %
> Attributes:   1018
> Test mode:    16-fold cross-validation
> Instances:    18

Since you loaded the data in the Explorer, you should have noticed
that the StringToWordVector puts the class attribute in first
position. Unless you specify otherwise, the last attribute is use d as
class attribute (Explorer and commandline).

BTW 16-fold CV is rather odd...

> Also, consider the case where I have a file and want to test if it is lazy
> or active (classification), will I have to re-do the whole procedure again
> Steps 1 through 4 !! It takes quite a few minutes on my puny laptop to
> crunch out the tokenized result.

There's nothing you can do about that: machine learning and data
mining are computational expensive and memory intensive process. But
instead of performing those steps manual, automate them by writing a
script, for instance (batch, bash, groovy, jython, perl, ...)

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/          Ph. +64 (7) 858-5174

______________ _________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Thanks Peter, much obliged :-)

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Paul Adriani | 2 Nov 2009 14:00
Picon

stringtowordvector: sequential execution of n-gram and wordlist option

Dear sir,

 

Weka provides the stringtoword vector filter. When applied to a string using both n-grams and a stopwordlist the n-gram is applied before the stopwordlist. However, since this results in many useless combinations in my text categorization task I would like to know whether the stopwordlist could be applied first followed by the n-gram technology.

 

Regards,

 

Paul

 

 

 

Paul Adriani Ba. sc. en ssc.

Jan van Riebeekstraat 14-3

1057ZX Amsterdam

tel 0644141917

 

Infocaster BV

paul <at> infocaster.net

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Thomas Debray | 2 Nov 2009 14:10
Picon
Favicon

Re: changing attribute selection algorithm



2009/11/2 shiloon <kalhor <at> gmail.com>

it wasn't really top-posting. i just edited my previous post!
i checked the tree size, it changed with subsituing infoGain() ! but for
getRandom(), weired! i don't get random result!

If I remember correct, Weka uses a random seed. With other words, if you dont alter the random seed each time you restart the main process, the random values you get will follow the same sequence.
 
 
Peter Reutemann-3 wrote:
>
>>> i've found this line,
>>>  minResult = currentModel[i].gainRatio(); at C45ModelSelection  line 162
>>> i think this is the line that should be modified to gain new algorithm
>>> for
>>> spiliting a node, e.g infoGain() instead of gainRatio(). am i right?
>>> but strangely when i use something like this getRandomDouble() instead
>>> of
>>> gainRatio(), and i repeat the test, i get same result for each try!!
>>> although i think it should be different each time!
>>>
>>> any idea please?
>
> Did you check with a debugger whether the modified code was actually
> used and a different attribute chosen?
>
> Cheers, Peter
> --
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>

--
View this message in context: http://old.nabble.com/changing-attribute-selection-algorithm-tp26108569p26156042.html
Sent from the WEKA mailing list archive at Nabble.com.



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Thomas Debray | Theoretical Epidemiology | Julius Center | Stratenum 6.131 | University Medical Center Utrecht  | P.O.Box 85500  | 3508 GA Utrecht | The Netherlands | www.juliuscenter.nl | www.thomasdebray.be | www.netstorm.be
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane