Paul | 1 Jun 2010 02:14
Picon
Gravatar

Re: NumericToNominal on Command Line



On Tue, Jun 1, 2010 at 8:43 AM, juli jaku <julietajaku <at> gmail.com> wrote:
Hello,

I am running weka from the regular command prompt. Icannot find the class description for NumericToNominal, the unsupervised attribute filter available in the explorer gui. please help?ominalto
thanks!

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Vatsala Dorairajan | 1 Jun 2010 03:33
Picon

sample CSV file to go with the CSV2Arff class

Hi
Does anyone have a sample of a typical CSV file that can be parsed by 
the CSV2Arff class?

Vatsala

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

miffy gal | 1 Jun 2010 04:44

RE: Distances of each instances to the center of clusters?


 
From: miffygal <at> live.com
To: wekalist <at> list.scms.waikato.ac.nz
Subject: RE: [Wekalist] Distances of each instances to the center of clusters?
Date: Mon, 31 May 2010 16:48:42 -0400

.ExternalClass .ecxhmmessage P {padding:0px;} .ExternalClass body.ecxhmmessage {font-size:10pt;font-family:Verdana;}
 

From: miffygal <at> live.com
To: wekalist <at> list.scms.waikato.ac.nz
Subject: RE: [Wekalist] Distances of each instances to the center of clusters?
Date: Mon, 31 May 2010 15:32:27 -0400

.ExternalClass .ecxhmmessage P {padding:0px;} .ExternalClass body.ecxhmmessage {font-size:10pt;font-family:Verdana;}
 
> Date: Sun, 30 May 2010 16:23:40 +1200
> From: mhall <at> pentaho.com
> To: wekalist <at> list.scms.waikato.ac.nz
> Subject: Re: [Wekalist] Distances of each instances to the center of clusters?
>
> On 29/05/10 5:17 PM, miffy gal wrote:
> >
> >
> > > Date: Sat, 29 May 2010 07:56:00 +1200
> > > From: mhall <at> pentaho.com
> > > To: wekalist <at> list.scms.waikato.ac.nz
> > > Subject: Re: [Wekalist] Distances of each instances to the center of
> > clusters?
> > >
> > > miffy gal wrote:
> > > > Hi
> > > >
> > > > I am using Weka, the Explorer. I know that we can get the cluster
> > > > assignment for each instances or observations by visualizing the
> > results
> > > > and save it it to a file. How about the distance? Can we get the
> > > > distances measured for each observations to the center of the clusters
> > > > (or the similarity measure for each observations to the rest of the
> > > > cluster member) by using the Explorer? If yes, how can I do it?
> > >
> > > For clusterers that produce density estimates (such as EM) the
> > > ClusterMembership filter appends a probability distribution over the
> > > clusters for each instance. These probabilities can be thought of as an
> > > indication of how close an instance is to each cluster center. Note that
> > > clusterers that don't produce density estimates can be wrapped in the
> > > MakeDensityBasedClusterer. This meta clusterer fits Gaussian
> > > distributions to the data in each cluster.
> > >
> > > Cheers,
> > > Mark.
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> > > List info and subscription status:
> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette:
> > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > Dear Mark,
> >
> > I am using EM clustering. I want to find an indication of how close the
> > instance is to the center of the clusters. I want to find outliers of
> > each clusters. For me the definition of outliers are those instances
> > which are so much different from other member in the same clusters. So I
> > think that by having the distance measurements would help me identify if
> > a specific instance is far/very much different from other memebers in
> > the same clusters. Am I correct to think of it this way?
>
> The probabilities are an indication of how close an instance is to the center of
> a cluster for EM. EM uses normal distributions for numeric attributes an a
> discrete estimator (based on Laplace corrected frequencies) for discrete
> attributes. The closer an instance is to the "middle" of a cluster (as defined
> by the means/modes) the higher the density/probability will be.
>
> >
> > I don't see ClusterMembership filter from the version that I use. Can
> > you tell me which version of weka I should be using? I am not familiar
> > with using weka in multiple steps. Do I have to run the cluster analysis
> > (such as using EM clustering) and save the results and then use filter?
>
> You can find ClusterMembership in weka/filters/unsupervised/attribute in both
> Weka 3.6.x and 3.7.x.
>
> Cheers,
> Mark.
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Mark,
 
I tried to use ClusterMembership filter. However, it wouldnot let me select the option. The two cluster filter options can not be selected. What could be possible reasons?
 
Some dataset have mixed values, some with only numeric values.
 
Thank you so much.
miffy
 
 
>>>

Mark,
 
I figured out how to solve it now.  I change the visualized box to "NO Class" and it works. Thank you so much for the help :)
 
Best,
Miffy

The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. Get busy.
 
Mark,
 
I got the probablility. Now I got more questions. Would I be able to use the probably that I got to evaluate if an instance is closer to the center of a specific cluster more than ther other instances?
 
For example, if I have the probabilities like these...
<at> attribute pCluster_0_0 numeric
<at> attribute pCluster_0_1 numeric
<at> attribute pCluster_0_2 numeric
<at> attribute pCluster_0_3 numeric
<at> attribute pCluster_0_4 numeric
<at> attribute pCluster_0_5 numeric
<at> data
0,0.000397,0,0,0.999603,0
0,0.002407,0,0.017637,0.979957,0

 
The two instances would be closer to cluster4 than other clusters. Would I be able to say that the first instance is closer to the center of cluster 4 than the second instance? If not, what procedure will I have to use in order to find those probablities (so that I can compare the probablility across the instances)?
 
Thank you so much.
 
Miffy
 

Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Vatsala Dorairajan | 1 Jun 2010 05:03
Picon

What classifiers are good for Textual mining

Hi

A lot of the common examples are explained using the J48 classifier or 
the M5P classifier. Are these good for textual classification? Also I 
saw that these classes dont seem to implement the UpdateableClassifier 
interface, that means these can be used only for one shot training, isnt it?
I want to be able to classify tokenized text and am looking for a 
classifier which can be trained on an incremental basis. Can anyone 
suggest which classifier is appropriate for such a scenario?

Vatsala

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Bill | 1 Jun 2010 06:27
Picon

Re: convert nominal to numeric?

Hi Mark and Harri,

Thanks for the tip! Will try it.

Bill

On Tue, Jun 1, 2010 at 6:35 AM, Mark Hall <mhall <at> pentaho.com> wrote:
On 31/05/10 9:38 PM, Bill wrote:
Hi All,

I have an attribute which has numeric and nominal values.
It represents a rating system on a scale from -3 to +3 but there are
also some nominal values like 'j' and 'ne'.
I merged the j and ne into a single value I called Objective and then I
removed all records that had Objective values so that I am left with
numbers between -3 and +3 but I cannot figure out how to get Weka to
understand that I want these to be numbers.

There isn't a filter to do this at present. What you can do is save your data out to an ARFF file and then manually edit the header (i.e. replace the declaration of the nominal values with the keyword "numeric").

Cheers,
Mark.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

#avg_ls_inline_popup { position:absolute; z-index:9999; padding: 0px 0px; margin-left: 0px; margin-top: 0px; width: 240px; overflow: hidden; word-wrap: break-word; color: black; font-size: 10px; text-align: left; line-height: 13px;}
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Nicolas Martin | 1 Jun 2010 10:04
Picon

Retrieve top 10 numeric attribute after clustering

Dear Weka users,

I performed SimpleKMeans clustering and I would like to know how to retrieve the top 10 attributes that match while building each cluster.
My attributes are numerics.

Best Regards,

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Harri Saarikoski | 1 Jun 2010 12:12
Picon

Re: What classifiers are good for Textual mining



2010/6/1 Vatsala Dorairajan <vatsala.dorairajan <at> gmail.com>
Hi

A lot of the common examples are explained using the J48 classifier or the M5P classifier. Are these good for textual classification?

Text classification datasets typically have a lot of (binary) attributes (words, n-grams etc.) pointing to a handful of classes (domains).

Decision trees don't handle huge number of attributes well. Then again, if you do attribute selection as a pre-stage, i.e. reduce the number of dimensions, then a tree model (consisting of associations between attributes) can be better than any other. Likelihood in text classification tasks is that even if you have a million attributes, that only some dozen or hundred of them suffice for an optimal model (and that to remove the noise of redundant and duplicate attributes, attribute selection makes a lot of sense).

Suggest you try running several classifiers (SMO, J48, BayesNet, Ibk) on both versions of the dataset (original and reduced)
using cross-validation, and observe which classifier does overall better. Note that odds are high that the classifier that does best with the original feature space is not the best at reduced space (which is a mistake people often tend to make).

Also I saw that these classes dont seem to implement the UpdateableClassifier interface, that means these can be used only for one shot training, isnt it?

Mark etc. would know better about updateables (is there one available for other classifiers than naive bayes ?)
Harri

I want to be able to classify tokenized text and am looking for a classifier which can be trained on an incremental basis. Can anyone suggest which classifier is appropriate for such a scenario?

Vatsala

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



--
-----------------
Harri M.T. Saarikoski
M.A, PhD graduate student
Helsinki University
Finland
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Capinha | 1 Jun 2010 15:10
Picon
Favicon

Root mean square for evaluating binary response


Recently I've been asked about the validity of using Root mean square for
evaluating probabilities (response from 0 to 1).
Despite I've found some previous posts regarding similar subjects I couldn't
find a clear answer for this.
Can someone provide further insight?

Thanks in advance,

César

--

-- 
View this message in context: http://old.nabble.com/Root-mean-square-for-evaluating-binary-response-tp28741545p28741545.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
guillermina bautista | 1 Jun 2010 15:24
Picon

separed training and set

Hi, I use the API of weka for build my classifier,  and I need know if is possible that I can manipulate the instances that are ocupated in the training in cross validation??

I use 
 Evaluation ev = new Evaluation(datosOriginals);

J48 tree= new J48(); // my base classifier 

In this part
 ev.crossValidateModel(tree,datosOriginales,10,new Random(1));

I want use the instances of training in each validation.

It is possible???
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
myriam abramson | 1 Jun 2010 16:07
Picon

Convolutional Neural Net

Hi!

I was wondering if somebody tried to implement convolutional neural nets with Weka and how did you do it?
My guess would be to just use the MultiLayerPerceptron class and encode the convolution at each layer. Any hints?


cheers,

melipone

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane