From: miffygal <at> live.com
To: wekalist <at> list.scms.waikato.ac.nz
Subject: RE: [Wekalist] Distances of each instances to the center of clusters?
Date: Mon, 31 May 2010 16:48:42 -0400
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Verdana;}
From: miffygal <at> live.com
To: wekalist <at> list.scms.waikato.ac.nz
Subject: RE: [Wekalist] Distances of each instances to the center of clusters?
Date: Mon, 31 May 2010 15:32:27 -0400
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Verdana;}
> Date: Sun, 30 May 2010 16:23:40 +1200
> From: mhall <at> pentaho.com
> To: wekalist <at> list.scms.waikato.ac.nz
> Subject: Re: [Wekalist] Distances of each instances to the center of clusters?
>
> On 29/05/10 5:17 PM, miffy gal wrote:
> >
> >
> > > Date: Sat, 29 May 2010 07:56:00 +1200
> > > From: mhall <at> pentaho.com
> > > To: wekalist <at> list.scms.waikato.ac.nz
> > > Subject: Re: [Wekalist] Distances of each instances to the center of
> > clusters?
> > >
> > > miffy gal wrote:
> > > > Hi
> > > >
> > > > I am using Weka, the Explorer. I know that we can get the cluster
> > > > assignment for each instances or observations by visualizing the
> > results
> > > > and save it it to a file. How about the distance? Can we get the
> > > > distances measured for each observations to the center of the clusters
> > > > (or the similarity measure for each observations to the rest of the
> > > > cluster member) by using the Explorer? If yes, how can I do it?
> > >
> > > For clusterers that produce density estimates (such as EM) the
> > > ClusterMembership filter appends a probability distribution over the
> > > clusters for each instance. These probabilities can be thought of as an
> > > indication of how close an instance is to each cluster center. Note that
> > > clusterers that don't produce density estimates can be wrapped in the
> > > MakeDensityBasedClusterer. This meta clusterer fits Gaussian
> > > distributions to the data in each cluster.
> > >
> > > Cheers,
> > > Mark.
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> > > List info and subscription status:
> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette:
> > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > Dear Mark,
> >
> > I am using EM clustering. I want to find an indication of how close the
> > instance is to the center of the clusters. I want to find outliers of
> > each clusters. For me the definition of outliers are those instances
> > which are so much different from other member in the same clusters. So I
> > think that by having the distance measurements would help me identify if
> > a specific instance is far/very much different from other memebers in
> > the same clusters. Am I correct to think of it this way?
>
> The probabilities are an indication of how close an instance is to the center of
> a cluster for EM. EM uses normal distributions for numeric attributes an a
> discrete estimator (based on Laplace corrected frequencies) for discrete
> attributes. The closer an instance is to the "middle" of a cluster (as defined
> by the means/modes) the higher the density/probability will be.
>
> >
> > I don't see ClusterMembership filter from the version that I use. Can
> > you tell me which version of weka I should be using? I am not familiar
> > with using weka in multiple steps. Do I have to run the cluster analysis
> > (such as using EM clustering) and save the results and then use filter?
>
> You can find ClusterMembership in weka/filters/unsupervised/attribute in both
> Weka 3.6.x and 3.7.x.
>
> Cheers,
> Mark.
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.htmlMark,
I tried to use ClusterMembership filter. However, it wouldnot let me select the option. The two cluster filter options can not be selected. What could be possible reasons?
Some dataset have mixed values, some with only numeric values.
Thank you so much.
miffy
>>>
Mark,
I figured out how to solve it now. I change the visualized box to "NO Class" and it works. Thank you so much for the help :)
Best,
Miffy
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
Get busy.
Mark,
I got the probablility. Now I got more questions. Would I be able to use the probably that I got to evaluate if an instance is closer to the center of a specific cluster more than ther other instances?
For example, if I have the probabilities like these...
| <at> attribute pCluster_0_0 numeric |
| <at> attribute pCluster_0_1 numeric |
| <at> attribute pCluster_0_2 numeric |
| <at> attribute pCluster_0_3 numeric |
| <at> attribute pCluster_0_4 numeric |
| <at> attribute pCluster_0_5 numeric |
|
| <at> data |
| 0,0.000397,0,0,0.999603,0 |
| 0,0.002407,0,0.017637,0.979957,0 |
The two instances would be closer to cluster4 than other clusters. Would I be able to say that the first instance is closer to the center of cluster 4 than the second instance? If not, what procedure will I have to use in order to find those probablities (so that I can compare the probablility across the instances)?
Thank you so much.
Miffy
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
Learn more.