divadpoc | 1 May 2012 01:52
Picon

how to get results of unsupervised clustering

Hy,

currently i'm trying to cluster messages. therefore I generate an
arff-file with the following header

 <at> relation myrel
 <at> attribute id string
 <at> attribute features string
 <at> data

the id can also be of type numeric, but it would be easier to leave it
as a string.

My goal would be to cluster the features and in the end I want to build
a kind of a tree structure, like

cluster1
- id1
- id2
- id5
cluster2
- id3
- id4

preferably with some label instead of "clusterX".

atm I don't have that much of an idea how to achieve it with Weka and
Java. I'm just able to cluster according to the wiki page
http://weka.wikispaces.com/Use+Weka+in+your+Java+code#Clustering -
although, the program hasn't stopped yet (almost 30 minutes) so there
(Continue reading)

Mark Hall | 1 May 2012 12:32
Favicon

Re: same data? different prediction results?

On 28/04/12 12:40 PM, Alexander Brooks wrote:
> Hello. I am testing a J48 classifier. I tried it reading one of the
> local data sets, weather.nominal.arff and then uploading that data set
> to a database and testing it. I am getting different results for the
> predictions. I cannot figure out why this would be.
> Here are the steps that I did to upload the training data.
> 1. Save the arff as a csv file.
> 2. Upload the csv file -all fields became varchar
>
> For the test file I created a new file with the first 4 rows of the
> weather.nominal.arff file. I used the local version of this regardless
> of whether I was using the local or remote data set as the training
> set.
>
> Can anyone explain what could be going on?

The local and database-resident versions of the training data probably 
do not have identical header structure. The number, order and types of 
the attributes will be the same, but more than likely the order of the 
values of the nominal attributes are different. InstanceQuery just 
collects up values for discrete attributes as data is read from the 
database. You could check this by printing the two sets of instances out 
to standard out (or a file) and comparing the headers. If you are using 
Weka 3.7 you can wrap your classifier in 
weka.classifiers.misc.InputMappedClassifer. This wrapper classifier 
builds a mapping between the structure of data used to train the model 
and that of the test instances. It can handle the attributes in 
different orders and different numbers and orders of the values of 
nominal attributes.

(Continue reading)

Mr. Debojit Boro | 1 May 2012 08:05
Picon
Favicon

Re: How to stop weka classify new or untrained classestoknown classes?

Hi,

Thanks for the reply. I exactly don't want to classify when Weka sees in
any instanced of untrained class. But if possible could it just not
classify it? Is there any way to stop this? What i think is probably i may
have to modify the code.

Thanks

Deb

> Hi ,Deb:
> I don't how Weka will cope with this problem, but here is what I thougt
> about this problem.
> C4.5 is a supervised learning algorithm, so I don't think it can recognize
> unknown class labels.
> If you really want to classify samples to unknown class, I suggest you to
> do some clustering work.
>
> On Mon, Apr 30, 2012 at 1:47 PM, Mr. Debojit Boro
> <deb0001 <at> tezu.ernet.in>wrote:
>
>> Hi All,
>>
>> Can anyone please help me on this?
>>
>> Deb
>>
>> > Hi All,
>> >
(Continue reading)

Mark Hall | 1 May 2012 12:33
Favicon

Re: how can I get professional help with Weka?

On 28/04/12 1:50 PM, Paul Edam wrote:
> Hello. I am having various problems using Weka in my Java code. Is there
> a way to get professional paid help with Weka?

Pentaho offers a low cost developer support offering for Weka. Or you 
could just ask here on the list if anyone is interested in consulting.

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Sameendra Samarawickrama | 1 May 2012 13:55

Unary class classification with Support Vector Machines

Hi,


I want to do a one class classification with SMO in Weka. That is, my training directory has only a one folder which contains the files that I want to train the SMO with (trainDir->class1->trainingfiles). When I try this one, it gives me the following error:

weka.classifiers.functions.SMO: Cannot handle unary class!

What I want to do is, train the classifier with only a single class and then when at the testing phase when I fed a document to the classifier, if it is in that class, say it or if can not be classified, gives something like 'unknown'.


Thanks.
Sameendra




_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Bill | 1 May 2012 14:11
Picon

anyone able to do some consulting?

Hi. I have a pretty simple problem using Java and Weka. I am willing to pay a good price for someone to solve it. If you can help I would greatly appreciate it. Please contact me.

Thanks!
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Bill | 1 May 2012 15:29
Picon

Re: anyone able to do some consulting?

Hello. I have a pretty simple problem using Java and Weka and remote database tables. If anyone would spare some time for consulting I am willing to pay a good rate. Please contact me directly.
Thanks.

On Tue, May 1, 2012 at 9:11 PM, Bill <william108 <at> gmail.com> wrote:
Hi. I have a pretty simple problem using Java and Weka. I am willing to pay a good price for someone to solve it. If you can help I would greatly appreciate it. Please contact me.

Thanks!



--
Bill
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sameendra Samarawickrama | 1 May 2012 18:07

StringToWordVector() unsupervised?

Hi,

In order to convert string features (text classification) to numeric features we apply the StringToWordVector() filter. But it is located in weka.filters.unsupervised class hierarchy. But even in supervised learning we use this filter right? So what is the reason it is being placed there?

Thanks.
Sameendra

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JiangHongTiao | 1 May 2012 18:12
Picon

Re: StringToWordVector() unsupervised?

Yes, we use it also with unsupervised learning, there is no problem. It's only 'preprocessing' of data.
I think it's there because you don't need to supervise this modification. You don't need to control how to convert Strings to numeric values.
Best regards JJ „Všetko smiem, ale nie všetko osoží.“ „Všetko smiem, ale ja sa ničím nedám zotročiť.“ (1 Kor 6, 12) ✝
On 05/01/2012 06:07 PM, Sameendra Samarawickrama wrote:
Hi,

In order to convert string features (text classification) to numeric features we apply the StringToWordVector() filter. But it is located in weka.filters.unsupervised class hierarchy. But even in supervised learning we use this filter right? So what is the reason it is being placed there?

Thanks.
Sameendra

_______________________________________________ Wekalist mailing list Send posts to: Wekalist <at> list.scms.waikato.ac.nz List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Bozhao Tan | 1 May 2012 18:19
Picon

Re: Wekalist Digest, Vol 111, Issue 2

One question for the Support Vector Machine (SMO) in Weka:
In several projects, why the performance is always worse than other classifiers, like neural network or J48?
Do you guys see these in your applications?

Thanks!
henry

On Tue, May 1, 2012 at 9:14 AM, <wekalist-request <at> list.scms.waikato.ac.nz> wrote:
Send Wekalist mailing list submissions to
       wekalist <at> list.scms.waikato.ac.nz

To subscribe or unsubscribe via the World Wide Web, visit
       https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
or, via email, send a message with subject or body 'help' to
       wekalist-request <at> list.scms.waikato.ac.nz

You can reach the person managing the list at
       wekalist-owner <at> list.scms.waikato.ac.nz

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

  1. Re: how can I get professional help with Weka? (Mark Hall)
  2. Re: How to stop weka classify new or untrained
     classestoknown classes? (Mr. Debojit Boro)
  3. Unary class classification with Support Vector Machines
     (Sameendra Samarawickrama)
  4. anyone able to do some consulting? (Bill)
  5. Re: anyone able to do some consulting? (Bill)
  6. StringToWordVector() unsupervised? (Sameendra Samarawickrama)
  7. Re: StringToWordVector() unsupervised? (JiangHongTiao)


----------------------------------------------------------------------

Message: 1
Date: Tue, 1 May 2012 22:33:26 +1200
From: Mark Hall <mhall <at> pentaho.com>
Subject: Re: [Wekalist] how can I get professional help with Weka?
To: Weka machine learning workbench list.
       <wekalist <at> list.scms.waikato.ac.nz>
Message-ID: <4F9FBBF6.1000801 <at> pentaho.com>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed

On 28/04/12 1:50 PM, Paul Edam wrote:
> Hello. I am having various problems using Weka in my Java code. Is there
> a way to get professional paid help with Weka?

Pentaho offers a low cost developer support offering for Weka. Or you
could just ask here on the list if anyone is interested in consulting.

Cheers,
Mark.



------------------------------

Message: 2
Date: Tue, 1 May 2012 11:35:13 +0530 (IST)
From: "Mr. Debojit Boro" <deb0001 <at> tezu.ernet.in>
Subject: Re: [Wekalist] How to stop weka classify new or untrained
       classestoknown classes?
To: "Weka machine learning workbench list."
       <wekalist <at> list.scms.waikato.ac.nz>
Cc: "Weka machine learning workbench list."
       <wekalist <at> list.scms.waikato.ac.nz>
Message-ID:
       <43598.202.141.75.162.1335852313.squirrel <at> agnigarh.tezu.ernet.in>
Content-Type: text/plain;charset=iso-8859-1

Hi,

Thanks for the reply. I exactly don't want to classify when Weka sees in
any instanced of untrained class. But if possible could it just not
classify it? Is there any way to stop this? What i think is probably i may
have to modify the code.

Thanks

Deb


> Hi ,Deb:
> I don't how Weka will cope with this problem, but here is what I thougt
> about this problem.
> C4.5 is a supervised learning algorithm, so I don't think it can recognize
> unknown class labels.
> If you really want to classify samples to unknown class, I suggest you to
> do some clustering work.
>
> On Mon, Apr 30, 2012 at 1:47 PM, Mr. Debojit Boro
> <deb0001 <at> tezu.ernet.in>wrote:
>
>> Hi All,
>>
>> Can anyone please help me on this?
>>
>> Deb
>>
>> > Hi All,
>> >
>> > I am new to Weka and still exploring. My problem is quite similar to
>> the
>> > problem referred from the mailing list "Detection of new Classes in
>> > Classification" (Mon Jun 13 04:59:07).
>> >
>> > I trained a C4.5 (J48) classifier with instances of three different
>> > classes. But when I test with a test set that consists of instances of
>> all
>> > the trained three classes along with the instances of untrained new
>> > classes. It mis-classifies the instances of new untrained classes to
>> one
>> > of the trained classes.
>> >
>> > Is there any way to stop this? Can Weka just not classify the instance
>> > when it sees any instance of new class instead of misclassifying it?
>> >
>> > Please help and thanks in advance.
>> >
>> > Deb
>> >
>> >
>> >
>> > ___________________
>> > D I S C L A I M E R
>> > This e-mail may contain privileged information and is intended solely
>> for
>> > the individual named. If you are not the named addressee you should
>> not
>> > disseminate, distribute or copy this e-mail. Please notify the sender
>> > immediately by e-mail if you have received this e-mail in error and
>> > destroy it from your system. Though considerable effort has been made
>> to
>> > deliver error free e-mail messages but it can not be guaranteed to be
>> > secure
>> > or error-free as information could be intercepted, corrupted, lost,
>> > destroyed,
>> > delayed, or may contain viruses. The recipient must verify the
>> integrity
>> > of
>> > this e-mail message.
>> >
>> > _______________________________________________
>> > Wekalist mailing list
>> > Send posts to: Wekalist <at> list.scms.waikato.ac.nz
>> > List info and subscription status:
>> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> > List etiquette:
>> > http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>> >
>>
>>
>>
>> ___________________
>> D I S C L A I M E R
>> This e-mail may contain privileged information and is intended solely
>> for
>> the individual named. If you are not the named addressee you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately by e-mail if you have received this e-mail in error and
>> destroy it from your system. Though considerable effort has been made to
>> deliver error free e-mail messages but it can not be guaranteed to be
>> secure
>> or error-free as information could be intercepted, corrupted, lost,
>> destroyed,
>> delayed, or may contain viruses. The recipient must verify the integrity
>> of
>> this e-mail message.
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
> --
> Qingchao Kong (孔庆超)
>
> Ph.D. Candidate
> Institute of Automation, Chinese Academy of Sciences
>
> No. 95 Zhongguancun East Road
> Haidian District, Beijing 100190 China
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>



___________________
D I S C L A I M E R
This e-mail may contain privileged information and is intended solely for
the individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail in error and
destroy it from your system. Though considerable effort has been made to
deliver error free e-mail messages but it can not be guaranteed to be secure
or error-free as information could be intercepted, corrupted, lost, destroyed,
delayed, or may contain viruses. The recipient must verify the integrity of
this e-mail message.



------------------------------

Message: 3
Date: Tue, 1 May 2012 17:25:01 +0530
From: Sameendra Samarawickrama <smsamrc <at> googlemail.com>
Subject: [Wekalist] Unary class classification with Support Vector
       Machines
To: wekalist <at> list.scms.waikato.ac.nz
Message-ID:
       <CAJwXLxhNV5s7ChnfFZR7nQg1Eb=bOuA7iUJQjx=g_ofoxk3QwQ <at> mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I want to do a one class classification with SMO in Weka. That is, my
training directory has only a one folder which contains the files that I
want to train the SMO with (trainDir->class1->trainingfiles). When I try
this one, it gives me the following error:

weka.classifiers.functions.SMO: Cannot handle unary class!

What I want to do is, train the classifier with only a single class and
then when at the testing phase when I fed a document to the classifier, if
it is in that class, say it or if can not be classified, gives something
like 'unknown'.


Thanks.
Sameendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20120501/22c7f17c/attachment-0001.html

------------------------------

Message: 4
Date: Tue, 1 May 2012 21:11:42 +0900
From: Bill <william108 <at> gmail.com>
Subject: [Wekalist] anyone able to do some consulting?
To: "Weka machine learning workbench list."
       <wekalist <at> list.scms.waikato.ac.nz>
Message-ID:
       <CAJnbHtL0dSGguj3idNrBQ9LR+f0VJP=gFZQKepkPoPoK094zxQ <at> mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi. I have a pretty simple problem using Java and Weka. I am willing to pay
a good price for someone to solve it. If you can help I would greatly
appreciate it. Please contact me.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20120501/d225b8b7/attachment-0001.html

------------------------------

Message: 5
Date: Tue, 1 May 2012 22:29:00 +0900
From: Bill <william108 <at> gmail.com>
Subject: [Wekalist] Re: anyone able to do some consulting?
To: "Weka machine learning workbench list."
       <wekalist <at> list.scms.waikato.ac.nz>
Message-ID:
       <CAJnbHtK4E-+XhspUEvLHc3E30=FwExb6-rWRdM7RvWe3saUaaA <at> mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello. I have a pretty simple problem using Java and Weka and remote
database tables. If anyone would spare some time for consulting I am
willing to pay a good rate. Please contact me directly.
Thanks.

On Tue, May 1, 2012 at 9:11 PM, Bill <william108 <at> gmail.com> wrote:

> Hi. I have a pretty simple problem using Java and Weka. I am willing to
> pay a good price for someone to solve it. If you can help I would greatly
> appreciate it. Please contact me.
>
> Thanks!
>



--
Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20120501/e3ff270b/attachment-0001.html

------------------------------

Message: 6
Date: Tue, 1 May 2012 21:37:54 +0530
From: Sameendra Samarawickrama <smsamrc <at> googlemail.com>
Subject: [Wekalist] StringToWordVector() unsupervised?
To: wekalist <at> list.scms.waikato.ac.nz
Message-ID:
       <CAJwXLxgR64-=QQXqNUyQEDSaLhkeaf656_E2V7W07TJ5wGNhbA <at> mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

In order to convert string features (text classification) to numeric
features we apply the StringToWordVector() filter. But it is located in
weka.filters.unsupervised class hierarchy. But even in supervised learning
we use this filter right? So what is the reason it is being placed there?

Thanks.
Sameendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20120501/4adf89fc/attachment-0001.html

------------------------------

Message: 7
Date: Tue, 01 May 2012 18:12:37 +0200
From: JiangHongTiao <jianghongtiao <at> gmail.com>
Subject: Re: [Wekalist] StringToWordVector() unsupervised?
To: "Weka machine learning workbench list."
       <wekalist <at> list.scms.waikato.ac.nz>
Message-ID: <4FA00B75.7090707 <at> gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Yes, we use it also with unsupervised learning, there is no problem.
It's only 'preprocessing' of data.
I think it's there because you don't need to supervise this
modification. You don't need to control how to convert Strings to
numeric values.

Best regards
  JJ

"Vs(etko smiem, ale nie vs(etko osoz(í."
"Vs(etko smiem, ale ja sa nic(ím nedám zotroc(it(."
(1 Kor 6, 12)
?


On 05/01/2012 06:07 PM, Sameendra Samarawickrama wrote:
> Hi,
>
> In order to convert string features (text classification) to numeric
> features we apply the StringToWordVector() filter. But it is located
> in weka.filters.unsupervised class hierarchy. But even in supervised
> learning we use this filter right? So what is the reason it is being
> placed there?
>
> Thanks.
> Sameendra
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20120501/5413e763/attachment.html

------------------------------

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist


End of Wekalist Digest, Vol 111, Issue 2
****************************************

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane