Константин | 20 Sep 18:04 2014
Picon

text mining on weka

Hello dear friends. Could you help me and answer on some questions about text mining on weka:

1. if i need classify russian texts, where can i get russian tokenize and russian stemming for weka?
2. In theory, if  you change parametes "wordstokeep" from 1000 to 100, you'll begin work with 100 words, but when i change this,  the filter "stringtowordvector" displays me that i have 100 words, but in fact  in this list,  all words are present, it wasn't pruned to 100. Is it bug of my weka or i did wrong action
The best Regards
Konstantin



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Ashish Dutt | 20 Sep 05:48 2014
Picon

How to perform prediction in Weka?

Dear members,
Greetings.
Consider the following scenario.
"There are 80 shoppers in a shopping mall going about with their shopping. Assume each shopper has only one record. There are two sets of thirty record each on 60 shoppers. There is no record about the remaining 20 shoppers. 
The record contain columns like shopper id, payment type and total amount." 
Now, my doubts are as follows;
1. Is it possible to use the transaction records of the 60 shoppers and then predict the payment type for the missing records of the 20 shoppers ? If yes, then please suggest, How?
2. Is it justifiable to predict for the missing records based on the data at hand of the 60 records? If yes, then how to justify it using Weka? and if No, then why?

Please, help me in understanding it.

Thank you
Ashish

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Drew Zhang | 19 Sep 18:23 2014
Picon

a simple question about cross validation

Dear list,

When we run a cross-validation experiment with any classifier (e.g.,decision tree), the output by default includes a model. Could anybody clarify where this model comes from, given that there are multiple training processes in CV?

Thanks a lot.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Ulrik Stervbo | 19 Sep 17:19 2014
Picon

Reading ARFF format

Hello list,

I have just discovered WEKA and find the arff file format so nice I am contemplating its use in another application of mine. However, my eyes start to hurt whenever I see data without a header and I would probably add more information if I use the format in a different context.

I noticed that WEKA has very strict expectations of the format so that addition of a <at> header declaration makes reading of the file break.

Is there any reason why WEKA is so strict?

Cheers,
Ulrik
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
yatindra yadav | 19 Sep 13:04 2014
Picon

Re: Wekalist Digest, Vol 139, Issue 49

Dear Friends,

We are dealing with biological dataset related to tuberculosis disease.
So we targeted some pesticide dataset found to be having some activity against tuberculosis bacteria.
So we used weka to classify then into active and inactive classes.
we got our final classifcation model with random forest using csv value23 .

Is their any way to increase the accuracy of our prediction I mean can we tune weka further for more accurate prediction if yes please suggest

note that while doing prediction we don't want to loose the biological significance of our dataset.

Thanks $ regards

Yatindra Nath Yadav

Blog Url: http://yatindradotnet.wordpress.com/

Mobile:7376664327,9369187813

OSDD CSIR TCOF3 RESEARCH FELLOW



On Fri, Sep 19, 2014 at 3:28 PM, <wekalist-request <at> list.waikato.ac.nz> wrote:
Send Wekalist mailing list submissions to
        wekalist <at> list.waikato.ac.nz

To subscribe or unsubscribe via the World Wide Web, visit
        http://list.waikato.ac.nz/mailman/listinfo/wekalist
or, via email, send a message with subject or body 'help' to
        wekalist-request <at> list.waikato.ac.nz

You can reach the person managing the list at
        wekalist-owner <at> list.waikato.ac.nz

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. Re: Hi Eibe,      I am seeking your recommendation in order to
      find suitable and clear references for my case (Eibe Frank)
   2. Re: Help with hierarchical clustering problems in Weka (jingxian)
   3. Re: Hi Eibe,      I am seeking your recommendation in order to
      find suitable     and     clear references for my case (double d s)
   4. Classification: other language (Arya)
   5. Re: Classification: other language (Jose Maria Gomez Hidalgo)


----------------------------------------------------------------------

Message: 1
Date: Fri, 19 Sep 2014 19:10:58 +1200
From: Eibe Frank <eibe <at> waikato.ac.nz>
To: "Weka machine learning workbench list."
        <wekalist <at> list.waikato.ac.nz>
Subject: Re: [Wekalist] Hi Eibe,        I am seeking your recommendation in
        order to find suitable and      clear references for my case
Message-ID: <2CD29B2A-8098-4369-9AD7-F1B7D981E914 <at> waikato.ac.nz>
Content-Type: text/plain; charset=us-ascii

It depends on your application. You might want to consider using the kappa statistic instead, which normalises by the accuracy achieved by a random classifier.

Read up on guidelines regarding values of kappa.

Cheers,
Eibe

On 19 Sep 2014, at 18:14, double d s <doubled_s2000 <at> yahoo.com> wrote:

> Hi Eibe,
>
> I wish that you are in good health. Beside, I would like to say thank you so much for your help and assistance.
>
> Dear, I am seeking your recommendation in order to find suitable and clear references for my case; I am looking for an academic resource where it has indicated the acceptable percentage/range of the "Correctly Classified Instances"  or even the " Incorrectly Classified Instances" in Weka which is necessary to evaluate the performance of the classifier. To be more precise, for instance, if the results of classifier's performance "Correctly Classified Instances = 55%), how do I say it is weak or good or acceptable or excellent result based on a "reference" from "paper" or "book"?
>
> I will be highly appreciated if you can provide me at least with one reference.
>
> Thanks.
>
> Sandler
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.waikato.ac.nz
> List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



------------------------------

Message: 2
Date: Fri, 19 Sep 2014 00:45:22 -0700 (MST)
From: jingxian <jing_xian90 <at> hotmail.com>
To: wekalist <at> list.waikato.ac.nz
Subject: Re: [Wekalist] Help with hierarchical clustering problems in
        Weka
Message-ID: <1411112722018-32248.post <at> n7.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hi Eibe,

Once again, thanks for your help. Finally I figured it out.

Regards,
JingXian



--
View this message in context: http://weka.8497.n7.nabble.com/Help-with-hierarchical-clustering-problems-in-Weka-tp32209p32248.html
Sent from the WEKA mailing list archive at Nabble.com.


------------------------------

Message: 3
Date: Fri, 19 Sep 2014 01:12:11 -0700
From: double d s <doubled_s2000 <at> yahoo.com>
To: "Weka machine learning workbench list."
        <wekalist <at> list.waikato.ac.nz>
Subject: Re: [Wekalist] Hi Eibe,        I am seeking your recommendation in
        order to find suitable  and     clear references for my case
Message-ID:
        <1411114331.25459.YahooMailAndroidMobile <at> web140802.mail.bf1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Eibe,

Thanks for your answer.

Basically, the use of kappa is already under my considerations and I have a suitable reference for it. On the other hand, I have been searching for book or paper to assist my results' interpretion in the the part of "correctly classified instences", but until now I have not find any clear reference for this part. That's why I passed my question to you dear.

Kind regards,
Sandler

Sent from Yahoo Mail on Android

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20140919/f502ca05/attachment-0001.html>

------------------------------

Message: 4
Date: Fri, 19 Sep 2014 05:20:50 -0400
From: Arya <arya.rah <at> gmail.com>
To: Weka <wekalist <at> list.waikato.ac.nz>
Subject: [Wekalist] Classification: other language
Message-ID: <8A8E1004-A4A1-406F-9606-B844C862BEF3 <at> gmail.com>
Content-Type: text/plain; charset=us-ascii

Hi

Would someone guide me please as to what the steps are if I were to get the classifier work with another language?

Am new to Weka!

Thank all,

Arya

Sent from my iPhone

------------------------------

Message: 5
Date: Fri, 19 Sep 2014 10:58:29 +0100
From: Jose Maria Gomez Hidalgo <jmgomezh <at> yahoo.es>
To: "Weka machine learning workbench list."
        <wekalist <at> list.waikato.ac.nz>
Subject: Re: [Wekalist] Classification: other language
Message-ID:
        <1411120709.21940.YahooMailNeo <at> web171204.mail.ir2.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi

In the WEKA related projects you can find info on using WEKA with R & Python: http://weka.wikispaces.com/Related+Projects

Regards

  Jose Maria

--
Jos? Mar?a G?mez Hidalgo
Email: jmgomezh <at> yahoo.es
Twitter: <at> jmgomez
LinkedIn: http://www.linkedin.com/in/jmgomezh/
Home page: http://www.esp.uem.es/jmgomez/


________________________________
 De: Arya <arya.rah <at> gmail.com>
Para: Weka <wekalist <at> list.waikato.ac.nz>
Enviado: Viernes 19 de septiembre de 2014 11:20
Asunto: [Wekalist] Classification: other language


Hi

Would someone guide me please as to what the steps are if I were to get the classifier work with another language?

Am new to Weka!

Thank all,

Arya

Sent from my iPhone
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20140919/f6a78121/attachment.html>

------------------------------

_______________________________________________
Wekalist mailing list
Wekalist <at> list.waikato.ac.nz
http://list.waikato.ac.nz/mailman/listinfo/wekalist


End of Wekalist Digest, Vol 139, Issue 49
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Arya | 19 Sep 11:20 2014
Picon

Classification: other language

Hi

Would someone guide me please as to what the steps are if I were to get the classifier work with another language?

Am new to Weka!

Thank all,

Arya

Sent from my iPhone
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

double d s | 19 Sep 08:14 2014
Picon

Hi Eibe, I am seeking your recommendation in order to find suitable and clear references for my case

Hi Eibe,

 

I wish that you are in good health. Beside, I would like to say thank you so much for your h elp and assistance.

 

Dear, I am seeking your recommendation in order to find suitable and clear references for my case; I am looking for an academic resource where it has indicated the acceptable percentage/range of the "Correctly Classified Instances"  or even the " Incorrectly Classified Instances" in Weka which is necessary to evaluate the performance of the classifier. To be more precise, for instance, if the results of classifier's performance "Correctly Classified Instances = 55%), how do I say it is weak or good or acceptable or excellent result based on a "reference" from "paper" or "book"?

 

I will be highly appreciated if you can provide me at least with one reference.

 

Thanks.

 

Sandler

 


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Seth Corrigan | 18 Sep 17:35 2014

Dynamic Bayes in Weka?

Hi All,

Our group is using Weka to run several static Bayes nets. We have a project coming up that will require a dynamic Bayes net but have not seen that Weka supports such models. 

Can you please tell us if this is correct? And if so, are there alternative solutions for specifying and running dynamic Bayes within Weka?

Thanks in advance. 

Best,
Seth
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Ahmed Abdeen Hamed | 18 Sep 19:38 2014
Picon

Job posting | R&D Data Mining/Text Mining Software Engineer

Dear Weka community,

Apologies for cross-posting!.

Here is great opening in Burlington Vermont, for people who are interested in Data Mining/Text Mining social media data. It seems that they might be helping relocating candidates from other states.

I thought I would share,

Sincerely,

-Ahmed

-----------------------
This is a very exciting position for someone who wants to stay connected with the latest technologies, design an impressive social media based platform, and feel appreciated and important. If you love downtown Burlington, you don't want to miss out on what's happening, and if you want to work flexible hours, you should apply for this position given you have the skills below:

* Demonstrates proficiency in using Open-source APIs (e.g., Apache Commons) Web services (e.g., REST, SOAP) Search engine core tasks (indexing, querying, ranking) and their related algorithms Recommender systems algorithms, APIs and platforms Web crawlers GIS and GIS software MySql, NoSQL, GraphDB, Neo4J, Cassandra, Social Media APIs: Twitter4J, FB4J, LinkedIn APIs. Very passionate about the product and demonstrate a great appetite in developing cutting edge technologies, learning new optimization techniques, being a big part of a team, and extending the hand to young developers and interns. Must be willing to provide expertise in designing new products and services to the executives. Be able to provide a clear vision of how she/he would solve complex problems and dealing with BigData. Experience in NLP, Text Mining, or Data Mining!

* Must have the some of following technical skills: Java, Python, IntelliJ, Eclipse, Lucene/Solr Search APIs. Must be able to parsing and handle various data formats (XML, XHTML, JSON, etc). Demonstrates a great level of patience on how to debug code and deploy software into production environment.

* Must have 2 years experience using Twitter and LinkedIn. 
* Salary is competitive and based on the expertise demonstrated. 
* Candidates from other states are most welcome under the understanding that relocation is required and not negotiable.
* Assistance with relocation is offered
* Applications are made to SmartRecs <at> SmartRecs.co
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jibran Ahmed | 18 Sep 03:54 2014
Picon

EM Algorithm output interpretation.

Hi Folks,
I implemented Simple K Means before for clustering and used the clusters to later predict the new instance. I was reading about EM and understood how the algorithm works but I am unable to understand the output that weka explorer shows for EM, is anyone who can help me out to interpret the EM output from weka explorer and using that output to predict the new instance that which cluster it belongs to. 
Any help will be appreciated. 
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sina Aghasi | 17 Sep 20:42 2014
Picon

How to convert nominal to numeric?

Hello

I have many nominal attributes, how can i convert them to numeric values? for example an attributes have these values: tcp,udp,icmp . I want the following map before doing any machine learning algorithm:
tcp -> 1
udp -> 2
icmp -> 3

Regards
Sina Aghasi
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane