sobiya khan | 20 May 2013 11:52
Picon

Unable to understand the word frequency calculation from text files to arff program

Dear Mark,

We found the program (ARFF files from Text Collections on http://weka.wikispaces.com/) for converting text file to arff. However the original program was having some errors. We rectified it. Now it does yields the output.
However, there are a few points We are unable to decipher the output. Kindly provide some clarification to the following:

1. The modified program and the sample files are attached hereby for your reference. The program did sample the words which were more than 2 characters in length.However, we didn't understand how it calculated the weightage or frequency for the words so found.How does this represent the weightage of words found in the document?
2. Secondly How could we build a cluster on top of this ( using java api)? A Cluster as simple as dividing the text files/words based on their meaning/discussion like Personal/Official/suspicious etc? for example if we are using SimpleKmeans then how does weka recognizes the centroid among these words from various documents and how does it determines similarity among these words?because the the similarity measures such as Euclidean distances are calculated on numeric data then how are we going to use this distance measure for word? What will be the criteria for similarity among words to cluster these into separate groups?
3. Thirdly, does weka provides any means for tagging the clusters with the top N keywords for topic identification? Or is there any mechanism in weka for topic identification after performing clustering?


I know my questions are bit too basic but... its just like we are missing some piece here.Kindly help.
Any help is most appreciated.
Thanks in advance.

Warm Regards
sobiya
Attachment (TextDirectoryToArff.zip): application/zip, 1950 bytes
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
shital patil | 20 May 2013 05:39
Picon

Help with MultiScheme Classifier

Hello sir,

I have a set of classifiers which I train on gene expression dataset and try to choose the best performing algorithm out of them by 10 fold cross validation.

I did this manually for each classifier and also using Multischeme classifier of weka.with 10 folds through java code.

I noticed that Multischeme is picking wrong classifier as best classifier than what I get manually with cross validation.

 My manual evaluation results are same as explorer output, so no bug in my code as I have confirmed it.

Why is this happening?

Thank You
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Wael Gomaa | 20 May 2013 00:16
Picon

Text Similarity

I am working in the field of text similarity between short texts.
i have results of three text similarity algorithms which were applied individually and i want to combine them using any machine learning algorithm.
The results are set of float similarity values between 0 and 1.
  
How do i perform this task using Weka??
--
Wael Hassan Gomaa
Mobile:  +2 011 4 6767 4 66
PhD student,
Faculty of Computers and Information,
Cairo University, Egypt
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 19 May 2013 23:27
Favicon

KDnuggets Poll on DM software

Hi folks,

Please vote for Weka in the latest Kdnuggets poll!

Cheers,
Mark.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Baskar Jayaraman | 19 May 2013 18:46
Picon

weighted random forest in WEKA...

Breiman et. al (http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf) discuss a modified random forest, called weighted random forest (wRF). Is there an implementation of this in WEKA (base or packages)? I looked and the closest that I can find is to use MetaCost with random forest as the base learner. What I am not sure about is whether this is the same or produce similar results as what the above paper calls as wRF.

Any pointers would be appreciated.

Thanks.

Baskar
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sardar Sulaman | 19 May 2013 11:10
Picon
Favicon

Use of a Built Classifier on new dataset with out pre-processing in Batch filter mode

Hi,

Is this possible in WEKA or in machine learning that I apply saved classifier model on a new 
dataset? New dataset is pre-processed alone not in batch filter mode.

I tried this and received train and test sets are not compatible. (I know it can be done in batch 
processing filter mode).

Thanks in anticipation!

/Sardar

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
dirichlet | 19 May 2013 10:37
Picon

[Experimenter advanced mode] Cannot set output path

Hi everyone,

I do not know if this is a bug or a 'feature' (using Weka 3.7.9 under Mac OS X Lion) but under the advanced panels of the Experimenter window I'm not able to choose the path for the output files (both InstancesResultListener and ResultGenerator), when I try to select it it appears that a file chooser, not a directory chooser pops up and does not let me specify a path.

Here is a screenshot:
http://oi44.tinypic.com/99hs75.jpg

In the end the output paths remain the same and I get the error (the experimenter is interrupted as it starts)
/Users/splitEvalutorOut.zip (Permission denied)

How can I specify a correct path? I need the advanced mode of the experimenter interface in order to get statistics for both of my labelled classes (it appears that the simple modes only saves the results for the first one)
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
sobiya khan | 18 May 2013 08:20
Picon

ARFF files from Text Collections and Frequency Calculation


Dear Mark,

We found the program (ARFF files from Text Collections on http://weka.wikispaces.com/) for converting text file to arff. However the original program was having some errors. We rectified it. Now it does yields the output.
However, there are a few points We are unable to decipher the output. Kindly provide some clarification to the following:

1. Firstly, How does this represent the weightage of words found in the document?
2. Secondly How could we build a cluster based on this? A Cluster as simple as dividing the text files/words based on their meaning/discussion like Personal and Official?

The modified program and the sample files are attached hereby for your reference.
The program did sample the words which were more than 2 characters in length.
However, we didn't understand how it calculated the weightage or frequency for the words so found.

Any help is most appreciated.
Thanks in advance.

Warm Regards
Sobiya

Attachment (TextDirectoryToArff.zip): application/zip, 1950 bytes
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Robert Bixler | 17 May 2013 16:38
Picon
Favicon

Using multiple cores with weka

Is it possible to use multiple cores to process multiple weka tasks programmatically? For instance, I want to create a data set and then over a certain number of iterations create a train and test set. I then want to create a number of models using different classifieds and evaluate them. I want to create a thread that does this process for each data set and run each thread on a different core. Would I need to use weka server and set up a server with the number of cores on my machine to do this?

--
Robert Bixler,

Computer Science and Engineering,
University of Notre Dame

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Ranganath S | 17 May 2013 16:00

Output

Hello,

    Does WEKA output the results on association in the Associator output 
window or does it writes to some file.

Regards,
ER

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Ranganath S | 17 May 2013 12:54

Association Apriori Stops Abruptly

Hello,

    I am running Association(Apriori scheme) in a machine with 18GB RAM 
and 3.3GHz double processor having Windows 7(64 Bit). The Apriori runs 
for sometime and without logging any error stops. I have set the heap 
memory to 10240m. What is the reason for such an abrupt stopping. Below 
are the details...

=== Run information ===

Scheme:       weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 
-M 0.1 -S -1.0 -c -1
Relation:     CourseDetails
Instances:    1294
Attributes:   236

Regards,
ER

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Gmane