1 Feb 2004 01:32
1 Feb 2004 18:42
connect to oracle
Ali Alkan <alialkan <at> simya.net>
2004-02-01 17:42:48 GMT
2004-02-01 17:42:48 GMT
Hi,
I advise you to try Cahit Arf V1.0 - A Data Extraction Utility for Weka.
You can download from http://cahitarf.sourceforge.net/.
Good luck,
Ali Alkan, Msc.
Informatics Institute
Istanbul Technical University - Turkey
_______________________________________________ Wekalist mailing list Wekalist <at> list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
1 Feb 2004 20:38
String Attributes In Weka
Daniel Harbig <dharbig <at> hotmail.com>
2004-02-01 19:38:27 GMT
2004-02-01 19:38:27 GMT
Hi,
we are a student project group working in the field of machine learning
using WEKA and completely new to that subject. We encountered some problems
using the string data type in an ARFF document.
We try to automatically classify a set of books according to the given
library classification using the title and keywords. We use attributes like
"book title" and "keywords" which shall be compared referring to the
similarity, e.g. comparing different book titles.
An example set could be:
<at> Attribute id numeric
<at> Attribute bookTitle string
<at> Attribute keywords string
<at> Attribute class {...}
<at> DATA
1 , "Data Mining" , " data mining, information retrieval" , CSC21
2 , "Macbeth" , "Shakespeare, Lady Macbeth", LIT32
3 , "Web Information Retrieval" , "information science, internet,
retrieval", CSC44
...
Is WEKA able to tell then, that book 1 is more similar to book 3 or in the
same class by comparing the string attributes.
We have not been able to find out how string attributes are processed by
WEKA. Do we have to preprocess them? If yes, how?
Thanks for all useful hints!
Sarah, Nina, Dan
_________________________________________________________________
MSN Messenger - sehen, welche Freunde online sind!
http://www.msn.de/messenger Jetzt kostenlos downloaden und mitmachen!
1 Feb 2004 22:07
Re: J48 output discrepancies
Eibe Frank <eibe <at> cs.waikato.ac.nz>
2004-02-01 21:07:23 GMT
2004-02-01 21:07:23 GMT
Did you get a difference when you just used J48 by itself? (The output you sent to the list was based on using J48 in conjunction with the CostSensitiveClassifier.) I made two small changes to J48 for the latest release (one bug fix and one other change), which in some rare cases may lead to slightly different probability estimates (more info in the CHANGELOG for the latest release). However, I don't think those changes explain the large differences that you observed. Which version of Weka were you using before the upgrade? Do you think you could send me the data so that I can try and figure out what's going on? Cheers, Eibe On Sunday, February 1, 2004, at 12:00 PM, wekalist-request <at> list.scms.waikato.ac.nz wrote: > Who can explain me the abnormal behaviour of these two > subsequent version of J48 outputs? Just updated weka > tool and realized that j48 had moved to > 'trees.j48.J48'. I did not change my fit dataset but > J48 results are significantly different. Look at the > misclassified numbers.
1 Feb 2004 21:25
Resizeable panes in KnowledgeFlow
Steve Moyle <steve.moyle <at> comlab.ox.ac.uk>
2004-02-01 20:25:46 GMT
2004-02-01 20:25:46 GMT
Hi, I have been giving students a demo of Weka and ran into a slight problem with the textual visualisation of the smo model in the text viewer. I was not able to adjust the relative size of the panels listing the models (LHS) and the model outputs themselves (RHS). The attached bitmap shows the GUI element that appears. It is not possible to `slide' the vertical separator left (or right?). I am using weker 3.4.1 on Windows XP and $ java -version java version "1.4.1_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06) Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode) Cheers, Steve Dr Steve Moyle Research Officer Oxford University Computing Laboratory Wolfson Building Parks Road Oxford OX1 3QD ENGLAND Telephone: +44 (0) 780 1749 587 Fax: +44 (0) 1865 273 839 Email: steve.moyle <at> comlab.ox.ac.uk
_______________________________________________ Wekalist mailing list Wekalist <at> list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
2 Feb 2004 05:42
Reccord Id for arff file
<eug <at> wwti.com>
2004-02-02 04:42:05 GMT
2004-02-02 04:42:05 GMT
Hi,
I need a way I can embed a record identifier in
arff file records. I do not want this value to be used in any way by the
modeling process. I just want it in the output so I can match output with
generated classifier values against a database of actual values. So I need a way
I can mark this as a comment fields. Anyone know how to do this?
Eric
PhD Candidate
IIT
Chicago
_______________________________________________ Wekalist mailing list Wekalist <at> list.scms.waikato.ac.nz https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
2 Feb 2004 12:44
premature end of line
Ernst Schwartz <Ernst_Schwartz <at> gmx.at>
2004-02-02 11:44:21 GMT
2004-02-02 11:44:21 GMT
Hi! I'm trying to get my apache logs in .arff format for future weka-use. I'm having problems with entries like this: 1,62.8.71.38,0,0,24/08/2003:04:34:54,+0200,GET,/default.ida? XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucb d3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00 =a,HTTP/1.0,404,292 ... it says premature end of line ... has anyone any idea how I could reformat that the way weka likes it? thanks Ernst
2 Feb 2004 20:43
Re: J48 output discrepancies
Eibe Frank <eibe <at> cs.waikato.ac.nz>
2004-02-02 19:43:41 GMT
2004-02-02 19:43:41 GMT
> Who can explain me the abnormal behaviour of these two > subsequent version of J48 outputs? Just updated weka > tool and realized that j48 had moved to > 'trees.j48.J48'. I did not change my fit dataset but > J48 results are significantly different. Look at the > misclassified numbers. Hi Cemal, Thanks for sending me the data. The difference in the results is due to the fact that the randomization routine used in the cross-validation has changed between Weka 3.2 (which you were were using before) and Weka 3.4 (which you are using now). You will find that the estimates change quite a bit if you try different random number seeds for the cross-validation (with the -s option). Cheers, Eibe
3 Feb 2004 00:46
Re: Wekalist Digest, Vol 12, Issue 1
Ashraf Kibriya <amk14 <at> cs.waikato.ac.nz>
2004-02-02 23:46:52 GMT
2004-02-02 23:46:52 GMT
R u in the Uni bro? wanna see u. wekalist-request <at> list.scms.waikato.ac.nz wrote: >Send Wekalist mailing list submissions to > wekalist <at> list.scms.waikato.ac.nz > >To subscribe or unsubscribe via the World Wide Web, visit > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist >or, via email, send a message with subject or body 'help' to > wekalist-request <at> list.scms.waikato.ac.nz > >You can reach the person managing the list at > wekalist-owner <at> list.scms.waikato.ac.nz > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Wekalist digest..." > > >Today's Topics: > > 1. (no subject) (Chong May Yee) > 2. filtering output results (Chong May Yee) > 3. Where to find a list of stop words (Mena Badieh) > 4. Re: Where to find a list of stop words (Jack Park) > 5. Urgent questions (zhangd <at> umbc.edu) > 6. J48 output discrepancies (Cemal Eroglu) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Sat, 31 Jan 2004 10:27:44 +0000 >From: "Chong May Yee" <zhong_meiyi <at> hotmail.com> >Subject: [Wekalist] (no subject) >To: wekalist <at> list.scms.waikato.ac.nz >Message-ID: <BAY2-F133QtMXitvVjJ0005957e <at> hotmail.com> >Content-Type: text/plain; format=flowed > >Hi all, > >I need to do a few rounds of classification (J48) on a set of data but ran >into some problems: >For the first round, I have a tree with two classes (say, classA and >classB), I run the entire testing data using the tree. Is it possible to >produce two data sets (maybe in a text format), where 1 file contains those >data that have been classified as classA, while another file contains those >data that have been classified as classB (irregardless of whether they have >been classified correctly or not)? > >Thanks in advance! >May Yee > >_________________________________________________________________ >Get 10mb of inbox space with MSN Hotmail Extra Storage >http://join.msn.com/?pgmarket=en-sg > > > > >------------------------------ > >Message: 2 >Date: Sat, 31 Jan 2004 10:28:20 +0000 >From: "Chong May Yee" <zhong_meiyi <at> hotmail.com> >Subject: [Wekalist] filtering output results >To: wekalist <at> list.scms.waikato.ac.nz >Message-ID: <BAY2-F37oP911YBlCzs00042e79 <at> hotmail.com> >Content-Type: text/plain; format=flowed > >Hi all, > >I need to do a few rounds of classification (J48) on a set of data but ran >into some problems: >For the first round, I have a tree with two classes (say, classA and >classB), I run the entire testing data using the tree. Is it possible to >produce two data sets (maybe in a text format), where 1 file contains those >data that have been classified as classA, while another file contains those >data that have been classified as classB (irregardless of whether they have >been classified correctly or not)? > >Thanks in advance! >May Yee > >_________________________________________________________________ >Get 10mb of inbox space with MSN Hotmail Extra Storage >http://join.msn.com/?pgmarket=en-sg > > > > >------------------------------ > >Message: 3 >Date: Sat, 31 Jan 2004 20:20:00 +0200 >From: "Mena Badieh" <menabad <at> yalla.com> >Subject: [Wekalist] Where to find a list of stop words >To: "wekalist" <wekalist <at> list.scms.waikato.ac.nz> >Message-ID: <001301c3e826$d9e4aef0$1f85c741 <at> maria> >Content-Type: text/plain; charset="windows-1256" > >Hello All, >I am a new researcher in the field of Text Categorization and I want to know where to find a full list of Stop Words??? Does this list vary according to the data set. I am working with 20 NewsGroups data set. > >Mena B. Habib >Faculty of Computers & Information Sciences - Ain Shams University - Cairo - Egypt >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20040131/b4d82b13/attachment-0001.htm > >------------------------------ > >Message: 4 >Date: Sat, 31 Jan 2004 10:30:14 -0800 >From: Jack Park <jackpark <at> thinkalong.com> >Subject: Re: [Wekalist] Where to find a list of stop words >To: wekalist <wekalist <at> list.scms.waikato.ac.nz> >Message-ID: <401BF436.3090801 <at> thinkalong.com> >Content-Type: text/plain; charset=windows-1256; format=flowed > >Mena Badieh wrote: > > > >>Hello All, >>I am a new researcher in the field of Text Categorization and I want >>to know where to find a full list of Stop Words??? Does this list vary >>according to the data set. I am working with 20 NewsGroups data set. >> >>Mena B. Habib >>Faculty of Computers & Information Sciences - Ain Shams University - >>Cairo - Egypt >> >> > >One source: look inside the source code for Lucene, the search engine: >http://jakarta.apache.org/lucene > >Cheers >Jack > > > > > >------------------------------ > >Message: 5 >Date: Sat, 31 Jan 2004 15:20:58 -0500 (EST) >From: zhangd <at> umbc.edu >Subject: [Wekalist] Urgent questions >To: wekalist <at> list.scms.waikato.ac.nz >Message-ID: <4359.130.85.90.185.1075580458.squirrel <at> webmail.umbc.edu> >Content-Type: text/plain;charset=iso-8859-1 > >Dear friends: > >I just started using Weka and find it is a very nice tool. I am currently >creating a neural network classification model for an e-Commerce >application and have a few questions about the system: > >1) Does Weka provide ROC (receiver operating curves) for classifiers? If >so, how can I get it in the classification output? > >2) In 10-fold cross-validation, is there any way to get the output (in >particular the information retrieval statistics) of each fold in addition >to the average of all 10 folds? > >3) A classification model (weighting scheme and architecture) can be >generated for neural network classifiers. In case of 10-fold, which fold >does the model apply to? I suppose that the model for each fold is >different. > >4) While calculating the margin curve, what is used as the actual >probability value for an instance to compare with the predicted >probability value of that instance? > >I would greatly appreicate if someone can provide answers to the above >questions. Thanks in advance. > >Zhang > > > > > > >------------------------------ > >Message: 6 >Date: Sat, 31 Jan 2004 14:19:07 -0800 (PST) >From: Cemal Eroglu <cemal_eroglu <at> yahoo.com> >Subject: [Wekalist] J48 output discrepancies >To: wekalist <at> list.scms.waikato.ac.nz >Message-ID: <20040131221907.69506.qmail <at> web41205.mail.yahoo.com> >Content-Type: text/plain; charset=us-ascii > >Hi there, > >Who can explain me the abnormal behaviour of these two >subsequent version of J48 outputs? Just updated weka >tool and realized that j48 had moved to >'trees.j48.J48'. I did not change my fit dataset but >J48 results are significantly different. Look at the >misclassified numbers. > >So, which one I should count on? > > >Here is the old one: >========= > >Options: -C costs\costs6.txt -W >weka.classifiers.j48.J48 -- -M 15 -R -N 96 > >CostSensitiveClassifier using reweighted training >instances > >weka.classifiers.j48.J48 -R -N 96 -M 15 > >Classifier Model >J48 pruned tree >------------------ > >c2 = 0 >| c1 = 0 >| | c5 = 0: 0 (2464.79/642.05) >| | c5 = 1 >| | | c3 = 0: 1 (188.42/80.9) >| | | c3 = 1 >| | | | c8 = 0: 0 (123.39/46.08) >| | | | c8 = 1: 1 (82.94/39.94) >| c1 = 1 >| | c4 = 0 >| | | c5 = 0: 0 (553.99/261.12) >| | | c5 = 1: 1 (160.77/50.18) >| | c4 = 1: 1 (428.04/145.41) >c2 = 1 >| c1 = 0 >| | c3 = 0: 0 (314.88/156.67) >| | c3 = 1 >| | | c5 = 0 >| | | | c8 = 0: 0 (71.17/33.79) >| | | | c8 = 1: 1 (81.92/32.77) >| | | c5 = 1: 1 (778.76/262.66) >| c1 = 1: 1 (3510.3/628.74) > >Number of Leaves : 12 > >Size of the tree : 23 > > >Cost Matrix > 0 1 > 6 0 > > > > >=== Stratified cross-validation === > >Correctly Classified Instances 6206 > 70.1243 % >Incorrectly Classified Instances 2644 > 29.8757 % >Kappa statistic 0.3105 >Mean absolute error 0.2988 >Root mean squared error 0.5466 >Relative absolute error 96.8048 % >Root relative squared error 139.1545 % >Total Number of Instances 8850 > > >=== Confusion Matrix === > > a b <-- classified as > 4949 2214 | a = 0 > 430 1257 | b = 1 > >===================== >This is the new one: > >Options: -C costs\costs6.txt -W >weka.classifiers.trees.j48.J48 -- -M 15 -R -N 96 > >CostSensitiveClassifier using reweighted training >instances > >weka.classifiers.trees.j48.J48 -R -N 96 -M 15 > >Classifier Model >J48 pruned tree >------------------ > >c2 = 0 >| c1 = 0 >| | c5 = 0: 0 (2464.79/642.05) >| | c5 = 1 >| | | c3 = 0: 1 (188.42/80.9) >| | | c3 = 1 >| | | | c8 = 0: 0 (123.39/46.08) >| | | | c8 = 1: 1 (82.94/39.94) >| c1 = 1 >| | c4 = 0 >| | | c5 = 0: 0 (553.99/261.12) >| | | c5 = 1: 1 (160.77/50.18) >| | c4 = 1: 1 (428.04/145.41) >c2 = 1 >| c1 = 0 >| | c3 = 0: 0 (314.88/156.67) >| | c3 = 1 >| | | c5 = 0 >| | | | c8 = 0: 0 (71.17/33.79) >| | | | c8 = 1: 1 (81.92/32.77) >| | | c5 = 1: 1 (778.76/262.66) >| c1 = 1: 1 (3510.3/628.74) > >Number of Leaves : 12 > >Size of the tree : 23 > > >Cost Matrix > 0 1 > 6 0 > > > > >=== Stratified cross-validation === > >Correctly Classified Instances 5835 > 65.9322 % >Incorrectly Classified Instances 3015 > 34.0678 % >Kappa statistic 0.2675 >Mean absolute error 0.4214 >Root mean squared error 0.4802 >Relative absolute error 136.5501 % >Root relative squared error 122.2635 % >Total Number of Instances 8850 > > >=== Confusion Matrix === > > a b <-- classified as > 4536 2627 | a = 0 > 388 1299 | b = 1 > > >thx. > >Cemal > > >===== >*************************************************** >** This mail was sent using 100% recycled bytes ** >** Save the internet! ** >*************************************************** > >__________________________________ >Do you Yahoo!? >Yahoo! SiteBuilder - Free web site building tool. Try it! >http://webhosting.yahoo.com/ps/sb/ > > > >------------------------------ > >_______________________________________________ >Wekalist mailing list >Wekalist <at> list.scms.waikato.ac.nz >https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > > >End of Wekalist Digest, Vol 12, Issue 1 >*************************************** > >
3 Feb 2004 00:55
Re: Wekalist Digest, Vol 12, Issue 1
Ashraf Kibriya <amk14 <at> cs.waikato.ac.nz>
2004-02-02 23:55:27 GMT
2004-02-02 23:55:27 GMT
Dear Weka list members, Please accept my apologies for the following last message. Accidentally clicked on the wrong button on screen. Regards, Ashraf Ashraf Kibriya wrote: > R u in the Uni bro? wanna see u. > > wekalist-request <at> list.scms.waikato.ac.nz wrote: >
RSS Feed