Albrecht Zimmermann | 1 Feb 2004 01:32
Picon

Cobweb: Missing Values

Hi y'all,

since I couldn't find it in the docu - how does the CobWeb 
implementation treat missing values? Does it ignore them? Or treat them 
as a third value?

                                                 Thanx in advance, A
Ali Alkan | 1 Feb 2004 18:42

connect to oracle

Hi,

 

I advise you to try Cahit Arf V1.0 - A Data Extraction Utility for Weka.

 

You can download from http://cahitarf.sourceforge.net/.

 

 

Good luck,

 

Ali Alkan, Msc.

Informatics Institute

Istanbul Technical University - Turkey

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Daniel Harbig | 1 Feb 2004 20:38
Picon
Favicon

String Attributes In Weka

Hi,

we are a student project group working in the field of machine learning 
using WEKA and completely new to that subject. We encountered some problems 
using the string data type in an ARFF document.

We try to automatically classify a set of books according to the given 
library classification using the title and keywords. We use attributes like 
"book title" and "keywords" which shall be compared referring to the 
similarity, e.g. comparing different book titles.
An example set could be:

 <at> Attribute id numeric
 <at> Attribute bookTitle string
 <at> Attribute keywords string
 <at> Attribute class {...}

 <at> DATA
1 , "Data Mining" , " data mining, information retrieval" , CSC21
2 , "Macbeth" , "Shakespeare, Lady Macbeth", LIT32
3 , "Web Information Retrieval" , "information science, internet, 
retrieval", CSC44
...

Is WEKA able to tell then, that book 1 is more similar to book 3 or in the 
same class by comparing the string attributes.

We have not been able to find out how string attributes are processed by 
WEKA. Do we have to preprocess them? If yes, how?

Thanks for all useful hints!

Sarah, Nina, Dan

_________________________________________________________________
MSN Messenger - sehen, welche Freunde online sind! 
http://www.msn.de/messenger Jetzt kostenlos downloaden und mitmachen!
Eibe Frank | 1 Feb 2004 22:07
Picon
Picon

Re: J48 output discrepancies


Did you get a difference when you just used J48 by itself? (The output 
you sent to the list was based on using J48 in conjunction with the 
CostSensitiveClassifier.)

I made two small changes to J48 for the latest release (one bug fix and 
one other change), which in some rare cases may lead to slightly 
different probability estimates (more info in the CHANGELOG for the 
latest release). However, I don't think those changes explain the large 
differences that you observed.

Which version of Weka were you using before the upgrade?

Do you think you could send me the data so that I can try and figure 
out what's going on?

Cheers,
Eibe

On Sunday, February 1, 2004, at 12:00  PM, 
wekalist-request <at> list.scms.waikato.ac.nz wrote:

> Who can explain me the abnormal behaviour of these two
> subsequent version of J48 outputs? Just updated weka
> tool and realized that j48 had moved to
> 'trees.j48.J48'. I did not change my fit dataset but
> J48 results are significantly different. Look at the
> misclassified numbers.
Steve Moyle | 1 Feb 2004 21:25
Picon
Picon

Resizeable panes in KnowledgeFlow

Hi,

I have been giving students a demo of Weka and ran into a slight problem
with the textual visualisation of the smo model in the text viewer.  I
was not able to adjust the relative size of the panels listing the
models (LHS) and the model outputs themselves (RHS).

The attached bitmap shows the GUI element that appears.  It is not
possible to `slide' the vertical separator left (or right?).

I am using weker 3.4.1 on Windows XP and
$ java -version
java version "1.4.1_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_02-b06)
Java HotSpot(TM) Client VM (build 1.4.1_02-b06, mixed mode)

Cheers,

Steve

Dr Steve Moyle
Research Officer                            
Oxford University Computing Laboratory      
Wolfson Building                            
Parks Road                                  
Oxford                                      
OX1 3QD                                     
ENGLAND                                     

Telephone: +44 (0) 780 1749 587             
Fax:       +44 (0) 1865 273 839             
Email:     steve.moyle <at> comlab.ox.ac.uk       
_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
eug | 2 Feb 2004 05:42

Reccord Id for arff file

Hi,
I need a way I can embed a record identifier in arff file records. I do not want this value to be used in any way by the modeling process. I just want it in the output so I can match output with generated classifier values against a database of actual values. So I need a way I can mark this as a comment fields. Anyone know how to do this?
Eric
PhD Candidate
IIT
Chicago
 
_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Ernst Schwartz | 2 Feb 2004 12:44
Picon
Picon

premature end of line

Hi!

I'm trying to get my apache logs in .arff format for future weka-use.
I'm having problems with entries like this:

1,62.8.71.38,0,0,24/08/2003:04:34:54,+0200,GET,/default.ida? 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
XXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucb 
d3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00 
=a,HTTP/1.0,404,292

... it says premature end of line ... has anyone any idea how I could  
reformat that the way weka likes it?

thanks
Ernst
Eibe Frank | 2 Feb 2004 20:43
Picon
Picon

Re: J48 output discrepancies


> Who can explain me the abnormal behaviour of these two
> subsequent version of J48 outputs? Just updated weka
> tool and realized that j48 had moved to
> 'trees.j48.J48'. I did not change my fit dataset but
> J48 results are significantly different. Look at the
> misclassified numbers.

Hi Cemal,

Thanks for sending me the data. The difference in the results is due to 
the fact that the randomization routine used in the cross-validation 
has changed between Weka 3.2 (which you were were using before) and 
Weka 3.4 (which you are using now). You will find that the estimates 
change quite a bit if you try different random number seeds for the 
cross-validation (with the -s option).

Cheers,
Eibe
Ashraf Kibriya | 3 Feb 2004 00:46
Picon
Picon

Re: Wekalist Digest, Vol 12, Issue 1

R u in the Uni bro? wanna see u.

wekalist-request <at> list.scms.waikato.ac.nz wrote:

>Send Wekalist mailing list submissions to
>	wekalist <at> list.scms.waikato.ac.nz
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>or, via email, send a message with subject or body 'help' to
>	wekalist-request <at> list.scms.waikato.ac.nz
>
>You can reach the person managing the list at
>	wekalist-owner <at> list.scms.waikato.ac.nz
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Wekalist digest..."
>
>
>Today's Topics:
>
>   1. (no subject) (Chong May Yee)
>   2. filtering output results (Chong May Yee)
>   3. Where to find a list of stop words (Mena Badieh)
>   4. Re: Where to find a list of stop words (Jack Park)
>   5. Urgent questions (zhangd <at> umbc.edu)
>   6. J48 output discrepancies (Cemal Eroglu)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Sat, 31 Jan 2004 10:27:44 +0000
>From: "Chong May Yee" <zhong_meiyi <at> hotmail.com>
>Subject: [Wekalist] (no subject)
>To: wekalist <at> list.scms.waikato.ac.nz
>Message-ID: <BAY2-F133QtMXitvVjJ0005957e <at> hotmail.com>
>Content-Type: text/plain; format=flowed
>
>Hi all,
>
>I need to do a few rounds of classification (J48) on a set of data but ran 
>into some problems:
>For the first round, I have a tree with two classes (say, classA and 
>classB), I run the entire testing data using the tree. Is it possible to 
>produce two data sets (maybe in a text format), where 1 file contains those 
>data that have been classified as classA, while another file contains those 
>data that have been classified as classB (irregardless of whether they have 
>been classified correctly or not)?
>
>Thanks in advance!
>May Yee
>
>_________________________________________________________________
>Get 10mb of inbox space with MSN Hotmail Extra Storage 
>http://join.msn.com/?pgmarket=en-sg
>
>
>
>
>------------------------------
>
>Message: 2
>Date: Sat, 31 Jan 2004 10:28:20 +0000
>From: "Chong May Yee" <zhong_meiyi <at> hotmail.com>
>Subject: [Wekalist] filtering output results
>To: wekalist <at> list.scms.waikato.ac.nz
>Message-ID: <BAY2-F37oP911YBlCzs00042e79 <at> hotmail.com>
>Content-Type: text/plain; format=flowed
>
>Hi all,
>
>I need to do a few rounds of classification (J48) on a set of data but ran 
>into some problems:
>For the first round, I have a tree with two classes (say, classA and 
>classB), I run the entire testing data using the tree. Is it possible to 
>produce two data sets (maybe in a text format), where 1 file contains those 
>data that have been classified as classA, while another file contains those 
>data that have been classified as classB (irregardless of whether they have 
>been classified correctly or not)?
>
>Thanks in advance!
>May Yee
>
>_________________________________________________________________
>Get 10mb of inbox space with MSN Hotmail Extra Storage 
>http://join.msn.com/?pgmarket=en-sg
>
>
>
>
>------------------------------
>
>Message: 3
>Date: Sat, 31 Jan 2004 20:20:00 +0200
>From: "Mena Badieh" <menabad <at> yalla.com>
>Subject: [Wekalist] Where to find a list of stop words
>To: "wekalist" <wekalist <at> list.scms.waikato.ac.nz>
>Message-ID: <001301c3e826$d9e4aef0$1f85c741 <at> maria>
>Content-Type: text/plain; charset="windows-1256"
>
>Hello All,
>I am a new researcher in the field of Text Categorization and I want to know where to find a full list of Stop
Words??? Does this list vary according to the data set. I am working with 20 NewsGroups data set.
>
>Mena B. Habib
>Faculty of Computers & Information Sciences - Ain Shams University - Cairo - Egypt
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20040131/b4d82b13/attachment-0001.htm
>
>------------------------------
>
>Message: 4
>Date: Sat, 31 Jan 2004 10:30:14 -0800
>From: Jack Park <jackpark <at> thinkalong.com>
>Subject: Re: [Wekalist] Where to find a list of stop words
>To: wekalist <wekalist <at> list.scms.waikato.ac.nz>
>Message-ID: <401BF436.3090801 <at> thinkalong.com>
>Content-Type: text/plain; charset=windows-1256; format=flowed
>
>Mena Badieh wrote:
>
>  
>
>>Hello All,
>>I am a new researcher in the field of Text Categorization and I want 
>>to know where to find a full list of Stop Words??? Does this list vary 
>>according to the data set. I am working with 20 NewsGroups data set.
>> 
>>Mena B. Habib
>>Faculty of Computers & Information Sciences - Ain Shams University - 
>>Cairo - Egypt
>>    
>>
>
>One source: look inside the source code for Lucene, the search engine:
>http://jakarta.apache.org/lucene
>
>Cheers
>Jack
>
>
>
>
>
>------------------------------
>
>Message: 5
>Date: Sat, 31 Jan 2004 15:20:58 -0500 (EST)
>From: zhangd <at> umbc.edu
>Subject: [Wekalist] Urgent questions
>To: wekalist <at> list.scms.waikato.ac.nz
>Message-ID: <4359.130.85.90.185.1075580458.squirrel <at> webmail.umbc.edu>
>Content-Type: text/plain;charset=iso-8859-1
>
>Dear friends:
>
>I just started using Weka and find it is a very nice tool. I am currently
>creating a neural network classification model for an e-Commerce
>application and have a few questions about the system:
>
>1) Does Weka provide ROC (receiver operating curves) for classifiers? If
>so, how can I get it in the classification output?
>
>2) In 10-fold cross-validation, is there any way to get the output (in
>particular the information retrieval statistics) of each fold in addition
>to the average of all 10 folds?
>
>3) A classification model (weighting scheme and architecture) can be
>generated for neural network classifiers. In case of 10-fold, which fold
>does the model apply to? I suppose that the model for each fold is
>different.
>
>4) While calculating the margin curve, what is used as the actual
>probability value for an instance to compare with the predicted
>probability value of that instance?
>
>I would greatly appreicate if someone can provide answers to the above
>questions. Thanks in advance.
>
>Zhang
>
>
>
>
>
>
>------------------------------
>
>Message: 6
>Date: Sat, 31 Jan 2004 14:19:07 -0800 (PST)
>From: Cemal Eroglu <cemal_eroglu <at> yahoo.com>
>Subject: [Wekalist] J48 output discrepancies
>To: wekalist <at> list.scms.waikato.ac.nz
>Message-ID: <20040131221907.69506.qmail <at> web41205.mail.yahoo.com>
>Content-Type: text/plain; charset=us-ascii
>
>Hi there,
>
>Who can explain me the abnormal behaviour of these two
>subsequent version of J48 outputs? Just updated weka
>tool and realized that j48 had moved to
>'trees.j48.J48'. I did not change my fit dataset but
>J48 results are significantly different. Look at the
>misclassified numbers.
>
>So, which one I should count on?
>
>
>Here is the old one:
>=========
>
>Options: -C costs\costs6.txt -W
>weka.classifiers.j48.J48 -- -M 15 -R -N 96 
>
>CostSensitiveClassifier using reweighted training
>instances
>
>weka.classifiers.j48.J48 -R -N 96 -M 15
>
>Classifier Model
>J48 pruned tree
>------------------
>
>c2 = 0
>|   c1 = 0
>|   |   c5 = 0: 0 (2464.79/642.05)
>|   |   c5 = 1
>|   |   |   c3 = 0: 1 (188.42/80.9)
>|   |   |   c3 = 1
>|   |   |   |   c8 = 0: 0 (123.39/46.08)
>|   |   |   |   c8 = 1: 1 (82.94/39.94)
>|   c1 = 1
>|   |   c4 = 0
>|   |   |   c5 = 0: 0 (553.99/261.12)
>|   |   |   c5 = 1: 1 (160.77/50.18)
>|   |   c4 = 1: 1 (428.04/145.41)
>c2 = 1
>|   c1 = 0
>|   |   c3 = 0: 0 (314.88/156.67)
>|   |   c3 = 1
>|   |   |   c5 = 0
>|   |   |   |   c8 = 0: 0 (71.17/33.79)
>|   |   |   |   c8 = 1: 1 (81.92/32.77)
>|   |   |   c5 = 1: 1 (778.76/262.66)
>|   c1 = 1: 1 (3510.3/628.74)
>
>Number of Leaves  : 	12
>
>Size of the tree : 	23
>
>
>Cost Matrix
> 0 1
> 6 0
>
>
>
>
>=== Stratified cross-validation ===
>
>Correctly Classified Instances        6206            
>  70.1243 %
>Incorrectly Classified Instances      2644            
>  29.8757 %
>Kappa statistic                          0.3105
>Mean absolute error                      0.2988
>Root mean squared error                  0.5466
>Relative absolute error                 96.8048 %
>Root relative squared error            139.1545 %
>Total Number of Instances             8850     
>
>
>=== Confusion Matrix ===
>
>    a    b   <-- classified as
> 4949 2214 |    a = 0
>  430 1257 |    b = 1
>
>=====================
>This is the new one:
>
>Options: -C costs\costs6.txt -W
>weka.classifiers.trees.j48.J48 -- -M 15 -R -N 96 
>
>CostSensitiveClassifier using reweighted training
>instances
>
>weka.classifiers.trees.j48.J48 -R -N 96 -M 15
>
>Classifier Model
>J48 pruned tree
>------------------
>
>c2 = 0
>|   c1 = 0
>|   |   c5 = 0: 0 (2464.79/642.05)
>|   |   c5 = 1
>|   |   |   c3 = 0: 1 (188.42/80.9)
>|   |   |   c3 = 1
>|   |   |   |   c8 = 0: 0 (123.39/46.08)
>|   |   |   |   c8 = 1: 1 (82.94/39.94)
>|   c1 = 1
>|   |   c4 = 0
>|   |   |   c5 = 0: 0 (553.99/261.12)
>|   |   |   c5 = 1: 1 (160.77/50.18)
>|   |   c4 = 1: 1 (428.04/145.41)
>c2 = 1
>|   c1 = 0
>|   |   c3 = 0: 0 (314.88/156.67)
>|   |   c3 = 1
>|   |   |   c5 = 0
>|   |   |   |   c8 = 0: 0 (71.17/33.79)
>|   |   |   |   c8 = 1: 1 (81.92/32.77)
>|   |   |   c5 = 1: 1 (778.76/262.66)
>|   c1 = 1: 1 (3510.3/628.74)
>
>Number of Leaves  : 	12
>
>Size of the tree : 	23
>
>
>Cost Matrix
> 0 1
> 6 0
>
>
>
>
>=== Stratified cross-validation ===
>
>Correctly Classified Instances        5835            
>  65.9322 %
>Incorrectly Classified Instances      3015            
>  34.0678 %
>Kappa statistic                          0.2675
>Mean absolute error                      0.4214
>Root mean squared error                  0.4802
>Relative absolute error                136.5501 %
>Root relative squared error            122.2635 %
>Total Number of Instances             8850     
>
>
>=== Confusion Matrix ===
>
>    a    b   <-- classified as
> 4536 2627 |    a = 0
>  388 1299 |    b = 1
>
>
>thx.
>
>Cemal
>
>
>=====
>***************************************************
>** This mail was sent using 100% recycled bytes  **
>**            Save the internet!                 **
>***************************************************
>
>__________________________________
>Do you Yahoo!?
>Yahoo! SiteBuilder - Free web site building tool. Try it!
>http://webhosting.yahoo.com/ps/sb/
>
>
>
>------------------------------
>
>_______________________________________________
>Wekalist mailing list
>Wekalist <at> list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>
>End of Wekalist Digest, Vol 12, Issue 1
>***************************************
>  
>
Ashraf Kibriya | 3 Feb 2004 00:55
Picon
Picon

Re: Wekalist Digest, Vol 12, Issue 1

Dear Weka list members,

Please accept my apologies for the following last message.  Accidentally 
clicked on the wrong button on screen.

Regards,
Ashraf

Ashraf Kibriya wrote:

> R u in the Uni bro? wanna see u.
>
> wekalist-request <at> list.scms.waikato.ac.nz wrote:
>

Gmane