Re: Index of class attribute
Mark Hall <mhall <at> pentaho.com>
2012-03-02 08:47:56 GMT
On 27/02/2012, at 10:37 PM, Nicolas Martin wrote:
> Hello,
>
> My arff file look like this :
>
> <at> relation test
>
> <at> attribute fulltoken string
> <at> attribute class {Y,N}
>
> <at> data
> {0 "information retrieval", 1 Y}
> {0 "harvesting information on the web", 1 Y}
> {0 "web mining and web retrieval", 1 N}
>
> When I apply StringToWordVector filter, my arff file is transformed as :
>
> <at> relation
'test-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-T-I-N0-L-S-stemmerweka.core.stemmers.SnowballStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer
-delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'
>
> <at> attribute class {Y,N}
> <at> attribute harvesting numeric
> <at> attribute information numeric
> <at> attribute retrieval numeric
> <at> attribute web numeric
> <at> attribute mining numeric
>
> <at> data
> {2 0.281047,3 0.281047}
> {1 0.7615,2 0.281047,4 0.281047}
> {0 N,3 0.281047,4 0.281047,5 0.7615}
>
> As you can see, N class attribute is retained (in the third instance). In fact, I have to set my class
attribute to <at> attribute class {DummyValue,Y,N} to get Y appearing in the transformed dataset :
>
> <at> attribute class {DummyValue,Y,N}
> <at> attribute harvesting numeric
> <at> attribute information numeric
> <at> attribute retrieval numeric
> <at> attribute web numeric
> <at> attribute mining numeric
>
> <at> data
> {0 Y,2 0.281047,3 0.281047}
> {0 Y,1 0.7615,2 0.281047,4 0.281047}
> {0 N,3 0.281047,4 0.281047,5 0.7615}
>
> What is best way to proceed ?
You do not need a dummy class value. The data is in sparse format, which means that zero's are not explicitly
stored. This goes for nominal attributes as well as numeric. To convince yourself that all the
information is still there just load the data back into the Explorer and examine the it via the "Edit" button.
Cheers,
Mark.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html