ouyeyu panyu | 1 Jun 2012 01:38
Picon

why weka tamper data

Hi There,

I ran into a problem when I use wek-3-7-5 to read a column from a mysql table.

select sl_yn_nm from backtest_20120112_g1_11 limit 0,10;
The type of "sl_yn_nm" is varchar(1) default NULL. It has two kinds of values: 0 or 1.
The results should be like green below (the left column), unfortunately, SOMETIMES what weka reads is like yellow (the right column).
0      1
1      0
0      1
0      1
0      1
1      0
0      1
1      0
0      1
0      1


Do you know why this happened?
Thanks.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Michael Prescott | 1 Jun 2012 02:32
Picon

Re: why weka tamper data

Your SQL query is unreliable.  SQL doesn't guarantee an order unless you specify one.  Different drivers, or the database itself, could return the results in a different order, which is legal.


If you require a consistent ordering, use a query like:

select sl_yn_nm
from backtest_20120112_g1_11
order by /* some business id */
limit 0,10;

On 31 May 2012 19:38, ouyeyu panyu <ouyeyu <at> gmail.com> wrote:
Hi There,

I ran into a problem when I use wek-3-7-5 to read a column from a mysql table.

select sl_yn_nm from backtest_20120112_g1_11 limit 0,10;
The type of "sl_yn_nm" is varchar(1) default NULL. It has two kinds of values: 0 or 1.
The results should be like green below (the left column), unfortunately, SOMETIMES what weka reads is like yellow (the right column).
0      1
1      0
0      1
0      1
0      1
1      0
0      1
1      0
0      1
0      1


Do you know why this happened?
Thanks.



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
ouyeyu panyu | 1 Jun 2012 04:15
Picon

Re: why weka tamper data

Hi Michael,

Thank you for the quick response.
In fact, I have an unique id in my original query.

select uuid, sl_yn_nm from backtest_20120112_g1_11 limit 0,10;
10001    0      1.0                 //for this unique id 10001, "0" in table was transformed to 1.0 by weka
10002    1      0.0                 //for this unique id 10002, "1" in table was transformed to 0.0 by weka
10003    0      1.0
10004    0      1.0
10005    0      1.0
10006    1      0.0
10007    0      1.0
10008    1      0.0
10009    0      1.0
10010    0      1.0

It seems weka represents strings as numeric internally, but why "0" is represented as 1.0 and vice versa?


2012/5/31 Michael Prescott <michael.r.prescott <at> gmail.com>
Your SQL query is unreliable.  SQL doesn't guarantee an order unless you specify one.  Different drivers, or the database itself, could return the results in a different order, which is legal.

If you require a consistent ordering, use a query like:

select sl_yn_nm
from backtest_20120112_g1_11
order by /* some business id */
limit 0,10;

On 31 May 2012 19:38, ouyeyu panyu <ouyeyu <at> gmail.com> wrote:
Hi There,

I ran into a problem when I use wek-3-7-5 to read a column from a mysql table.

select sl_yn_nm from backtest_20120112_g1_11 limit 0,10;
The type of "sl_yn_nm" is varchar(1) default NULL. It has two kinds of values: 0 or 1.
The results should be like green below (the left column), unfortunately, SOMETIMES what weka reads is like yellow (the right column).
0      1
1      0
0      1
0      1
0      1
1      0
0      1
1      0
0      1
0      1


Do you know why this happened?
Thanks.



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 1 Jun 2012 04:30
Favicon

Re: why weka tamper data


On 1/06/12 2:15 PM, ouyeyu panyu wrote:
> Hi Michael,
>
> Thank you for the quick response.
> In fact, I have an unique id in my original query.
>
> select uuid, sl_yn_nm from backtest_20120112_g1_11 limit 0,10;
> 10001 0 1.0 *//for this unique id 10001, "0" in table was transformed to
> 1.0 by weka*
> 10002 1 0.0 *//for this unique id 10002, "1" in table was transformed to
> 0.0 by weka*
> 10003 0 1.0
> 10004 0 1.0
> 10005 0 1.0
> 10006 1 0.0
> 10007 0 1.0
> 10008 1 0.0
> 10009 0 1.0
> 10010 0 1.0
>
> It seems weka represents strings as numeric internally, but why "0" is
> represented as 1.0 and vice versa?

How are you printing out the results from Weka? You shouldn't be seeing 
floating point numbers if the database type in question is varchar - 
this gets mapped to nominal attribute values by Weka. So, in this case 
the output should be "1"s and "0"s, not "1.0" and "0.0". If you are just 
printing out the raw values in the instances then you will see this 
because (as you surmised) Weka represents everything internally as 
double values. In the case of nominal string attributes the double 
values are actually indexes into the list of values for the 
corresponding attribute.

So my guess is that Michael is correct. The that data arrives from the 
database is changing from run to run. Weka's InstanceQuery class simply 
collects up nominal values in the order that they arrive. So the index 
for value "0" might be 0 on one run and 1 on the next.

Cheers,
Mark.

>
>
> 2012/5/31 Michael Prescott <michael.r.prescott <at> gmail.com
> <mailto:michael.r.prescott <at> gmail.com>>
>
>     Your SQL query is unreliable.  SQL doesn't guarantee an order unless
>     you specify one.  Different drivers, or the database itself, could
>     return the results in a different order, which is legal.
>
>     If you require a consistent ordering, use a query like:
>
>     select sl_yn_nm
>     from backtest_20120112_g1_11
>     order by /* some business id */
>     limit 0,10;
>
>     On 31 May 2012 19:38, ouyeyu panyu <ouyeyu <at> gmail.com
>     <mailto:ouyeyu <at> gmail.com>> wrote:
>
>         Hi There,
>
>         I ran into a problem when I use wek-3-7-5 to read a column from
>         a mysql table.
>
>         select sl_yn_nm from backtest_20120112_g1_11 limit 0,10;
>         The type of "sl_yn_nm" is varchar(1) default NULL. It has two
>         kinds of values: 0 or 1.
>         The results should be like green below (the left column),
>         unfortunately, *SOMETIMES* what weka reads is like yellow (the
>         right column).
>         0 1
>         1 0
>         0 1
>         0 1
>         0 1
>         1 0
>         0 1
>         1 0
>         0 1
>         0 1
>
>
>         Do you know why this happened?
>         Thanks.
>
>
>
>         _______________________________________________
>         Wekalist mailing list
>         Send posts to: Wekalist <at> list.scms.waikato.ac.nz
>         <mailto:Wekalist <at> list.scms.waikato.ac.nz>
>         List info and subscription status:
>         https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>         List etiquette:
>         http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>         <http://www.cs.waikato.ac.nz/%7Eml/weka/mailinglist_etiquette.html>
>
>
>
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: Wekalist <at> list.scms.waikato.ac.nz
>     <mailto:Wekalist <at> list.scms.waikato.ac.nz>
>     List info and subscription status:
>     https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>     List etiquette:
>     http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>     <http://www.cs.waikato.ac.nz/%7Eml/weka/mailinglist_etiquette.html>
>
>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Giacomo Migliorini | 1 Jun 2012 10:26
Picon
Gravatar

Negative Entropy

Hi,
I'm implementing a new Attribute Selector in Weka and I have a problem. I have to calculate entropy for each attribute in dataset, and I'm using ContingencyTables.entropy method: the problem is that sometimes the calculated entropy is negative, and that's wrong (from theory). 

What's wrong?

Regards

--
========================================
* Giacomo Migliorini
*
* via Felice Cavallotti 62, 50052 CERTALDO ITALY
* +39 329 8637321
*
* Twitter:      twitter.com/jack1852
* Skype:      giacomomigliorini
========================================

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Stelios Togias | 1 Jun 2012 14:40
Picon

Can't have more folds than instances

Hi,


The javadocs for EM clusterer for weka 3.6.6 state:

"The number of folds is fixed to 10, as long as the number of instances in the training set is not smaller 10. If this is the case the number of folds is set equal to the number of instances."  

Shouldn't this mean I should not be receiving this error when I use less than 10 instances.

On the other hand I am not supplying a separate training and test set. Could this be the reason?

Thanks
Stelios
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Sebastian Luna Valero | 1 Jun 2012 16:30
Picon
Favicon

Re: Can't have more folds than instances


Hi Stelios,

I think that the message that you get from WEKA is correct. I will try 
to explain myself but correct me if I am wrong, please.

If you have a set U with 2 instances U = {1, 2}, you may apply 2-fold 
cross-validation like this:

1. Select Te1 = {1} as testing set and Tr1 = {2} as training set, and 
obtain the classification results.
2. Select Te2 = {2} as testing set and Tr2 = {2} as training set, and 
obtain the classification results.
3. Average results of the above classifications.

However, are you able to apply 3-fold cross-validation with U? The 
answer should be: "no", since you need one more instance, at least, to 
select a new (testing set, training set) partition that have not been 
selected yet.

In general, you have the following:

In k-fold cross-validation, you need x >= k instances in the original 
set U, since you select k disjoint sets for testing (Te1, ..., Tek) and 
their corresponding training sets (Tr1 = U - Te1, ..., Trk = U - Tek).

In fact, when x = k you are actually applying leave-one-out 
cross-validation (LOOCV).

On the other hand, if you have x < k instances in your training set, it 
is not possible to select k disjoint sets for testing and training.

Concretely in your case, you are applying 10-fold cross-validation in a 
set that has less than 10 instances, and due to the previous 
explanation, that is not possible. My advice is that you count the 
number of instances in you dataset (x) and then select the number of 
folds accordingly (x >= k). For example, if you have 7 instances in your 
dataset, select k = 7, 6, 5, 4, 3 or 2 (depending on your needs). 
Normally, when the number of instances is so low, I would recommend 
LOOCV. (i.e., x = k = 7)

I hope my explanation was clear for you.

Regards,
Sebastian.

On Fri, 1 Jun 2012 14:40:34 +0200, Stelios Togias wrote:
> Hi,
>
> The javadocs for EM clusterer for weka 3.6.6 state:
>
> _"The number of folds is fixed to 10, as long as the number of
> instances in the training set is not smaller 10. If this is the case
> the number of folds is set equal to the number of instances."_  
>
> Shouldnt this mean I should not be receiving this error when I use
> less than 10 instances.
>
> On the other hand I am not supplying a separate training and test 
> set.
> Could this be the reason?
>
> Thanks
> Stelios

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Firoj Alam | 1 Jun 2012 17:02
Picon

Help: Labeling test file using RandomSubspace and REPTree

Hi,

Can anybody tell me how to label the test file two classifier? For  example, I train the system with the following command:

java -Xmx1024m weka.classifiers.meta.RandomSubSpace -P 0.5 -S 1 -I 10 -W weka.classifiers.trees.REPTree -v -o -i -t corpus/data.arff -d models/model -- -L -25 -P

But it is not clear to me how to label the test file using this model since i am using two classifier here. Can anybody please check the following command is ok or not?

java -Xmx1024m weka.classifiers.meta.RandomSubSpace -l models/model -T corpus/tst.arff -p 0 -i


Thanks in advance

Firoj


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Amalia Carolina Canavire | 1 Jun 2012 20:40
Picon

selection attribute + clustering

Thanks .but A doubt, what meaning " objective class" for selection attributes?
is first attribute to be analyzed?

2012/5/31 Wikispaces <do-not-reply <at> wikispaces.com>:
> sudheep, a member of Wikispaces, has seent you a message. Please do not reply
> to this email. To respond, follow the link below to view the message in your
> browser.
>
> Clustering and classification are different hence for clustering you should
> not use classification algorithms rather clustering algorithms itself..Also
> to know the meaning of any terms in algorithms/results there is a more
> option in weka when you double click the name of the algorithm after you
> select it. Chi-square analysis is an option for attribute
> selection/analysis, which you can do using SPSS/Weka.
> ________________________________
>
> View this message in your browser
>
> To turn off email followups, sign in at http://www.wikispaces.com/, click on
> My Account and switch "Email When You Receive a Message" to "No". Please
> email help <at> wikispaces.com if you have any questions.

--
**************   :) sonrei que te queda lindo :):):):): amy cgc
**************************


--

-- 
**************   :) sonrei que te queda lindo :):):):): amy cgc
**************************

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Road Tang | 2 Jun 2012 03:32
Picon

[bug][weka3.6.7] DatabaseLoader fails to getDataSet after setUrl().

Hi,folks,

Try  to load instances from a table, but fails always when
getDataSet(), while getStructure is fine. 

 bug?

==============getDataSet failure  ===============
-- Code
	public static void testDBLoader() throws Exception {
		String host = "localhost";
		String user = "roadt";
		String pwd = "pass";
		String db = "wp";
		String query = "select post_type, post_title,
	post_status from wp_posts";

		DatabaseLoader loader = new DatabaseLoader();

		String url = String.format("jdbc:mysql://%s/%s", host,
		db); loader.setSource(url, user, pwd);
		loader.setQuery(query);

		System.out.println(loader.getDataSet());

	}

--  Output

--- Exception caught ---

Message:   No suitable driver found for jdbc:idb=experiments.prp
SQLState:  08001
ErrorCode: 0

null

It looks like the
getDataSet() doesn't care about the database connection which is
set by setUrl. 

===== repace getDataSEt to getStructure ============

-- Code
	public static void testDBLoader() throws Exception {
		String host = "localhost";
		String user = "roadt";
		String pwd = "pass";
		String db = "wp";
		String query = "select post_type, post_title,
post_status from wp_posts";

		DatabaseLoader loader = new DatabaseLoader();

		String url = String.format("jdbc:mysql://%s/%s", host,
		db); loader.setSource(url, user, pwd);
		loader.setQuery(query);

		System.out.println(loader.getStructure());

	}

-- Output
 <at> relation wp_posts

 <at> attribute post_type string
 <at> attribute post_title string
 <at> attribute post_status string

 <at> data

Regards, 
-Road Tang.

 <at> relation wp_posts

 <at> attribute post_type string
 <at> attribute post_title string
 <at> attribute post_status string

 <at> data

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane