yavinty | 1 Jan 07:14 2011
Picon

How to predict next elements in pattern sequence?


Hello,

I have simple sequences with nominal values analogical to this:

a b c a b d a b c a b d a b c a b

The above sequence alternate two patterns "a b c" and "a b d" and the
expected next element in the sequence is "d", as the current pattern is "a b
d".

I am looking for a method to do the similar sequence prediction using weka.
What should I be looking for in weka? GSP?

Thanks!
--

-- 
View this message in context: http://old.nabble.com/How-to-predict-next-elements-in-pattern-sequence--tp30566404p30566404.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Myse Elmadani | 1 Jan 07:42 2011
Picon

Re: Sparse ARFF files

On 31 December 2010 21:35, Mark Hall <mhall <at> pentaho.com> wrote:
This is nearly correct for using sparse format with Apriori. You just need to declare your attributes to have two values, e.g.

<at> ATTRIBUTE att1 {absent, present}

and then just specify the "present" value where appropriate in the sparse instance representation. The reason for this is that sparse instances don't store zeros (i.e. the first declared value for nominal attributes) explicitly.

You also need to make sure that you use the -Z flag with Apriori, which tells it to treat zeros (i.e. first values) as missing/absent from the item sets so that you don't get "absent" values appearing in the large item sets and rules.

Cheers,
Mark.

I have corrected the format for use with Apriori, however as I am using it from within RapidMiner I have yet to figure out how to set the -Z flag so I might use the RM FP-Growth algorithm instead. Thank you for your help anyway.

Myse

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
inception | 1 Jan 21:49 2011
Picon

Some Confusion with Evaluation framework in Weka API


Hi , 

I am having slight confusion using weka.Evaluation framework.I am using
contact-lens.arff.My  code is as follows 

  //clens = load arff(contact-lens.arff ....)

Evaluation evaluation = new Evaluation(cles);
		evaluation.useNoPriors();

               J48 j48 = new J48();
		evaluation.crossValidateModel(new J48(), clens, 3,
				new Random(7));

  //print stastiscs using evaluation.printStats(..)..

I am having trouble with the random seed value. If I change the value the
value 7,8,9 I get different value for ROC area , RMS error and few other. Is
it usual? In my opinion this value must be constant despite random seed
value. Can you please explain me if I am missing anything ? This case
applies to other algorithm too (I tested ID3 and NN).  

Thanks.

--

-- 
View this message in context: http://old.nabble.com/Some-Confusion-with-Evaluation-framework-in-Weka-API-tp30569132p30569132.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Bernhard Pfahringer | 1 Jan 22:47 2011
Picon

Re: Some Confusion with Evaluation framework in Weka API

>
> I am having trouble with the random seed value. If I change the value the
> value 7,8,9 I get different value for ROC area , RMS error and few other. Is
> it usual? In my opinion this value must be constant despite random seed
> value.

Why do you expect the values to be constant?
If they were constant, why would there be a random seed?

Bernhard

---------------------------------------------------------------------
Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

inception | 2 Jan 00:12 2011
Picon

Re: Some Confusion with Evaluation framework in Weka API


Thanks Bernhard  for quick reply. I think I am overlooking some basic idea.
Please explain what is wrong with  my assumption : Given same confusion
matrix M1 and M2 in both case with same data and algorithm but with
different random value(8, 9 as previous) why am I getting two different 
value for AUC and mean absolute error.

I was curious because with random seed 1 i get AUC of 0.937 and 3 I get
0.874. Confusion matrix is same in both case.Which value should I trust and
why?

Can you please clear my confusion?

Thanks in advance.

Bernhard Pfahringer-2 wrote:
> 
>>
>> I am having trouble with the random seed value. If I change the value the
>> value 7,8,9 I get different value for ROC area , RMS error and few other.
>> Is
>> it usual? In my opinion this value must be constant despite random seed
>> value.
> 
> Why do you expect the values to be constant?
> If they were constant, why would there be a random seed?
> 
> Bernhard
> 
> ---------------------------------------------------------------------
> Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
> http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 
> 

--

-- 
View this message in context: http://old.nabble.com/Some-Confusion-with-Evaluation-framework-in-Weka-API-tp30569132p30569584.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Bernhard Pfahringer | 2 Jan 00:45 2011
Picon

Re: Some Confusion with Evaluation framework in Weka API

> Thanks Bernhard  for quick reply. I think I am overlooking some basic idea.
> Please explain what is wrong with  my assumption : Given same confusion
> matrix M1 and M2 in both case with same data and algorithm but with
> different random value(8, 9 as previous) why am I getting two different
> value for AUC and mean absolute error.
>

AUC is about ranking, using the probs to sort your examples.
Accuracy (and the confusion matrix) depend on a specific threshold.
So if your probabilities "sort" the examples differently in different
runs on either side of the threshold, you can get the exact same
accuracy, but different AUC values.

> I was curious because with random seed 1 i get AUC of 0.937 and 3 I get
> 0.874. Confusion matrix is same in both case.Which value should I trust and
> why?

I suppose you are using a rather "unstable" algorithm, and/or a small number
of examples, and/or have a high number of class values. What you experience
is that cross-validation has some variance as well. If the variance is as high
as it seems in your case, I'd repeat at least ten times with a new seed each
time and take the average. BTW, this is the default for the Experimenter:
10x10fold cross-validation, to get more robust estimates.

hth, Bernhard

---------------------------------------------------------------------
Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Chris Spencer | 2 Jan 00:47 2011
Picon

Continuous Variable Prediction

Hi,


Sorry if this is a newbie question, but what algorithms in Weka support predicting a continuous variable? I'm new to data mining, so I'm not sure what the formal terminology is. Going through the docs, it looks like nearly all the algorithms are intended for classification or assigning a discrete label to an observation, not predicting a continuous variable.

Regards,
Chris
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Yavinty | 2 Jan 05:04 2011
Picon

Re: Continuous Variable Prediction


Hi Chris,

I am looking for an answer to a similar question about predicting sequences.
Is seems that current state of data mining is all about classification, and
concept of time and sequential relations is not addressed yet (besides
linear approximations such as regression or SVM).

Have you tried to apply numerical extrapolation methods to predict your
variables?

Cheers,
Yavinty

Cerin wrote:
> 
> Hi,
> 
> Sorry if this is a newbie question, but what algorithms in Weka support
> predicting a continuous variable? I'm new to data mining, so I'm not sure
> what the formal terminology is. Going through the docs, it looks like
> nearly
> all the algorithms are intended for classification or assigning a discrete
> label to an observation, not predicting a continuous variable.
> 
> Regards,
> Chris
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 
> 

--

-- 
View this message in context: http://old.nabble.com/Continuous-Variable-Prediction-tp30569827p30570357.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
inception | 2 Jan 05:20 2011
Picon

Re: Some Confusion with Evaluation framework in Weka API


 I got the  concepts.
Thanks.

Bernhard Pfahringer-2 wrote:
> 
>> Thanks Bernhard  for quick reply. I think I am overlooking some basic
>> idea.
>> Please explain what is wrong with  my assumption : Given same confusion
>> matrix M1 and M2 in both case with same data and algorithm but with
>> different random value(8, 9 as previous) why am I getting two different
>> value for AUC and mean absolute error.
>>
> 
> AUC is about ranking, using the probs to sort your examples.
> Accuracy (and the confusion matrix) depend on a specific threshold.
> So if your probabilities "sort" the examples differently in different
> runs on either side of the threshold, you can get the exact same
> accuracy, but different AUC values.
> 
>> I was curious because with random seed 1 i get AUC of 0.937 and 3 I get
>> 0.874. Confusion matrix is same in both case.Which value should I trust
>> and
>> why?
> 
> I suppose you are using a rather "unstable" algorithm, and/or a small
> number
> of examples, and/or have a high number of class values. What you
> experience
> is that cross-validation has some variance as well. If the variance is as
> high
> as it seems in your case, I'd repeat at least ten times with a new seed
> each
> time and take the average. BTW, this is the default for the Experimenter:
> 10x10fold cross-validation, to get more robust estimates.
> 
> hth, Bernhard
> 
> ---------------------------------------------------------------------
> Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
> http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041
> 
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist <at> list.scms.waikato.ac.nz
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> 
> 

--

-- 
View this message in context: http://old.nabble.com/Some-Confusion-with-Evaluation-framework-in-Weka-API-tp30569132p30570398.html
Sent from the WEKA mailing list archive at Nabble.com.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Harri Saarikoski | 2 Jan 07:44 2011
Picon

Re: Continuous Variable Prediction



2011/1/2 Chris Spencer <chrisspen <at> gmail.com>
Hi,

Sorry if this is a newbie question, but what algorithms in Weka support predicting a continuous variable? I'm new to data mining, so I'm not sure what the formal terminology is. Going through the docs, it looks like nearly all the algorithms are intended for classification or assigning a discrete label to an observation, not predicting a continuous variable.


yes, only roughly one third are regression capable, i.e. can be used to predict a continuous class variable but that's still a lot to choose from

to find capability of classifiers, load dataset in explorer, go to Classify tab -> Classifier 'choose' and presuming your class variable is last, capable classifiers are marked in bold (selectable), rest are greyed out (deselected). alternatively, after selecting a classifier, left-click the Classify field -> Capabilities.

best, Harri


Regards,
Chris

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
-----------------
Harri M.T. Saarikoski
CEO, IdealX Corporation
Espoo, Finland
www.idealpredictions.com (English)
www.idealpredictions.com/fi (Suomi)

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane