Suranga Kasthurirathne | 28 Jun 00:15 2016
Picon
Gravatar

"Problem evaluating classifier. train and test set are not compatible"


Hi all,

I have the following,

a) training dataset consisting of an id, 5 columns or numerics, and a results column consisting of N or P values.
b) test dataset consisting of an id, the same 5 columns listed before, and a dummy column of results consisting of N or P as results

I'm trying to train a random forrest model using dataset A, and use it to predict results for dataset B. However, I keep getting a very frustrating "Problem evaluating classifier. train and test set are not compatible" error message over and over again.

If I were to train and test models separately using holdout, each dataset works fine. But when I try to train and test using the two tests, it just collapses.
Can anyone suggest why I keep getting this message?



--
Best Regards,
Suranga
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 27 Jun 10:47 2016
Picon
Picon

Re: Problem with prediction appender to save final result in knowledge flow

Bummer. There is a concurrency issue with the InputMappedClassifier at the moment in that it's classifyInstance() and distributionForInstance() methods are not thread safe. Your flow will execute correctly if the either of the ClassifierPerformanceEvaluator or PredictionAppender steps are removed. I've just committed a fix to stable-3-8 and trunk that fixes the problem in InputMappedClassifier. You can get the fix in the next nightly snapshot.

 

Thanks for the bug report!

 

Cheers,

Mark.

 

On 27/06/16 2:23 pm, "Yan Luo" <yluo2k <at> yahoo.com> wrote:

 

Thanks much.

I have modified my training and testing sets and make NaiveBayes classifier working now. But issue remains when I change NaiveBayes to Inputmappedclassifier(NaiveBayes). It seems modelpath parameter of Inputmapped classifier is somehow related to this issue. The flow file and dataset have been attached.

 

 Yan

 

On Sunday, June 26, 2016 6:32 PM, Mark Hall <mhall <at> waikato.ac.nz> wrote:

 

I can't see a problem with PredictionAppender at the moment. Perhaps you can share your flow file and dataset for debugging purposes?

 

Cheers,

Mark.

 

On 27/06/16, 12:31 PM, "Yan Luo" <wekalist-bounces <at> list.waikato.ac.nz on behalf of yluo2k <at> yahoo.com> wrote:

 

I am testing knowledge flow for saving final prediction result. There is something hard to understand, I tried Naivebayes and MOA-Naivebayes classifiers.  For MOA-Naivebayes, prediction appender successfully saved final result to arff file, but when I change to Naivebayes, nothing was modified in knowledge flow except for classifier.  I received the following errors.  Hope there is a simple solution for fixing it.

 

Thanks,

Yan

 

17:08:45: [Basic] ClassifierPerformanceEvaluator$1834061622|Scheduling evaluation of fold/set 1 for execution

17:08:45: [ERROR] PredictionAppender$985544574|null

weka.core.WekaException

at weka.knowledgeflow.steps.PredictionAppender.processBatchClassifierCase(PredictionAppender.java:445)

at weka.knowledgeflow.steps.PredictionAppender.processIncoming(PredictionAppender.java:133)

at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1014)

at weka.knowledgeflow.BaseExecutionEnvironment$4.run(BaseExecutionEnvironment.java:382)

at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

at java.util.concurrent.FutureTask.run(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

Caused by: weka.core.WekaException

at weka.knowledgeflow.steps.PredictionAppender.predictProbabilitiesClassifier(PredictionAppender.java:519)

at weka.knowledgeflow.steps.PredictionAppender.processBatchClassifierCase(PredictionAppender.java:416)

... 8 more

Caused by: java.lang.NullPointerException

_______________________________________________ Wekalist mailing list Send posts to: Wekalist <at> list.waikato.ac.nz List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 

 

 

 

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Paul | 27 Jun 03:43 2016
Picon
Gravatar

Generate data set covering all permutations...

I can't seem to find a way in Weka to achieve this (perhaps for a good reason)...

I've got a classification data set of ~11 attributes. Each attribute is one of 3-5 nominal values. Currently the data set has been manually created and there are some instances not covered. Is there a way of generating, using Weka, a data set (providing the constraints per attribute) that covers the entire possible space?

Many thanks,

/paul


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Yan Luo | 27 Jun 02:31 2016
Picon

Problem with prediction appender to save final result in knowledge flow

I am testing knowledge flow for saving final prediction result. There is something hard to understand, I tried Naivebayes and MOA-Naivebayes classifiers.  For MOA-Naivebayes, prediction appender successfully saved final result to arff file, but when I change to Naivebayes, nothing was modified in knowledge flow except for classifier.  I received the following errors.  Hope there is a simple solution for fixing it.

Thanks ,
Yan

17:08:45: [Basic] ClassifierPerformanceEvaluator$1834061622|Scheduling evaluation of fold/set 1 for execution
17:08:45: [ERROR] PredictionAppender$985544574|null
weka.core.WekaException
at weka.knowledgeflow.steps.PredictionAppender.processBatchClassifierCase(PredictionAppender.java:445)
at weka.knowledgeflow.steps.PredictionAppender.processIncoming(PredictionAppender.java:133)
at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1014)
at weka.knowledgeflow.BaseExecutionEnvironment$4.run(BaseExecutionEnvironment.java:382)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: weka.core.WekaException
at weka.knowledgeflow.steps.PredictionAppender.predictProbabilitiesClassifier(PredictionAppender.java:519)
at weka.knowledgeflow.steps.PredictionAppender.processBatchClassifierCase(PredictionAppender.java:416)
... 8 more
Caused by: java.lang.NullPointerException
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Yan Luo | 26 Jun 20:22 2016
Picon

Huge difference of making prediction by using Weka explorer and knowledge flow

I have performed a simple comparison test of making prediction by using explorer and knowledge flow. 
Exact same training and test data, same classifier(NaiveBayes). Processes have been set very simply. 10 folds cross validation results are same, but prediction results on testing data(1000 labeled cases) are totally different, total accuracy decrease from 75%(explorer) to 50%. Parameters and random seeds are all same.  Has anyone get same issues? I am wondering if I am missing something when I set up the test.

Thanks,
Yan


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Mark Hall | 26 Jun 05:11 2016
Picon
Picon

Re: KnowledgeFlow overwrites DatabaseSaver table name with relation name

Thanks for the bug report. I’ve committed the fix to trunk and stable-3-8. It will be available in the next nightly snapshot build.

 

Cheers,

Mark.

 

On 17/06/16 7:04 am, "Wieslaw Pietruszkiewicz" <wekalist-bounces <at> list.waikato.ac.nz on behalf of wieslaw <at> pietruszkiewicz.com> wrote:

 

Hello everyone,

 

I’ve noticed there is a bug in KnowledgeFlow causing DatabaseSaver to overwrite the table name with the relation name, regardless the value of “Use relation name” switch is. This happens in 3.8 branch, as well as in 3.9 branch.

 

The error is caused by lines 193-209 in weka.knowledgeflow.steps.Saver. Currently, the table name is always set to fileName - including a scenario when m_wrappedAlgorithm is an instance of DatabaseConverter. IMO, fileName should only be used if m_isDBSaver is False, otherwise tableName should be set to relationName only if getRelationForTableName is True. Hope I understood the logic correctly.

 

Please find proposed patch attached.

 

Cheers,

 

Wieslaw

 

_______________________________________________ Wekalist mailing list Send posts to: Wekalist <at> list.waikato.ac.nz List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 

 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Andrew Olney | 26 Jun 03:28 2016
Gravatar

Using MetaCost under FilteredClassifier with Groovy

I'm doing custom cross validation on an attribute using 
FilteredClassifier with Groovy. That works fine.

When I add MetaCost, I end up with ZeroR type behavior. It is 
insensitive to changes in the cost matrix

The code is below. As you can see in the comments, if I set j48 as the 
classifier for FilteredClassifier, it works fine. If I set j48 as the 
classifier for MetaCost, and then set MetaCost as the classifier for 
FilteredClassifier, things go wrong (as above).

Any thoughts would be appreciated.

import weka.classifiers.Classifier
import weka.classifiers.Evaluation
import weka.core.converters.ConverterUtils.DataSource
import weka.core.Utils
import weka.filters.Filter
import weka.filters.unsupervised.instance.RemoveWithValues
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.MultiFilter
import weka.filters.unsupervised.attribute.Remove;
import weka.filters.supervised.instance.SpreadSubsample
import weka.classifiers.meta.MetaCost
import weka.classifiers.CostMatrix

//key parameters
arffPath = 
"/z/aolney/research_projects/class5/data/master-merged-data/nd-0316/discourse-features-062116.arff"
classIndex = 24 //zero index
removeIndices =
   //"1-24,26,27,28" //basic remove class id type info
   //remove class id type info and anything not selected by CFS above

"1-24,26,27,28,31-47,49-64,66-110,112-114,116-136,139,142-144,146-149,151-155,157-169,171-173,175,177-205,207-218,220-222,224-228,230-235"

cvIndex = 4 //defines folds for CV; see reasonable values below
//0 SessionID
//1 EPS
//2 Class
//3 School
//4 Teacher

//load data
data = DataSource.read(arffPath)
data.setClassIndex(classIndex) //auth is 24, uptake 23

//prepare to cv
eval = new Evaluation(data)
foldValues = Collections.list( data.attribute(cvIndex).enumerateValues())
foldCount = data.attribute(cvIndex).numValues()

println "Dataset: " + data.relationName()
println "Class: " + data.classAttribute().name() + "/" + (classIndex)
println "Fold attribute: " + data.attribute(cvIndex).name() + "/" + 
(cvIndex)
println "Folds: " + foldCount

//iterate over folds
println "Processing ..."
//for ( v in foldValues ) {
for( i = 0; i < foldCount; i++ ) {
   println "Fold: " + i + "/" + (foldValues[i])

   // setup filters
   filterTrain = new RemoveWithValues()
   filterTrain.setAttributeIndex("" + (cvIndex+1)) //the attribute on 
which we cv
   filterTrain.setNominalIndices("" + (i+1)) //the INDEX of attribute 
value on which we cv
   filterTrain.setInputFormat(data)
   filterTest = new RemoveWithValues()
   filterTest.setAttributeIndex("" + (cvIndex+1))
   filterTest.setNominalIndices("" + (i+1))
   filterTest.setInvertSelection(true) //note the inversion
   filterTest.setInputFormat(data)

   // generate data
   train = weka.filters.Filter.useFilter(data, filterTrain)
   test  = weka.filters.Filter.useFilter(data, filterTest)

   // train + evaluate classifier, removing attributes we don't want and 
subsampling as needed
   rm = new Remove()
   rm.setAttributeIndices(removeIndices)
   sub = new SpreadSubsample()
   sub.setDistributionSpread(1.0) //uniform

   def filterArray = new weka.filters.Filter[1]//[2]
   filterArray[0] = rm
   //filterArray[1] = sub
   multiFilter = new MultiFilter()
   multiFilter.setFilters(filterArray)

   // create the model
   j48 = new J48()
   metacost = new MetaCost()
   metacost.setCostMatrix( CostMatrix.parseMatlab("[0.0 10.0; 1.0 0.0]"))
   metacost.setClassifier(j48)

   fc = new FilteredClassifier();
   fc.setFilter(multiFilter);
   fc.setClassifier(metacost); //j48);

   fc.buildClassifier(train)
   eval.evaluateModel(fc, test)
}
println ""

///////////////////////
// output evaluation //
///////////////////////

print eval.toSummaryString()
print eval.toMatrixString()

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Sandro Puhmeister | 25 Jun 11:58 2016
Picon

MultiClassClassifier with the option SMO and calibrated with a MultilayerPerceptron on Android

Hello,

I'm trying to build a multi-class classifier for Android usage and I'm stuck with some errors 
from the MultilayerPerceptron part. I've read that it enables you to monitor and interact with 
the classification progress from the control panel in Weka GUI. Here is the problem (I think).

When I try to use this classifier on Android it throws a "class not found error" for the "control 
panel". If I calibrate the classifier with other methods (that work for binary problems) it works 
great on Android but a model calibrated with the MultilayerPerceptron doesn't load at 
all because it wants to interact with some methods that are used with the GUI.

All models Weka built so far work great on my PC (Java Eclipse), but calibration with the 
MultilayerPerceptron gives best results so far and I would like to use it in a MultiClassClassifier 
with the option SMO for Android.

Is there a way to build the classifier without the need for those methods or classes that 
are causing these problems? Is there another solution?

Thank you in advance.

Sincerely,

Sandro
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Edward Wiskers | 24 Jun 07:22 2016
Picon

Kappa statistic results

Hi all,

Is there any way to get the values that generate Kappa statistic (in the "Classify" panel) out of Weka?

Thanks.
Edward 

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jeff Pattillo | 23 Jun 18:33 2016
Picon

Reclaiming Memory When Calling WEKA from R

So I have an R file that calls out to WEKA.  I repeatedly build a classifier (using the same name each time), pruning attributes as I go.  Eventually I get the following error:

Error in .jarray(x) : java.lang.OutOfMemoryError: Java heap space

If I close down R and restart, I can run the same command and things work fine.  However, if I try to save the session and restore, I still cannot run the commands in WEKA.  This makes me believe that somehow when R calls WEKA its not reclaiming the memory it uses each time it builds a classifier.

Has anyone else had this problem?  Is there a way to force R to close down its connection to WEKA and reclaim the memory?

I have had this problem when I run WEKA from the explorer and try to build multiple classifiers, but I can just close down the explorer and the memory is freed up.  I don't know how to close down WEKA from R.  I have tried

1) detach("package:RWeka, unload = TRUE)
2) rm("Model Name"); gc()
3) Increasing the heap size with options( java.parameters = "-Xmx4g" ) before ever loading rJava and RWeka.

None of these seem to work.  So I am at a loss as to how to get rid of the error without closing down a session and starting over.

Thanks for the help!

Jeff
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Julien Cumin | 22 Jun 14:51 2016
Picon
Picon

Retrieving the current epoch number when building an MLP

Hello,
I am building an MLP in my Java prototype which takes quite some time to complete (a few hours).
I have not found a single way to retrieve the current epoch number during the classifier building process.
The only way I have identified to check the "progress" of the training is by setting the GUI to true, because we have that information in the GUI.
However I don't want that GUI to appear, since it requires user input and is generally unnecessary for me.

Is there really no way to retrieve the current epoch number from a training MLP?
(More generally, is there no support to check the progress of any classifier?)

Thank you for your help,

Julien Cumin
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gmane