AJIT KUMAR | 18 Apr 13:56 2015
Picon

Exporting ROC curve from Knowledge Flow

Hi All,

I have build a RoC curve for  multi-classifier using Weka Knowledge flow. 
Now i want to export the graphical ROC curve as any image format to document.
But i can't able to do so.

I try to search over internet and all given solution suggests.

"Shift + Alt + left click gives a save dialog" but it's not working for me.

I am using Weka 3.6.6 version on Ubuntu 12.4 .

Please guide to get through this problem.

thanking You.

With Regards.
Ajit
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Martin | 17 Apr 10:49 2015
Picon

Re: Gaussian multi value modeling in weka

Could you please be more precise about what you want exactly? IOW, what kind of output you're looking for?

Regards,
Martin

On Apr 17, 2015 12:35 PM, "hamedmirashk [via WEKA]" <[hidden email]> wrote:

Is there any function in weka for multi-dimensional Gaussian modeling. Indeed, I want to show Gaussian Graph for my n-dimensional arff file. I used multivariateGaussainEstimator, but I hav no idea whether it is work or not?

If you reply to this email, your message will be added to the discussion below:
http://weka.8497.n7.nabble.com/Gaussian-multi-value-modeling-in-weka-tp34304.html
To unsubscribe from WEKA, click here.
NAML

View this message in context: Re: Gaussian multi value modeling in weka
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
arvega | 17 Apr 04:26 2015
Picon

attribute selection for data of the string type

I have an .*arff* file with data in the *STRING* format which I want
classifying with the NaiveBayesMultinomialText classifier. But I need also
to reduce the number of features from hundreds to a dozen without converting
them into the *NUMERIC* type. How to do this?
I'd much appreciate your help.

--
View this message in context: http://weka.8497.n7.nabble.com/attribute-selection-for-data-of-the-string-type-tp34303.html
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Jeffrey Denison | 16 Apr 02:55 2015
Picon

Re: Wekalist Digest, Vol 146, Issue 38

Is this list archived anywhere that I can just go to a website & read
by thread or subject?

On 4/15/15, wekalist-request <at> list.waikato.ac.nz
<wekalist-request <at> list.waikato.ac.nz> wrote:
> Send Wekalist mailing list submissions to
> 	wekalist <at> list.waikato.ac.nz
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://list.waikato.ac.nz/mailman/listinfo/wekalist
> or, via email, send a message with subject or body 'help' to
> 	wekalist-request <at> list.waikato.ac.nz
>
> You can reach the person managing the list at
> 	wekalist-owner <at> list.waikato.ac.nz
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Re: InputMappedClassifier classifies only 2 out of 3 classes
>       (Mark Hall)
>    2. Re: smo building takes ages (Eibe Frank)
>    3. Getting error in MATLAB script to run WEKA's	classification.
>       (Weka List)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 16 Apr 2015 09:46:37 +1200
> From: Mark Hall <mhall <at> waikato.ac.nz>
> To: "Weka machine learning workbench list."
> 	<wekalist <at> list.waikato.ac.nz>
> Subject: Re: [Wekalist] InputMappedClassifier classifies only 2 out of
> 	3 classes
> Message-ID: <D15533E1.2B93C%mhall <at> waikato.ac.nz>
> Content-Type: text/plain; charset="utf-8"
>
> From:  Nikola Milosevic <nikola.milosevic86 <at> gmail.com>
> Reply-To:  "Weka machine learning workbench list."
> <wekalist <at> list.waikato.ac.nz>
> Date:  Thursday, 16 April 2015 3:37 am
> To:  "Weka machine learning workbench list." <wekalist <at> list.waikato.ac.nz>
> Subject:  [Wekalist] InputMappedClassifier classifies only 2 out of 3
> classes
>
>> Hello,
>>
>> I have just joined this mailing list and I need a little help. I have
>> build a
>> model for classifier using Weka GUI. It is classification of some XML
>> tables,
>> so original features included number of rows, columns, header rows, as
>> well as
>> text in captions, footers, headers and stubs. I used StringToWordVector
>> filter
>> and classifier is SMO. I had 3 classes and dataset had 50 samples of the
>> first
>> and second class (settings and finding tables) and 25 samples of third
>> class
>> (support-knowledge). The classifier showed around 86% F-measure for
>> classification using 10-fold cross-validation on the mentioned data.
>>
>> Now, I am trying to create java classifier that will classify unseen
>> unlabeled
>> data using this model. I used InputMappedClassifier, loaded model. I
>> loaded
>> new table as single instance with original features and used
>> StringToWordVector filter and then apply classifier. Code of my classifier
>> is
>> here:
>> https://github.com/nikolamilosevic86/TabInExj/blob/master/src/classifiers/Prag
>> maticClassifier.java and my model is
>> https://github.com/nikolamilosevic86/TabInExj/blob/master/Models/SMOPragmaticM
>> odel3.model
>>
>> My problem is that when I run this on my data which have around 4100
>> tables,
>> from which 125 were tables used for training (I would expect at least
>> these 25
>> to be classified as support-knowledge), all data is classified either as
>> settings or findings and there is none support-knowledge table. This
>> should
>> not obviously be the case. Do you know what can be wrong?
>
> The best approach is to use the FilteredClassifier + StringToWordVector and
> SMO. This ensures that the training dictionary is used to vectorise the
> test
> documents. When you apply STWV separately to the training and test sets
> they
> will be vectorised according to separate dictionaries. The
> InputMappedClassifier will use missing values for terms that appear in the
> training set but not in the test set, whereas vectorising test data using
> the training dictionary will never set any values to missing.
>
> Cheers,
> Mark.
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20150416/5343d023/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 16 Apr 2015 09:59:04 +1200
> From: Eibe Frank <eibe <at> waikato.ac.nz>
> To: "Weka machine learning workbench list."
> 	<wekalist <at> list.waikato.ac.nz>
> Subject: Re: [Wekalist] smo building takes ages
> Message-ID: <A6433461-B010-4EA6-B542-3862A3EE190A <at> waikato.ac.nz>
> Content-Type: text/plain; charset=utf-8
>
> This behaviour seems quite odd. I just tried generating some data with
> different numbers of attributes using the RandomRBF generator and did not
> observe anything like this (WEKA 3.7.12).
>
> Maybe try LibSVM to check whether it behaves the same way?
>
> A very fast implementation for linear kernels is in LibLINEAR.
>
> You could also force SMO to use the kernel matrix throughout by making the
> exponent for the polynomial kernel slightly larger than 1 (e.g., 1.00001).
> Then, if you set the size of the kernel cache to a negative number, it will
> simply cache the full kernel matrix. This would help in case there is a
> problem with the special-case implementation for the linear kernel in SMO.
>
> Cheers,
> Eibe
>
>> On 15 Apr 2015, at 20:38, Martin G?tlein <guetlein <at> posteo.de> wrote:
>>
>> Hi,
>>
>> I have a binary classification dataset (4000 instances, balanced class
>> distribution) and want to build a SMO model on it.
>>
>> However, training the smo classifier (default settings) takes ages,
>> depending on the number of features (see table below, I would like to use
>> some 8000 features).
>>
>> Is this normal? Can I speed that up?
>>
>> Cheers,
>> Martin
>>
>> num-features -> seconds-build-time
>> 8 -> 2.7
>> 16 -> 10
>> 32 -> 51
>> 64 -> 220
>>
>> P.S. Hoping everythings fine in Hamilton
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: Wekalist <at> list.waikato.ac.nz
>> List info and subscription status:
>> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 15 Apr 2015 19:31:53 -0400
> From: Weka List <wekalist2015 <at> gmail.com>
> To: wekalist <at> list.waikato.ac.nz
> Subject: [Wekalist] Getting error in MATLAB script to run WEKA's
> 	classification.
> Message-ID:
> 	<CAL5Ww5du2bib6rOfRN33jNnJLdVRhgVMgQObMuXuqZzMznF8GQ <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello Friends,
>
> Here is the script in MATLAB to access WEKA's classification function:
>
> %% Example file of how to use Weka's classification functions in Matlab
> %% Loads Weka, selects a dataset, uses SMO classification, and
> evaluates the results
>
> %% FIRST LOAD WEKA
> wekaHome = getenv('WEKA_HOME');
> wekaJar = sprintf('%s/weka.jar',wekaHome);
> if ~exist(wekaJar,'file')
>   error(sprintf('File %s not found',wekaJar));
> end
> javaaddpath(sprintf('%s/weka.jar',wekaHome));
>
> import weka.core.Instances.*
> import weka.classifiers.functions.supportVector.*
> import weka.core.converters.ConverterUtils$DataSource.*
>
> %% LOAD A DATA FILE.
> %% ionosphere.arff is a binary classification task; all predictors
> %% are continuous values
> filename = sprintf('%s/data/ionosphere.arff',wekaHome);
> if ~exist(filename,'file')
>   error(sprintf('File %s not found',filename));
> end
> source =
> javaObject('weka.core.converters.ConverterUtils$DataSource',filename);
> data = source.getDataSet();
> if (data.classIndex() == -1) % -1 means that it is undefined
>     data.setClassIndex(data.numAttributes() - 1);
> end
>
> %% SELECT THE CLASSIFIER SMO, USING THE PUK KERNEL
> %% There are two ways to set parameter values for each Weka function
> %% 1) call methods( weka.classifiers.functions.SMO ) to list the
> %%    specific functions that work, like setC() and setOmega()
> %% 2) construct an array of java strings, like '-t infile -T outfile -i'
> %%    that correspond to what you'd type as a command line argument
> c = weka.classifiers.functions.SMO();
>   c.setC(100);
>   k = weka.classifiers.functions.supportVector.Puk();
>   k.setOmega(1.0);
>   k.setSigma(1.0);
>   c.setKernel(k);
> c.buildClassifier(data); %% "data" here is the training data
>
> % evaluate model (simple evaluation over training set)
> ev = weka.classifiers.Evaluation(data); %% "data" here is the test data
>   v(1) = java.lang.String('-t');
>   v(2) = java.lang.String(filename);
>   v(3) = java.lang.String('-T');
>   v(4) = java.lang.String(filename);
>   v(5) = java.lang.String('-i');
>   params = cat(1,v(1:end));
> ev.evaluateModel(c, params)
>
> I got this script form the following link:
>
> http://www.mathworks.com/matlabcentral/fileexchange/36413-using-wekas-svm-classification-functions-in-matlab/content/svmClassificationInMatlab
>
> When I run it, I get the following error:
>
>>> svmClassificationInMatlab
> Unable to create packages directory (C:\Program Files\Weka-3-7\packages)
> Unable to create repository cache directory (C:\Program
> Files\Weka-3-7\repCache)
> Error using svmClassificationInMatlab (line 51) %LINE 51 IS THE LAST
> LINE OF CODE IN SCRIPT FILE.
>
> Java exception occurred:
> java.lang.Exception:
> Weka exception: Illegal options: -i
>
> General options:
>
> -h or -help
> 	Output help information.
> -synopsis or -info
> 	Output synopsis for classifier (use in conjunction  with -h)
> -t <name of training file>
> 	Sets training file.
> -T <name of test file>
> 	Sets test file. If missing, a cross-validation will be performed
> 	on the training data.
> -c <class index>
> 	Sets index of class attribute (default: last).
> -x <number of folds>
> 	Sets number of folds for cross-validation (default: 10).
> -no-cv
> 	Do not perform any cross validation.
> -force-batch-training
> 	Always train classifier in batch mode, never incrementally.
> -split-percentage <percentage>
> 	Sets the percentage for the train/test set split, e.g., 66.
> -preserve-order
> 	Preserves the order in the percentage split.
> -s <random number seed>
> 	Sets random number seed for cross-validation or percentage split
> 	(default: 1).
> -m <name of file with cost matrix>
> 	Sets file with cost matrix.
> -disable <comma-separated list of evaluation metric names>
> 	Comma separated list of metric names not to print to the output.
> 	Available metrics:
> 	Correct,Incorrect,Kappa,Total cost,Average cost,KB relative,KB
> information,
> 	Correlation,Complexity 0,Complexity scheme,Complexity improvement,
> 	MAE,RMSE,RAE,RRSE,Coverage,Region size,TP rate,FP rate,Precision,Recall,
> 	F-measure,MCC,ROC area,PRC area
> -l <name of input file>
> 	Sets model input file. In case the filename ends with '.xml',
> 	a PMML file is loaded or, if that fails, options are loaded
> 	from the XML file.
> -d <name of output file>
> 	Sets model output file. In case the filename ends with '.xml',
> 	only the options are saved to the XML file, not the model.
> -v
> 	Outputs no statistics for training data.
> -o
> 	Outputs statistics only, not the classifier.
> -do-not-output-per-class-statistics
> 	Do not output statistics for each class.
> -k
> 	Outputs information-theoretic statistics.
> -classifications
> "weka.classifiers.evaluation.output.prediction.AbstractOutput +
> options"
> 	Uses the specified class for generating the classification output.
> 	E.g.: weka.classifiers.evaluation.output.prediction.PlainText
> -p range
> 	Outputs predictions for test instances (or the train instances if
> 	no test instances provided and -no-cv is used), along with the
> 	attributes in the specified range (and nothing else).
> 	Use '-p 0' if no attributes are desired.
> 	Deprecated: use "-classifications ..." instead.
> -distribution
> 	Outputs the distribution instead of only the prediction
> 	in conjunction with the '-p' option (only nominal classes).
> 	Deprecated: use "-classifications ..." instead.
> -r
> 	Only outputs cumulative margin distribution.
> -xml filename | xml-string
> 	Retrieves the options from the XML-data instead of the command line.
> -threshold-file <file>
> 	The file to save the threshold data to.
> 	The format is determined by the extensions, e.g., '.arff' for ARFF
> 	format or '.csv' for CSV.
> -threshold-label <label>
> 	The class label to determine the threshold data for
> 	(default is the first label)
> -no-predictions
> 	Turns off the collection of predictions in order to conserve memory.
>
> Options specific to weka.classifiers.functions.SMO:
>
> -no-checks
> 	Turns off all checks - use with caution!
> 	Turning them off assumes that data is purely numeric, doesn't
> 	contain any missing values, and has a nominal class. Turning them
> 	off also means that no header information will be stored if the
> 	machine is linear. Finally, it also assumes that no instance has
> 	a weight equal to 0.
> 	(default: checks on)
> -C <double>
> 	The complexity constant C. (default 1)
> -N
> 	Whether to 0=normalize/1=standardize/2=neither. (default 0=normalize)
> -L <double>
> 	The tolerance parameter. (default 1.0e-3)
> -P <double>
> 	The epsilon for round-off error. (default 1.0e-12)
> -M
> 	Fit logistic models to SVM outputs.
> -V <double>
> 	The number of folds for the internal
> 	cross-validation. (default -1, use training data)
> -W <double>
> 	The random number seed. (default 1)
> -K <classname and parameters>
> 	The Kernel to use.
> 	(default: weka.classifiers.functions.supportVector.PolyKernel)
> -output-debug-info
> 	If set, classifier is run in debug mode and
> 	may output additional info to the console
> -do-not-check-capabilities
> 	If set, classifier capabilities are not checked before classifier is built
> 	(use with caution).
>
> Options specific to kernel weka.classifiers.functions.supportVector.Puk:
>
> -O <num>
> 	The Omega parameter.
> 	(default: 1.0)
> -S <num>
> 	The Sigma parameter.
> 	(default: 1.0)
> -C <num>
> 	The size of the cache (a prime number), 0 for full cache and
> 	-1 to turn it off.
> 	(default: 250007)
> -output-debug-info
> 	Enables debugging output (if available) to be printed.
> 	(default: off)
> -no-checks
> 	Turns off all checks - use with caution!
> 	(default: checks on)
>
> 	at
> weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1406)
>
> 	at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
>
> Where is the problem?
>
> Secondly, I have 100s of csv files, without any header, all real entries,
> last column labeled with string names. So whole data is a matrix of real
> entries. No, missing or NaN value.
>
> I want to use this function as given in the link, load my 100s of csv files
> automatically to run WEKA's SVM classifier, using SMO on 10 fold
> Cross-Validation, for kernel size from 1 to 5, and print classification
> accuracy for each csv file. I do not want to load each csv file one by one
> as it takes a lot of time.
>
> I will appreciate any advise!
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20150415/67f1a070/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> Wekalist <at> list.waikato.ac.nz
> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>
>
> End of Wekalist Digest, Vol 146, Issue 38
> *****************************************
>
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Weka List | 16 Apr 01:31 2015
Picon

Getting error in MATLAB script to run WEKA's classification.

Hello Friends,

Here is the script in MATLAB to access WEKA's classification function:

%% Example file of how to use Weka's classification functions in Matlab %% Loads Weka, selects a dataset, uses SMO classification, and evaluates the results %% FIRST LOAD WEKA wekaHome = getenv('WEKA_HOME'); wekaJar = sprintf('%s/weka.jar',wekaHome); if ~exist(wekaJar,'file') error(sprintf('File %s not found',wekaJar)); end javaaddpath(sprintf('%s/weka.jar',wekaHome)); import weka.core.Instances.* import weka.classifiers.functions.supportVector.* import weka.core.converters.ConverterUtils$DataSource.* %% LOAD A DATA FILE. %% ionosphere.arff is a binary classification task; all predictors %% are continuous values filename = sprintf('%s/data/ionosphere.arff',wekaHome); if ~exist(filename,'file') error(sprintf('File %s not found',filename)); end source = javaObject('weka.core.converters.ConverterUtils$DataSource',filename); data = source.getDataSet(); if (data.classIndex() == -1) % -1 means that it is undefined data.setClassIndex(data.numAttributes() - 1); end %% SELECT THE CLASSIFIER SMO, USING THE PUK KERNEL %% There are two ways to set parameter values for each Weka function %% 1) call methods( weka.classifiers.functions.SMO ) to list the %% specific functions that work, like setC() and setOmega() %% 2) construct an array of java strings, like '-t infile -T outfile -i' %% that correspond to what you'd type as a command line argument c = weka.classifiers.functions.SMO(); c.setC(100); k = weka.classifiers.functions.supportVector.Puk(); k.setOmega(1.0); k.setSigma(1.0); c.setKernel(k); c.buildClassifier(data); %% "data" here is the training data % evaluate model (simple evaluation over training set) ev = weka.classifiers.Evaluation(data); %% "data" here is the test data v(1) = java.lang.String('-t'); v(2) = java.lang.String(filename); v(3) = java.lang.String('-T'); v(4) = java.lang.String(filename); v(5) = java.lang.String('-i'); params = cat(1,v(1:end)); ev.evaluateModel(c, params)

I got this script form the following link:

http://www.mathworks.com/matlabcentral/fileexchange/36413-using-wekas-svm-classification-functions-in-matlab/content/svmClassificationInMatlab

When I run it, I get the following error:

>> svmClassificationInMatlab Unable to create packages directory (C:\Program Files\Weka-3-7\packages) Unable to create repository cache directory (C:\Program Files\Weka-3-7\repCache) Error using svmClassificationInMatlab (line 51) %LINE 51 IS THE LAST LINE OF CODE IN SCRIPT FILE. Java exception occurred: java.lang.Exception: Weka exception: Illegal options: -i General options: -h or -help Output help information. -synopsis or -info Output synopsis for classifier (use in conjunction with -h) -t <name of training file> Sets training file. -T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data. -c <class index> Sets index of class attribute (default: last). -x <number of folds> Sets number of folds for cross-validation (default: 10). -no-cv Do not perform any cross validation. -force-batch-training Always train classifier in batch mode, never incrementally. -split-percentage <percentage> Sets the percentage for the train/test set split, e.g., 66. -preserve-order Preserves the order in the percentage split. -s <random number seed> Sets random number seed for cross-validation or percentage split (default: 1). -m <name of file with cost matrix> Sets file with cost matrix. -disable <comma-separated list of evaluation metric names> Comma separated list of metric names not to print to the output. Available metrics: Correct,Incorrect,Kappa,Total cost,Average cost,KB relative,KB information, Correlation,Complexity 0,Complexity scheme,Complexity improvement, MAE,RMSE,RAE,RRSE,Coverage,Region size,TP rate,FP rate,Precision,Recall, F-measure,MCC,ROC area,PRC area -l <name of input file> Sets model input file. In case the filename ends with '.xml', a PMML file is loaded or, if that fails, options are loaded from the XML file. -d <name of output file> Sets model output file. In case the filename ends with '.xml', only the options are saved to the XML file, not the model. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -do-not-output-per-class-statistics Do not output statistics for each class. -k Outputs information-theoretic statistics. -classifications "weka.classifiers.evaluation.output.prediction.AbstractOutput + options" Uses the specified class for generating the classification output. E.g.: weka.classifiers.evaluation.output.prediction.PlainText -p range Outputs predictions for test instances (or the train instances if no test instances provided and -no-cv is used), along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired. Deprecated: use "-classifications ..." instead. -distribution Outputs the distribution instead of only the prediction in conjunction with the '-p' option (only nominal classes). Deprecated: use "-classifications ..." instead. -r Only outputs cumulative margin distribution. -xml filename | xml-string Retrieves the options from the XML-data instead of the command line. -threshold-file <file> The file to save the threshold data to. The format is determined by the extensions, e.g., '.arff' for ARFF format or '.csv' for CSV. -threshold-label <label> The class label to determine the threshold data for (default is the first label) -no-predictions Turns off the collection of predictions in order to conserve memory. Options specific to weka.classifiers.functions.SMO: -no-checks Turns off all checks - use with caution! Turning them off assumes that data is purely numeric, doesn't contain any missing values, and has a nominal class. Turning them off also means that no header information will be stored if the machine is linear. Finally, it also assumes that no instance has a weight equal to 0. (default: checks on) -C <double> The complexity constant C. (default 1) -N Whether to 0=normalize/1=standardize/2=neither. (default 0=normalize) -L <double> The tolerance parameter. (default 1.0e-3) -P <double> The epsilon for round-off error. (default 1.0e-12) -M Fit logistic models to SVM outputs. -V <double> The number of folds for the internal cross-validation. (default -1, use training data) -W <double> The random number seed. (default 1) -K <classname and parameters> The Kernel to use. (default: weka.classifiers.functions.supportVector.PolyKernel) -output-debug-info If set, classifier is run in debug mode and may output additional info to the console -do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution). Options specific to kernel weka.classifiers.functions.supportVector.Puk: -O <num> The Omega parameter. (default: 1.0) -S <num> The Sigma parameter. (default: 1.0) -C <num> The size of the cache (a prime number), 0 for full cache and -1 to turn it off. (default: 250007) -output-debug-info Enables debugging output (if available) to be printed. (default: off) -no-checks Turns off all checks - use with caution! (default: checks on) at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1406) at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)

Where is the problem?

Secondly, I have 100s of csv files, without any header, all real entries, last column labeled with string names. So whole data is a matrix of real entries. No, missing or NaN value.

I want to use this function as given in the link, load my 100s of csv files automatically to run WEKA's SVM classifier, using SMO on 10 fold Cross-Validation, for kernel size from 1 to 5, and print classification accuracy for each csv file. I do not want to load each csv file one by one as it takes a lot of time.

I will appreciate any advise!

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Nikola Milosevic | 15 Apr 17:37 2015
Picon

InputMappedClassifier classifies only 2 out of 3 classes

Hello,

I have just joined this mailing list and I need a little help. I have build a model for classifier using Weka GUI. It is classification of some XML tables, so original features included number of rows, columns, header rows, as well as text in captions, footers, headers and stubs. I used StringToWordVector filter and classifier is SMO. I had 3 classes and dataset had 50 samples of the first and second class (settings and finding tables) and 25 samples of third class (support-knowledge). The classifier showed around 86% F-measure for classification using 10-fold cross-validation on the mentioned data.

Now, I am trying to create java classifier that will classify unseen unlabeled data using this model. I used InputMappedClassifier, loaded model. I loaded new table as single instance with original features and used StringToWordVector filter and then apply classifier. Code of my classifier is here: https://github.com/nikolamilosevic86/TabInExj/blob/master/src/classifiers/PragmaticClassifier.java and my model is https://github.com/nikolamilosevic86/TabInExj/blob/master/Models/SMOPragmaticModel3.model

My problem is that when I run this on my data which have around 4100 tables, from which 125 were tables used for training (I would expect at least these 25 to be classified as support-knowledge), all data is classified either as settings or findings and there is none support-knowledge table. This should not obviously be the case. Do you know what can be wrong?

Best regards,
Nikola Milošević
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Martin Gütlein | 15 Apr 10:38 2015
Picon

smo building takes ages

Hi,

I have a binary classification dataset (4000 instances, balanced class 
distribution) and want to build a SMO model on it.

However, training the smo classifier (default settings) takes ages, 
depending on the number of features (see table below, I would like to 
use some 8000 features).

Is this normal? Can I speed that up?

Cheers,
Martin

num-features -> seconds-build-time
8 -> 2.7
16 -> 10
32 -> 51
64 -> 220

P.S. Hoping everythings fine in Hamilton

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Sebastián Vanrell | 14 Apr 20:42 2015
Picon

unable to install gridSearch on weka 3.7.12

Hi folks,

I tried and couldn't install gridSearch package (1.0.9) on weka 3.7.12 (using the package manager) and don't know why. I'm under Linux.
The error message says that weka version must be 3.7.13 or superior. Was released that version? Maybe under other name?

I just want to do a parameter selection over C and gamma from an SVM classifier. I was tempted by gridSearch because it allow me to search by magnitude order. Is there another way to do a "grid search" like that?

Thanks in advance.

Sebastián
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Martin | 13 Apr 13:03 2015
Picon

Re: Confidence Factor WEKA

Please take a look at attached file

On 7 April 2015 at 16:33, chm [via WEKA] <[hidden email]> wrote:
hello all

how get confidence factor in java code ? and where is the difference between confidence factor and the result of distributionForInstance() method ?

If you reply to this email, your message will be added to the discussion below:
http://weka.8497.n7.nabble.com/Confidence-Factor-WEKA-tp34210.html
To unsubscribe from WEKA, click here.
NAML


Using WEKA in Java code.pdf (322K) Download Attachment

View this message in context: Re: Confidence Factor WEKA
Sent from the WEKA mailing list archive at Nabble.com.
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
salim.diwani@yahoo.com | 13 Apr 09:25 2015
Picon

Min max

I need help on how to calculate minmax in weka and also how to do data sets reduction and normalization.
Sent from Yahoo Mail on Android

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jeffrey Denison | 13 Apr 01:29 2015
Picon

Re: Wekalist Digest, Vol 146, Issue 31

Any word on the advanced data mining mooc?

On 4/12/15, wekalist-request <at> list.waikato.ac.nz
<wekalist-request <at> list.waikato.ac.nz> wrote:
> Send Wekalist mailing list submissions to
> 	wekalist <at> list.waikato.ac.nz
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://list.waikato.ac.nz/mailman/listinfo/wekalist
> or, via email, send a message with subject or body 'help' to
> 	wekalist-request <at> list.waikato.ac.nz
>
> You can reach the person managing the list at
> 	wekalist-owner <at> list.waikato.ac.nz
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Re: inclusion of "the" in the stop word list (jason roger)
>    2. link based classification (Nariman Ammar)
>    3. SMOTE example programmatically (Mohamed Tleis)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 13 Apr 2015 02:25:26 +0800
> From: jason roger <jasonroger8 <at> gmail.com>
> To: "Weka machine learning workbench list."
> 	<wekalist <at> list.waikato.ac.nz>
> Subject: Re: [Wekalist] inclusion of "the" in the stop word list
> Message-ID:
> 	<CABWXmRHr42Q9FBUtp25HMLtDby8aTjTGE98unLAi5oNHu5kY4Q <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> It's quite reasonable having 0s in your results and confusion matrix. This
> due to the *unknown* classes of the test set which is represented by "?",
> so WEKA can't test whether the model built on the training set predicts the
> correct value.
>
> On the other hand, if you want to know the results of the test set class
> (that is unknown), you can do that from the classify panel--> "More
> options..." button--> output predictions.
>
> ReutersCorn-test does not have unknown class (the class of the test set
> should have "?" as a value of each instance/class you want to predict).
>
> Regarding your problem, Apriori basically able to handle nominal
> attributes, if you feed it with string data, it cannot deal with such data
> from one hand. On the other hand, by applying "StringToWordVector" filter,
> this filter will convert all the attributes to the numeric type. In
> addition, StringToWordVector *by default* will move the class form the end
> of the data, and make it at the very begging of the data (first attribute)
> leaving the last attriubte to be form the numeric type (because of using
> StringToWordVector) which is also can't be handeled by Apriori.
>
> I think you can do the following steps:
> >From the Preprocess panel apply StringToWordVector filter, then change the
> order of the from the option "Class" (just above the histogram). After
> doing so, invoke "weka.filters.unsupervised.attribute.NumericToNominal"
> filter. Then use "Apriori" from the "Associate" panel.
>
> NB: You can do the previous steps with "FilteredAssociator" using
> "MultiFilter"
>
> Cheers,
> Jason
>
>
>
> On Mon, Apr 13, 2015 at 1:08 AM, M Peter Jurkat <pjurkat <at> unm.edu> wrote:
>
>>  ??yes, using the stopwords.txt file you sent does work
>>
>>
>>  I started using the filteredClassifier with J48 and StringToWordVector
>> dilter on the two small corpora suggested in Tables 17.4 and 17.5 of the
>> Witten, Frank, and Hall text 3rd Ed. - unfortunately J48 predicts all
>> documents in the test set to classified into the same category so all the
>> measures are zero and the counts are not entered into the confusion
>> matrix
>>
>>
>>  however, when I tried it on the ReutersCorn-train and -test sets the
>> results were reasonable and illustrated what the text wanted it to
>>
>>
>>  then I tried to use filteredAssociator with the StringToWordVector
>> filter and the Apriori algorithm with Confidence and Lift as the weights
>> and got a -2 error both times - don't yet know what that means but I'll
>> try
>> find out - association with target words, which I can get from the item
>> sets and the rules output from Apriori, is one the analysis I use with
>> text/string variables - any help resolving this error would be
>> appreciated
>> - peter j
>>  ------------------------------
>> *From:* wekalist-bounces <at> list.waikato.ac.nz <
>> wekalist-bounces <at> list.waikato.ac.nz> on behalf of jason roger <
>> jasonroger8 <at> gmail.com>
>> *Sent:* Saturday, April 11, 2015 12:32 PM
>>
>> *To:* Weka machine learning workbench list.
>> *Subject:* Re: [Wekalist] inclusion of "the" in the stop word list
>>
>>
>>
>>   ?Jason - sorry to have you do this - I worked it out from your previous
>>> email - turns out editing the common-english-words.txt file I got from
>>> Wikipedia by hand to take out the commas and entering <CR><LF> was
>>> easier
>>> than writing a script to do this - also had to remember to set
>>> useStoplist
>>> to True (duh!) - peter j
>>>
>> I was asking about this information which is not clear.
>>
>>  What version of WEKA you are using?
>>
>>  I attached one file now which has extra stopwords, this file works fine
>> for me. My question: if you load this file (that I attached now) into
>> your
>> WEKA does it work?
>>
>>  Thanks.
>>  Jason
>>
>>
>>
>>
>>>   ------------------------------
>>> *From:* wekalist-bounces <at> list.waikato.ac.nz <
>>> wekalist-bounces <at> list.waikato.ac.nz> on behalf of jason roger <
>>> jasonroger8 <at> gmail.com>
>>> *Sent:* Friday, April 10, 2015 10:50 AM
>>>
>>> *To:* Weka machine learning workbench list.
>>> *Subject:* Re: [Wekalist] inclusion of "the" in the stop word list
>>>
>>>     I attached a file to only stop the word "the". But any other words
>>> you need to stop, just add them one-by-one in each line of the file.
>>> After
>>> determining the words that you want to stop, just simply load this file
>>> into WEKA using the "WordsFormFile" (in stopwordsHandler- a parameter of
>>> StringToWordVector" filter).
>>>
>>>  I havn't tried SAS to deal with such cases. Doing txt files it's quite
>>> easy and does not require any additional stuff.
>>>
>>>  Cheers,
>>>  Jason
>>>
>>> On Fri, Apr 10, 2015 at 11:52 PM, M Peter Jurkat <pjurkat <at> unm.edu>
>>> wrote:
>>>
>>>>  ?that sounds eminently reasonable - just looked on the web for such a
>>>> file but the only one I could find has the words separated by commas -
>>>> guess I got to do some programming or make such a file - having used SAS
>>>> it
>>>> looks like the number of such words is about 100+/-, if that much - do
>>>> you
>>>> have any leads to such files? - thanks - p
>>>>  ------------------------------
>>>> *From:* wekalist-bounces <at> list.waikato.ac.nz <
>>>> wekalist-bounces <at> list.waikato.ac.nz> on behalf of jason roger <
>>>> jasonroger8 <at> gmail.com>
>>>> *Sent:* Thursday, April 9, 2015 8:00 PM
>>>> *To:* Weka machine learning workbench list.
>>>> *Subject:* Re: [Wekalist] inclusion of "the" in the stop word list
>>>>
>>>>     Hi,
>>>>
>>>>  You have to create txt file first, where it has each word you want to
>>>> stop e.g., the. Each word should be in one line. Then from
>>>> "StringToWordVector" filter, choose the parameter "stopwordsHandler".
>>>> After
>>>> that select "WordsFormFile" option.  Use this option to insert the file
>>>> that you created (has the word "the")
>>>>
>>>>  Cheers,
>>>>  Jason
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 6:29 AM, M Peter Jurkat <pjurkat <at> unm.edu>
>>>> wrote:
>>>>
>>>>>  ?using Filtered Classifier with J48 and StringToVector filter but I'm
>>>>> getting the word "the" as the most frequent word and dominating the
>>>>> tree -
>>>>> I see the word "system32" in the stopwords attribute of the filter but
>>>>> when
>>>>> I click on it I get the Windows directory and I don't see an obvious
>>>>> stop
>>>>> word file?
>>>>>
>>>>>
>>>>>  shouldn't "the" be in the stop word list? - how do I avoid getting it
>>>>> as a word in the term-document matrix? - peter j
>>>>>
>>>>> _______________________________________________
>>>>> Wekalist mailing list
>>>>> Send posts to: Wekalist <at> list.waikato.ac.nz
>>>>> List info and subscription status:
>>>>> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>>>>> List etiquette:
>>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to: Wekalist <at> list.waikato.ac.nz
>>>> List info and subscription status:
>>>> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to: Wekalist <at> list.waikato.ac.nz
>>> List info and subscription status:
>>> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: Wekalist <at> list.waikato.ac.nz
>> List info and subscription status:
>> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20150413/c9729e1a/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 12 Apr 2015 15:39:05 -0400
> From: Nariman Ammar <nariman.ammar <at> gmail.com>
> To: "Weka machine learning workbench list."
> 	<wekalist <at> list.waikato.ac.nz>,	wekalist-owner <at> list.waikato.ac.nz
> Subject: [Wekalist] link based classification
> Message-ID:
> 	<CAEe6THojCR8aoQbVAa+JZK8Tmqwwpv6Lg9NPBmRg5-E34ZKrLA <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello
> Is link based classification or collaborative classification implemented in
> weka?
>
> --
> Sincerely Yours,
> Nariman Ammar
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20150412/92bd6a5c/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Sun, 12 Apr 2015 23:28:35 +0200
> From: Mohamed Tleis <m.tlais <at> gmail.com>
> To: wekalist <at> list.waikato.ac.nz
> Subject: [Wekalist] SMOTE example programmatically
> Message-ID: <552AE383.9010802 <at> gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Dear Folks,
>
> I am trying to resample my dataset using the SMOTE algorithm in a java
> program; however, I am not able to succeed. I cannot find any method in
> this SMOTE.java documentation that accepts a set of instances! On the
> web I cannot find any examples either! Can you advise please?
>
> Best Regards,
> M. Tleis
>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> Wekalist <at> list.waikato.ac.nz
> http://list.waikato.ac.nz/mailman/listinfo/wekalist
>
>
> End of Wekalist Digest, Vol 146, Issue 31
> *****************************************
>
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Gmane