Khan, Shehroz | 2 Jan 2008 10:41

Difference between book and developer version

Hello,
What precisely is the difference between Weka Book and Developer version?


Shehroz

-----Original Message-----
From: wekalist-bounces <at> list.scms.waikato.ac.nz [mailto:wekalist-bounces <at> list.scms.waikato.ac.nz] On Behalf Of wekalist-request <at> list.scms.waikato.ac.nz

Sent: 28 December 2007 01:34
To: wekalist <at> list.scms.waikato.ac.nz
Subject: Wekalist Digest, Vol 58, Issue 25

Send Wekalist mailing list submissions to
        wekalist <at> list.scms.waikato.ac.nz

To subscribe or unsubscribe via the World Wide Web, visit
        https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
or, via email, send a message with subject or body 'help' to
        wekalist-request <at> list.scms.waikato.ac.nz

You can reach the person managing the list at
        wekalist-owner <at> list.scms.waikato.ac.nz

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Peter Reutemann | 2 Jan 2008 19:50
Picon
Picon

Re: Difference between book and developer version

> What precisely is the difference between Weka Book and Developer version?

Read FAQ "What's the difference between book and developer version?". Link
to the FAQs is available from the Weka homepage.

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
Felipe | 2 Jan 2008 21:17
Picon

Plans on parallel WEKA

Does WEKA plan to implement a native versionof WEKA that runs on multi-core processor such as Pentium Dual Core or Opteron64 x2? Actually one could use Parallel-Weka, but that only works for cross-validation. (Sorry for my english. I'm from Chile).

Best regards.

--
Felipe

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Mark Hall | 2 Jan 2008 21:55

Re: CFSS

On Mon, Dec 31, 2007 at 12:35:41PM +0530, Krishna wrote:
> 
>    i need to select best features using this method for classification.
>    what is the percentage cutoff(no of folds ) to select features for
>    classification?

This will vary from problem to problem and is up to the practitioner to
decide, either using any domain knowledge they might have or through
experimentation with a learning algorithm that is to use the selected
features.

Cheers,
Mark.

--

-- 
Mark Hall
Senior Developer/Consultant
Pentaho
The Open Source Business Intelligence Company
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA

5 Grace Avenue, Hamilton 3210, New Zealand
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, 
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/projects/pentaho>

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Mark Hall | 2 Jan 2008 22:09

Re: Two questions about cost sensitive learning

On Sun, Dec 30, 2007 at 01:05:42PM +0800, ?? ?? wrote:
> Hello everyone!
>   My database has 10000 samples of type A and about
> 1000 samples of type B, Moreover type B is more
> important than A. It is both class imbalance and cost
> sensitive. 
>   In weka, as I know, there are
> "CostSensitiveClassifier" and "MetaCost" to solve cost
> sensitive problems, but no method to treat class
> imbalance problems. 
>   The first question is how to treat class imbalance
> problems in weka and whether "CostSensitiveClassifier"
> or "MetaCost" method can be used to solve class
> imbalance problems.

You can use CostSensitiveClassifier in cost sensitive learning
mode to address class imbalances. Depending on the base classifier,
either re-weighting or re-sampling will be used. Note that it uses
an ad hoc heuristic for this when there are more than two classes
in the data.

>   The second question is what is the differences
> between "CostSensitiveClassifier" and "MetaCost".

CostSensitiveClassifier implements cost sensitive learning and cost
sensitive classification (via the minimum expected cost
method). MetaCost is the method by Domingos that achieves a
comprehensible cost sensitive classifier from a bagging approach by
relabeling the training data.

Cheers,
Mark.

--

-- 
Mark Hall
Senior Developer/Consultant
Pentaho
The Open Source Business Intelligence Company
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA

5 Grace Avenue, Hamilton 3210, New Zealand
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, 
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/projects/pentaho>

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
rich | 3 Jan 2008 22:17
Picon

controlling cross validation and outputting predictions.



Hi,

I want to be able to return the class prediction and know for which instance it relates during cross validation.
I think i therefore need to control the cross validation step.

Say, for example, I have 1000 data records and I want to do a 5-fold CV. I want to take records from 1 to 200 as fold1, 201-400 as fold2 , ..., 801-1000 as fold5.

I have read that this may be possible by editting the following method of the weka.classifiers.Evaluation class:
  crossValidateModel(Classifier,Instances,int,Random)

by commenting out the randomize and stratify method calls.

This potentially will only work on the command line version. I need to see the predictions in the output but can only do this with the gui. On the command line -p 0 only shows the predictions on the training dataset when using cross validation.

Does anyone know of a solution for identifying  predictions and the instance number for the test data in cross validation?

Thanks

Rich 


_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Picon

JRIP's minNo

Dear all,

I am currently using JRIP and experimenting with different values for its properties (I am in the Explorer mode). The parameter minNo appears to be decimal, given the dot "." in the text box of the generic object editor. However, decimal values appear to have no effect, in the sense that if I try "minNo = 2.1, 2.2, 2.3, 2.4" (and many other values) the accuracy I get is the same. There are some exceptions, like "minNo = 2.0" gives different results from 2.1, but then 2.1 gives same results as 2.2, 2.3 etc.. So the question is, are decimal values actually used? Or is the ceiling of the number used instead (that is ceil( 2.1) = 3)? Or is it just my examples that the decimal part seems to have no effect? Should I try to use decimal values or integers should suffice for the implementation at hand?

Thank you very much,
George Valkanas

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Peter Reutemann | 4 Jan 2008 00:13
Picon
Picon
Favicon

Re: controlling cross validation and outputting predictions.

> I want to be able to return the class prediction and know for which instance it 
> relates during cross validation.
> I think i therefore need to control the cross validation step.
> 
> Say, for example, I have 1000 data records and I want to do a 5-fold CV. I want 
> to take records from 1 to 200 as fold1, 201-400 as fold2 , ..., 801-1000 as fold5.
> 
> I have read that this may be possible by editting the following method of the 
> weka.classifiers.Evaluation class:
>   crossValidateModel(Classifier,Instances,int,Random)
> 
> by commenting out the randomize and stratify method calls.
> 
> This potentially will only work on the command line version. I need to see the 
> predictions in the output but can only do this with the gui. On the command line 
> -p 0 only shows the predictions on the training dataset when using cross validation.
> 
> Does anyone know of a solution for identifying  predictions and the instance 
> number for the test data in cross validation?

If you wanna track instances, then use an ID attribute in combination 
with the FilteredClassifier, your base classifier of choice and the 
Remove filter (to get rid of the ID attribute, before the base 
classifier sees the data). See FAQ "How do I use ID attributes?" for 
more information. Link to the FAQs is available form the Weka homepage.

On the other hand, if you wanna have more control over what's happening, 
have a look at the article "Generating cross-validation folds (Java 
approach)":
http://weka.sourceforge.net/wiki/index.php/Generating_cross-validation_folds_%28Java_approach%29

The source file "CrossValidationAddPrediction.java" (in the "Links" 
section) runs a normal cross-validation, but also adds the 
prediction/distribution/error flag to the test data of each fold. At the 
end, the evaluation summary and the "enriched" test data is printed to 
the console.

HTH

Cheers, Peter
--

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/     +64 (7) 838-4466 Ext. 5174
rich | 4 Jan 2008 13:08
Picon

Re: controlling cross validation and outputting predictions.





On Thu Jan 3 23:13 , Peter Reutemann sent:

> I want to be able to return the class prediction and know for which instance it
> relates during cross validation.
> I think i therefore need to control the cross validation step.
>
> Say, for example, I have 1000 data records and I want to do a 5-fold CV. I want
> to take records from 1 to 200 as fold1, 201-400 as fold2 , ..., 801-1000 as fold5.
>
> I have read that this may be possible by editting the following method of the
> weka.classifiers.Evaluation class:
> crossValidateModel(Classifier,Instances,int,Random)
>
> by commenting out the randomize and stratify method calls.
>
> This potentially will only work on the command line version. I need to see the
> predictions in the output but can only do this with the gui. On the command line
> -p 0 only shows the predictions on the training dataset when using cross validation.
>
> Does anyone know of a solution for identifying predictions and the instance
> number for the test data in cross validation?

If you wanna track instances, then use an ID attribute in combination
with the FilteredClassifier, your base classifier of choice and the
Remove filter (to get rid of the ID attribute, before the base
classifier sees the data). See FAQ "How do I use ID attributes?" for
more information. Link to the FAQs is available form the Weka homepage.

Thanks, this works well in the Explorer. Unless i'm mistaken, it doesn't work from the command line however because when using cross validation and -p, the predictions output are those made on the training data, not the test data.

cheers

Rich 

On the other hand, if you wanna have more control over what's happening,
have a look at the article "Generating cross-validation folds (Java
approach)":
http://weka.sourceforge.net/wiki/index.php/Generating_cross-validation_folds_%28Java_approach%29

The source file "CrossValidationAddPrediction.java" (in the "Links"
section) runs a normal cross-validation, but also adds the
prediction/distribution/error flag to the test data of each fold. At the
end, the evaluation summary and the "enriched" test data is printed to
the console.

HTH

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ +64 (7) 838-4466 Ext. 5174

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
ZHANG Yang | 4 Jan 2008 16:39
Picon
Picon
Favicon

BUG in InfoGainSplitCrit.java

Dear WEKAer,

Please refer to
	weka.classifiers.trees.j48.GainRatioSplitCrit

The following method :
///////////////////////////////////////////////////////////////////////////////////
  /**
   * This method is a straightforward implementation of the information
   * gain criterion for the given distribution.
   */
  public final double splitCritValue(Distribution bags) {

    double numerator;

    numerator = oldEnt(bags)-newEnt(bags);

    // Splits with no gain are useless.
    if (Utils.eq(numerator,0))
      return Double.MAX_VALUE;

    // We take the reciprocal value because we want to minimize the
    // splitting criterion's value.
    return bags.total()/numerator;
  }
///////////////////////////////////////////////////////////////////////////////////

There is a BUG here.
The code should be changed into:

    // Splits with no gain are useless.
    if (Utils.eq(numerator,0))
      return 0;

    // We take the reciprocal value because we want to minimize the
    // splitting criterion's value.
    return numerator/bags.total();

Am I right?

This BUG is very harmful to users with experiments by J48.

Best Regards,
Yang ZHANG
------------------------------------------------------------
School of Information Technology and Electrical Engineering
The University of Queensland
Email: mokuram <at> itee.uq.edu.au

_______________________________________________
Wekalist mailing list
Wekalist <at> list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Gmane