Roman Shaposhnik | 2 Sep 23:27 2015

[MADlib-user] ASF Incubator proposal is now under discussion


I just wanted to let you know that I've submitted the
proposal for discussion today:

I would also like encourage everybody to directly
participate in the discussion on that thread.

Steven Hillion | 16 Aug 17:19 2015

[MADlib-user] proposed MADlib move to Apache Software Foundation

I would like to express my support for making MADlib an incubator project under the Apache Software Foundation.

As developers and users of the MADlib libraries, my development team is keenly interested in seeing this
project attract as wide an audience as possible.

We believe that MADlib is the most flexible, open and powerful mechanism for doing machine learning on data
in relational databases. And with open-source HAWQ, that capability is now extended to Hadoop.

We have seen MADlib used successfully in numerous real-world scenarios, covering projects in
healthcare, finance, media and manufacturing. By extending the community of contributors, we hope that
the potential of MADlib can be applied even more broadly.
Michael Brand | 15 Aug 04:33 2015

[MADlib-user] MADlib's move to ASF

Hi all,

I was, just a few weeks ago, speaking at the Melbourne Data Science 
Meetup, telling about in-database analytics and giving MADlib as a 
case-in-point. Reaction was through the roof, and people were really 
excited about the possibilities. The problem, of course, is that MADlib 
was not usable outside GPDB/HAWQ (or, very hobbled, also PostgreSQL). I 
think that a move of MADlib to ASF would enable people to make it much 
more of an integral part of the Big Data ecosystem, including, for 
example, kicking off the development of a MADlib port for other MPP 
databases. The basic building blocks MADlib is built on are equally 
available in Teradata, SQL Server, etc.. So, why not?

My personal outlook on this is that things like Spark come and go, but 
SQL -- though completely unsexy -- is here to stay. Companies doing Big 
Data analytics have 90% of the data they analyse inside SQL DBs. SQL 
isn't going away any time soon, and data scientists all over the world 
need a SQL tool for in-database analytics, or else they are forced to 
sample down, etc., and all of the advantages of your big data go away.

When I was at Pivotal, my common question about MADlib was always "why 
isn't it more open to outside contributions?". Now it seems things are 
changing for the better.

Excellent news!

Salman B.M Raeisi | 12 Aug 14:10 2015

[MADlib-user] python version


I want to know if madlib can work on python version 3 or its only python 2 .

Salman B.M Raeisi | 12 Aug 14:08 2015

[MADlib-user] regression trees/ensemble tree


I want to know if madlib support regression trees/ensemble tree .

Afra | 27 Jun 03:25 2015

[MADlib-user] Chi-Squared Independence Result Discrepancy

Hello all,

I am having different results returned from the madlib.chi2_gof_test function. Per the documentation at:

(under “Chi-squared independence test”), I receive the following results using the same arrays:

  ─[ RECORD 1 ]────────────────
statistic                  │ 320.125868955635
p_value                 │ 1.39464882809491e-63
df                          │ 9
phi                        │ 4.4730154045931
contingency_coef │ 0.975909209031126

(138.289841626008 is the expected value per the docs). 

I also noticed the SQL had some syntax issues, namely:

   sum(observed) OVER (PARTITION BY id_y) AS expected

(there doesn’t seem to be a column separating the two). 

Thank you very much for any guidance. 

MADlib user discussion mailing list
User <at>
(Continue reading)

John Langton | 16 May 19:59 2015

[MADlib-user] multinom function not found

Hello, I was just tryign to run the multinomial function and got an error
that the function does not exist. I checked the madlib schema and sure
enough I'm not finding it in there.

I installed maybe a few months ago. Is there an easy way to check madlib
version? Is this a new function? Any idea why it would be missing?




John T. Langton, Ph.D.
VisiTrend, Inc.
John Langton | 1 Apr 17:58 2015

[MADlib-user] what values are allowed for ARIMA non-seasonal orders

The input parameter is specified as an int[3]

Just wondering what values can be passed into that, 0, 1, 2 or any int?

BTW: I keep getting this message using the mail app on a mac, even when
specifying a plain text format: "The message's content type was not
explicitly allowed"
Rahul Iyer | 28 Mar 16:59 2015

Re: [MADlib-user] Random Forest classification

You could write the SQL for such a random classifier.
For eg. using the dt_golf dataset from decision tree example page
<>, we
can produce random classes as:

SELECT class, classes[trunc(random()*2 + 1)] as random_class
FROM dt_golf,

(SELECT ARRAY['Play', 'Don''t Play'] as classes) q1;

In the query, the expression trunc(random()*2 + 1 computes a random integer
between 1 and 2. You would need to replace '2' with the number of classes
in your data.

- Rahul

On Fri, Mar 27, 2015 at 8:56 PM, <dpopova@...> wrote:

> Rahul,
> Sorry to bug you on the weekend. But does MADlib have a random classifier?
> Something that puts random values into the class variable, to establish a
> baseline for a particular dataset?
> For example, WEKA has ZeroR classifier.
> Thank you,
> Diana
> > Yes. That's the goal of tree_predict (signature below):
(Continue reading)

dpopova | 26 Mar 02:42 2015

[MADlib-user] Random Forest classification

Dear all,

I am trying to run forest_train function on a dataset. I have already
successfully trained this data using tree_train. But the forest_train
gives the following error message:
dianapopova=# \c maddb
You are now connected to database "maddb" as user "dianapopova".
maddb=# SELECT madlib.forest_train('madlib.pull_request_class_merged',
maddb(# 'forest_output',
maddb(# 'pull_req_id',
maddb(# 'merged',
maddb(# '*',
maddb(# NULL
maddb(# );
ERROR:  AttributeError: 'NoneType' object has no attribute 'strip'
CONTEXT:  Traceback (most recent call last):
  PL/Python function "forest_train", line 41, in <module>
  PL/Python function "forest_train", line 279, in forest_train
  PL/Python function "forest_train", line 1486, in _forest_validate_args
PL/Python function "forest_train"
Could you please advise?

Thank you,

AJ Welch | 3 Jan 08:31 2015

[MADlib-user] Distribute on pgxn?

Looks like madlib was distributed on pgxn at one point:

Why was 1.3 the last distribution? Not enough support to make it a priority?