Giang Nguyen | 16 Feb 19:30 2016

[MADlib-user] Sample datasets to run CRF

Hi guys,
I am new to madlib, and I am try to run CRF from madlib from a user standpoint.
Is there anyway I can get some sample datasets to run the CRF with? 
For example is there any way I can get the crf_label, 
crf_regex, and train_segment_tbl, and so on. 

Thank you very much!
Karen Vuong | 14 Dec 18:21 2015

[MADlib-user] Please Join Apache MADlib Contributors' ConCall Friday 12/18 - 10AM PST

Dear MADlib contributors,

We would like to invite you to the second monthly Contributors meeting this
Friday, 12/18 at 10AM Pacific.

We will discuss the following topics:

* MADlib on PostgreSQL as presented at last conference
* Getting to the first Apache MADlib release
* Starter Jira Issues for new contributors
* Open discussion

The call will be moderated by Frank McQuillan from Pivotal Software.

If you have any other topics or questions you would like to add to the
agenda, please respond to this email.

Join the meeting Friday, December 18 at 10AM PST using the following link:

If you have never attended an Adobe Connect meeting before:

Test your connection:

Get a quick overview:

(Continue reading)

Roman Shaposhnik | 16 Sep 00:59 2015

[MADlib-user] MADlib has been accepted as an ASF Incubator project


awesome news everybody: MADlib has been accepted as 
an ASF Incubator project:
We are now Apache MADlib (incubatiung) and with
that the infrastructure migration begins today:

Here's what is going to happen next in the next few days:
   0. Please STOP any commits to existing MADlib GitHub
   repos and updating of JIRAs. I will be migrating those
   bits of infrastructure over to ASF starting from tomorrow.
   There will be no changes to their state (commit history, etc.)
   just the location will be a new one.

   1. Make sure that you are subscribed to the new set
   of mailing lists:
   these are *public* mailing lists that are open to anybody. 
   All communication about project development is expected to
   happen in the open mostly on dev@...
   In order to subscribe yourself to all these MLs make sure to
   send an empty email to [ML name]-subscribe@...
   For example, in order to subscribe to dev@...
   send an empty email to dev-subscribe@...

   2. One exception to the public communication rule is:
(Continue reading)

Tomáš Greif | 11 Sep 03:17 2015

[MADlib-user] Parallelization/performance of summary

How do I make madlib to execute more work in parallel? I have table with
500 numeric columns and 600k rows. Summary function runs for half an hour
because it uses single thread only (I think). I have server with 56
hardware threads and 100GB RAM, but I cannot utilize it with madlib summary
function. This is strange as this is easy to parallelize. Any hints?

Running latest postgresql on latest centos with latest madlib (stable
rvs | 10 Sep 04:45 2015

[MADlib-user] ASF Incubator voting has begun


just wanted to let you guys know that the
voting on accepting MADlib into the ASF
Incubator has started. It would be great
if those of you interested in this next
phase of MADlib's community evolution could
cast their votes on the email thread.

Roman Shaposhnik | 2 Sep 23:27 2015

[MADlib-user] ASF Incubator proposal is now under discussion


I just wanted to let you know that I've submitted the
proposal for discussion today:

I would also like encourage everybody to directly
participate in the discussion on that thread.

Steven Hillion | 16 Aug 17:19 2015

[MADlib-user] proposed MADlib move to Apache Software Foundation

I would like to express my support for making MADlib an incubator project under the Apache Software Foundation.

As developers and users of the MADlib libraries, my development team is keenly interested in seeing this
project attract as wide an audience as possible.

We believe that MADlib is the most flexible, open and powerful mechanism for doing machine learning on data
in relational databases. And with open-source HAWQ, that capability is now extended to Hadoop.

We have seen MADlib used successfully in numerous real-world scenarios, covering projects in
healthcare, finance, media and manufacturing. By extending the community of contributors, we hope that
the potential of MADlib can be applied even more broadly.
Michael Brand | 15 Aug 04:33 2015

[MADlib-user] MADlib's move to ASF

Hi all,

I was, just a few weeks ago, speaking at the Melbourne Data Science 
Meetup, telling about in-database analytics and giving MADlib as a 
case-in-point. Reaction was through the roof, and people were really 
excited about the possibilities. The problem, of course, is that MADlib 
was not usable outside GPDB/HAWQ (or, very hobbled, also PostgreSQL). I 
think that a move of MADlib to ASF would enable people to make it much 
more of an integral part of the Big Data ecosystem, including, for 
example, kicking off the development of a MADlib port for other MPP 
databases. The basic building blocks MADlib is built on are equally 
available in Teradata, SQL Server, etc.. So, why not?

My personal outlook on this is that things like Spark come and go, but 
SQL -- though completely unsexy -- is here to stay. Companies doing Big 
Data analytics have 90% of the data they analyse inside SQL DBs. SQL 
isn't going away any time soon, and data scientists all over the world 
need a SQL tool for in-database analytics, or else they are forced to 
sample down, etc., and all of the advantages of your big data go away.

When I was at Pivotal, my common question about MADlib was always "why 
isn't it more open to outside contributions?". Now it seems things are 
changing for the better.

Excellent news!

Salman B.M Raeisi | 12 Aug 14:10 2015

[MADlib-user] python version


I want to know if madlib can work on python version 3 or its only python 2 .

Salman B.M Raeisi | 12 Aug 14:08 2015

[MADlib-user] regression trees/ensemble tree


I want to know if madlib support regression trees/ensemble tree .

Afra | 27 Jun 03:25 2015

[MADlib-user] Chi-Squared Independence Result Discrepancy

Hello all,

I am having different results returned from the madlib.chi2_gof_test function. Per the documentation at:

(under “Chi-squared independence test”), I receive the following results using the same arrays:

  ─[ RECORD 1 ]────────────────
statistic                  │ 320.125868955635
p_value                 │ 1.39464882809491e-63
df                          │ 9
phi                        │ 4.4730154045931
contingency_coef │ 0.975909209031126

(138.289841626008 is the expected value per the docs). 

I also noticed the SQL had some syntax issues, namely:

   sum(observed) OVER (PARTITION BY id_y) AS expected

(there doesn’t seem to be a column separating the two). 

Thank you very much for any guidance. 

MADlib user discussion mailing list
User <at>
(Continue reading)