[MADlib-user] MADlib's move to ASF
I was, just a few weeks ago, speaking at the Melbourne Data Science
Meetup, telling about in-database analytics and giving MADlib as a
case-in-point. Reaction was through the roof, and people were really
excited about the possibilities. The problem, of course, is that MADlib
was not usable outside GPDB/HAWQ (or, very hobbled, also PostgreSQL). I
think that a move of MADlib to ASF would enable people to make it much
more of an integral part of the Big Data ecosystem, including, for
example, kicking off the development of a MADlib port for other MPP
databases. The basic building blocks MADlib is built on are equally
available in Teradata, SQL Server, etc.. So, why not?
My personal outlook on this is that things like Spark come and go, but
SQL -- though completely unsexy -- is here to stay. Companies doing Big
Data analytics have 90% of the data they analyse inside SQL DBs. SQL
isn't going away any time soon, and data scientists all over the world
need a SQL tool for in-database analytics, or else they are forced to
sample down, etc., and all of the advantages of your big data go away.
When I was at Pivotal, my common question about MADlib was always "why
isn't it more open to outside contributions?". Now it seems things are
changing for the better.