Shengwen Yang | 9 Apr 03:46 2014

Re: [MADlib-user] User Digest, Vol 22, Issue 3

Hi Paul,

If you go through the LDA example, you can get the description for each
topic in Step 3 using *madlib*.*lda_get_topic_desc*. You can also get the
topic distribution for each document (in fact it gives the topic counts,
which can be converted to a probability distribution through normalization
very easily) as shown in Step 4. To get the most important topics of a
document, you can index sort the topic_count in a descending order and get
the top *k* topics (we have an internal function doing index sorting -

I'd suggest you run the algorithm in some real dataset, like reuters-21578
to get some meaningful results.


On Wed, Apr 9, 2014 at 12:00 AM, <user-request@...> wrote:

> Send User mailing list submissions to
>         user@...
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         user-request@...
> You can reach the person managing the list at
>         user-owner@...
(Continue reading)

Paul Nock | 7 Apr 23:36 2014

[MADlib-user] Recommendations for a beginner

Can anyone recommend good tutorial information for getting started with
Madlib?  Not really for installation and setup, but actually using the
tools and intepreting the results.

I successfully got everything downloaded, setup postgres, etc. and built
something like the example at  I felt a bit HHGTG
though at the end of it starting at "42", wondering what the question was...

In this particular case, I was looking to glean topics from text documents,
but I was having a hard time translating the results back into the real
world. I know I have a pretty steep learning curve, but my searching wasn't
giving me much that seemed all that helpful.

Michael D. Box | 7 Apr 18:31 2014

[MADlib-user] MADLib version support

I am new to MADLib, GreenPlum, etc. and have been asked to install MADLib on GreenPlum.  My first question is
which version of MADLib should I install and where can I find an installation guide for this release?  We
currently have a multi-node cluster of GreenPlum 4.1.

Thanks in advance,
Individuals who have received this information in error or are not authorized to receive it must promptly
return or dispose of the information and notify the sender. Those individuals are hereby notified that
they are strictly prohibited from reviewing, forwarding, printing, copying, distributing or using
this information in any way.
Hai Qian | 3 Apr 08:12 2014

Re: [MADlib-user] User Digest, Vol 21, Issue 3

* To create a new table, you can use "**". It can write a
data.frame / file / db.Rquery into the database, create a table and return
a object. It can also copy an existing table wrapped by
another object. Please read the user doc for more details.

* To append (or insert) a data.frame data to an existing table, you can
specify "append=TRUE" in to append the content of a file
or data.frame to an existing data table.

Note, "append" is not supported by the version on CRAN ( You will
need to use the latest version on GitHub ( There is detailed
instructions about how to install the latest PivotalR on the GitHub page.

Hope this helps.


*Pivotal <>*
A new platform for a new era

On Tue, Mar 25, 2014 at 9:00 AM, <user-request@...> wrote:

> Send User mailing list submissions to
>         user@...
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
(Continue reading)

Nagubandi, Govind | 24 Mar 23:44 2014

[MADlib-user] PivotalR Method/Function


I started using the PivotalR package to connect to a Greenplum database.

Is there a method I can call to write data back to a table (SQL "Insert") directly from an R object or data
frame? For many reasons this will be very helpful and I did not see it in the current version of the PivotalR
help file.

Thanks, Govind

Intelligent Solutions
(o) 212-270-0135
(m) 845-642-2068

This email is confidential and subject to important disclaimers and conditions including on offers for
the purchase or sale of securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers, available at  
Krishna Das | 5 Mar 09:18 2014

[MADlib-user] K-means clustering


I am working on huge datasets and I need to do clustering on these records
I have used kmeans_random and kmeanspp for clustering, but i have following

1) the output given by these functions have centroids. *How to associate
the data to different clusters.*  I dont know whether i have missed any
documentation on cluster association.

2) can the present clustering functions allow the users to do clustering on
the dataset with grouping clause. Ex: dataset with sex, height, weight as
variables and I want to do clustering on male  & female datasets separately
in single function call.

Krishna Das
Caleb Welton | 28 Feb 20:22 2014

Re: [MADlib-user] Median on round(decimal, 2) using madlib.summary()

Digging into the implementation a bit it looks like currently the quartiles
are implemented using the SQL Standard PERCENTILE_CONT() inverse
distribution function, which unfortunately is not currently supported on
the Postgres database.  We should do a better job documenting this
limitation or finding a workaround on this platform.

On Fri, Feb 28, 2014 at 11:03 AM, Jacob Kroeze <jkroeze@...>wrote:

>  Here's more detail about the environment and the test case run. I'm
> still not getting quartile summary statistics.
> Cheers!
> fas_staging=# select pg_catalog.version();
> -[ RECORD 1
> ]---------------------------------------------------------------------------------------------------------
> version | PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc
> (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
> fas_staging=# select madlib.version();
> -[ RECORD 1
> ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(Continue reading)

Jacob Kroeze | 28 Feb 02:20 2014

[MADlib-user] Median on round(decimal,2) using madlib.summary()

I'm trying to find the median of a table.

select * from madlib.summary(
'salary_percent_change', --target cols
'school,job_title,merit_rating', -- group on cols
TRUE, --get quartiles
NULL, --ntile array

However, the Quartile and median are not returned in the table.
Benjamin D. Shapiro | 13 Jan 06:44 2014

[MADlib-user] support for PostgreSQL 9.3


I noticed there is some work in the codebase on supporting PostgreSQL 9.3 ( and some discussion of the
possibility of installing MadLib in a PostgreSQL 9.3 system despite it not
being officially supported. So far I have been unsuccessful in building and
installing the latest MadLib version from source on Mac OS X with
PostgreSQL 9.3.

Do the MadLib developers intend to officially support Postgres 9.3 sometime
soon, and if so, could anyone take a guess at a timeline for the release?

It would also be helpful to know any factors that might make a successful
MadLib installation with PostgreSQL 9.3 more likely (e.g. a certain
compiler version or a certain OS) since, for instance, it looks like some
users have successfully installed it on LMDE (

Thank you!

Sandeep Gupta | 14 Dec 04:31 2013

[MADlib-user] list of installed functions


 I have installed madlib. The installation seems correct.
However, I am not sure if all the modules got installed correctly.

When I type SELECT * FROM madlib.linear_solver_dense();
I get some text output that describes the routine.

However, when I type SELECT * FROM madlib.assoc_rules();

I get error
ERROR:  function madlib.assoc_rules() does not exist
LINE 1: SELECT * FROM madlib.assoc_rules();
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.

So I was wonder if I am doing something wrong or assoc_rules is not part of
the package.

Adrian Schreyer | 12 Dec 18:31 2013

[MADlib-user] Intersection between two sparse vectors

Hi all,

Is it possible with madlib to get the intersection of two sparse vectors as
new sparse vector, i.e. only with the elements that are equal in both?