Benjamin Wilson | 28 Jul 19:28 2014

[MADlib-user] nearest neighbours search in madlib

Hi all,

Does MADLib offer any facilities for nearest neighbours lookup on
vectorial data, approximate or exact?

Googling around, I found this:
But I am not sure if this is current.

Any help would be greatly appreciated!

Cheers, Benjamin.

[MADlib-user] optimal ARIMA order parameters

   I would like to know if a function has been developed to compute the optimal order for ARIMA (similar to
auto.arima - checking BIC or AIC or smallest MLE using stepwise selection process) in Madlib.

Please let me know..

Nick | 10 Jun 18:10 2014

[MADlib-user] Using PivotalR with SQL Server

Sorry if I missed this but I've been looking and haven't seen mention of how to use PivotalR with SQL Server.
 I know it isn't a native thing but  i can't tell if it's possible and I haven't figured it out, or if it
simply isn't possible. 

My RODBC equivalent is below.  VF9 is the server name.

>user_id <- "my_user_name"
>password <- "my_password"
>myconn <- odbcConnect("VF9", uid = user_id, pwd = password)
># Execute query
>data_frame <- sqlQuery(myconn, "
>                            select * 
>                            from Finance.dbo.finance_001
>                           ")

Thank you so much in advance.  This looks like literally the perfect package if I can use it with SQL Server. 

Budi Wibowo | 3 Jun 21:22 2014

[MADlib-user] other flavor of SQL with MADlib

I’m new to MADlib. Just started to dig around this couple of days. I saw the instruction to run MADlib on top
of Postgre. Is it possible to run MADlib on top of MS SQL Server, MySQL or Oracle? could someone share if they
have done it in the past?

thank you!!

Budi Wibowo
Sent with Airmail
MADlib user discussion mailing list
User <at>

[MADlib-user] store an R data frame in GPDB

   I am playing around with PivotalR package. I am able to read an existing table in GPDB and do some basic
manupulations using R. After that, I need to store the modified data.frame into GPDB as a table. How do I do
that? What is the function/API to store a data.frame into GPDB as a table?

Shengwen Yang | 9 Apr 03:46 2014

Re: [MADlib-user] User Digest, Vol 22, Issue 3

Hi Paul,

If you go through the LDA example, you can get the description for each
topic in Step 3 using *madlib*.*lda_get_topic_desc*. You can also get the
topic distribution for each document (in fact it gives the topic counts,
which can be converted to a probability distribution through normalization
very easily) as shown in Step 4. To get the most important topics of a
document, you can index sort the topic_count in a descending order and get
the top *k* topics (we have an internal function doing index sorting -

I'd suggest you run the algorithm in some real dataset, like reuters-21578
to get some meaningful results.


On Wed, Apr 9, 2014 at 12:00 AM, <user-request@...> wrote:

> Send User mailing list submissions to
>         user@...
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
>         user-request@...
> You can reach the person managing the list at
>         user-owner@...
(Continue reading)

Paul Nock | 7 Apr 23:36 2014

[MADlib-user] Recommendations for a beginner

Can anyone recommend good tutorial information for getting started with
Madlib?  Not really for installation and setup, but actually using the
tools and intepreting the results.

I successfully got everything downloaded, setup postgres, etc. and built
something like the example at  I felt a bit HHGTG
though at the end of it starting at "42", wondering what the question was...

In this particular case, I was looking to glean topics from text documents,
but I was having a hard time translating the results back into the real
world. I know I have a pretty steep learning curve, but my searching wasn't
giving me much that seemed all that helpful.

Michael D. Box | 7 Apr 18:31 2014

[MADlib-user] MADLib version support

I am new to MADLib, GreenPlum, etc. and have been asked to install MADLib on GreenPlum.  My first question is
which version of MADLib should I install and where can I find an installation guide for this release?  We
currently have a multi-node cluster of GreenPlum 4.1.

Thanks in advance,
Individuals who have received this information in error or are not authorized to receive it must promptly
return or dispose of the information and notify the sender. Those individuals are hereby notified that
they are strictly prohibited from reviewing, forwarding, printing, copying, distributing or using
this information in any way.
Hai Qian | 3 Apr 08:12 2014

Re: [MADlib-user] User Digest, Vol 21, Issue 3

* To create a new table, you can use "**". It can write a
data.frame / file / db.Rquery into the database, create a table and return
a object. It can also copy an existing table wrapped by
another object. Please read the user doc for more details.

* To append (or insert) a data.frame data to an existing table, you can
specify "append=TRUE" in to append the content of a file
or data.frame to an existing data table.

Note, "append" is not supported by the version on CRAN ( You will
need to use the latest version on GitHub ( There is detailed
instructions about how to install the latest PivotalR on the GitHub page.

Hope this helps.


*Pivotal <>*
A new platform for a new era

On Tue, Mar 25, 2014 at 9:00 AM, <user-request@...> wrote:

> Send User mailing list submissions to
>         user@...
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
(Continue reading)

Nagubandi, Govind | 24 Mar 23:44 2014

[MADlib-user] PivotalR Method/Function


I started using the PivotalR package to connect to a Greenplum database.

Is there a method I can call to write data back to a table (SQL "Insert") directly from an R object or data
frame? For many reasons this will be very helpful and I did not see it in the current version of the PivotalR
help file.

Thanks, Govind

Intelligent Solutions
(o) 212-270-0135
(m) 845-642-2068

This email is confidential and subject to important disclaimers and conditions including on offers for
the purchase or sale of securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers, available at  
Krishna Das | 5 Mar 09:18 2014

[MADlib-user] K-means clustering


I am working on huge datasets and I need to do clustering on these records
I have used kmeans_random and kmeanspp for clustering, but i have following

1) the output given by these functions have centroids. *How to associate
the data to different clusters.*  I dont know whether i have missed any
documentation on cluster association.

2) can the present clustering functions allow the users to do clustering on
the dataset with grouping clause. Ex: dataset with sex, height, weight as
variables and I want to do clustering on male  & female datasets separately
in single function call.

Krishna Das