Re: Optimizing data insertion
Hi Abhishek
With this details + what is your existing hardware details +
configuration, shoot out other mail (different thread)
I am sure many from the group can able to assist you.
PS : I am on the move.
Raxit
On Thu, Mar 1, 2012 at 2:36 AM, Abhishek Pratap <apratap@...> wrote:
> Raxit
>
> Not sure what you meant by numbers but I will include some. Let me
> know know if you are looking for something else.
>
> 1. Our each experiment's input data set to Mongo can range from 100
> million to 500 million records
> 2. In the long run we expect the total number of records to be more
> than (500 million times 200). This could be an underestimate.
> 3. For now I want to begin small relative to #2 and say managing 500
> million records efficiently will be good. We want to have a system
> where we can push data into Mongo as quickly as we can. The data type
> will be mostly strings. Each record will have 10-15 columns.
>
> I will be glad to share more details if needed.
>
> Best,
> -Abhi
>
> On Wed, Feb 29, 2012 at 12:52 PM, Raxit Sheth
<raxitsheth2000@...> wrote:
>> Hi Abhishek
>>
>> 1. any numbers ?
>> 2. journaling is on/off ? if it is on, data will be writen multiple times
>>
>>
>>
>> Raxit
>>
>> On Thu, Mar 1, 2012 at 1:50 AM, Abhishek Pratap <apratap@...> wrote:
>>> Hi Raxit
>>>
>>> Thanks for following up.
>>>
>>> For performance I would say I am looking to leverage the most I can
>>> from Mongo in terms of data insertion and running queries over it. In
>>> our case the data insertion will be frequent and in production cases
>>> we will have to push the data into Mongo either through perl/python as
>>> it needs some cleaning.
>>>
>>> I tried mongoimport without doing the required cleaning on my data
>>> just to test the insertion speed #records/sec to get a sense of
>>> scalability of the DB.
>>>
>>> What I am looking for is some recommendations of the env to run
>>> mongodb in to get the best performance. For example free RAM, impact
>>> of journaling etc.
>>>
>>> And the recommneded way for inserting data into DB through perl/python
>>> API. Should the insertions be done per record or batch inserted (#N
>>> records at a time).
>>>
>>> Thanks!
>>> -Abhi
>>>
>>> On Tue, Feb 28, 2012 at 11:21 PM, Raxit Sheth
<raxitsheth2000@...> wrote:
>>>> simple extrapolate (which may be wrong)
>>>>
>>>> Regular perl (total time = read from file + parse + insert to mongo) :
>>>> 1M record/210 second
>>>> mongoimport <Here you do parsing etc ?> : 10k/sec, i.e. 1M/100 second.
>>>
>>> I dint do parsing for testing purposes
>>>>
>>>> Normally this is non-frequent activity ?
>>>> Performance depends on many factors ? RAM/HD Speed/ OS stuff etc, so
>>>> you need to take call what is acceptable ?
>>>>
>>>> What kind of performance you are looking for ?
>>>>
>>>> Raxit
>>>>
>>>> On Wed, Feb 29, 2012 at 6:31 AM, Abhishek Pratap <apratap@...> wrote:
>>>>> Thanks for link Raxit.
>>>>>
>>>>> Using mongoimport I am getting close to 10,000 records inserted /
>>>>> second. Does this sound reasonable ? I know it might depend on the
>>>>> record type but I am trying to get a rough idea if I am seeing the
>>>>> right efficiency.
>>>>>
>>>>> Also can there is no option to index while inserting with mongoimport ???
>>>>>
>>>>> Cheers!
>>>>> -Abhi
>>>>>
>>>>> On Tue, Feb 28, 2012 at 11:40 AM, Raxit Sheth <raxitsheth2000 <at> gmail.com> wrote:
>>>>>> you may want to check utility like import
>>>>>>
>>>>>> http://www.mongodb.org/display/DOCS/Import+Export+Tools
>>>>>>
>>>>>> On Wed, Feb 29, 2012 at 12:44 AM, Abhishek Pratap <apratap@...> wrote:
>>>>>>> Hi Guys
>>>>>>>
>>>>>>> I have started playing with MongoDB as of yesterday so my questions
>>>>>>> can be naive and I might ask too many of them in the next couple of
>>>>>>> days.
>>>>>>>
>>>>>>> I am trying to use the DB to create an index for fast lookups for a
>>>>>>> data set of about 500 million records/rows. I am parsing the data from
>>>>>>> input file through perl and then inserting each record one at a time
>>>>>>> in the DB using MongoDB perl's API.
>>>>>>>
>>>>>>> At present I am able to insert about 1 million records in the DB in
>>>>>>> about 3.5 minutes. Ofcourse this also includes the file reading and
>>>>>>> parsing time of perl so not really a true estimate of time it takes
>>>>>>> MongoDB to insert these records.
>>>>>>>
>>>>>>>
>>>>>>> I am wondering if data can be inserted in batch mode and if that will
>>>>>>> be quicker. I am also trying to index on a set of columns. Does
>>>>>>> indexing on more than one columns slow down the IO from DB.
>>>>>>>
>>>>>>>
>>>>>>> FYI : The mongo server currently is running on the same box I use for
>>>>>>> dev. For production purposes I would like to know what are the
>>>>>>> recommendations for setting up the database server.
>>>>>>>
>>>>>>> For now I will end here.
>>>>>>>
>>>>>>> -Abhi
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>>>>> To post to this group, send email to mongodb-user@...
>>>>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe@...
>>>>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongodb-user@...
>>>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe@...
>>>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>>> To post to this group, send email to mongodb-user@...
>>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>> To post to this group, send email to mongodb-user@...
>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>> To post to this group, send email to mongodb-user@...
>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>> To post to this group, send email to mongodb-user@...
>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@...
> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
--
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@...
To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.