elif | 1 Mar 2012 01:10
Picon

Re: Building mongodb-hadoop examples

Thanks Andy.
I actually end up writing my own makefile and copying all the jars
to ../lib to make things work, which it did.
Then, though, I moved to a system where I have no authorization to
access the $HADOOP_HOME/lib.
I could't really make it work by playing with $HADOOP_CLASSPATH  so I
just created a jar file which includes all the classes from all the
other jar files required. Because for some reason -libjars didn't work
as well.
So I after I put all classes in jar WordCountExample.jar this is how I
run it:

-hadoop jar lib/WordCountExample.jar
com.mongodb.hadoop.examples.wordcount.WordCount

Do you guys know how I should modify $HADOOP_CLASSPATH so that I don't
have to collect all my jars in one.

thanks,

elif

On Feb 29, 1:45 pm, aharbick <aharb...@...> wrote:
> I've been struggling with the same problem.  About the only thing that
> I can conclude is that the documentation is wrong.
>
> 1.  I've found no way to get ./sbt to build the examples.
> 2.  There are a couple references to "ant" but as far as I can tell
> the r1.0.0-rc0 tag, and the master branch don't have ant build files.
> 3.  The hadoop command to run the wordcount example is wrong (at a
(Continue reading)

Nolhian | 1 Mar 2012 01:11
Picon
Favicon

Really slow queries

Hello,

I made a test collection with 100 000 documents 2ko each. I have a
state attribute which is random between 1 and 50. When I try to search
documents of a certain state :

col.find({'state':3}).explain()
{u'allPlans': [{u'cursor': u'BasicCursor', u'indexBounds': []}],
u'millis': 177, u'n': 1821, u'cursor': u'BasicCursor', u'indexBounds':
[], u'nscannedObjects': 100000, u'nscanned': 100000}

This is nice, 177ms without an index. Now let's add 1 000 000
documents, still 2ko each. The same query :

col.find({'state':3}).explain()
{u'allPlans': [{u'cursor': u'BasicCursor', u'indexBounds': []}],
u'millis': 78983, u'n': 21389, u'cursor': u'BasicCursor',
u'indexBounds': [], u'nscannedObjects': 1100000, u'nscanned': 1100000}

79s ! Let's add an index on state and do it again :

col.find({'state':3}).explain()
{u'allPlans': [{u'cursor': u'BtreeCursor state_1', u'indexBounds':
[[{u'state': 3}, {u'state': 3}]]}], u'millis': 26790, u'n': 21389,
u'cursor': u'BtreeCursor state_1', u'indexBounds': [[{u'state': 3},
{u'state': 3}]], u'nscannedObjects': 21389, u'nscanned': 21389}

26s !

This is really really slow,  I don't understand the gap between 100k
(Continue reading)

Mathias Stearn | 1 Mar 2012 01:32

Re: mongos messages

If you run the same query again do you get null or the correct object? What is the query and what is the shard key for that collection? Which version are you using?


On Wednesday, February 29, 2012 4:43:30 AM UTC-5, Oded Maimon wrote:
well, actually the error is getting a null object when we expect to
get the data itself (and we know the data is in the db)

On Feb 6, 1:12 am, Eliot Horowitz <el...-Ot75HdpNzd8AvxtiuMwx3w@public.gmane.org> wrote:
> Can you send the client error?
> Those are ok in mongos, just indicate meta data changed.
>
>
>
>
>
>
>
> On Sun, Feb 5, 2012 at 11:46 AM,OdedMaimon <oded.mai... <at> gmail.com> wrote:
> > Hi,
> > we are getting this messages in mongos and at the same time our application
> > returns error that it didn't get the data it expected:
> > 14:35:08 [conn20129] ns: prod.msgs could not initialize cursor across all
> > shards because :staleconfigdetected for ns: prod.msgs
> > ClusteredCursor::_checkCursor <at> shard001/db0101:19281,db0102:19281 attempt:
> > 0
> > 14:35:08 [conn20129] ns: prod.msgs could not initialize cursor across all
> > shards because :staleconfigdetected for ns: prod.msgs
> > ClusteredCursor::_checkCursor <at> shard001/db0101:19281,db0102:19281 attempt:
> > 1
> > 14:35:09 [conn20129] ns: prod.msgs could not initialize cursor across all
> > shards because :staleconfigdetected for ns: prod.msgs
> > ClusteredCursor::_checkCursor <at> shard001/db0101:19281,db0102:19281 attempt:
> > 2
> > 14:35:11 [conn20129] created new distributed lock for prod.msgs on
> > db0101:19282,db0201:19282,db0102:19282 ( lock timeout : 900000, ping
> > interval : 30000, process : 0 )
>
> > can we know what causing it? how the application should handle such issues?
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To view this discussion on the web visit
> >https://groups.google.com/d/msg/mongodb-user/-/VjYSs6eKD8EJ.
> > To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> > To unsubscribe from this group, send email to
> > mongodb-user+unsubscribe <at> googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/mongodb-user?hl=en.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/AHDMwNw6s68J.
To post to this group, send email to mongodb-user-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Nat | 1 Mar 2012 01:56
Picon

Re: Really slow queries

It shouldn't be that slow. Can you post your db.<Collectionname>.stats(), mongostat, iostat -x 2, your
machine load etc (as attachment or pastebin)?
-----Original Message-----
From: Nolhian <Eldurian@...>
Sender: mongodb-user@...
Date: Wed, 29 Feb 2012 16:11:42 
To: mongodb-user<mongodb-user@...>
Reply-To: mongodb-user@...
Subject: [mongodb-user] Really slow queries

Hello,

I made a test collection with 100 000 documents 2ko each. I have a
state attribute which is random between 1 and 50. When I try to search
documents of a certain state :

col.find({'state':3}).explain()
{u'allPlans': [{u'cursor': u'BasicCursor', u'indexBounds': []}],
u'millis': 177, u'n': 1821, u'cursor': u'BasicCursor', u'indexBounds':
[], u'nscannedObjects': 100000, u'nscanned': 100000}

This is nice, 177ms without an index. Now let's add 1 000 000
documents, still 2ko each. The same query :

col.find({'state':3}).explain()
{u'allPlans': [{u'cursor': u'BasicCursor', u'indexBounds': []}],
u'millis': 78983, u'n': 21389, u'cursor': u'BasicCursor',
u'indexBounds': [], u'nscannedObjects': 1100000, u'nscanned': 1100000}

79s ! Let's add an index on state and do it again :

col.find({'state':3}).explain()
{u'allPlans': [{u'cursor': u'BtreeCursor state_1', u'indexBounds':
[[{u'state': 3}, {u'state': 3}]]}], u'millis': 26790, u'n': 21389,
u'cursor': u'BtreeCursor state_1', u'indexBounds': [[{u'state': 3},
{u'state': 3}]], u'nscannedObjects': 21389, u'nscanned': 21389}

26s !

This is really really slow,  I don't understand the gap between 100k
documents without index 177ms and 1 million with an index 26s. By the
way I tested with 10 millions documents, it took 11 min to do the same
query.

At 100k documents speed is okay for me but I need a lot more documents
in the collection and I think more than 500ms is unacceptable.

Any idea of what's going on ? Are these speed tests normal ?

--

-- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@...
To unsubscribe from this group, send email to mongodb-user+unsubscribe@...
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Manoj | 1 Mar 2012 02:12
Picon

dropping database in secondary to free space

Hi Mongos,

I have a few huge database in a replicaset which takes up to 95% of
the disk space and my disk alarm is in critical state. Even though I
deleted lots of unwanted collection to free up the space, it doesn't
help in the disk because mongo wont release the free space to OS.
There no option to do the repairdatabase because not enough space.

So what I am planning to do is bring 1 secondary from the 3server
replicaset. And then drop the dababase and resync again from the
primary to free the space to OS. Once I finish that in 1 secondary
then I will get that back to the set. Once it is in proper sync (how
do I check this, using rs.status()? using the optime?) I will do the
same in other 2 servers in the set.

My question is how do I do this?

Can I drop one database and resync that? Or do I need to drop
everything and resync?

Please help me

Regards
Manoj

Nat | 1 Mar 2012 02:24
Picon

Re: dropping database in secondary to free space

You would need to drop everything and resync. If you want to have lower impact to database performance, you
can probably take down that member, attach temp disk that has a lot of disk space, move the data files there,
run repairDatabase, move the data files back, detach temp disk, start your mongod back to join the replicaset.
-----Original Message-----
From: Manoj <manojtr@...>
Sender: mongodb-user@...
Date: Wed, 29 Feb 2012 17:12:05 
To: mongodb-user<mongodb-user@...>
Reply-To: mongodb-user@...
Subject: [mongodb-user] dropping database in secondary to free space

Hi Mongos,

I have a few huge database in a replicaset which takes up to 95% of
the disk space and my disk alarm is in critical state. Even though I
deleted lots of unwanted collection to free up the space, it doesn't
help in the disk because mongo wont release the free space to OS.
There no option to do the repairdatabase because not enough space.

So what I am planning to do is bring 1 secondary from the 3server
replicaset. And then drop the dababase and resync again from the
primary to free the space to OS. Once I finish that in 1 secondary
then I will get that back to the set. Once it is in proper sync (how
do I check this, using rs.status()? using the optime?) I will do the
same in other 2 servers in the set.

My question is how do I do this?

Can I drop one database and resync that? Or do I need to drop
everything and resync?

Please help me

Regards
Manoj

-- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@...
To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

--

-- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@...
To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Srimonti | 1 Mar 2012 02:28
Picon

how to pass command line arguments to mongo shell script

I would like to do

./mongo dbname myscript.js --myparam value

where myparam would be passed to myscript as a command line argument.

Is this possible?

Thanks,
Srimonti

Raxit Sheth | 1 Mar 2012 02:29
Picon

Re: Optimizing data insertion

Hi Abhishek

With this details + what is your existing hardware details +
configuration, shoot out other mail (different thread)
I am sure many from the group can able to assist you.

PS : I am on the move.

Raxit

On Thu, Mar 1, 2012 at 2:36 AM, Abhishek Pratap <apratap@...> wrote:
> Raxit
>
> Not sure what you meant by numbers but I will include some. Let me
> know know if you are looking for something else.
>
> 1. Our each experiment's input data set to Mongo can range from 100
> million to 500 million records
> 2. In the long run we expect the total number of records to be more
> than (500 million times 200). This could be an underestimate.
> 3. For now I want to begin small relative to #2 and say managing 500
> million records efficiently will be good. We want to have a system
> where we can push data into Mongo as quickly as we can. The data type
> will be mostly strings. Each record will have 10-15 columns.
>
> I will be glad to share more details if needed.
>
> Best,
> -Abhi
>
> On Wed, Feb 29, 2012 at 12:52 PM, Raxit Sheth
<raxitsheth2000@...> wrote:
>> Hi Abhishek
>>
>> 1. any numbers ?
>> 2. journaling is on/off ? if it is on, data will be writen multiple times
>>
>>
>>
>> Raxit
>>
>> On Thu, Mar 1, 2012 at 1:50 AM, Abhishek Pratap <apratap@...> wrote:
>>> Hi Raxit
>>>
>>> Thanks for following up.
>>>
>>> For performance I would say I am looking to leverage the most I can
>>> from Mongo in terms of data insertion and running queries over it. In
>>> our case the data insertion will be frequent and in production cases
>>> we will have to push the data into Mongo either through perl/python as
>>> it needs some cleaning.
>>>
>>> I tried mongoimport without doing the required cleaning on my data
>>> just to test the insertion speed #records/sec to get a sense of
>>> scalability of the DB.
>>>
>>> What I am looking for is some recommendations of the env to run
>>> mongodb in to get the best performance. For example free RAM, impact
>>> of journaling etc.
>>>
>>> And the recommneded way for inserting data into DB through perl/python
>>> API. Should the insertions be done per record or batch inserted (#N
>>> records at a time).
>>>
>>> Thanks!
>>> -Abhi
>>>
>>> On Tue, Feb 28, 2012 at 11:21 PM, Raxit Sheth
<raxitsheth2000@...> wrote:
>>>> simple extrapolate (which may be wrong)
>>>>
>>>> Regular perl (total time = read from file + parse + insert to mongo) :
>>>> 1M record/210 second
>>>> mongoimport <Here you do parsing etc ?> :   10k/sec, i.e. 1M/100 second.
>>>
>>> I dint do parsing for testing purposes
>>>>
>>>> Normally this is non-frequent activity ?
>>>> Performance depends on many factors ? RAM/HD Speed/ OS stuff etc, so
>>>> you need to take call what is acceptable ?
>>>>
>>>> What kind of performance you are looking for ?
>>>>
>>>> Raxit
>>>>
>>>> On Wed, Feb 29, 2012 at 6:31 AM, Abhishek Pratap <apratap@...> wrote:
>>>>> Thanks for link Raxit.
>>>>>
>>>>> Using mongoimport I am getting close to 10,000 records inserted /
>>>>> second.  Does this sound reasonable ? I know it might depend on the
>>>>> record type but I am trying to get a rough idea if I am seeing the
>>>>> right efficiency.
>>>>>
>>>>> Also can there is no option to index while inserting with mongoimport ???
>>>>>
>>>>> Cheers!
>>>>> -Abhi
>>>>>
>>>>> On Tue, Feb 28, 2012 at 11:40 AM, Raxit Sheth <raxitsheth2000 <at> gmail.com> wrote:
>>>>>> you may want to check utility like import
>>>>>>
>>>>>> http://www.mongodb.org/display/DOCS/Import+Export+Tools
>>>>>>
>>>>>> On Wed, Feb 29, 2012 at 12:44 AM, Abhishek Pratap <apratap@...> wrote:
>>>>>>> Hi Guys
>>>>>>>
>>>>>>> I have started playing with MongoDB as of yesterday so my questions
>>>>>>> can be naive and I might ask too many of them in the next couple of
>>>>>>> days.
>>>>>>>
>>>>>>> I am trying to use the DB to create an index for fast lookups for a
>>>>>>> data set of about 500 million records/rows. I am parsing the data from
>>>>>>> input file through perl and then inserting each record one at a time
>>>>>>> in the DB using MongoDB perl's API.
>>>>>>>
>>>>>>> At present I am able to insert about 1 million records in the DB  in
>>>>>>> about 3.5 minutes. Ofcourse this also includes the file reading and
>>>>>>> parsing time of perl so not really a true estimate of time it takes
>>>>>>> MongoDB to insert these records.
>>>>>>>
>>>>>>>
>>>>>>> I am wondering if data can be inserted in batch mode and if that will
>>>>>>> be quicker. I am also trying to index on a set of columns. Does
>>>>>>> indexing on more than one columns slow down the IO from DB.
>>>>>>>
>>>>>>>
>>>>>>> FYI : The mongo server currently is running on the same box I use for
>>>>>>> dev. For production purposes I would like to know what are the
>>>>>>> recommendations for setting up the database server.
>>>>>>>
>>>>>>> For now I will end here.
>>>>>>>
>>>>>>> -Abhi
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>>>>> To post to this group, send email to mongodb-user@...
>>>>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe@...
>>>>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>>>> To post to this group, send email to mongodb-user@...
>>>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe@...
>>>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>>> To post to this group, send email to mongodb-user@...
>>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>>> To post to this group, send email to mongodb-user@...
>>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>>> To post to this group, send email to mongodb-user@...
>>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
>> To post to this group, send email to mongodb-user@...
>> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@...
> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

--

-- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@...
To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Manoj | 1 Mar 2012 02:37
Picon

Re: dropping database in secondary to free space

I am fine with drop everything and resync.

Whats the best way to do this. Let me put a plan.

1. Remove 1 sec node from the set . rs.remove(hostportstr)
2. Stop mongod in sec
3. rm -rf  everything in /var/lib/mongo
4. restart sec and add the sec back to the set
5. wait for the sec to sync ( verify the completion of sync by
rs.status optime???)
6. do this the second secondary
7. Do a failover in primary
8. do the same in primary
9. all good

Is this sounds good? I think we can even avoid the step 1??

On Mar 1, 12:24 pm, "Nat" <nat.lu...@...> wrote:
> You would need to drop everything and resync. If you want to have lower impact to database performance, you
can probably take down that member, attach temp disk that has a lot of disk space, move the data files there,
run repairDatabase, move the data files back, detach temp disk, start your mongod back to join the replicaset.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Manoj <mano...@...>
> Sender: mongodb-user@...
> Date: Wed, 29 Feb 2012 17:12:05
> To: mongodb-user<mongodb-user@...>
> Reply-To: mongodb-user@...
> Subject: [mongodb-user] dropping database in secondary to free space
>
> Hi Mongos,
>
> I have a few huge database in a replicaset which takes up to 95% of
> the disk space and my disk alarm is in critical state. Even though I
> deleted lots of unwanted collection to free up the space, it doesn't
> help in the disk because mongo wont release the free space to OS.
> There no option to do the repairdatabase because not enough space.
>
> So what I am planning to do is bring 1 secondary from the 3server
> replicaset. And then drop the dababase and resync again from the
> primary to free the space to OS. Once I finish that in 1 secondary
> then I will get that back to the set. Once it is in proper sync (how
> do I check this, using rs.status()? using the optime?) I will do the
> same in other 2 servers in the set.
>
> My question is how do I do this?
>
> Can I drop one database and resync that? Or do I need to drop
> everything and resync?
>
> Please help me
>
> Regards
> Manoj
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongodb-user@...
> To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
> For more options, visit this group athttp://groups.google.com/group/mongodb-user?hl=en.

--

-- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongodb-user@...
To unsubscribe from this group, send email to mongodb-user+unsubscribe <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Nolhian | 1 Mar 2012 02:42
Picon
Favicon

Re: Really slow queries

Sure !

> db.components.stats()
{
	"ns" : "Test.components",
	"count" : 1100000,
	"size" : 2324397360,
	"storageSize" : 2347428352,
	"numExtents" : 26,
	"nindexes" : 1,
	"lastExtentSize" : 397967360,
	"paddingFactor" : 1,
	"flags" : 1,
	"totalIndexSize" : 45522944,
	"indexSizes" : {
		"_id_" : 45522944
	},
	"ok" : 1
}

In htop mongod never goes over 20% CPU seems to have 10% average.
However mongod uses 92% of my ram with or without any request. My Cpu
is an Intel Atom 330  <at>  1.60GHz ( dualcore with Hyper Threading ).

For this 1,1 million documents collection with no index and executing
col.find({'state':3}).explain() I got a total time of 66s :

mongostat + iostat : http://pastebin.com/R8v009Kp


Gmane