Shivani | 2 May 2011 04:12
Picon

[gensim:248] Pyro and Distributed LDA using gensim

Hello,

I am attempting to use Pyro with Distributed LDA, but it does not seem
to be that straight forward

>>lda = gensim.models.ldamodel.LdaModel(corpus=myCorpus, id2word=myCorpus.dictionary, numTopics=100,passes=20,distributed=True)

gives
ERROR:ldamodel:failed to initialize distributed LDA (No module named
Pyro)

I then tried

>>import Pyro.util
>>import Pyro.core
>>lda = gensim.models.ldamodel.LdaModel(corpus=myCorpus, id2word=myCorpus.dictionary, numTopics=100,passes=20,distributed=True)

and got:

ERROR:ldamodel:failed to initialize distributed LDA ('module' object
has no attribute 'naming')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/gensim-0.7.8-py2.6.egg/
gensim/models/ldamodel.py", line 245, in __init__
    raise RuntimeError("failed to initialize distributed LDA (%s)" %
err)
RuntimeError: failed to initialize distributed LDA ('module' object
has no attribute 'naming')

(Continue reading)

Shivani | 2 May 2011 05:28
Picon

[gensim:249] LDA batch or streaming

Hello all,

I would like to tap out the lda model at different points when running
in online mode. That means if chunks = 10000 then every 10000
documents the lda model is updated and I would like to access the
model. Is there a way to access the model at each of those update
points?

Any help is appreciated.

Shivani

Shivani | 2 May 2011 05:39
Picon

[gensim:250] Re: LDA batch or streaming

I guess I found an answer
The only way I guess is to divide the corpus and use lda.update()
function

Shivani

On May 1, 11:28 pm, Shivani <raoshiv...@...> wrote:
> Hello all,
>
> I would like to tap out the lda model at different points when running
> in online mode. That means if chunks = 10000 then every 10000
> documents the lda model is updated and I would like to access the
> model. Is there a way to access the model at each of those update
> points?
>
> Any help is appreciated.
>
> Shivani

simonxue21 | 2 May 2011 11:11
Picon

[gensim:251] Can I have a "null" model?

Hi,

I know gensim is mostly built for LSI, where models like tfidf is
almost unavoidable. But I still want to have a "null" model, for cases
like recommendation systems. Is it possible? I guess the effort to
build such a model is not going to be huge. Isn't it? Can anybody
build one? Thanks.

simonxue21 | 2 May 2011 12:56
Picon

[gensim:252] Re: Can I have a "null" model?

The title here may be a little bit confusing. What I want to have is
the ability to specify the matrix before SVD, instead of generating
the matrix from corpus with predefined "tfidf" method. Thanks.

On 5月2日, 下午5时11分, simonxue21 <simonxu... <at> gmail.com> wrote:
> Hi,
>
> I know gensim is mostly built for LSI, where models like tfidf is
> almost unavoidable. But I still want to have a "null" model, for cases
> like recommendation systems. Is it possible? I guess the effort to
> build such a model is not going to be huge. Isn't it? Can anybody
> build one? Thanks.

Shivani | 4 May 2011 00:46
Picon

[gensim:253] index cannot be created for large corpora

>>> index = similarities.MatrixSimilarity(myCorpus)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/gensim-0.7.8-py2.6.egg/
gensim/similarities/docsim.py", line 111, in __init__
    self.corpus = numpy.empty(shape=(len(corpus), numFeatures),
dtype=dtype)
ValueError: array is too big.

i wish to use the online lda based algorithm in order to perform
retrieval tasks.
What is the alternative?

Stephan Gabler | 4 May 2011 10:05

Re: [gensim:254] Re: cPickle Problem


Hey guys,

thanks a lot for investigating the problem. 

Radim: what do you suggest how to deal with this problem. 
I suggest to add functionality to the lsi_model (or to models in general) to
store its very large matrices by using a different method (numpy saving methods for example) and
then exclude them from the normal pickling process like here:
http://stackoverflow.com/questions/2345944/exclude-objects-field-from-pickling-in-python

What do you think?

stephan

Am 21.04.2011 um 18:17 schrieb Radim:

> Matt managed to pinpoint the problem, so I'm forwarding his reply to
> clear up the issue.
> 
> Thanks a lot for investigating,
> Radim
> 
> 
> # Od: Matt Goodman
> # ----------------------------------------
> # I dug in a bit more, and it is a pickle/cpickle error.
> #
> # The minimal reproducing error is something like the following:
> # pickle.dump(open("test.pkl","w"), "a"*2**(32-1))
(Continue reading)

JCL | 4 May 2011 12:32
Picon

[gensim:255] Re: LDA forcing python to close

Hi just a quick update to this thread.

Got 64-bit Python to work, with NumPy, SciPy and Gensim now and have
no troubles at all.

Since my problem was, that I couldn't get NumPy and SciPy to work for
my ordinary 64-bit Python distribution, I got hold of a free Enthought
distribution through applying for a educational version. Enthought has
got both of the above mentioned frameworks and then it was easy to
install Gensim.

Now everything works just fine.

Kind Regards

Jens

On Apr 20, 4:01 pm, Radim <radimrehu...@...> wrote:
> Hello Jens,
>
> the topics are printed to log, so you have to turn on logging:
>
> import logging
> logging.basicConfig(level=logging.INFO)
>
> Best,
> Radim
>
> On Apr 19, 3:26 pm, Jens Christian Lange <jens.chr.la...@...>
> wrote:
(Continue reading)

JCL | 4 May 2011 12:35
Picon

[gensim:256] Capturing logging text

Hi

Is there any way to capture the text printed via logging?

My main concern is the results of printTopic(s) when using LDA, since
I would like to do some text analysis of the words that explain the
topic.

Thanks for your help

Jens

Radim | 4 May 2011 18:35
Picon
Favicon
Gravatar

[gensim:257] Re: Pyro and Distributed LDA using gensim

Hello Shivani,

On May 2, 4:12 am, Shivani <raoshiv...@...> wrote:
> Hello,
>
> I am attempting to use Pyro with Distributed LDA, but it does not seem
> to be that straight forward
>
> >>lda = gensim.models.ldamodel.LdaModel(corpus=myCorpus, id2word=myCorpus.dictionary, numTopics=100,passes=20,distributed=True)
>
> gives
> ERROR:ldamodel:failed to initialize distributed LDA (No module named
> Pyro)

if you cannot even import Pyro, that means it is not installed
properly. How did you install it? My guess is this is Pyro version
mismatch.

Gensim requires Pyro 4.x (not the 3.x series). Gensim should install
the 4.x version automatically, if you install with `easy_install
gensim[distributed]` (gensim's setup.py asks for Pyro >= 4.1).

HTH,
Radim

>
> I then tried
>
> >>import Pyro.util
> >>import Pyro.core
(Continue reading)


Gmane