Jithin | 1 Oct 2011 05:19
Picon
Gravatar

Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

Thanks a million Uwe. That fixes it.

On Sat, Oct 1, 2011 at 4:16 AM, Uwe Schindler [via Lucene] <
ml-node+s472066n3383905h73 <at> n3.nabble.com> wrote:

> Hi,
>
> The junk is appended here: buffer.append(termAtt.buffer());
>
> I assume you are on Lucene 3.1+, so use buffer.append(termAtt); termAtt
> implements CharSequence, so it can be appended to any StringBuilder.
> The code you are using appends the whole char array, which may contain
> characters after termAtt.length().
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=0>
>
> > -----Original Message-----
> > From: Jithin [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=1>]
>
> > Sent: Friday, September 30, 2011 11:12 PM
> > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=3383905&i=2>
> > Subject: Writing a TokenConcatenateFilter - junk characters appearing on
> > output.
> >
(Continue reading)

Jithin | 1 Oct 2011 06:42
Picon
Gravatar

Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

I have added this custom filter at the end of my query. Now only my first
document is getting indexed.

--
View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384379.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Jithin | 1 Oct 2011 07:27
Picon
Gravatar

Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

I meant to say. Now my analser chain looks like this. 

            <analyzer type="index">                                                                                                                                                                       
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[-_]" replacement=" " />                                                                                                
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />                                                                        
                <tokenizer class="solr.WhitespaceTokenizerFactory" />                                                                                                                                     
                <filter class="solr.LowerCaseFilterFactory" />                                                                                                                                            
                <filter class="solr.StopWordFilterFactory" ignoreCase="true"                                                                                                         
                    words="words.txt" />                                                                                                                
                <filter
class="org.ctown.solr.analysis.CTConcatFilterFactory" />                                                                                                                          
            </analyzer>    
            <analyzer type="query">                                                                                                                                                                       
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[-_]" replacement=" " />                                                                                                
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />                                                                        
                <tokenizer class="solr.KeywordTokenizerFactory" />                                                                                                                                        

            </analyzer>  

But only my first document is getting indexed. Is there any logging I can
enable to see what is going wrong?

--
View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384419.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
(Continue reading)

Jithin | 1 Oct 2011 09:19
Picon
Gravatar

Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

Figured out the issue. finished variable needs to be reinitialized to false
once current stream is over.

    if (finished) {
        logger.debug("Finished");
        finished = false;
        return false;
    }

Looks like the same class is being reused. Makes sense.

On Sat, Oct 1, 2011 at 10:57 AM, Jithin [via Lucene] <
ml-node+s472066n3384419h7 <at> n3.nabble.com> wrote:

> I meant to say. Now my analser chain looks like this.
>
>             <analyzer type="index">
>
>
>                 <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[-_]" replacement=" " />
>
>                 <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="[^\p{L}\p{Nd}\p{Mn}\p{Mc}\s+]" replacement="" />
>
>                 <tokenizer class="solr.WhitespaceTokenizerFactory" />
>
>
>                 <filter class="solr.LowerCaseFilterFactory" />
>
(Continue reading)

Shai Erera | 1 Oct 2011 15:37
Picon

Re: TaxWriter leakage?

That's weird. The line that throws the NPE seems like it ... well, cannot
throw an NPE :).

LuceneTaxonomyWriter, line 724:

    return getParentArray().getArray()[ordinal];

getParentArray() never returns null, so it cannot be from there.
ParentArray.getArray() cannot return null, as its internal array is
initialized by LuceneTaxonomyWriter.getParentArray() (see lines 712-713), so
that's really odd.

But perhaps we're looking in the wrong place. Can you please describe what
are the steps that you perform, and how is using ThreadedIndexWriter
related?

Shai

On Thu, Sep 29, 2011 at 7:22 PM, Mihai Caraman <caraman.mihai <at> gmail.com>wrote:

> Hmm.. if i leave it a couple of minutes before restarting, it doesn't log
> the proper shutdown steps, but it does restart correctly.
>
> 2011/9/29 Mihai Caraman <caraman.mihai <at> gmail.com>
>
> > There may be some likage while using threadedIndexWriter...
> >
> > The app start as a listener servlet in tomcat6
> > First start, all ok.
> > First close, none of these lines appear:
(Continue reading)

Uwe Schindler | 1 Oct 2011 16:47
Picon
Favicon

Re: TaxWriter leakage?

Maybe another Java7 bug? Are you using Java 7?
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

Shai Erera <serera <at> gmail.com> schrieb:

That's weird. The line that throws the NPE seems like it ... well, cannot
throw an NPE :).

LuceneTaxonomyWriter, line 724:

return getParentArray().getArray()[ordinal];

getParentArray() never returns null, so it cannot be from there.
ParentArray.getArray() cannot return null, as its internal array is
initialized by LuceneTaxonomyWriter.getParentArray() (see lines 712-713), so
that's really odd.

But perhaps we're looking in the wrong place. Can you please describe what
are the steps that you perform, and how is using ThreadedIndexWriter
related?

Shai

On Thu, Sep 29, 2011 at 7:22 PM, Mihai Caraman <caraman.mihai <at> gmail.com>wrote:

> Hmm.. if i leave it a couple of minutes before restarting, it doesn't log
> the proper shutdown steps, but it does restart correctly.
(Continue reading)

Andrzej Bialecki | 3 Oct 2011 11:52

[ANN] Luke 3.4.0 release

Hi,

I'm happy to announce that Luke - The Lucene Index Toolbox for Lucene 
3.4.0 is available now for download from the project page:

	http://code.google.com/p/luke

Changes in version 3.4.0 (released on 2011.10.03):
* Issue 46: and 47: Update to Lucene 3.4.0 and fix some changed APIs.
* Rearranged "field flags" so that they are more logical and cover
   index options added in 3.4.0. E.g. omitNorms is represented as
   "with Norms" and marked by "N", IndexOptions are expanded to "Idfp"
   to mark indexed fields with docs, freqs and positions.

Enjoy!

--

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com
Shai Erera | 3 Oct 2011 13:45
Picon

Re: [ANN] Luke 3.4.0 release

Thanks Andrezj !

I downloaded the standalone lukeall-3.4.0.jar and ran "java -jar
lukeall-3.4.0.jar" and I get this:

java.lang.NumberFormatException: For input string: "60  "
        at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:59)
        at java.lang.Integer.parseInt(Integer.java:469)
        at java.lang.Integer.valueOf(Integer.java:565)
        at thinlet.Thinlet.addAttribute(Unknown Source)
        at thinlet.Thinlet.parse(Unknown Source)
        at thinlet.Thinlet.parse(Unknown Source)
        at thinlet.Thinlet.parse(Unknown Source)
        at thinlet.Thinlet.parse(Unknown Source)
        at org.getopt.luke.Luke.addComponent(Unknown Source)
        at org.getopt.luke.Luke.<init>(Unknown Source)
        at org.getopt.luke.Luke.startLuke(Unknown Source)
        at org.getopt.luke.Luke.main(Unknown Source)
Exception in thread "main" java.lang.IllegalArgumentException: unknown text
string for null
        at thinlet.Thinlet.getDefinition(Unknown Source)
        at thinlet.Thinlet.setString(Unknown Source)
        at org.getopt.luke.Luke.errorMsg(Unknown Source)
        at org.getopt.luke.Luke.addComponent(Unknown Source)
        at org.getopt.luke.Luke.<init>(Unknown Source)
        at org.getopt.luke.Luke.startLuke(Unknown Source)
        at org.getopt.luke.Luke.main(Unknown Source)

I use Windows 7, tried with Java 5 & 6 (Oracle and IBM) and get the same
(Continue reading)

Erick Erickson | 3 Oct 2011 14:45
Picon

Re: [ANN] Luke 3.4.0 release

Same thing happened to me on a Mac, Java 1.6

FWIW
Erick

On Mon, Oct 3, 2011 at 7:45 AM, Shai Erera <serera <at> gmail.com> wrote:
> Thanks Andrezj !
>
> I downloaded the standalone lukeall-3.4.0.jar and ran "java -jar
> lukeall-3.4.0.jar" and I get this:
>
> java.lang.NumberFormatException: For input string: "60  "
>        at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:59)
>        at java.lang.Integer.parseInt(Integer.java:469)
>        at java.lang.Integer.valueOf(Integer.java:565)
>        at thinlet.Thinlet.addAttribute(Unknown Source)
>        at thinlet.Thinlet.parse(Unknown Source)
>        at thinlet.Thinlet.parse(Unknown Source)
>        at thinlet.Thinlet.parse(Unknown Source)
>        at thinlet.Thinlet.parse(Unknown Source)
>        at org.getopt.luke.Luke.addComponent(Unknown Source)
>        at org.getopt.luke.Luke.<init>(Unknown Source)
>        at org.getopt.luke.Luke.startLuke(Unknown Source)
>        at org.getopt.luke.Luke.main(Unknown Source)
> Exception in thread "main" java.lang.IllegalArgumentException: unknown text
> string for null
>        at thinlet.Thinlet.getDefinition(Unknown Source)
>        at thinlet.Thinlet.setString(Unknown Source)
>        at org.getopt.luke.Luke.errorMsg(Unknown Source)
(Continue reading)

Mihai Caraman | 3 Oct 2011 16:09
Picon

Re: [ANN] Luke 3.4.0 release

same on win7 and ubuntu11.

2011/10/3 Erick Erickson <erickerickson <at> gmail.com>

> Same thing happened to me on a Mac, Java 1.6
>
> FWIW
> Erick
>
> On Mon, Oct 3, 2011 at 7:45 AM, Shai Erera <serera <at> gmail.com> wrote:
> > Thanks Andrezj !
> >
> > I downloaded the standalone lukeall-3.4.0.jar and ran "java -jar
> > lukeall-3.4.0.jar" and I get this:
> >
> > java.lang.NumberFormatException: For input string: "60  "
> >        at
> >
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:59)
> >        at java.lang.Integer.parseInt(Integer.java:469)
> >        at java.lang.Integer.valueOf(Integer.java:565)
> >        at thinlet.Thinlet.addAttribute(Unknown Source)
> >        at thinlet.Thinlet.parse(Unknown Source)
> >        at thinlet.Thinlet.parse(Unknown Source)
> >        at thinlet.Thinlet.parse(Unknown Source)
> >        at thinlet.Thinlet.parse(Unknown Source)
> >        at org.getopt.luke.Luke.addComponent(Unknown Source)
> >        at org.getopt.luke.Luke.<init>(Unknown Source)
> >        at org.getopt.luke.Luke.startLuke(Unknown Source)
> >        at org.getopt.luke.Luke.main(Unknown Source)
(Continue reading)


Gmane