David Worms | 1 Feb 01:52 2003

Re: [LARM] next steps


On Friday, January 31, 2003, at 03:48  PM, Clemens Marschner wrote:
>
> Great, so how should we go on?
>
> I suggest we wait for you, David, so that you can make the code a 
> little
> more stable and change the things you mentioned. You said something 
> about
> two weeks (?)

Two weeks is the time I need to become more familiar with the crawler, 
setup some config, try Merlin, and get a deeper look at the excalibur 
event package. At this time, I could send a similar but cleaner code.

> I would say we should then be at a point where we could get rid of
> de.lanlab.* packages and move the rest to something like 
> org.apache.larm and
> then put it into the sandbox.

or maybe incubator.apache.org

> We should also check if performance is a problem, especially with those
> factory methods.

We could easily avoid the message factories and use regular constructor.

> Within this time we should also review the docs and adapt the 
> LARM-speak
> (MessageHandler? MessageListener? MessageProcessor? Stage? Storage?)
(Continue reading)

Tatu Saloranta | 1 Feb 21:47 2003
Picon

Re: Escaping bug \( and ? or *

On Friday 31 January 2003 13:27, Lukas Zapletal wrote:
> Hello all,
>
> Let`s have an indexed text "Test (1) and test (2)".
>
> Now search for: \(1\)
>
> Everything OK, so lets search for: \(?\)
>
> Nothing found! It`s same with \" and maybe other escaped characters.
>
> Is this a bug? Is it already solved in the CVS? If not, how can we fix it?

I think the problem is that the analyzer you used for indexer strips out 
parenthesis. So, text actually indexed would look something like:
"test 1 test 2" (assuming 'and' is a stop word removed). Thus there's
no token matching term "(1)" or "(2)".
Same goes for most other punctuation characters, they are routinely
stripped by analyser, as they usually are not very useful for searching.

To make it work the way you want, you need to modify analyzer to 
included parentesis, perhaps so that they are included only if
they contain just single alpha-numeric token (otherwise
"(1 and 2)" would be tokenized to "(1" and "2)" which is probably
not what you want?

-+ Tatu +-
Lukas Zapletal | 2 Feb 17:09 2003
Picon

Re: Escaping bug \( and ? or *

Lukas Zapletal wrote:

> Hello all,
>
> Let`s have an indexed text "Test (1) and test (2)".
>
> Now search for: \(1\)
>
> Everything OK, so lets search for: \(?\)
>
> Nothing found! It`s same with \" and maybe other escaped characters.
>
> Is this a bug? Is it already solved in the CVS? If not, how can we fix 
> it?
>
> Thanks for help! Lucene rocks.
>
> ps - have anybody compiled lucene with GCJ? if so with any results in 
> performance?
>
attaching JUnit test....

--

-- 
Lukas Zapletal      [lzap <at> root.cz]
http://www.tanecni-olomouc.cz/lzap

Attachment (juEscapeBug.java): text/x-java, 2279 bytes
---------------------------------------------------------------------
(Continue reading)

bugzilla | 2 Feb 17:12 2003
Picon

DO NOT REPLY [Bug 16677] New: - Escape bug

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16677>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16677

Escape bug

           Summary: Escape bug
           Product: Lucene
           Version: 1.2
          Platform: All
        OS/Version: Windows XP
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: QueryParser
        AssignedTo: lucene-dev <at> jakarta.apache.org
        ReportedBy: lzap <at> seznam.cz

package cz.finesoft.socd;

import junit.framework.*;
import org.apache.lucene.index.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.*;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;
(Continue reading)

none none | 2 Feb 17:14 2003
Picon

Re: Escaping bug \( and ? or *

i am not sure, but could be because you use 2 different analyzer for indexing and searching?

--

On Sun, 02 Feb 2003 17:09:21  
 Lukas Zapletal wrote:
>Lukas Zapletal wrote:
>
>> Hello all,
>>
>> Let`s have an indexed text "Test (1) and test (2)".
>>
>> Now search for: \(1\)
>>
>> Everything OK, so lets search for: \(?\)
>>
>> Nothing found! It`s same with \" and maybe other escaped characters.
>>
>> Is this a bug? Is it already solved in the CVS? If not, how can we fix 
>> it?
>>
>> Thanks for help! Lucene rocks.
>>
>> ps - have anybody compiled lucene with GCJ? if so with any results in 
>> performance?
>>
>attaching JUnit test....
>
>-- 
>Lukas Zapletal      [lzap <at> root.cz]
(Continue reading)

Lukas Zapletal | 2 Feb 17:28 2003
Picon

Re: Escaping bug \( and ? or *

Tatu Saloranta wrote:

>To make it work the way you want, you need to modify analyzer to 
>included parentesis, perhaps so that they are included only if
>they contain just single alpha-numeric token (otherwise
>"(1 and 2)" would be tokenized to "(1" and "2)" which is probably
>not what you want?
>
AAAH, and I was creating the JUnit test...

Please anybody reject the record in bugzilla, I`m sorry.

Thanks Tatu! Have a nice day.

--
Respect to all space pilots whos lifes were ended on the board of Columbia.

--

-- 
Lukas Zapletal      [lzap <at> root.cz]
http://www.tanecni-olomouc.cz/lzap
Lukas Zapletal | 2 Feb 17:44 2003
Picon

Re: Escaping bug \( and ? or *

Tatu Saloranta wrote:

>I think the problem is that the analyzer you used for indexer strips out 
>parenthesis. So, text actually indexed would look something like:
>"test 1 test 2" (assuming 'and' is a stop word removed). Thus there's
>no token matching term "(1)" or "(2)".
>Same goes for most other punctuation characters, they are routinely
>stripped by analyser, as they usually are not very useful for searching.
>
>To make it work the way you want, you need to modify analyzer to 
>included parentesis, perhaps so that they are included only if
>they contain just single alpha-numeric token (otherwise
>"(1 and 2)" would be tokenized to "(1" and "2)" which is probably
>not what you want?
>
Well I think this is not true.

I use this analzyer either for queries. So the parenthesis and other 
puncatuation are also stripped when I make query.

This is MAYBE a bug. PLEASE TEST THE CODE.

--

-- 
Lukas Zapletal      [lzap <at> root.cz]
http://www.tanecni-olomouc.cz/lzap
Ralph Schaer | 2 Feb 09:37 2003
Picon

NOT Bug in 1.3

Hi
I've found an error in 1.3. The NOT is not working correct. Here's the 
program. With the latest nightly build the Searcher finds the document, but
it should not. With version 1.2 the program works correct.
Regards
Ralph

      IndexWriter writer = new IndexWriter("c:\\temp\\ix", new SimpleAnalyzer(), true);
      Document doc = new Document();
      doc.add(Field.Text("txt", "one"));
      doc.add(Field.Text("txt", "two"));
      doc.add(Field.UnIndexed("id", "1"));
      writer.addDocument(doc);
      writer.optimize();
      writer.close();    
      Searcher searcher = new IndexSearcher("c:\\temp\\ix");          
      Query query = QueryParser.parse("one NOT two", "txt", new SimpleAnalyzer());           
      Hits hits = searcher.search(query); 
      for (int i = 0; i < hits.length(); i++) {
        System.out.println(hits.doc(i).get("id"));      
      }
bugzilla | 2 Feb 19:54 2003
Picon

DO NOT REPLY [Bug 16677] - Escape bug

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16677>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16677

Escape bug

otis <at> apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID

------- Additional Comments From otis <at> apache.org  2003-02-02 18:54 -------
This is not a bug.  The provided unit test class uses 2 different Analyzers for
indexing and searching.
bugzilla | 3 Feb 17:31 2003
Picon

DO NOT REPLY [Bug 16719] New: - java.io.IOException: Pipe closed

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16719>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16719

java.io.IOException: Pipe closed

           Summary: java.io.IOException: Pipe closed
           Product: Lucene
           Version: 1.2
          Platform: PC
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Analysis
        AssignedTo: lucene-dev <at> jakarta.apache.org
        ReportedBy: rballing <at> gmx.at

When indexing some 550 HTML files, the following exception is raised repeatedly 
after file 500 but index seems to build OK.

java.io.IOException: Pipe closed
	at java.io.PipedReader.receive(Unknown Source)
	at java.io.PipedReader.receive(Unknown Source)
	at java.io.PipedWriter.write(Unknown Source)
	at java.io.Writer.write(Unknown Source)
(Continue reading)


Gmane