Chris Hostetter | 1 Apr 2007 01:26

Re: Lucene nightly build failure


: I imagine there are user accounts/passwords available.  I volunteer
: to be his backup, although I hope the only reason I would be needed
: for it is if he decides he doesn't want to maintain it anymore and
: not the bus scenario.

Sorry, i didn't mean to be negative -- it's just the classic
tech documentation/support scenerio that i've come to appreciate is even
more import with OSS projects then it is in a centralized company.

The old nightly builds were built managed in a way that was clearly
documented, and although not every commiter has an account on
lucene.zones.apache.org to deal with any changes/problems -- several do.
I just want to make sure we have the same level of coverage moving forward
to a new system.

-Hoss
Grant Ingersoll | 1 Apr 2007 01:40
Picon
Favicon
Gravatar

Re: Lucene nightly build failure


On Mar 31, 2007, at 7:26 PM, Chris Hostetter wrote:

>
> : I imagine there are user accounts/passwords available.  I volunteer
> : to be his backup, although I hope the only reason I would be needed
> : for it is if he decides he doesn't want to maintain it anymore and
> : not the bus scenario.
>
> Sorry, i didn't mean to be negative -- it's just the classic
> tech documentation/support scenerio that i've come to appreciate is  
> even
> more import with OSS projects then it is in a centralized company.
>
> The old nightly builds were built managed in a way that was clearly
> documented, and although not every commiter has an account on
> lucene.zones.apache.org to deal with any changes/problems --  
> several do.
> I just want to make sure we have the same level of coverage moving  
> forward
> to a new system.
>

Yep, I agree.  I am working on documenting it and communicating with  
Nigel on getting access.  I would propose that the same people who  
have zones access should have Hudson access.  One of the benefits to  
Hudson is that we don't have to run under a crontab under a specific  
account, which makes it easier for several people to maintain.

I will update the Wiki when I get more info.
(Continue reading)

Thang Luong Minh | 1 Apr 2007 08:27
Picon

Issues to watch out when making changes to Lucene's gap encoding scheme

Dear Lucene developers,

As part of my school project, I've proposed to implement a gap encoding
scheme, and try to replace it with the current scheme in Lucene which is the
byte-aligned scheme. The main purpose of making changes to Lucene source is
just for learning experience in dealing with open source code, conforming
with the current system, and other design issues.

The gap encoding I'm implementing is the Fixed Binary Coding scheme in which
each gap in a group are coded using the same number of bits. That scheme is
described in the paper here http://crpit.com/confpapers/CRPITV27Anh.pdf. The
advantage of the scheme is extremely fast decoding speed. In order to
efficiently index, we need to decompose a list of gaps into groups so that
the total number of bits used is minimized.
Ex: list of gaps is broken into 4 group (38, 17, 13, 34)  (6)  (4, 1, 3, 1)
(2, 3, 1)
A list of 4 selectors is 9, 1, 6, 9, each of which corresponds to one group.
By looking at the selector we could know how many bits used to code each gap
in the group associated with that selector, and how many gaps in that group.

Having described abt the new coding scheme, below are changes I have made to
Lucene source code:
+ I touched mostly on classes which involve proximity streams (.prx). In
DocumentWriter.writePostings(), and SegmentMerger.appendPostings(), instead
of using writeVInt for each gap, I go through the whole gap list, decompose
into groups, encode the whole list as described in the scheme, and
writeByte() to the .prx stream
+ Handle decoding gap by gap in SegmentTermPositions
+ By the nature of the scheme, I need to know the selector list in order to
decode the gaps, so I need another sel stream to store selectors. As gaps
(Continue reading)

Mohsen Saboorian | 1 Apr 2007 08:31
Picon

Hits.hitDoc(int)


Hi,

Since Hits.hitDoc(int) is private, the only way to access nth documet in a
returnd Hits object is to iterate over it n times. Why don't you make it
public?

Thanks.
--

-- 
View this message in context: http://www.nabble.com/Hits.hitDoc%28int%29-tf3500174.html#a9775149
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
Chris Hostetter | 1 Apr 2007 09:55

Re: Hits.hitDoc(int)


1) if you have questions about using the java lucene library, please send
them to the java-user <at> lucene mailing list ... java-dev if for discussing
hte development of the library.

2) ...

: Since Hits.hitDoc(int) is private, the only way to access nth documet in a
: returnd Hits object is to iterate over it n times. Why don't you make it
: public?

i don't understand why you think you need to iterate to get the nth match
... just call hits.doc(n)

-Hoss
Mohsen Saboorian | 1 Apr 2007 12:19
Picon

Re: Hits.hitDoc(int)


I was just *incorrectly* thinking that this is an API problem, and that's why
I sent my mail to this list. Anyway, sorry for  disturbance.

Chris Hostetter wrote:
> 
> 
> 1) if you have questions about using the java lucene library, please send
> them to the java-user <at> lucene mailing list ... java-dev if for discussing
> hte development of the library.
> 
> 2) ...
> 
> : Since Hits.hitDoc(int) is private, the only way to access nth documet in
> a
> : returnd Hits object is to iterate over it n times. Why don't you make it
> : public?
> 
> i don't understand why you think you need to iterate to get the nth match
> ... just call hits.doc(n)
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-dev-help <at> lucene.apache.org
> 
(Continue reading)

Chris Hostetter | 1 Apr 2007 20:51

Re: Hits.hitDoc(int)


: I was just *incorrectly* thinking that this is an API problem, and that's why

well, even if you think there is a problem with the API, the best approach
is to start by emailing  question to java-user to make sure you understand
the correct usage of the API, and then once you are sure that you are,
follow up with a thread on java-dev about chaging the API.

: I sent my mail to this list. Anyway, sorry for  disturbance.

there's no reason to be sorry - it's just a question of audience; we
want to make sure people get good anssers to their questions, and you are
more likely to get helpful answers from the (larger) java-user audience.

-Hoss
Nicolas Lalevée (JIRA | 1 Apr 2007 21:26
Picon
Favicon

[jira] Updated: (LUCENE-662) Extendable writer and reader of field data


     [
https://issues.apache.org/jira/browse/LUCENE-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Lalevée updated LUCENE-662:
-----------------------------------

    Attachment: indexFormat.patch
                indexFormat-only.patch

Synchronized with the trunk, so with the payload feature. It allowed me to refactor in one class the payload
writing which is in two places today : it is now in the DefaultPostingWriter class.

From my last update, the TODO list is still to do, nothing has been fixed. Furthermore there is a regression
in the new patch : the ensureOpen() is not correctly handled for lazy loaded fields : a test fail. This is due
to the fact that the FieldsReader doesn't handle it anymore in my patch. As the data struture can be
customized, lazy loading is exported to the FieldData created by the FieldsReader. So the both instance
have to communicate about the closing of the streams. So a new item in the TODO list.

As discussed in java-dev, here is a light patch with only the index format handling, without the
possibility to redefine how data and postings are store/retreived.

> Extendable writer and reader of field data
> ------------------------------------------
>
>                 Key: LUCENE-662
>                 URL: https://issues.apache.org/jira/browse/LUCENE-662
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
(Continue reading)

deva_java | 2 Apr 2007 10:58
Picon
Favicon

Invalid Sort results when column contains link names of like "*.html,*.xls,*.doc etc."


Hi,

In my application,I wanna sort out index based on different column names in
ascending or descending order 
but it doesn't do correctly

For example:

it does well as its column contains single word or phrase 

but in the phrase type contains differnet links like .html or .doc or .xls

it does sorting  based on .extension  so that i got invalid sorting results
like "A series in middle part".

How can i solve it ,Please help me

Regards
devanadan

--

-- 
View this message in context: http://www.nabble.com/Invalid-Sort-results-when-column-contains-link-names-of-like-%22*.html%2C*.xls%2C*.doc-etc.%22-tf3504432.html#a9787099
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
Steven Rowe | 2 Apr 2007 16:24
Picon
Favicon

Re: Invalid Sort results when column contains link names of like "*.html,*.xls,*.doc etc."

Hi devanadan,

deva_java wrote:
> In my application,I wanna sort out index based on different column names in
> ascending or descending order but it doesn't do correctly
> 
> For example:
> 
> it does well as its column contains single word or phrase but in the
> phrase type contains differnet links like .html or .doc or .xls it 
> does sorting based on .extension so that i got invalid sorting
> results like "A series in middle part".

This question, one of Lucene usage rather than of development, really
belongs on the java-user list instead of java-dev.

I'm guessing that the sort field you're using is tokenized - if so, this
is a problem.

From the org.apache.lucene.search.Sort API documentation[1]:

    The fields used to determine sort order must be carefully chosen.
    Documents must contain a single term in such a field .... The field
    must be indexed, but should not be tokenized, and does not need to
    be stored (unless you happen to want it back with the rest of your
    document data). In other words:

      document.add(new Fieldable("byNumber", Integer.toString(x),
                   Fieldable.Store.NO, Fieldable.Index.UN_TOKENIZED));

(Continue reading)


Gmane