Michael Busch (JIRA | 1 Nov 2007 02:21
Picon
Favicon

[jira] Updated: (LUCENE-935) Improve maven artifacts


     [
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-935:
---------------------------------

    Attachment: lucene-935-new.patch

This small patch adds the property "m2.repository.url", that defaults to
"file://${maven.dist.dir}". You easily can set a different value:
  ant -Dm2.repository.url="file://C:\temp\maven" 

I'm going to commit this now. In case our build machine cannot deploy to 
the snapshot repository using the local path 
"file:///www/people.apache.org/maven-snapshot-repository" let me know 
please, then we'll have to register a provider for an appropriate transfer 
protocol.

> Improve maven artifacts
> -----------------------
>
>                 Key: LUCENE-935
>                 URL: https://issues.apache.org/jira/browse/LUCENE-935
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Build
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
(Continue reading)

Michael Busch (JIRA | 1 Nov 2007 02:36
Picon
Favicon

[jira] Updated: (LUCENE-935) Improve maven artifacts


     [
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-935:
---------------------------------

    Fix Version/s:     (was: 2.3)

I just committed the latest patch.

I'm leaving this open in case more work needs to be done in order to deploy to the snapshot repository.
I'm clearing the fix version though.

> Improve maven artifacts
> -----------------------
>
>                 Key: LUCENE-935
>                 URL: https://issues.apache.org/jira/browse/LUCENE-935
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Build
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>         Attachments: lucene-935-new.patch, lucene-935-rename-poms.patch, lucene-935.patch
>
>
> There are a couple of things we can improve for the next release:
> - "*pom.xml" files should be renamed to "*pom.xml.template"
(Continue reading)

Grant Ingersoll | 1 Nov 2007 02:48
Picon
Favicon
Gravatar

Re: [jira] Updated: (LUCENE-935) Improve maven artifacts

Funny, I was just thinking of looking into setting up snapshot  
publishing.  Let me know what needs to be done to hook it up into the  
build process and I can take care of that.

Right now, Zones uses scp to copy the website to p.a.o, but we should  
be able to setup scp for M2 using the distribution management tag,  
right?  Nightly builds are currently hosted on zones.

-Grant

On Oct 31, 2007, at 9:21 PM, Michael Busch (JIRA) wrote:

>
>     [
https://issues.apache.org/jira/browse/LUCENE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>  ]
>
> Michael Busch updated LUCENE-935:
> ---------------------------------
>
>    Attachment: lucene-935-new.patch
>
> This small patch adds the property "m2.repository.url", that  
> defaults to
> "file://${maven.dist.dir}". You easily can set a different value:
>  ant -Dm2.repository.url="file://C:\temp\maven"
>
> I'm going to commit this now. In case our build machine cannot  
> deploy to
> the snapshot repository using the local path
(Continue reading)

Hoss Man (JIRA | 1 Nov 2007 03:25
Picon
Favicon

[jira] Commented: (LUCENE-1040) Can't quickly create StopFilter


    [
https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539273
] 

Hoss Man commented on LUCENE-1040:
----------------------------------

If the StopFilter constructor that takes in a Set no longer needs the "boolean ignoreCase" we should
probably: deprecate that constructor, document that ignoreCase is ignoredand the Set must only contain
lowercase items, add a new Set based constructor without that param.

> Can't quickly create StopFilter
> -------------------------------
>
>                 Key: LUCENE-1040
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1040
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>         Attachments: CharArraySet.patch
>
>
> Due to the use of CharArraySet by StopFilter, one can no longer efficiently pre-create a Set for use by
future StopFilter instances.

--

-- 
This message is automatically generated by JIRA.
-
(Continue reading)

Hoss Man (JIRA | 1 Nov 2007 03:32
Picon
Favicon

[jira] Resolved: (LUCENE-1041) This document has errors that must be fixed beforeUsing HTMLDocument class . Gives the following error This document has errors that must be fixed before using HTML Tidy to generate a tidied up version.


     [
https://issues.apache.org/jira/browse/LUCENE-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved LUCENE-1041.
------------------------------

    Resolution: Invalid

> This document has errors that must be fixed beforeUsing HTMLDocument class . Gives the following error
This document has errors that must be fixed before using HTML Tidy to generate a tidied up version.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1041
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Other
>         Environment: Solaris 10. http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ant/src/java/org/apache/lucene/ant/
>            Reporter: DURGA DEEP
>            Priority: Minor
>
> Writing e-mail parser, and we are impeded by this error.
> {noformat}
>                     HtmlDocument hd = new HtmlDocument (p.getInputStream());
>                     doc.add( new Field ( "contents", new StringReader(hd.getBody())) );
> HTMLDocument: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/index.html?org/apache/lucene/ant/HtmlDocument.html
> line 29 column 27 - Error: <st1:place> is not recognized!
> line 29 column 47 - Error: <st1:country-region> is not recognized!
> line 36 column 21 - Error: <o:p> is not recognized!
(Continue reading)

Michael Busch | 1 Nov 2007 04:47
Picon

Re: [jira] Updated: (LUCENE-935) Improve maven artifacts

Hi Grant,

do you think you could just run the ant target
"generate-maven-artifacts", which will deploy to the local dist dir, and
then have the build script copy it using scp?

We could use the maven-ant-tasks for that, but the "wagon-ssh" provider,
which is needed for scp, seems to be in beta state, which makes me
nervous. Also, using scp in the build script seems easier to implement.

Would that work?

-Michael

Grant Ingersoll wrote:
> Funny, I was just thinking of looking into setting up snapshot
> publishing.  Let me know what needs to be done to hook it up into the
> build process and I can take care of that.
> 
> Right now, Zones uses scp to copy the website to p.a.o, but we should be
> able to setup scp for M2 using the distribution management tag, right? 
> Nightly builds are currently hosted on zones.
> 
> -Grant
> 
> On Oct 31, 2007, at 9:21 PM, Michael Busch (JIRA) wrote:
> 
>>
>>     [
>>
(Continue reading)

robert engels | 1 Nov 2007 05:28
Picon

possible segment merge improvement?

Currently, when merging segments, every document is [parsed and then  
rewritten since the field numbers may differ between the segments  
(compressed data is not uncompressed in the latest versions).

It would seem that in many (if not most) Lucene uses the fields  
stored within each document with an index are relatively static,  
probably changing for all documents added after point X, if at all.

Why not check the fields dictionary for the segments being merged,  
and if the same, just copy the binary data directly?

In the common case this should be a vast improvement.

Anyone worked on anything like this? Am I missing something?

Robert Engels
jian chen | 1 Nov 2007 06:30
Picon

Re: possible segment merge improvement?

Hi, Robert,

That's a brilliant idea! Thanks so much for suggesting that.

Cheers,

Jian

On 10/31/07, robert engels <rengels <at> ix.netcom.com> wrote:
>
> Currently, when merging segments, every document is [parsed and then
> rewritten since the field numbers may differ between the segments
> (compressed data is not uncompressed in the latest versions).
>
> It would seem that in many (if not most) Lucene uses the fields
> stored within each document with an index are relatively static,
> probably changing for all documents added after point X, if at all.
>
> Why not check the fields dictionary for the segments being merged,
> and if the same, just copy the binary data directly?
>
> In the common case this should be a vast improvement.
>
> Anyone worked on anything like this? Am I missing something?
>
> Robert Engels
>
>
>
> ---------------------------------------------------------------------
(Continue reading)

robert engels | 1 Nov 2007 07:06
Picon

Re: possible segment merge improvement?

It seems that the following are needed:

FieldInfos.hashCode(); // to allow for fast equals failure
FieldInfos.equals();

for most efficient buffer reuse during merge to avoid GC, add

int FieldsReader.doclength(int doc);
int FieldsReader.binarydoc(int doc,byte[] buffer);

this will allow the caller to reuse the existing buffer if large  
enough, or create a new one

and

FieldsWriter.addBinaryDocument(byte[] buffer,int len);

All of the above methods are trivial.

SegmentMerger just needs to be changed to compare the readers to be  
merged, and if all have equal FieldInfos, then use a short circuit  
copy similar to

byte[] buffer = new byte[1024];

for each reader {
     for doc in reader {
	    if doc not deleted {
            	int len = reader.doclength(doc);
                 if(len > buffer.length) {
(Continue reading)

robert engels | 1 Nov 2007 07:30
Picon

Re: possible segment merge improvement?

Actually, a bit better signatures would use method overloading and be

int FieldsReader.length(int doc); // length of document in bytes
int FieldsReader.doc(int doc,byte[] buffer); // read a formatted  
document into a buffer

void FieldsWriter.addDocument(byte[] buffer, int len); // write an  
already formatted document from a buffer

On Nov 1, 2007, at 1:06 AM, robert engels wrote:

> It seems that the following are needed:
>
> FieldInfos.hashCode(); // to allow for fast equals failure
> FieldInfos.equals();
>
> for most efficient buffer reuse during merge to avoid GC, add
>
> int FieldsReader.doclength(int doc);
> int FieldsReader.binarydoc(int doc,byte[] buffer);
>
> this will allow the caller to reuse the existing buffer if large  
> enough, or create a new one
>
> and
>
> FieldsWriter.addBinaryDocument(byte[] buffer,int len);
>
> All of the above methods are trivial.
>
(Continue reading)


Gmane