david groeling | 14 Dec 2004 14:53
Picon
Favicon

Start-Url Problems

Im having a heck of a time getting http search to
work. I have read the documentation over and over and
have tried numerous if not all of the variable
settings that  i can imagine.
Here are current and most accurate settings that i
use. With these settings i can do all but index from
the web. I would like to index from the web as of the
use of active page's such as .php content.

$DOCUMENT_ROOT = '/www/Apache2/htdocs/';
$BASE_URL = 'https://your-domain.org:8082/';
    I know this is the setting that sets the actual
     link specfication from your search page. All 
     my links point to the proper page and server     
     protocol HTTPS.
$CGIBIN = '/cgi-bin/perlfect/search/';
$INSTALL_DIR =
'/www/apache2/cgi-bin/perlfect/search/';
 <at> EXT = ("htm","html","shtml","xml","php");
$INDEXER_CGI_PASSWORD = "MY-PASSWORD";

Now in order for all this to work properly i can not
use the http start url wich i really would like to.
As stated i have tried every possible variable i can
think of. The best results i have gotten are 1 page
scanned 0 pages indexed 0 content indexed. When i use
the start HTTP url varaible.

$HTTP_START_URL = '';

(Continue reading)

David Wessel | 13 Dec 2004 14:40

moving index from nt to unix

Hello,

I want to move the index from my NT machine to a linux box but it fails to
open it. Why would I want to do this? I am indexing a clearcase-vob that
cannot be accessed from linux, therefore i need to do the indexing on the NT
machine.

Is there some other way of doing this? search.pl says:
Cannot open /usr/local/httpd/htdocs/search/data/inv_index:  at search.pl
line 76.
(the file exists and the permissions are o.k.)

Thanks for any help,

David

--------------------------------------------
David Wessel
Auszubildender Fachinformatik - Anwendungsentwicklung
GWI Research GmbH
Fachgruppe Kommunikation / Schnittstellen
Monaiser Straße 11, 54294 Trier
Tel.:	0651 / 8247 - 0
Fax.: 0651 / 8247 - 100

GWI SST - Hotline 01805 / 494483
http://www.gwi-ag.com
david.wessel <at> gwi-ag.com
--------------------------------------------

(Continue reading)

Jeramiah M. Bowling | 7 Dec 2004 21:37

Windows 2003 - CGI Misbehaved

Can anyone help?

I have installed perlfect search and indexed successfully on a Windows 2003 box.  I have allowed perl
extensions, CGI extensions, and ISAPI extensions.  I have given the anonymous user read and execute
access at both the IIS level and the NTFS level.

I get the following error when I go to search.pl through a browser:

The specified CGI application misbehaved by not returning a complete set of HTTP headers.

I have searched through the forums to no avail.  My perl install is off of the root of c:\, but as I said I
installed and indexed with no problem.  In my search.pl I have added the location of my perl on the top line as:

#!c:/perl

*******************************************

Below is my conf.pl file:

*******************************************

# Perlfect Search configuration file
#$rcs = ' $Id: conf.pl,v 1.64 2003/02/24 21:10:16 daniel Exp $ ' ;

# NOTE: Whenever you change one of the options that's marked with [re-index]
# you need to run indexer.pl again to make the change take effect.

###########################################################################
### basic configuration
### You'll have to adapt these values if you didn't use setup.pl
(Continue reading)

Roger Growe | 7 Dec 2004 20:22
Picon
Favicon

Problems with indexing

Thanks for your quick reply.  I made the changes below and got the exact
same response.





$DOCUMENT_ROOT = 'http://www.quinacrine.com/';
$BASE_URL = 'http://www.quinacrine.com';
$CGIBIN = "/cgi-bin/searchsite/";
$INSTALL_DIR = '/nfs/cust/5/80/46/564085/cgi-bin/searchsite/';
<at> EXT = ("html", "htm", "shtml", "txt");
$INDEXER_CGI_PASSWORD =
$HTTP_START_URL = 'http://www.quinacrine.com/';
$HTTP_MAX_PAGES = 150;
$HTTP_SERVER_ROOT = $DOCUMENT_ROOT;
<at> HTTP_LIMIT_URLS = 'http://www.quinacrine.com/';


Thanks again, any suggestions?



Roger Growe



----- Original Message -----
From: "Daniel Naber" <daniel.naber <at> t-online.de>
To: "Roger Growe" <roger_growe <at> earthlink.net>
Cc: <perlfect-search <at> perlfect.com>
Sent: Monday, December 06, 2004 4:59 PM
Subject: Re: [Perlfect-search] Problems with indexing


> On Monday 06 December 2004 22:18, Roger Growe wrote:
>
> > $HTTP_START_URL = 'http://www.quinacrine.com/index.html';
>
> Only URLs below this will be indexed, so either remove the "index.html" or
> set
> <at> HTTP_LIMIT_URLS = ("http://www.quinacrine.com/");
>
> Regards
>  Daniel
>
> --
> http://www.danielnaber.de
>
_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search
Roger Growe | 6 Dec 2004 22:18
Picon
Favicon

Problems with indexing

Problems with indexing

 

I 'd really appreciate a hand with this script.

After working though the instructions, all the correspondence in the mail list archives and other sources I am still getting this when starting the index:

 

 

Using DB_File...
Checking for old temp files...
Building string of special characters...
Loading 'no index' regular expressions:
    - frontpage2.html
    - frontpage.html
[etc.]

 

Loading stopwords...371 stopwords loaded.
Starting crawler...
Note: I will not visit more than $HTTP_MAX_PAGES=150 pages.
Loading http://www.quinacrine.com/robots.txt...
Error: Couldn't get 'http://www.quinacrine.com/robots.txt': response code 500
Not using any robots.txt.
Error: Couldn't get 'http://www.quinacrine.com/index.html': response code 500

 

Crawler finished: indexed 0 files, 0 terms (0 different terms).
Ignored 0 files because of conf/no_index.txt
Ignored 0 files because of robots.txt

 

I thought it might be the structure of the site that was the problem.  The pages are not in the root but in the 'web' directory, like so:

 

root
        .config
        .sessions
        cgi-bin
        logs
        web

 

In cgi-bin I have these:

 

searchsite
        conf
        data
        Perlfect
        temp
        templates

 

I installed manually by necessity and need to index though http for the same reason.  All syntax, permissions, and other rules that I can find check out. Unix server.

 

The main sections of config.pl look like this now:

 

 

$DOCUMENT_ROOT = 'http://www.quinacrine.com/';

 

# The base url of your site (normally that's the URL which

# corresponds to $DOCUMENT_ROOT).

$BASE_URL = 'http://www.quinacrine.com';

 

# The url in which Perlfect Search is located (usually somewhere in cgi-bin/).

$CGIBIN = "/cgi-bin/searchsite/";

 

# The full-path of the directory where Perlfect Search is installed.

$INSTALL_DIR = '/nfs/cust/5/80/46/564085/cgi-bin/searchsite/';

 

# Only files with these extensions should be indexed (case-sensitive).

# This is only relevant for file system indexing, when you index files via

# http you need to set <at> HTTP_CONTENT_TYPES instead. [re-index]

<at> EXT = ("html", "htm", "shtml", "txt");

 

 

[Password section]

 

###########################################################################

### http configuration

### You only need this if you want to index your pages via http

 

# Where you want the indexer to start via http. Leave empty if

# you want to index the files in the filesystem ($DOCUMENT_ROOT).

# ** WARNING **: Do not use for foreign servers! It might use too many

# resources on other people's servers. [re-index]

# example: $HTTP_START_URL = 'http://localhost/';

$HTTP_START_URL = 'http://www.quinacrine.com/index.html';

 

Thinking that the file structure could be the issue, I put a copy of robots.txt in the root, still the 500 response.  I've left $HTTP_START_URL =  blank, used 'http://www.quinacrine.com/' as well as other things I could think of to break this jam.

 

Thanks for your help in advance,

 

Roger Growe

_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search
David Wessel | 2 Dec 2004 15:22

indexing a site // index-size

Hello,

I have indexed a filesystem containing ~8000 files of different sizes, most
of them under 80k

I was wondering which $CONTEXT_SIZE will make sense... right now i use 0 and
it works fine and the search is fast (index under 25 Megs). Yet, I need more
detailed search results. Has anyone tampered with this value / can anyone
give recommendations?

thanks,

David

--------------------------------------------
David Wessel
Auszubildender Fachinformatik - Anwendungsentwicklung
GWI Research GmbH
Fachgruppe Kommunikation / Schnittstellen
Monaiser Straße 11, 54294 Trier
Tel.:	0651 / 8247 - 0
Fax.: 0651 / 8247 - 100

GWI SST - Hotline 01805 / 494483
http://www.gwi-ag.com
david.wessel <at> gwi-ag.com
--------------------------------------------

_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search

Yves Hanotiau | 1 Dec 2004 10:13
Picon

multiple templates for the search result page

Hello, 

I'm using Perlfect Search 3.31.
Is it possible to have different templates for the "search result" page ?

Thanks in advance

Yves HANOTIAU
_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search

David Wessel | 1 Dec 2004 15:55

Search result misbehavior via http / fine-working under console

Hello,

I have successfully installed perlfect under win-xp pro (with xampp and
xampp-perl-addon) and indexed some files. Running "perl search.pl [word]"
under the command prompt returns apropriate search results. There is however
a problem with the display of search results using the web-interface.

It would return search results for the _first_ query but not more. Entering
another search-word will not change the results. When checking the error.log
it says:

[Wed Dec  1 15:04:57 2004] search.pl: Variable "$query" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 110.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> force" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 115.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> not" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 116.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> other" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 117.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> docs" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 118.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> valid_docs" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 119.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%answer" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 120.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$query" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 141.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%urls_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 144.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> stopwords" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 237.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> stopwords_ignored" will not
stay shared at C:/Programme/xampp/htdocs/search/search.pl line 248.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$query" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 255.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%terms_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 279.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> force" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 280.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> not" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 285.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> other" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 287.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> not" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 305.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%inv_index_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 306.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> force" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 312.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> valid_docs" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 315.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%docs_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 323.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> force" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 330.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> other" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 330.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> valid_docs" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 333.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%inv_index_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 336.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$query" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 339.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%docs_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 341.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%answer" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 346.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%dates_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 349.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$query" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 367.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%answer" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 380.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%docs_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 400.
[Wed Dec  1 15:04:57 2004] search.pl: Variable " <at> stopwords_ignored" will not
stay shared at C:/Programme/xampp/htdocs/search/search.pl line 415.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%titles_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 449.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%dates_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 486.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%sizes_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 505.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$query" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 587.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%answer" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 588.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%content_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 637.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "%desc_db" will not stay
shared at C:/Programme/xampp/htdocs/search/search.pl line 640.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$punct" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 685.
[Wed Dec  1 15:04:57 2004] search.pl: Variable "$punct" will not stay shared
at C:/Programme/xampp/htdocs/search/search.pl line 726.
[Wed Dec  1 15:04:57 2004] tools.pl: Argument "O_RDONLY" isn't numeric in
subroutine entry at C:/Programme/xampp/perl/site/lib/DB_File.pm line 278.

the last line is repeated a couple of times whenever I trigger a new search.

Any ideas? Sorry, it's not a public site for you to see it.

greetings,

David

--------------------------------------------
David Wessel
Auszubildender Fachinformatik - Anwendungsentwicklung
GWI Research GmbH
Fachgruppe Kommunikation / Schnittstellen
Monaiser Straße 11, 54294 Trier
Tel.:	0651 / 8247 - 0
Fax.: 0651 / 8247 - 100

GWI SST - Hotline 01805 / 494483
http://www.gwi-ag.com
david.wessel <at> gwi-ag.com
--------------------------------------------

_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search

Stefan Gläßer | 24 Nov 2004 18:53
Picon
Picon

Limitations?

Hi,

I would like to index appr. 3500 pages with an average 
size of 70KB. Is it possible for perlfect-search to do
it or should I look for some other software?

Greets,
 Stefan

_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search

Philipp Gühring | 23 Nov 2004 12:15
Picon

MemoryLeak, DB_Symlink

Hi,

I developed a new Database interface for Perlfect Search, that directly 
interfaces the filesystem with symlinks and files.

It works pretty well, the only issue left is big keys with a size>255.

But it clearly shows that indexer.pl is still leaking memory somewhere.

Has anyone an idea, where indexer.pl is leaking?

Many greetings,
Philipp Gühring
Attachment (DB_Symlink-0.1.tgz): application/x-tgz, 3350 bytes
Attachment (indexer.diff): text/x-diff, 879 bytes
_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search
Philipp Gühring | 22 Nov 2004 00:57
Picon

Scalability

Hi,

Perlfect Search is really nice. 
But for only 2000 documents, I am using grep. 
Can we get Perlfect Search to handle 200.000.000 documents?

Nov 21 16:06:46 linux3 kernel: __alloc_pages: 0-order allocation failed 
(gfp=0x1f0/0)
Nov 21 16:06:47 linux3 kernel: __alloc_pages: 0-order allocation failed 
(gfp=0x1d2/0)
Nov 21 16:06:47 linux3 kernel: __alloc_pages: 0-order allocation failed 
(gfp=0x1d2/0)
Nov 21 16:06:47 linux3 kernel: VM: killing process perl

:-(

What is causing the memory-consumption here? 
Are the database-tied hashes using so much memory? 
Or is the problem in the indexer itself?
( I already have $LOW_MEMORY_INDEX = 1; )

By the way, we will soon have it finally integrated on 
http://www.quintessenz.at/ , which is needing about 50.000 documents.

Many greetings,
Philipp Gühring
_______________________________________________
perlfect-search mailing list
perlfect-search <at> perlfect.com
To unsubscribe, set other personal options or view the list archives please visit:
http://perlfect.com/mailman/listinfo/perlfect-search

Gmane