jiho | 1 Feb 14:17 2008
Picon

Prevent bots (Google) to index WebSVN

Hello everyone,

I've been happily using WebSVN for a while to make my repositories  
public in a nice and friendly manner. The repositories are hosted on a  
Linux box (Fedora Core 8), with Apache and they use multiview. RSS is  
disabled but tarballs are enabled.

They were recently indexed by Google and msn as Apache logs shows:

65.55.208.171 - - [27/Jan/2008:04:17:18 +0100] "GET /##whatever##/ 
models_larvae/?rev=52&sc=0 HTTP/1.0" 200 6567 "-" "msnbot/1.0
(+http://search.msn.com/msnbot.htm) 
"
65.55.208.171 - - [27/Jan/2008:04:17:19 +0100] "GET /##whatever##/ 
models_larvae/?rev=35&sc=0 HTTP/1.0" 200 6411 "-" "msnbot/1.0
(+http://search.msn.com/msnbot.htm) 
"
[...]
66.249.66.50 - - [01/Feb/2008:14:34:17 +0100] "GET /##whatever##/ 
ownfor/bbscript/trunk/doc/figures/deco/bluefish.png?op=diff&rev=&sc=1  
HTTP/1.1" 200 4039 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 
"
66.249.66.50 - - [01/Feb/2008:14:34:23 +0100] "GET /##whatever##/ 
ownfor/bbscript/trunk/src/GNU_GPL.txt?op=log&rev=6&sc=1&isdir=0 HTTP/ 
1.1" 200 6326 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 
"
[...]
which generated a huge number of /tmp files which completely filled  
the hard drive of the machine. In addition, even if these repositories  
are public, I would rather point people to the address and prevent  
(Continue reading)

Mark.Ziegler | 1 Feb 15:30 2008
Picon

Mark Ziegler ist außer Haus.


Ich werde ab  01.02.2008 nicht im Büro sein. Ich kehre zurück am
04.02.2008.

Ich werde Ihre Nachricht nach meiner Rückkehr beantworten.
I am not in the office.

Vorsitzender des Aufsichtsrats:  Dr. Thomas Bach 
Vorstand: Rainer Hundsdörfer (Vorsitzender), Dr. Dieter Japs, Karl Wachter 
Sitz Tauberbischofsheim , Registergericht Mannheim HRB 560227 
UST / ID-Nr. DE 146587898 
Alessandro Vesely | 1 Feb 17:56 2008
Picon

Re: Prevent bots (Google) to index WebSVN

jiho wrote:
>  how can I 
> prevent bots to access them? It may rather be an Apache question I agree 
> but the problem is so obvious with WebSVN that you may want to make an 
> FAQ of it.

See http://en.wikipedia.org/wiki/Robots_Exclusion_Standard
jiho | 1 Feb 18:21 2008
Picon

Re: Prevent bots (Google) to index WebSVN


On 2008-February-01  , at 17:56 , Alessandro Vesely wrote:
> jiho wrote:
>> how can I prevent bots to access them? It may rather be an Apache  
>> question I agree but the problem is so obvious with WebSVN that you  
>> may want to make an FAQ of it.
>
> See http://en.wikipedia.org/wiki/Robots_Exclusion_Standard

Thanks. I did this indeed. It did not stop the indexing currently in  
process (unfortunately) but it will stop the next one hopefully.

I still think that there should be a mention of this somewhere in  
WebSVN doc since site crawling by robots triggers all tarball links,  
hence creating a huge amount of files in /tmp (>100Gb here, for  
repositories that are probably quite small compared to the "average"  
ones).
Alternatively (and better) the attribute 'nofollow' should be set on  
some links, such as the tarball links, the version comparison links  
(yes google is also trying to index code comparisons between all  
versions!), the blame links etc. While indexing the content of the  
HEAD code may be useful and actually wanted, I see little use for the  
indexing of the rest.

Thanks again and I hope you can fix this somehow.

JiHO
---
http://jo.irisson.free.fr/
(Continue reading)

Jurnell Cockhren | 19 Feb 18:39 2008
Picon

Problem(??) with Authentication (Solved!)

Hey,
This is in response to 2 previous posts made by Kevin Luck and Chris Bergstresser. They problem they cited concerned the disappearing of the repository list on the websvn index page once the "$config->useAuthenticationFile" function was used.


Here's a little info on my setup: There are a total of 20 repos housed on a single server that authenticates against eDirectory (LDAP). All the repos where added to websvn using the "$config->parentPath" function. About half of the repos will need users to be Authorized to view a particular repo's contents. Websvn's authentication system can even allows for sub directories within repositories to require authentication to view. There's two ways of achieving web-based Authentication: 1. Authentication before the page is loaded. 2. Authentication when a user clicks on a repository.

The first is achieved by placing the authentication porcedures in the "Directory" Block with the websvn/apache2 settings. The second is achieved by placing your authentication procedures in a separate "Location" block.

I will provide the details for a decent authentication file based on the following scenario (and explain the main element for the cause of trouble):
Assume there are 3 repositories (test1, test2 and test3) and 7 users (tim, bob, james, frank, me, you, and admin). The following is a sample Authentication file for websvn:

[groups]
repo1 = tim, bob, admin
repo2 = james, me, admin
repo3 = you, frank, admin

[repo1:/]
<at> repo1 = r

[repo2:/]
<at> repo2 = r

[repo3:/]
<at> repo3 = r

The reasoning for the "disappearing" the the repository list is that in your authentication files you're missing a directive to allow for anyone to see the root directory (list of repos).

Syntax is important! Some people may have thought about using:
[/]
* = r
However, this allows for read-access to all folders (repos) and subfolders to anyone and defeats the purpose of Authentication.

Hence, the above authentication file should begin with:
[/:/]
* = r
This will allow for th list of repos to be viewed by anyone; however, it will respect the rules set for each repository.
This info should be included in the docs for websvn.

Sorry for length of this email.
 
Jurnell Cockhren
Chris Bergstresser | 21 Feb 03:24 2008

Re: Problem(??) with Authentication (Solved!)

On Tue, Feb 19, 2008 at 11:39 AM, Jurnell Cockhren <taker12 <at> gmail.com> wrote:
> This is in response to 2 previous posts made by Kevin Luck and Chris
> Bergstresser. They problem they cited concerned the disappearing of the
> repository list on the websvn index page once the
> "$config->useAuthenticationFile" function was used.

   Thanks for your help.

> The reasoning for the "disappearing" the the repository list is that in your
> authentication files you're missing a directive to allow for anyone to see
> the root directory (list of repos).

   If I make this change I can see the repository list, but none of
the subdirectories in the repository are visible.

   Here's my setup: I've got a repository (called "adv", and the
display name is "My Project"), and an associated access and passwd
file.  This works perfectly for access through local clients (such as
TortoiseSVN).
   I've installed the WebSVN source in a password-protected
subdirectory of Apache (using the same passwd file as svn), and I've
added the repository using "$config->addRepository" and a local (e.g.
"file://...") URL.  This requires a password to access the directory
through the web, and I can browse the repository.  It doesn't respect
any of the access restrictions, however.
   If I add the "[/:/]" directive to the access file, and add the
"$config->useAuthenticationFile" command, it lists the repository, but
doesn't show any subdirectories at all.
   What am I not understanding?

-- Chris
bubblboy | 23 Feb 18:19 2008
Picon

raw file download patch

Hi,

After using websvn for a while I often found myself wanting to use (and 
to allow others to do) wget http://example.com/websvn/my/file to get the 
latest version of a file. So, I created a patch for it. It's not very 
well integrated with the themes yet, and the mime-type sent is always 
application/octet-stream (unless there is an explicit svn:mime-type 
property set) but it's better than nothing, I guess.

There are some other things I would like to add to websvn, so if you are 
OK with it I would like to request a subversion account with commit 
access :).

Greetings,

Hraban Luyat

P.S.: The patch is available at [1] in case the attachment is filtered out.

[1] https://0brg.net/~bb/websvndownloadrawfile.patch
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe <at> websvn.tigris.org
For additional commands, e-mail: dev-help <at> websvn.tigris.org
fekepp | 26 Feb 17:07 2008
Picon

Access to secured files via View Log / Changed Files

Hi all,

i got some problems with path-restrictions and websvn.

My websvn is configured with:

$config->parentPath('Path/to/parent (e.g. c:\\svn)');
$config->useAuthenticationFile('/path/to/accessfile'); // Global access file

In my accessfile i defined some users and set some path-rights for a
test repository

[groups]
all = user1, user2

[test:/]
 <at> all = r

[test:/secretdir]
 <at> all =
user1 = rw

This works with "normal" svn, user2 cannot checkout, read and/or write
in test/secretdir, user1 is allowed to do changes.

With websvn, user2 cannot see the secreddir directory in the path list,
thats like assumed. But if user2 looks at the "View Log" log-history,
user2 can the all! revisions, including those who are made in
test/secretdir... And if user2 clicks on one of those revisions made in
secretdir, user2 has access to added/modified/deleted files by user1 in
the "Last modification" section.

Using websvn user2 can bypass all restriction by just checking the log
history for modifications in a secret dir and download new and/or
modified files.

Two questions:
Did i do anything wrong with my configuration?
or
Is this a bug?

Regards

Felix aka fekepp
fekepp | 26 Feb 17:27 2008
Picon

Re: Access to secured files via View Log / Changed Files

I switched the template from "calm" to "BlueGrey":

//$config->setTemplatePath("$locwebsvnreal/templates/calm/");
$config->setTemplatePath("$locwebsvnreal/templates/BlueGrey/");

now i am no more able to download files, but still able to see the full
log history, including revisions in "secret" directorys...

maybe it is a bug, but it is depending on the template you use.

asking svn i am still not able to checkout/modify/delete files in such a
"secret" directory and the log command for a revision XXXX in such a
directory prints

------------------------------------------------------------------------
rXXXX | (no author) | (no date) | 1 line

on the screen:)

regards
fekepp

fekepp wrote:
> Hi all,
> 
> i got some problems with path-restrictions and websvn.
> 
> 
> My websvn is configured with:
> 
> $config->parentPath('Path/to/parent (e.g. c:\\svn)');
> $config->useAuthenticationFile('/path/to/accessfile'); // Global access file
> 
> In my accessfile i defined some users and set some path-rights for a
> test repository
> 
> [groups]
> all = user1, user2
> 
> [test:/]
>  <at> all = r
> 
> [test:/secretdir]
>  <at> all =
> user1 = rw
> 
> 
> This works with "normal" svn, user2 cannot checkout, read and/or write
> in test/secretdir, user1 is allowed to do changes.
> 
> With websvn, user2 cannot see the secreddir directory in the path list,
> thats like assumed. But if user2 looks at the "View Log" log-history,
> user2 can the all! revisions, including those who are made in
> test/secretdir... And if user2 clicks on one of those revisions made in
> secretdir, user2 has access to added/modified/deleted files by user1 in
> the "Last modification" section.
> 
> Using websvn user2 can bypass all restriction by just checking the log
> history for modifications in a secret dir and download new and/or
> modified files.
> 
> 
> Two questions:
> Did i do anything wrong with my configuration?
> or
> Is this a bug?
> 
> 
> 
> Regards
> 
> Felix aka fekepp
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe <at> websvn.tigris.org
> For additional commands, e-mail: dev-help <at> websvn.tigris.org
> 
> 

Gmane