Stefan Fuhrmann | 23 Oct 14:51 2014

Deferring FSFS revprop caching to 1.10

Hi all,

I ran a few tests to determine the impact of the missing
revprop caching on trunk. For checkouts, the overhead
is ~2 extra CPU cores per saturated 10Gb network. Even
that will be not be severe issue to users and the changes
mentioned below bring it down to .5 cores.

'svn log -v' is a factor of 2 slower for the http: client and
4x for the svn: client. Server load is approx 3x and 8x
as high, respectively, if revprop caching is not available.

Most of the overhead is spent parsing packed revprop
manifest info. I wrote a series of patches that reduces
the overhead such that the http: client will see no real
difference while the svn: client takes only 50% longer
than it would with a revprop caching server.

These are only ballpark numbers run on a system with
high bandwidths, low latencies and low fopen() overhead.
However, I think we can tune the revprop file granularity
and parsing now and defer the FSFS revprop caching to
1.10 without a massive impact on users.

-- Stefan^2.
Picon

[l10n] Translation status report for trunk r1633270

Translation status report for trunk <at> r1633270

  lang   trans untrans   fuzzy     obs
--------------------------------------
    de    2759     107     279     483  ++++++++++++++++++++++++++U~~~oooo
    es    2261     605     823     527  ++++++++++++++++++UUUUU~~~~~~~oooo
    fr    2565     301     511     111  ++++++++++++++++++++++UUU~~~~~
    it    2121     745     947     340  ++++++++++++++++UUUUUU~~~~~~~~oo
    ja    2251     615     872     763  ++++++++++++++++++UUUUU~~~~~~~oooooo
    ko    2397     469     644     217  ++++++++++++++++++++UUUU~~~~~~o
    nb    2311     555     774     500  +++++++++++++++++++UUUU~~~~~~~oooo
    pl    2336     530     741     297  +++++++++++++++++++UUUU~~~~~~~oo
 pt_BR    2097     769     960     321  ++++++++++++++++UUUUUU~~~~~~~~oo
    sv    2725     141     305      78  +++++++++++++++++++++++++UU~~~
 zh_CN    2620     246     432      19  +++++++++++++++++++++++UUU~~~~
 zh_TW    2038     828     997     377  +++++++++++++++UUUUUUU~~~~~~~~oo

Stefan Fuhrmann | 20 Oct 15:16 2014

FS API functions missing FS API tests

Hi there,

I've been going through svn_fs.h and found that the following
functions aren't being called directly in any of our tests. They
may still be tested indirectly through higher level functions but
I think most of them should get a low-level test as well:

svn_fs_access_get_username
svn_fs_access_add_lock_token2
svn_fs_purge_txn
svn_fs_txn_base_revision
svn_fs_is_txn_root
svn_fs_is_revision_root
svn_fs_txn_root_name
svn_fs_path_change2_create
svn_fs_check_path
svn_fs_is_file
svn_fs_node_relation
svn_fs_node_created_path
svn_fs_props_different
svn_fs_props_changed
svn_fs_get_mergeinfo2
svn_fs_merge
svn_fs_dir_optimal_order
svn_fs_file_length
svn_fs_file_checksum(sha1)
svn_fs_try_process_file_contents
svn_fs_contents_different
svn_fs_contents_changed
svn_fs_info_config_files
svn_fs_get_file_delta_stream
svn_fs_lock_target_set_token
svn_fs_get_locks2
svn_fs_print_modules

[svn_fs_node_history2, svn_fs_history_prev2 are opaque and will
 be tested through 'svn log']

I'll be working through that list by the end of this week and add
tests where they are simple enough and then post an updated list
here.

-- Stefan^2.
Philip Martin | 15 Oct 12:15 2014

Some x509 branch review points

1)

In x509parse.c:x509_get_version:

  err = asn1_get_tag(p, end, &len,
                     ASN1_CONTEXT_SPECIFIC | ASN1_CONSTRUCTED | 0);

Why the "| 0"?  It doesn't do anything.

2)

In x509parse.c:x509_get_name:

  cur->next = apr_palloc(result_pool, sizeof(x509_name));

  if (cur->next == NULL)
    return SVN_NO_ERROR;

We generally assume apr_palloc will abort rather than return NULL,
that's certainly the assumption in other places in this file.  If we
were to detect an allocation failure then returning no error is not
correct.

3)

In x509parse.c:x509_get_name:

  oid = &cur->oid;
  oid->tag = **p;

  err = asn1_get_tag(p, end, &oid->len, ASN1_OID);
  if (err)
    return svn_error_create(SVN_ERR_X509_CERT_INVALID_NAME, err, NULL);

The asn1_get_tag() call verifies both that *p has not reached end and
that the tag is ASN1_OID.  This happens after assigning **p to oid->tag,
so if asn1_get_tag were to detect that *p had reached end it would
happen after we had dereferenced **p.  This bug occurs in other places:
x509_get_sig, x509_get_alg, etc.

Fix by assigning ASN1_OID explicitly or by moving the assignment after
asn1_get_tag.

--

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Picon

[l10n] Translation status report for trunk r1631608

Translation status report for trunk <at> r1631608

  lang   trans untrans   fuzzy     obs
--------------------------------------
    de    2765     108     279     477  ++++++++++++++++++++++++++U~~~oooo
    es    2263     610     825     526  ++++++++++++++++++UUUUU~~~~~~~oooo
    fr    2571     302     513     108  ++++++++++++++++++++++UUU~~~~~
    it    2123     750     949     340  ++++++++++++++++UUUUUU~~~~~~~~oo
    ja    2253     620     874     763  ++++++++++++++++++UUUUU~~~~~~~oooooo
    ko    2399     474     646     217  ++++++++++++++++++++UUUU~~~~~~o
    nb    2313     560     776     500  +++++++++++++++++++UUUU~~~~~~~oooo
    pl    2338     535     743     297  +++++++++++++++++++UUUU~~~~~~~oo
 pt_BR    2099     774     962     321  ++++++++++++++++UUUUUU~~~~~~~~oo
    sv    2731     142     305      72  +++++++++++++++++++++++++UU~~~
 zh_CN    2626     247     434      13  +++++++++++++++++++++++UUU~~~~
 zh_TW    2040     833     999     377  +++++++++++++++UUUUUUU~~~~~~~~oo

Lieven Govaerts | 10 Oct 17:41 2014
Picon

Re: ra_serf bug(s)

On Tue, Oct 7, 2014 at 2:57 PM, Conrad Rad <cse.cem <at> gmail.com> wrote:
> Hi all,
>
> I'm working on a thing that links libsvn_ra_serf (default RA in Fedora
> Linux) as well as other pieces of the SVN libraries.
>
> I am using the 'replay' feature to dump edits from a remote repository.
>
> Inadvertently, I requested r0 as my starting_revision. This is my
> mistake, however: nothing in the generic replay or ra_serf-specific
> replay code produced a warning or error. This could be improved if r0
> is always invalid (first bug). (I am less confident about this one
> than the next one.)
>
> ra_serf makes a series of valid requests (200 OK) before finally
> asking the server for REPLAY of r0. In this case the server is
> googlecode.com. In response to the invalid REPLAY it gave a 500 Server
> Error response. I will attach a pcap.

Googlecode returns the 500 Internal Error, svn.apache.org will happily
return the content of r0.
IMO googlecode is doing the wrong thing here, there's nothing special
about r0 (it's empty, but still a valid revision).

> The second bug is: ra_serf ignores the 500 response and then waits
> forever for a valid response that isn't coming. So at this point the
> program is just stuck in svn_ra_replay_range ->
> svn_ra_serf__replay_range -> svn_ra_serf__context_run_wait -> ... ->
> epoll(2) forever, when it should have aborted and returned an error on
> the 500.
>
> I'm guessing that a fix might involve toggling
> no_fail_on_http_failure_status or setting up a response_error()
> handler on the svn_ra_serf__handler_t, but I'm not an expert on SVN
> internals.
>
> I'm using subversion-1.8.10, libserf-1.3.7, and apr-1.5.1 from Fedora 20.

It looks like this issue has been solved on trunk. When I use a
modified svnsync with start_revision set to 0, and sync from a
googlecode repo, I get following error from the 500 error response:

subversion/svnsync/svnsync.c:1558,
subversion/svnsync/svnsync.c:1504,
subversion/libsvn_ra/ra_loader.c:1358,
subversion/libsvn_ra_serf/replay.c:788,
subversion/libsvn_ra_serf/util.c:930,
subversion/libsvn_ra_serf/util.c:895,
subversion/libsvn_ra_serf/multistatus.c:560:
(apr_err=SVN_ERR_FS_NO_SUCH_REVISION)
svnsync: E160006: No such revision -1

Maybe you can try your program with svn trunk libraries to see if it
works for you too?

Lieven

> Thanks,
> Conrad

Conrad Rad | 10 Oct 16:22 2014
Picon

Re: ra_serf bug(s)

On Tue, Oct 7, 2014 at 8:57 AM, Conrad Rad <cse.cem <at> gmail.com> wrote:
> ...
>
> The second bug is: ra_serf ignores the 500 response and then waits
> forever for a valid response that isn't coming. So at this point the
> program is just stuck in svn_ra_replay_range ->
> svn_ra_serf__replay_range -> svn_ra_serf__context_run_wait -> ... ->
> epoll(2) forever, when it should have aborted and returned an error on
> the 500.
>
> I'm guessing that a fix might involve toggling
> no_fail_on_http_failure_status or setting up a response_error()
> handler on the svn_ra_serf__handler_t, but I'm not an expert on SVN
> internals.
>
> I'm using subversion-1.8.10, libserf-1.3.7, and apr-1.5.1 from Fedora 20.

Due to overwhelming consensus / lack of objection (*birds chirping*),
I've filed an issue for this in the tracker:

http://subversion.tigris.org/issues/show_bug.cgi?id=4523

Thanks,
Conrad

Have a nice day | 9 Oct 03:48 2014
Picon

help

I would like to join the open source community.I have some ideas i think would be useful.

Jay Gordon
Fitzpatrick, Ben | 8 Oct 11:51 2014
Picon

svnperms.py Config.get method broken

Hi everyone,

svnperms.py currently has a bug in the 'get' method of the Config class, on
line 114 in
https://svn.apache.org/viewvc/subversion/trunk/tools/hook-scripts/svnperms.py?revision=1295006&view=markup:

    def get(self, section, option, default=None):
        return self._sections_dict.get(option, default)

As you can tell by the naming, 'option' is not meant to be used as a key in
'self._sections_dict' ('section' is), and it's then missing a step to actually
extract the option value information.

Cheers,

Ben Fitzpatrick

Alexey Neyman | 8 Oct 02:27 2014
Picon
Picon

Pool usage in Python bindings

Hi,

 

I have encountered a problem with Subversion's Python bindings. For example,

the following simple program does not work:

 

[[[

#!/usr/bin/python

from svn import repos, core

if __name__ == '__main__':

stream = core.Stream(core.svn_stream_open_readonly("dump"))

print stream.read()

stream.close()

]]]

 

$ ./0.py

Traceback (most recent call last):

File "./0.py", line 7, in <module>

print stream.read()

File "/usr/lib/python2.7/dist-packages/svn/core.py", line 212, in read

data = svn_stream_read(self._stream, SVN_STREAM_CHUNK_SIZE)

File "/usr/lib/python2.7/dist-packages/libsvn/core.py", line 5362, in svn_stream_read

return _core.svn_stream_read(*args)

File "/usr/lib/python2.7/dist-packages/libsvn/core.py", line 5585, in assert_valid

assert self.__dict__["_is_valid"](), "Variable has already been deleted"

AssertionError: Variable has already been deleted

 

The problem is that the function to open a stream takes two pool arguments:

result_pool and scratch_pool. In the bindings, the following code is

generated to wrap svn_stream_open_readonly() [formatted for readability]:

 

[[[

if (svn_swig_py_get_pool_arg(args, SWIGTYPE_p_apr_pool_t,

&_global_py_pool, &_global_pool))

SWIG_fail;

arg3 = _global_pool;

if (svn_swig_py_get_pool_arg(args, SWIGTYPE_p_apr_pool_t,

&_global_py_pool, &_global_pool))

SWIG_fail;

arg4 = _global_pool;

...

result = (svn_error_t *)svn_stream_open_readonly(arg1,

(char const *)arg2,arg3,arg4);

...

Py_XDECREF(_global_py_pool);

Py_XDECREF(_global_py_pool);

]]]

 

In case the pool arguments are passed as default (None), each of the calls to

svn_swig_py_get_pool_arg() will create a subpool of the application pool and

assign it to _global_pool. Then, the 2nd subpool will be decremented twice

at the end of the function, while the 1st one will leak.

 

If the caller passes both pool arguments, it is even worse: both arg3 and

arg4 will be initialized by svn_swig_get_pool_arg() to point to the scratch

pool - it always checks the last argument of the function to be a valid pool

and if it is - uses that to initialize the _global_pool:

 

[[[

int argnum = PyTuple_GET_SIZE(args) - 1;

 

if (argnum >= 0)

{

PyObject *input = PyTuple_GET_ITEM(args, argnum);

if (input != Py_None && PyObject_HasAttrString(input, markValid))

{

*pool = svn_swig_py_must_get_ptr(input, type, argnum+1);

]]]

 

Obviously, the caller would expect the result from the wrapped function

to be allocated in the result pool - it would be unexpected to see the

returned variables to disappear when the *scratch* pool is freed.

 

Even if the svn_swig_py_get_pool_arg had recovered both pool pointers

correctly, an attempt to pass two pools would fail in the validity check

in the wrapper function. Obviously, if one passes two distinct pools as

arguments, only one of them can be equal to the _global_pool:

 

[[[

if (obj1) {

/* Verify that the user supplied a valid pool */

if (obj1 != Py_None && obj1 != _global_py_pool) {

SWIG_Python_TypeError(..., obj1);

...

if (obj2) {

/* Verify that the user supplied a valid pool */

if (obj2 != Py_None && obj2 != _global_py_pool) {

SWIG_Python_TypeError(..., obj2);

]]]

 

Am I correct in assuming that the division of a single 'pool' argument

into distinct 'result_pool' and 'scratch_pool' is rather recent change

and the bindings were not updated for that?

 

Now, fixing this is somewhat complicated. The typemaps for 'apr_pool_t *'

use %typemap(default) to obtain the object passed in, not the %typemap(in).

There's a reason for this: the _global_pool may be then used to perform

the memory allocations needed to convert other arguments - which, according

to Subversion's coding style, always precede the pool arguments passed last.

For the %typemap(default), however, SWIG does not set the $input macro -

so the $svn_argnum 'hack' cannot be used to determine the argument number

from which the pool should be obtained.

 

The allocations from thus created _global_pool can be either temporary,

needed to convert a Python object into an appropriate C/APR type (e.g.

svn_swig_py_seq_to_array to convert a list to array) or more permanent

(such as svn_swig_py_setup_ra_callbacks - which should be kept for the

lifetime of the returned svn_ra_session_t - the svn_ra_local__open()

stashes the passed in 'callbacks' pointer for later use).

 

Given all that, I could think of two approaches to fixing this issue.

 

1. Hackish, but simple.

 

Change the call signature of the functions receiving one or two pool

arguments: make them always receive a single argument for any pools

it may have, which may be either None (meaning "use application pool"),

a pool object (which will be used for both the result and scratch

allocations) or a (result_pool, scratch_pool) tuple.

 

This approach has two obvious drawbacks: first, it changes the

number of arguments on the existing interfaces (but this may not

be a bad thing, actually, given that it makes the breakage more

explicit - rather than subtly using the scratch pool as described above).

Second, it makes the argument lists for functions taking two pool

arguments not match their C counterparts. E.g., svn_stream_open_readonly

would be invoked as:

 

// C

res = svn_stream_open_readonly(&stream, "path",

result_pool, scratch_pool);

 

# Python

stream = core.svn_stream_open_readonly("path", \

(result_pool, scratch_pool))

 

Additionally, if there are functions that only take a scratch_pool

argument, but still need a permanent pool for some allocations, they

would beed to be called with such tuple explicitly - at least with

a (None, scratch_pool) tuple - even though the C counterpart only

has a single pool argument. This is somewhat counterintuitive and

error-prone, I think.

 

2. Less hackish, but requiring more effort.

 

Do not use the passed-in pools for allocations in the wrapper

itself. Instead, implement svn_swig_py_get_scratch_pool() and

svn_swig_py_get_persistent_pool() - to create subpools of the

application pool - and use these subpool in the typemaps for other

types that need memory for conversion. The scratch pool will then

be freed at the end of the wrapper.

 

Persistent pools would originally be kept indefinitely. This is

a leak, but it is probably better than what we do currently

(using scratch pool for such allocations). Then, we could go

over the interfaces using such persisten pools one by one,

determine the required lifetime of such pool - record it in the

Python object with the same lifetime and destroy the pool in

that object's destructor.

 

I am leaning towards the second approach.

 

 

PS. I use svn_ra_session_t as an example where a semi-permanent

allocation is necessary. However, I couldn't find the interface

that destroys/closes the svn_ra_session_t. Is it correct that the

svn_ra_session_t is always destroyed by destroying the corresponding

session pool? If so, the bindings for this type need to extend it

with a destructor function to call svn_pool_destroy(session->pool),

or it will leak the memory if the application pool was used to

create it.

 

 

Comments/opinions/alternative approaches welcome.

 

Regards,

Alexey.

 

Navrotskiy Artem | 3 Oct 09:26 2014
Picon

Subversion binary file detection is look like broken

Hello,
 
Subversion console client try to detect binary file with algorythm:
  1. File is NOT BINARY if it contains only BOM UTF-8 signature (why not check as first N bytes is corret UTF-8?);
  2. File is BINARY if first 1024 bytes contains ZERO byte (uniform distribution of bytes takes change of absent ZERO byte: (1 - 1 / 256) ^ 1024 = ~1.8%);
  3. File is BINARY if first 1024 bytes contains over 85% of characters not in range 0x07-0x0D, 0x20-0x7F (total we have 153 "binary" bytes, ~60%).
This algoritm looks like broken.
 
For example:
  1. File "text.txt":
    Is file contains text block from wikipedia about subversion in UTF-8 (https://ru.wikipedia.org/wiki/Subversion) and unfortunaly contains too many cyrillic charactes (on character - 2 "binary" bytes).
  2. File "binary.txt" detected as "text"
    It was created by "dd if=/dev/urandom of=binary.txt count=1 bs=2048" and unfortunaly does not contains ZERO byte in first 1024 bytes.
 
-- 
Best regards,
Navrotskiy Artem
 
6$ÓAQõ^¥ÝFBfÝex}:^ã`œìØÌ·$N;÷Þ³Sé}ç9Ú¨âÞÞýe/ùò¯¹[æk΋‘Y-žúœ"ÛQ)_+_±³™¼n.çØÃb¤ç-Ú³Ç`ƒ·Ž‡ËÙ=â	æF™Ëœ$dÞÉâú
CJcu¢ÝǕWîëw­¾Vö†ex.‰6Œ™RˆÙ'ÞôˆWµìqPŠ÷»‹ëÀï!¸à¯(™CÆ­±~ ÖŠ -Ø`iQi·X®™ê“+ǽú=Fƒ~,P¯ñ֏¢I`1Ù(gm¨ézq·Ùdɝç÷â$G3 Šp`º‚üFÍñ”,ޜ­*wä1Ó´/€-¤Ä%iè0q»cN9éù,28Q(ä́®u¯2RìȊEj‡‹ÝÝéVPä9ïûñýËÑØe¶˜„zw}1—IÞKÀìTwn‘ÈlíâBígÀ搩‚ëtšc!Å\:\¼àImҋp#é;Ãx\ƒd¿ÂuŒº¤_愻ÜðâÀ gÓ¶žpõ¹ð‰[£0îZjFQ <at> ‹jòY‚€éÔºÃFö;áXWúbÞMÍ¿ZÀän–ŒVi\gÍ˸(‹y…¹†—*.	LùÔ:h1
'T <at> YuUµèßdísUg©aêÁ	Á¯W²BPÚìÒä÷¸™2‰ì˜Ý³+½G¨šÇCXav_P
ûîãâ(ŒÃ{Îïåûv2®S€ÝÇCLÚžñ‰Ëñfq/\?Ó+Q¬ý¼ÁÃmrÏä <at> 6VÔQ?â7ëoË÷ÎçºWs†„ŸLxzE/
Èt¿F(6gðŠ„ƒ1ŸýéÄ+”$M“:ÒE%ô×hˆ¶ÌþÚL°_n55­IF¬ûŽÀXíAÛ­º
“—îl¢£ãs{j5ç׋­Ïj¬¥x©¿Pƒµ‚ü*æV4³¥¹÷ü"þCKù˜	ê(hFÔW£b˜/÷aÁ‰
•¥
ݧg–ÌBñ6¼ÜýgM—Dê©ñg <at> ¯Z?zµÙiS|ù?G§
rž¿'ÇюZ„ebë«ÂuÌ䐆Ðí^øIs:zßß0TD¾	¯Á†Ò{·\È—»¶
–14wºí)ÖN\=¹J€jR‹qíOA¦Å±pp©èW’S·Øeý»|£ôD۝¹]7oŠ{#–A£Öpf'«cx·NÈúd§ÌO´š·‚5Ú>ž×¤×>ngæcN9×ß­H‹½G´°®Ã(£ç»£=Z5Ø!²ÓL¾q”£ø£Güf³Ð¤'YüÇE÷¡ò‘†2øÎÂoB<ÆïFq:Ô¯ß=Öð <at> Ëû“ü ¬é_¸Տ
DzúQþcU <at> 4¶ò­ð§
Xé€/qo•ÍddGqèKXZeù <at> '¦Ë 'ok&ÈEÏùÄݯæóOŠY¿|½š2Š„Ê.¯^®´‚*6ÁNv«­s¿¦(áHmˆ¾!nj¸oàbON˜ ¥¶N’“ˬ0 êj€
”¡•±3EɏLÊI]‚
2$8=îÁMá¡Ü{Óã,
ðWK,g‰^Õ÷Åù(À$;í=89
Ë
Разработка Subversion была начата
в 2000 году по инициативе и при
финансовой поддержке CollabNet.
Инициаторы проекта
хотели создать свободную
систему управления
версиями, в основном
похожую на CVS, но лишённую её
ошибок и неудобств. В то
время не существовало
лучших программ этого
класса со свободной
лицензией, CVS была
стандартом де-факто среди
разработчиков
свободного программного
обеспечения. Выбрав её за
основу, разработчики Subversion
надеялись упростить
разработку за счёт
использования уже
проверенных концепций и в
то же время облегчить
переход на новую систему
многочисленным
пользователям CVS.[15]

Gmane