Junio C Hamano | 29 Jan 2011 01:02
Picon
Picon
Favicon
Gravatar

Re: Can't find the revelant commit with git-log

René Scharfe <rene.scharfe <at> lsrfire.ath.cx> writes:

> Subject: pickaxe: don't simplify history too much
>
> If pickaxe is used, turn off history simplification and make sure to keep
> merges with at least one interesting parent.
>
> If path specs are used, merges that have at least one parent whose files
> match those in the specified subset are edited out.  This is good in
> general, but leads to unexpectedly few results if used together with
> pickaxe.  Merges that also have an interesting parent (in terms of -S or
> -G) are dropped, too.
>
> This change makes sure pickaxe takes precedence over history
> simplification.

Hmmm, I understand the _motivation_ behind the change in the second hunk,
in that you _might_ want to dig the side branch that did not contribute
anything to the end result when looking for a needle with either -S or -G,
but doesn't the same logic apply to things like --grep?

I do not think it is a good idea to unconditionally disable simplification
for -S/G without a way for the user to countermand (even though I could be
persuaded to say that the flipping the default for -S/-G/--grep might have
been a better alternative in hindsight).

The user can control this behaviour by giving or not giving --simplify
from the command line anyway, no?

As to the first hunk, I have no idea why this is a good change.
(Continue reading)

Shawn Pearce | 29 Jan 2011 02:32
Gravatar

Re: [RFC] Add --create-cache to repack

On Fri, Jan 28, 2011 at 13:09, Nicolas Pitre <nico <at> fluxnic.net> wrote:
> On Fri, 28 Jan 2011, Shawn Pearce wrote:
>
>> On Fri, Jan 28, 2011 at 10:46, Nicolas Pitre <nico <at> fluxnic.net> wrote:
>> > On Fri, 28 Jan 2011, Shawn Pearce wrote:
>> >
>> >> This started because I was looking for a way to speed up clones coming
>> >> from a JGit server.  Cloning the linux-2.6 repository is painful,

Well, scratch the idea in this thread.  I think.

I retested JGit vs. CGit on an identical linux-2.6 repository.  The
repository was fully packed, but had two pack files.  362M and 57M,
and was created by packing a 1 month old master, marking it .keep, and
then repacking -a -d to get most recent last month into another pack.
This results in some files that should be delta compressed together
being stored whole in the two packs (obviously).

The two implementations take the same amount of time to generate the
clone.  3m28s / 3m22s for JGit, 3m23s for C Git.  The JGit created
pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB.  I point
out this data because improvements made to JGit may show similar
improvements to CGit given how close they are in running time.

I fully implemented the reuse of a cached pack behind a thin pack idea
I was trying to describe in this thread.  It saved 1m7s off the JGit
running time, but increased the data transfer by 25 MiB.  I didn't
expect this much of an increase, I honestly expected the thin pack
portion to be well, thinner.  The issue is the thin pack cannot delta
against all of the history, its only delta compressing against the tip
(Continue reading)

René Scharfe | 29 Jan 2011 03:34

Re: Can't find the revelant commit with git-log

Am 29.01.2011 01:02, schrieb Junio C Hamano:
> René Scharfe <rene.scharfe <at> lsrfire.ath.cx> writes:
> 
>> Subject: pickaxe: don't simplify history too much
>>
>> If pickaxe is used, turn off history simplification and make sure to keep
>> merges with at least one interesting parent.
>>
>> If path specs are used, merges that have at least one parent whose files
>> match those in the specified subset are edited out.  This is good in
>> general, but leads to unexpectedly few results if used together with
>> pickaxe.  Merges that also have an interesting parent (in terms of -S or
>> -G) are dropped, too.
>>
>> This change makes sure pickaxe takes precedence over history
>> simplification.
> 
> Hmmm, I understand the _motivation_ behind the change in the second hunk,
> in that you _might_ want to dig the side branch that did not contribute
> anything to the end result when looking for a needle with either -S or -G,
> but doesn't the same logic apply to things like --grep?

Yes, that's true.  I have to admit that I'm mostly reacting to the
unintuitive output given in the specific case ("test driven") and
probably don't fully understand the underlying problem and all its
implications.

> I do not think it is a good idea to unconditionally disable simplification
> for -S/G without a way for the user to countermand (even though I could be
> persuaded to say that the flipping the default for -S/-G/--grep might have
(Continue reading)

Shawn Pearce | 29 Jan 2011 03:34
Gravatar

Re: [RFC] Add --create-cache to repack

On Fri, Jan 28, 2011 at 17:32, Shawn Pearce <spearce <at> spearce.org> wrote:
>
> Well, scratch the idea in this thread.  I think.
>
> I retested JGit vs. CGit on an identical linux-2.6 repository.  The
> repository was fully packed, but had two pack files.  362M and 57M,
> and was created by packing a 1 month old master, marking it .keep, and
> then repacking -a -d to get most recent last month into another pack.
> This results in some files that should be delta compressed together
> being stored whole in the two packs (obviously).
>
> The two implementations take the same amount of time to generate the
> clone.  3m28s / 3m22s for JGit, 3m23s for C Git.  The JGit created
> pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB.

I just tried caching only the object list of what is reachable from a
particular commit.  The file is a small 20 byte header:

  4 byte magic
  4 byte version
  4 byte number of commits (C)
  4 byte number of trees (T)
  4 byte number of blobs (B)

Then C commit SHA-1s, followed by T tree SHA-1 + 4 byte path_hash,
followed by B blob SHA-1 + 4 byte path_hash.  For any project the size
is basically on par with the .idx file for the pack v1 format, so ~41
MB for linux-2.6.  The file is stored as
$GIT_OBJECT_DIRECTORY/cache/$COMMIT_SHA1.list, and is completely
pack-independent.
(Continue reading)

Vitor Antunes | 29 Jan 2011 03:41
Picon

Re: [PATCH] git-p4: Corrected typo.

Hi Thomas,

First of all I'd like to thank you on your feedback. It's my first try
on creating submitting a patch, so having someone's guidance helps a
lot :)

I'll rebase my patches against the head of the tree and squash the fix
to avoid multiple commits. While I do that I'll also review my commit
message and sign-off the patches according to what you said. Hopefully
I will be able to do this during this weekend.

From git-diff-tree man page:

"""
-M[<n>]
    Detect renames. If n is specified, it is a is a threshold on the
similarity index (i.e. amount of addition/deletions compared to the
file’s
    size). For example, -M90% means git should consider a delete/add
pair to be a rename if more than 90% of the file hasn’t changed.
"""

But from my latest tests I think that this option is ignored in
diff-tree (I think it's only used in git log). With this in mind I'll
need to add some code to implement the check of the score value of
diff-tree output string. Again from its man page:

"""
Status letters C and R are always followed by a score (denoting the
percentage of similarity between the source and target of the move or
(Continue reading)

Nguyen Thai Ngoc Duy | 29 Jan 2011 04:13
Picon
Gravatar

Re: [PATCH 09/21] tree_entry_interesting(): support depth limit

2011/1/29 Junio C Hamano <gitster <at> pobox.com>:
> Nguyễn Thái Ngọc Duy  <pclouds <at> gmail.com> writes:
>
>>  static const char *get_mode(const char *str, unsigned int *modep)
>>  <at>  <at>  -557,8 +558,13  <at>  <at>  int tree_entry_interesting(const struct name_entry *entry,
>>       int pathlen, baselen = base->len;
>>       int never_interesting = -1;
>>
>> -     if (!ps || !ps->nr)
>> -             return 1;
>> +     if (!ps->nr) {
>> +             if (!ps->recursive || ps->max_depth == -1)
>> +                     return 1;
>> +             return !!within_depth(base->buf, baselen,
>> +                                   !!S_ISDIR(entry->mode),
>> +                                   ps->max_depth);
>> +     }
>
> Back in 1d848f6 (tree_entry_interesting(): allow it to say "everything is
> interesting", 2007-03-21), a new return value "2" was introduced to allow
> this function to tell the caller that all the remaining entries in the
> tree object the caller is feeding the entries to this function _will_
> match.  This was to optimize away expensive pathspec matching done by this
> function.
>
> In that version, "no pathspec" case wasn't changed to return 2 but still
> returned 1 ("I tell you that this does not match; call me with the next
> entry").  We could have changed it to return 2, but the overhead was only
> a call to a function that checks the number of pathspecs and was not so
> bad.
(Continue reading)

Nicolas Pitre | 29 Jan 2011 05:08

Re: [RFC] Add --create-cache to repack

On Fri, 28 Jan 2011, Shawn Pearce wrote:

> On Fri, Jan 28, 2011 at 13:09, Nicolas Pitre <nico <at> fluxnic.net> wrote:
> > On Fri, 28 Jan 2011, Shawn Pearce wrote:
> >
> >> On Fri, Jan 28, 2011 at 10:46, Nicolas Pitre <nico <at> fluxnic.net> wrote:
> >> > On Fri, 28 Jan 2011, Shawn Pearce wrote:
> >> >
> >> >> This started because I was looking for a way to speed up clones coming
> >> >> from a JGit server.  Cloning the linux-2.6 repository is painful,
> 
> Well, scratch the idea in this thread.  I think.
> 
> I retested JGit vs. CGit on an identical linux-2.6 repository.  The
> repository was fully packed, but had two pack files.  362M and 57M,
> and was created by packing a 1 month old master, marking it .keep, and
> then repacking -a -d to get most recent last month into another pack.
> This results in some files that should be delta compressed together
> being stored whole in the two packs (obviously).
> 
> The two implementations take the same amount of time to generate the
> clone.  3m28s / 3m22s for JGit, 3m23s for C Git.  The JGit created
> pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB.  I point
> out this data because improvements made to JGit may show similar
> improvements to CGit given how close they are in running time.

What are those improvements?

Now, the fact that JGit is so close to CGit must be because the actual 
cost is outside of them such as within zlib, otherwise the C code should 
(Continue reading)

Shawn Pearce | 29 Jan 2011 05:35
Gravatar

Re: [RFC] Add --create-cache to repack

On Fri, Jan 28, 2011 at 20:08, Nicolas Pitre <nico <at> fluxnic.net> wrote:
>> pack is actually smaller 376.30 MiB vs. C Git's 380.59 MiB.  I point
>> out this data because improvements made to JGit may show similar
>> improvements to CGit given how close they are in running time.
>
> What are those improvements?

None right now.  JGit is similar to CGit algorithm-wise.  (Actually it
looks like JGit has a faster diff implementation, but that's a
different email.)

If you are asking about why JGit created a slightly smaller pack
file... it splits the delta window during threaded delta search
differently than CGit does, and we align our blocks slightly
differently when comparing two objects to generate a delta sequence
for them.  These two variations mean JGit produces different deltas
than CGit does.  Sometimes we are smaller, sometimes we are larger.
But its a small difference, on the order of 1-4 MiB for something like
linux-2.6.  I don't think its worthwhile trying to analyze the
specific differences in implementations and retrofit those differences
into the other one.

What I was trying to say was, _if_ we made a change to JGit and it
dropped the running time, that same change in CGit should have _at
least_ the same running time improvement, if not better.  I was
pointing out that this cached-pack change dropped the running time by
1 minute, so CGit should also see a similar improvement (if not
better).  I would prefer to test against CGit for this sort of thing,
but its been too long since I last poked pack-objects.c and the
revision code in CGit, while the JGit equivalents are really fresh in
(Continue reading)

Junio C Hamano | 29 Jan 2011 06:47
Picon
Picon
Favicon
Gravatar

Re: Can't find the revelant commit with git-log

René Scharfe <rene.scharfe <at> lsrfire.ath.cx> writes:

> Perhaps we should check my underlying assumption first: is it reasonable
> to expect a git log command to show the same commits with and without a
> path spec that covers all changed files?

The simplest case would be "git log ." vs "git log" from the root level of
the repository, right?  Traditionally, the former is "please show _one_
simplest history that can explain how the current commit came to be"
(i.e. with merge simplification), while the latter is "please list
everything that is behind the current commit" (i.e. without), I think.

It feels unintuitive, but my understanding of the rationale behind the
design is that, the expectation Linus had when he first did the pathspec
limited traversal was that most of the time "git log $path" is used to get
an explanation.  It follows that having to say "git log --simplify $path"
would have been a nuisance, so "with pathspec, we simplify" was thought to
be a reasonable default.

Dmitry S. Kravtsov | 29 Jan 2011 11:01
Picon
Gravatar

Features from GitSurvey 2010

Hello,

I want to dedicate my coursework at University to implementation of
some useful git feature. So I'm interesting in some kind of list of
development status of these features
https://git.wiki.kernel.org/index.php/GitSurvey2010#17._Which_of_the_following_features_would_you_like_to_see_implemented_in_git.3F

Or I'll be glad to know what features are now 'free' and what are
currently in active development.

Best Regards
--

-- 
Dmitry S. Kravtsov

Gmane