Re: darcs patch: switch Darcs.Patch.FileName to be ByteString.Char8 int...
Neil Mitchell <ndmitchell <at> gmail.com>
2011-11-06 13:23:43 GMT
Sorry for the ridiculous delay in replying, but perhaps the information is still useful. I think having a filepath-bytestring package could be very useful for path heavy apps such as darcs. However, before doing that I would suggest you profile inside the filepath library. It was written for correctness, not speed, and there are plenty of places I traverse a string many more times than necessary. There are plenty of tests, so any performance improvements should be easy to check.
On Friday, September 25, 2009, Jason Dagit <dagit <at> codersbase.com> wrote:
> On Thu, Sep 24, 2009 at 1:13 AM, Reinier Lamers <reinier.lamers <at> gmail.com> wrote:
>> Hi Jason,
>> 2009/9/24 Jason Dagit <dagit <at> codersbase.com>:
>> > Doing the above test I have discovered that Darcs.Patch.FileName is a
>> > very costly module. It is costly mainly in terms of space usage. The
>> > space usage forces the garbage collector to run far too frequently and
>> > this burns up CPU time, allocates a ton of virtual memory, and wastes
>> > siginificant amounts of ram. On my test machine, the virtual memory
>> > usage is just over 1GB when profiling, and uses 400-500 megs of
>> > physical ram.
>> Great that you do this research! If we keep up the current pace of
>> performance hacking, darcs will complete before you even hit the enter
>> key in a few years
> Heh. Thanks. I haven't had any real luck improving things yet though. In fact, at Ganesh's request I think I'm giving up on optimizing the darcs.net source any further. I've moved on to working with darcs-hs. I just sent Petr a patch for hashed-storage that makes zipTrees significantly faster on my test case and now darcs-hs can run my test in about 23 seconds (regular darcs is about 29 seconds, so yay!). zipTrees could probably be further improved but at this point it's no longer on the radar as a slow point in the code so I'm moving on to other functions.
> System.FilePath is one of the big slow downs now. I wonder if we need a System.FilePath.ByteString version? I don't know if it would help. The real problem is that we do a lot of path munging that we should perhaps not be doing.
> Hashed.Storage.AnchoredPath.floatPath looks like this:
> -- | Take a relative FilePath and turn it into an AnchoredPath. The operation
> -- is unsafe and if you break it, you keep both pieces. More useful for
> -- exploratory purposes (ghci) than for serious programming.
> floatPath :: FilePath -> AnchoredPath
> floatPath = AnchoredPath . map (Name . BS.pack) . splitDirectories
> . normalise . dropTrailingPathSeparator
> The expensive parts are as follows (from most expensive to least):
> 1. normalise
> 2. BS.pack
> 3. splitDirectories
> splitDirectories and normalise both come from System.FilePath. Neil, do you think a ByteString filepath would help?
> The other thing I don't understand here is the haddock for this function. What does it split? I don't understand what pieces it makes; it certainly seems that it just returns an AnchoredPath. If it's a joke then I don't understand it (I also don't think a joke belongs in a haddock because it's confusing). The other odd thing is that it's used quite a lot in Darcs.IO. If it's unsafe and meant for interactive use why do we rely on it so much? Is that a bug waiting to happen? Petr, comments please?
darcs-users mailing list
darcs-users <at> darcs.net