Re: [GSoC] HAMMER compression and new unionfs
Naohiro Aota <naota <at> elisp.net>
2011-04-01 09:46:05 GMT
Michael Neumann <mneumann <at> ntecs.de> writes:
> Am Dienstag, den 29.03.2011, 00:48 +0900 schrieb Naohiro Aota:
>> I'm Naohiro Aota, undergraduate student at Osaka University, Japan.
>> Last year I've participated GSoC with Gentoo and worked on porting
>> Gentoo system to DragonFly. Since then I'm so interested in DragnFly
>> kernel, so I'd like to take part in GSoC with some DragnFly kernel work
>> this year. I've read the project page and get interested these two
>> ideas: HAMMER compression and new unionfs. (yes, I like filesystem ;))
>> I have some question about the ideas.
>> about HAMMER compression:
>> - "compression could be turned on a per-file" may support all files
>> under "/foo" get compressed?
> Individual blocks of data will be compressed, so that it could happen
> that a file contains uncompressed and compressed data blocks. You only
> have to record a flag whether a given block is compressed (or not) and
> uncompress/compress it transparently before passing it to/from the
> buffer cache. The decision whether to compress a block when writing a
> file can be many-fold: Either a filesystem-wide flag (all files created
> within this filesystem will by default be compressed), a recursivly
> inherited per-directory flag (a new file that gets created inside this
> directory will be compressed), or what is also feasible is that the
> compression is done by the reblocker, i.e. as a background process, so
> that you will never directly write compressed data "online" (this could
> be a starting point).
so if I have a fully uncompressed file like this ("|" indicate block
file Foo: |ABC|DEF|GH|
then it get partly compressed, it become:
file Foo: |<compressed 1>|EFG|H|
finally when all blocks compressed, it become:
file Foo: |<compressed 1>|<compressed 2>|
Is this right?
Implementation process would be:
- Implement userland tool to set compression to a file (hammer set-compress <file> ?)
- Implement systemcalls or such to be used by the tool
- Implement userland tool to search and compress blocks (hammer compress ?)
- Implement the ioctl or such
- Check if blocks are really compressed
- Improve: implement per-directory flag
- Improve: implement per-filesystem flag
- (documentat the feature and the implementation)
Anything to do otherwise?
> As we keep historical data for a longer period of time (this is how
> HAMMER works and we like it), compression could increase the amount of
> historical data that we can store. As most of the historical data is
> only very infrequently accessed (they mainly serve as backup), the
> decompression must not be hyper-performant (IMHO), but of course an
> acceptable performance is desirable (due to slow disk reads, compression
> could even lead to faster access).
>> - file size measurement commands, such as "df", "du" and "ls", also need
>> to change? (actual disk space size and file size may differ if compressed)
> I think is will be enough to display the uncompressed file size, not the
> compressed one, so no changes should be required. Note that we also have
> deduplication and that "du" and "ls" will not show IMHO the actual disk
> space used.