Feature requests related to sparse files
<markk <at> clara.co.uk>
2011-03-15 12:34:22 GMT
Here are a couple of suggestions relating to star's support for sparse files.
1. Allow -force-hole to apply to archive creation, not just extraction.
The -force-hole option tells star to extract all files from an archive
sparsely. That is, for files in the archive which contain regions of
all-zero bytes, star creates corresponding holes in the output files.
The advantage of doing that is the sparse files created occupy less disk
space and are faster to work with. (Reading holes is very fast since no
actual disk I/O is needed.) However, -force-hole only applies on archive
extraction, not creation.
Suppose you have some non-sparse files which have large all-zero regions.
You want to archive them efficiently. What you'd like to happen, is for
star to scan the files for all-zero regions, and archive them as sparse
files. The tar archive will thus be much smaller, and future extraction
much faster. The archive will likely be more compressible and compression
will be faster.
The -sparse option has no effect, since it applies only to files which are
already sparse. As a workaround, you could create sparse copies like this:
$ mv bigfile bigfile.old
$ cp --archive --sparse=always bigfile.old bigfile ; rm bigfile.old
Then archive those using -sparse. But that wastes time and (temporarily)
disk space. Allowing -force-hole to apply to archive creation would solve
2. Add an option to have star create the archive file sparsely
There may be cases where you don't want to archive files sparsely, perhaps
for interchange with tar implementations on other systems. However, it
would still be possible to reduce the disk space used by the archive. When
creating an archive, where the output is a normal file (not a fifo, tape
or stdout), star could write the archive sparsely. For each chunk of data
to be written to the archive file, star would check if it is all-zero; if
so seek forward in the archive instead of actually writing the data.
In some cases that could be a big performance win; imagine archiving a
file with large all-zero regions, where the archive file is on the same
disk as the source file. Since no disk I/O is needed to write the all-zero
regions in the .tar archive, disk thrashing would be much reduced.