Hendrik Pagenhardt <Hendrik.Pagenhardt <at> gmx.net>
2004-01-09 13:15:43 GMT
always searching for easy hacks to improve performance of GNUnet, I
looked at the source of "estimateAvailableBlocks" in the MYSQL module.
What I really want is to reduce the number of calls to this function
(I'll try that at the weekend), but for now I have a question about the
First I noticed, that the average row length is multiplied with the
number of rows (costing two SQL statements), when MYSQL returns the size
of the data file in the "Data_length" column of SHOW TABLE STATUS. So
row counting should not be necessary. Secondly this calculation omits
the space the table indexes use. As those currently add another 20% of
the table data size to the space consumed by the content storage, I
would suggest to either document that the quota is used for the data
files only, or, preferably, to adjust the calculation to include the
index size. That could be easily done by adding the "Data_length" and
"Index_length" columns of SHOW TABLE STATUS, still saving the row count.
A way to shrink the indexes, and possibly speed up data access at a
whole, might be to not use a primary key on the bucket tables. It could
be replaced by an index over the first 3 bytes of the hash. Randomness
assumed, this would be sufficient to distinctly identify 256^3 rows,
which is an order of magnitude higher than what could be placed in a
bucket. I tested this on my system with a copied GNUnet table (125MB
data file, 25 MB index) and the index shrank to less than half (9 MB).
Query performance was as fast or even faster this way. And the reduced
index size should help caching.
The only drawback is, that the "writeContent" method must be rewritten