3 Jan 2005 03:25
Re: dividing B-trees
Olly Betts <olly <at> survex.com>
2005-01-03 02:25:06 GMT
2005-01-03 02:25:06 GMT
I've been thinking about this more. I realised a while ago that where we know all possible keys in a block have a common prefix, we don't need to store all of each key. For example, if the relevant dividing keys in the level above are "mutilate" and "muzzle", all keys in the block must start with "mu", so we can avoid storing these 2 characters repeatedly. In fact we don't need to explicitly store that we've removed them, or even explicitly store how many we've not store - this can easily be seen from the dividing keys. With this scheme, when we split a block, we'll quite often get a longer common prefix on one or both subblocks. In that case, we need to scan through the keys and trim off the extra byte or bytes. But this means if we split a block which would end up over 100% full, we can easily end up with two blocks both < 50% full. This probably gives us more latitude for picking a good split point, and indeed the choice of split point may be able to assist the prefix removal. The delete case is also interesting - when we remove an empty block, we remove a dividing key in the level above, and this may result in the adjacent block having a shorter common prefix (not in keys that are currently in it, but in keys which could be). Which could result in it becoming over 100% full, and needing to be split! Sequential mode seems trickier though. If we're inserting at the end of the table (a very common case), we don't have a dividing key to bound above, so unless the preceding diving key starts with '\xff', we must store keys whole. When the block fills, we insert a dividing key, at which point we'll often be able to remove a prefix for all keys in the block at which point it won't be full. With 8K blocks (the default),(Continue reading)
RSS Feed