Improved Header Scan Cache for Jam
Matt Armstrong <matt+jamming.7904ac <at> lickey.com>
2002-01-03 19:01:55 GMT
I just submitted code to //guest/matt_armstrong/jam/hdrscan_cache that
implements a header scan cache for Jam.
This code is an incremental improvement over Craig McPheeters'
original version in //guest/craig_mcpheeters/jam/src/. I've talked
with Craig and he plans to roll most or all of my changes into his
I have even higher hopes -- I'd like it to make it into stock jam.
- A header scan cache can improve things when HDRGRIST is in use.
For example, with stock Jam if you always set HDRGRIST to
$(SOURCE_GRIST), standard headers such as /usr/include/stdio.h
will now get scanned once for each SubDir. With the header scan
cache, common headers will be scanned only once.
This makes it practical to always use HDRGRIST. This means that
the stock Jambase can support multiple header files of the same
name. I think this rectifies a frequently encountered weakness
It is important to point out that you get this benefit
regardless of whether the cache is saved to disk.
- The header scan cache is persistent across runs of Jam only if
the user wants it (controlled via the HCACHEFILE variable). So
by default Jam will not sprinkle cache files all of the source
tree, and it is possible to use LOCATE to put the persistent
copy of the cache in, e.g., a build output directory.
Storing the header cache on disk can bring real benefits. On
the medium sized project I use jam for, it seems to speed jam
startup (time to first build action) by a factor of 6. People
are happy to wait 15 seconds instead of 90.
It is important to point out that about half of this speedup
occurs even if the cache is not persistent, since our project
makes heavy use of HDRGRIST to correctly find all the header
files in the project.
- The cache is implemented in such a way that it can never change
the semantics of what Jam does. The call to a target's HDRRULE
will be identical with or without the cache code.
Here is the text of the README.header_scan_cache that is part of the
This change implements a header scan cache in a form that
(cross fingers) can be incorporated into the stock version of
This code is taken from //guest/craig_mcpheeters/jam/src/ on
the Perforce public depot. Many thanks to Craig McPheeters
for making his code available. It is delimited by the
OPT_HEADER_CACHE_EXT #define within the code.
Jam has a facility to scan source files for other files they
might include. This code implements a cache of these scans,
so the entire source tree need not be scanned each time jam is
run. This brings the following benefits:
- If a file would otherwise be scanned multiple times in a
single jam run (because the same file is represented by
multiple targets, perhaps each with a different grist),
it will now be scanned only once. In this way, things
are faster even if the cache file is not present when
Jam is run.
- If a cache entry is present in the cache file when Jam
starts, and the file has not changed since the last time
it was scanned, Jam will not bother to re-scan it. This
markedly increaces Jam startup times for large projects.
This code has improvements over Craig McPheeters' original
version. I've described all of these changes to Craig and he
intends to incorporate them back into his version. The
- The actual name of the cache file is controlled by the
HCACHEFILE Jam variable. If HCACHEFILE is left unset
(the default), reading and writing of a cache file is
not performed. The cache is always used internally
regardless of HCACHEFILE, which helps when HDRGRIST
causes the same file to be scanned multiple times.
Setting LOCATE and SEARCH on the the HCACHEFILE works as
well, so you can place anywhere on disk you like or even
search for it in several directories. You may also set
it in your environment to share it amongst all your
- The .jamdeps file is in a new format that allows binary
data to be in any of the fields, in particular the file
names. The original code would break if a file name
contained the ' <at> ' or '\n' characters. The format is
also versioned, allowing upgrades to automatically
ignore old .jamdeps files. The format remains human
readable. In addition, care has been taken to not add
the entry into the header cache until the entire record
has been successfully read from the file.
- The cache stores the value of HDRPATTERN with each cache
entry, and it is compared along with the file's date to
determine if there is a cache hit. If the HDRPATTERN
does not match, it is treated as a cache miss. This
allows HDRPATTERN to change without worrying about stale
cache entries. It also allows the same file to be
scanned multiple times with different HDRPATTERN values.
- Each cache entry is given an "age" which is the maximum
number of times a given header cache entry can go unused
before it is purged from the cache. This helps clean up
old entries in the .jamdeps file when files move around
or are removed from your project.
You control the maximum age with the HCACHEMAXAGE
variable. If set to 0, no cache aging is performed.
Otherwise it is the number of times a jam must be run
before an unused cache entry is purged. The default for
HCACHEMAXAGE if left unset is 100.
- Jambase itself is changed.
SubDir now always sets HDRGRIST to $(SOURCE_GRIST) so
header scanning can deal with multiple header files of
the same name in different directories. With the header
cache, this does no longer incurs a performance penalty
-- a given file will still only be scanned once.
The FGristSourceFiles rule is now just an alias for
FGristFiles. Header files do not necessarily have
global visibility, and the header cache eliminates any
performance penalty this might otherwise incur.
Because of all these improvements, the following claims can be
made about this header cache implementation that can not be
made about Craig McPheeters' original version.
- The semantics of a Jam run will never be different
because of the header cache (the HDRPATTERN check
- It will never be necessary to delete .jamdeps to fix
obscure jam problems or purge old entries.