17 May 19:52
UIMA internals memory footprint
Hi all, I have begun getting seeing heavy memory use when processing largish documents through a UIMA pipeline. I wanted to make sure what I'm seeing with regard to UIMA's internal memory use is on par with expectations. It looks like either for a 1,500,000 byte or a 15,000,000 byte document with the same annotations (100,000 10-character annotations), we incur a ~13 MB "overhead" for internal UIMA data structures. Is this in line with expectations? Details: In the interest of narrowing down the issue, I made a very simple test annotator which mimics what my annotators do. The annotator creates a document of N bytes which is set in a view in the CAS, then it transforms the bytes to an HTML string that is then set in a view in the CAS. Next, for each view, the annotator creates 50,000 annotations. Each annotation has two 5-character attributes. I profiled my application using two profilers (JProbe and YourKit) and took heap snapshots before and after processing was performed and saw similar results. I know there's a lot going on under the hood, so I'm trying to get an idea of what kind of size factor I can expect for a given document size. Right now, according to my calculations and verified by the profiler, the expected memory usage for just my data (i.e. the two views of the document and the strings making up the annotations) is:(Continue reading)
--Thilo
RSS Feed