4 Jun 03:10
Some offset issues with Open Calais
I fiddled around a bit more with this, trying various things that actually connected to the service. I finally figured out that if you send the string "xxx & yyy" to the service, it actually processes the string <Document><Title>1212537108630-85FDAB4B-292518</Title><Date>2008-06-03</Date><Body>xxx & yyy</Body></Document> or something like that. And that returned offsets are relative to this string. To correct the offsets returned so that they correspond to what you sent looks like it has 2 parts: the first part - the prefix "<Document ... <Body>" is pretty easily accounted for. The send part, expanding & to & requires more work. Other characters are also converted, some strangely. I've seen the usual: < converted to <, > converted to > The character " seemed to be converted to &quot; All this is apparently a "bug" - their forum includes a post saying the problem with the "&" will be fixed in the next release. I've posted a reply to their forum asking about other characters beside the "&". One final note: their API says that for the POST method, content sent using that method needs to be escaped. I think that means the kind of(Continue reading)
--Thilo
Eddie Epstein wrote:
> The CAS reference passed to the annotator process method changes when
> Sofa capabilities are declared. See
>
RSS Feed