Re: Is it possible to query by 2 index ranges at the same time
I read the thread from a few days ago regarding OIDs. While I agree with your position, I think it's
beneficial to be able to get a list of OIDs to be able to perform these kinds of operations. I think the
premise from the other thread was that of "relying" on OIDs for relational purposes.
I also like the idea of where the query system may be going. But, independently of that, being able to specify
something like :oid t in the get-instances-by-range function and then be able to "walk" through a set of
oids, specifying the base class and the "walk" mechanism will instantiate those objects. Think of it as a
primitive to the query system.
On Fri, June 1, 2007 2:08 am, Ian Eslick <eslick@...> said:
> That's very insightful and quite correct! I was going to use this
> mechanism as part of the query system I want to get working for 1.0.
> This is pretty much the mechanism that relational databases use to do
> join queries and Elephant will be no different.
> As a matter of discipline, I didn't want to encourage users to mess
> around with oids because of the issues outlined in an e-mail thread a
> few days ago. There will be an internal elephant api to access this
> functionality for advanced users. That function will essentially be
> a copy of map-inverted-index, but using cursor-get to get only the
> oid instead of cursor-pget to get the oid and primary value. I'm
> happy to be argued out of this position, but my experience is that
> people will start to depend on this interface and then complain when
> features are added later that leads to bugs in user code.
> The query system will perform these kinds of optimizations for you.
> The interface will be something like:
> (lambda (obj)
> (render obj screen))
> (select-objects (geometry)
> (between geometry.x 10 20)
> (between geometry.y -5 15)))
> For a class named geometry with indexed slots x and y. If x and y
> are indexed, then the query will be very efficient. If not, then it
> will be a linear walk. I aim to add informative statements and
> statistics to select-objects to help people optimize queries. This
> is similar to the select clause in SQL, but will have features
> oriented towards path queries (object references) and graph queries.
> The select-objects statement effectively returns a generator, which
> will probably be a list of oids, either in memory or maintained on
> disk for large query sets.
> I'm open to suggestions on syntax. I'm not happy with my current
> sketches here.
> PS - If you really want to get OIDs as the user, you can using
> cursors over the inverted index to get the ranges yourself, then you
> can do exactly the intersection operation you mentioned before. Of
> course you need the classes of the oids so selected to recover the
> On May 31, 2007, at 11:43 PM, lists@... wrote:
>> Please excuse if I don't make a direct reference to Elephant
>> solving this in my comment below. However, I remembered reading
>> something just like this in AllegroCache's Reference Manual, in
>> which it said, and I quote:
>> "In a database every object has a unique object identifier (oid).
>> This value can be retrieved using db- object- oid. An oid is an
>> integer. There is no way to determine the class of an object given
>> its oid.
>> Usually a program need not concern itself with oids. However in
>> certain circumstances it may be convenient to work with oids. One
>> such case is when combining the results of multiple indexes over
>> the same class. You may want to ask for the set of objects whose X
>> slot is greater than 10 and whose Y slot is greater than 20. It
>> consumes fewer resources to ask for the oids of objects whose X
>> slot is greater than 10 than to ask for the objects themselves. In
>> the later case the objects retrieved have to be instantiated in the
>> object cache and there's no point in doing that if you don't need
>> all those objects. In this case you don't need all those objects
>> since you only need those objects whose Y slot is also greater
>> then 20. Thus the optimal way to do this query is to find the
>> intersection of the oids corresponding to "X > 10" and those oids
>> with "Y > 20" and then from that intersection find the objects
>> corresponding to the oids."
>> They later document the following function:
>> "retrieve- from- index- range (class slot name
>> initial value end value
>> &key (db *allegrocache*) oid)
>> returns all objects of the given class (a persistent class object
>> or a symbol) whose slot slot name has a value in the range
>> beginning with initial value up to but not including end value. If
>> oid is true then the object id values are returned instead of the
>> It's interesting to note the "oid" key argument. I'm not sure if
>> Elephant's get-instances-by-range supports something like that.
>> But, a similar solution to the problem presented is:
>> "We could find the oids of all the employees whose first name
>> begins with “Jo” and whose last name begins with “F”
>> (intersection (retrieve-from-index-range
>> ‘Employee ‘first-name “Jo”
>> “Jp” :oid
>> ‘Employee ‘last-name “F”
>> “G” :oid
>> given a persistent class Employee with indices on slots first-name
>> and last-name.
>> If Elephant supports something similar to the "oid" key argument,
>> it would be very useful for these kind of queries, where you can
>> get an initial result set and only create the instances of those
>> objects if/when needed.
>> Just my $0.02 :)
>> - Daniel
>> On Thu, May 31, 2007 11:07 pm, Ian Eslick <eslick@...> said:
>>> The easiest way to do this is to follow Robert's suggestion, and
>>> declare slots x and y to be indexed (assuming your parameters are
>>> slot values of a persistent objects) and then say:
>>> (nconc (get-instances-by-range 'my-class 'x 10 20)
>>> (get-instances-by-range 'my-class 'y -5 15)))
>>> Of course you might do a little better, performance-wise, if you
>>> filter the y value as you traverse the x range (or vice-versa).
>>> Since get-instances-by-range is just a wrapper around map-
>>> index that collects visited objects, this could be up to twice as
>>> fast, although in practice I'd expect the benefit to be marginal:
>>> (let ((matches nil))
>>> (map-inverted-index (lambda (obj)
>>> (when (and (> y -5) (< y 15))
>>> (push obj matches)))
>>> This version avoids a second map-inverted-index operation (called by
>>> get-instances-by-range) by only collecting objects if their y
>>> coordinate is in the range. You can wrap all this in a function like
>>> (get-objects-in-region x1 y1 x2 y2). Map over the x or y based on
>>> which query is smaller, or which parameter is sparser. If you don't
>>> know, then it doesn't really matter which you choose!
>>> If you get some performance comparisons of these two scenarios,
>>> please post them to the list!
>>> PS - It would be interesting to see an R-tree (or one of the well-
>>> known variants such as R* or Priority R-Trees) in Elephant. Instead
>>> of using indexed classes, you would directly implement the R-tree
>>> nodes as persistent instances and write the construction and
>>> retrieval algorithms as if they were operating on in-memory objects.
>>> Graphs are a very natural structure to implement in Elephant and the
>>> only special provision I can think of might be maintaining a
>>> persistent set of free nodes for the dynamic case.
>>> On May 31, 2007, at 10:17 PM, Robert L. Read wrote:
>>>> I'm assuming x and y are properites of a data object, which has
>>>> some other component z which
>>>> you with to retrieve, and you query is that you want to find all
>>>> the records (x,y,z) such that
>>>> (10 < x < 20) and ( -5 < y < 15).
>>>> There is a spectrum of solutions to this problem. However, in the
>>>> general case Elephant will
>>>> not compute this with the best possible asymptotic complexity,
>>>> although it may be better than
>>>> a relational database at doing so.
>>>> Here is the most naive solution:
>>>> Examine every object in the database, and report those that meet
>>>> that above condition.
>>>> Although this may seem silly, don't knock it till you try it....it
>>>> may be perfectly reasonable.
>>>> A second solution is to make sure that you have specified :index on
>>>> the x component
>>>> (or, without loss of generality, the y component), and then using
>>>> the "get-instance-by-range"
>>>> feature to get all of the instances within the x range, and use a
>>>> simple lisp function to discard
>>>> those for which the y component does not match.
>>>> This will be relatively fast if the x range excludes a lot of
>>>> objects and the y range doesn't.
>>>> It's not obvious to me that one can do better than that without
>>>> some significant coding
>>>> (For example, a "grid file" is a datastructure designed to answer
>>>> such queries efficiently.
>>>> An R-Tree is another geometric structure.)
>>>> A relational database will not do better, in the worst case. A
>>>> relational database will do
>>>> a better job at statistical sampling---that is, a query optimizer
>>>> should, in a huge database,
>>>> be able to decide whether it should use the "x" index or the "y"
>>>> index first based on the
>>>> selectivity of those clauses in the boolean expression. However,
>>>> fundamentally it can't
>>>> do any better than to pick the best index, use that only examine
>>>> some of the objects, and then
>>>> look at the values of the other ones.
>>>> On Fri, 2007-06-01 at 03:48 +0300, Ignas Mikalajunas wrote:
>>>>> Hi, i have an interesting usecase and i would like to ask whether
>>>>> i can solve it by using Elephant. I would like to index objects
>>>>> placed on a discreet 2d grid of an arbitarry size, which would
>>>>> require queries like (10 < x < 20) AND (-5 < y < 15). Is it
>>>>> possible to perform such a query with elephant? I am not sure i
>>>>> have looked in the right place, but i could not find such example
>>>>> in the manual. Ignas Mikalajūnas
>>>>> _______________________________________________ elephant-devel
>>>>> site list elephant-devel@... http://common-lisp.net/
>>>> elephant-devel site list
>>> elephant-devel site list
>> elephant-devel site list