Hurry Query =========== The hurry query system for the Zope 3 catalog builds on catalog indexes as defined in Zope 3 core, as well as the indexes in zc.catalog. It is in part inspired by AdvancedQuery for Zope 2 by Dieter Maurer, though has an independent origin. Setup ----- Let's define a simple content object. First its interface:: >>> from zope.interface import Interface, Attribute, implements >>> class IContent(Interface): ... f1 = Attribute('f1') ... f2 = Attribute('f2') ... f3 = Attribute('f3') ... f4 = Attribute('f4') ... t1 = Attribute('t1') ... t2 = Attribute('t2') And its implementation:: >>> from zope.app.container.contained import Contained >>> class Content(Contained): ... implements(IContent) ... def __init__(self, id, f1='', f2='', f3='', f4='', t1='', t2=''): ... self.id = id ... self.f1 = f1 ... self.f2 = f2 ... self.f3 = f3 ... self.f4 = f4 ... self.t1 = t1 ... self.t2 = t2 ... def __cmp__(self, other): ... return cmp(self.id, other.id) The id attribute is just so we can identify objects we find again easily. By including the __cmp__ method we make sure search results can be stably sorted. We use a fake int id utility here so we can test independent of the full-blown zope environment:: >>> from zope import interface >>> import zope.app.intid.interfaces >>> from zope.app.testing import ztapi >>> class DummyIntId(object): ... interface.implements(zope.app.intid.interfaces.IIntIds) ... MARKER = '__dummy_int_id__' ... def __init__(self): ... self.counter = 0 ... self.data = {} ... def register(self, obj): ... intid = getattr(obj, self.MARKER, None) ... if intid is None: ... setattr(obj, self.MARKER, self.counter) ... self.data[self.counter] = obj ... intid = self.counter ... self.counter += 1 ... return intid ... def getObject(self, intid): ... return self.data[intid] ... def __iter__(self): ... return iter(self.data) >>> intid = DummyIntId() >>> ztapi.provideUtility( ... zope.app.intid.interfaces.IIntIds, intid) Now let's register a catalog:: >>> from zope.app.catalog.interfaces import ICatalog >>> from zope.app.catalog.catalog import Catalog >>> catalog = Catalog() >>> ztapi.provideUtility(ICatalog, catalog, 'catalog1') And set it up with various indexes:: >>> from zope.app.catalog.field import FieldIndex >>> from zope.app.catalog.text import TextIndex >>> catalog['f1'] = FieldIndex('f1', IContent) >>> catalog['f2'] = FieldIndex('f2', IContent) >>> catalog['f3'] = FieldIndex('f3', IContent) >>> catalog['f4'] = FieldIndex('f4', IContent) >>> catalog['t1'] = TextIndex('t1', IContent) >>> catalog['t2'] = TextIndex('t2', IContent) Now let's create some objects so that they'll be cataloged:: >>> content = [ ... Content(1, 'a', 'b', 'd'), ... Content(2, 'a', 'c'), ... Content(3, 'X', 'c'), ... Content(4, 'a', 'b', 'e'), ... Content(5, 'X', 'b', 'e'), ... Content(6, 'Y', 'Z')] And catalog them now:: >>> for entry in content: ... catalog.index_doc(intid.register(entry), entry) Now let's register a query utility:: >>> from hurry.query.query import Query >>> from hurry.query.interfaces import IQuery >>> ztapi.provideUtility(IQuery, Query()) Set up some code to make querying and display the result easy:: >>> from zope.app import zapi >>> from hurry.query.interfaces import IQuery >>> def displayQuery(q): ... query = zapi.getUtility(IQuery) ... r = query.searchResults(q) ... return [e.id for e in sorted(list(r))] FieldIndex Queries ------------------ Now for a query where f1 equals a:: >>> from hurry.query import Eq >>> f1 = ('catalog1', 'f1') >>> displayQuery(Eq(f1, 'a')) [1, 2, 4] Not equals (this is more efficient than the generic ~ operator):: >>> from hurry.query import NotEq >>> displayQuery(NotEq(f1, 'a')) [3, 5, 6] Testing whether a field is in a set:: >>> from hurry.query import In >>> displayQuery(In(f1, ['a', 'X'])) [1, 2, 3, 4, 5] Whether documents are in a specified range:: >>> from hurry.query import Between >>> displayQuery(Between(f1, 'X', 'Y')) [3, 5, 6] You can leave out one end of the range:: >>> displayQuery(Between(f1, 'X', None)) # 'X' < 'a' [1, 2, 3, 4, 5, 6] >>> displayQuery(Between(f1, None, 'X')) [3, 5] You can also use greater-equals and lesser-equals for the same purpose:: >>> from hurry.query import Ge, Le >>> displayQuery(Ge(f1, 'X')) [1, 2, 3, 4, 5, 6] >>> displayQuery(Le(f1, 'X')) [3, 5] It's also possible to use not with the ~ operator:: >>> displayQuery(~Eq(f1, 'a')) [3, 5, 6] Using and (&):: >>> f2 = ('catalog1', 'f2') >>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b')) [1, 4] Using or (|):: >>> displayQuery(Eq(f1, 'a') | Eq(f2, 'b')) [1, 2, 4, 5] These can be chained:: >>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b') & Between(f1, 'a', 'b')) [1, 4] >>> displayQuery(Eq(f1, 'a') | Eq(f1, 'X') | Eq(f2, 'b')) [1, 2, 3, 4, 5] And nested:: >>> displayQuery((Eq(f1, 'a') | Eq(f1, 'X')) & (Eq(f2, 'b') | Eq(f2, 'c'))) [1, 2, 3, 4, 5] "and" and "or" can also be spelled differently:: >>> from hurry.query import And, Or >>> displayQuery(And(Eq(f1, 'a'), Eq(f2, 'b'))) [1, 4] >>> displayQuery(Or(Eq(f1, 'a'), Eq(f2, 'b'))) [1, 2, 4, 5] Combination of In and & ----------------------- A combination of 'In' and '&':: >>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z'])) [1, 2, 3, 4, 5, 6] >>> displayQuery(In(f1, ['Z'])) [] >>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']) & In(f1, ['Z'])) [] SetIndex queries ---------------- The SetIndex is defined in zc.catalog. Let's make a catalog which uses it:: >>> intid = DummyIntId() >>> ztapi.provideUtility( ... zope.app.intid.interfaces.IIntIds, intid) >>> from zope.app.catalog.interfaces import ICatalog >>> from zope.app.catalog.catalog import Catalog >>> catalog = Catalog() >>> ztapi.provideUtility(ICatalog, catalog, 'catalog1') >>> from zc.catalog.catalogindex import SetIndex >>> catalog['f1'] = SetIndex('f1', IContent) >>> catalog['f2'] = FieldIndex('f2', IContent) First let's set up some new data:: >>> content = [ ... Content(1, ['a', 'b', 'c'], 1), ... Content(2, ['a'], 1), ... Content(3, ['b'], 1), ... Content(4, ['c', 'd'], 2), ... Content(5, ['b', 'c'], 2), ... Content(6, ['a', 'c'], 2)] And catalog them now:: >>> for entry in content: ... catalog.index_doc(intid.register(entry), entry) Now do a a 'any of' query, which returns all documents that contain any of the values listed:: >>> from hurry.query.set import AnyOf >>> displayQuery(AnyOf(f1, ['a', 'c'])) [1, 2, 4, 5, 6] >>> displayQuery(AnyOf(f1, ['c', 'b'])) [1, 3, 4, 5, 6] >>> displayQuery(AnyOf(f1, ['a'])) [1, 2, 6] Do a 'all of' query, which returns all documents that contain all of the values listed:: >>> from hurry.query.set import AllOf >>> displayQuery(AllOf(f1, ['a'])) [1, 2, 6] >>> displayQuery(AllOf(f1, ['a', 'b'])) [1] >>> displayQuery(AllOf(f1, ['a', 'c'])) [1, 6] We can combine this with other queries:: >>> displayQuery(AnyOf(f1, ['a']) & Eq(f2, 1)) [1, 2] ValueIndex queries ------------------ The ``ValueIndex`` is defined in ``zc.catalog`` and provides a generalization of the standard field index. >>> from hurry.query import value Let's set up a catalog that uses this index. The ``ValueIndex`` is defined in ``zc.catalog``. Let's make a catalog which uses it: >>> intid = DummyIntId() >>> ztapi.provideUtility(zope.app.intid.interfaces.IIntIds, intid) >>> from zope.app.catalog.interfaces import ICatalog >>> from zope.app.catalog.catalog import Catalog >>> catalog = Catalog() >>> ztapi.provideUtility(ICatalog, catalog, 'catalog1') >>> from zc.catalog.catalogindex import ValueIndex >>> catalog['f1'] = ValueIndex('f1', IContent) Next we set up some content data to fill the indices: >>> content = [ ... Content(1, 'a'), ... Content(2, 'b'), ... Content(3, 'c'), ... Content(4, 'd'), ... Content(5, 'c'), ... Content(6, 'a')] And catalog them now: >>> for entry in content: ... catalog.index_doc(intid.register(entry), entry) Let's now query for all objects where ``f1`` equals 'a': >>> f1 = ('catalog1', 'f1') >>> displayQuery(value.Eq(f1, 'a')) [1, 6] Next, let's find all objects where ``f1`` does not equal 'a'; this is more efficient than the generic ``~`` operator: >>> displayQuery(value.NotEq(f1, 'a')) [2, 3, 4, 5] You can also query for all objects where the value of ``f1`` is in a set of values: >>> displayQuery(value.In(f1, ['a', 'd'])) [1, 4, 6] The next interesting set of queries allows you to make evaluations of the values. For example, you can ask for all objects between a certain set of values: >>> displayQuery(value.Between(f1, 'a', 'c')) [1, 2, 3, 5, 6] >>> displayQuery(value.Between(f1, 'a', 'c', exclude_min=True)) [2, 3, 5] >>> displayQuery(value.Between(f1, 'a', 'c', exclude_max=True)) [1, 2, 6] >>> displayQuery(value.Between(f1, 'a', 'c', ... exclude_min=True, exclude_max=True)) [2] You can also leave out one end of the range: >>> displayQuery(value.Between(f1, 'c', None)) [3, 4, 5] >>> displayQuery(value.Between(f1, None, 'c')) [1, 2, 3, 5, 6] You can also use greater-equals and lesser-equals for the same purpose: >>> displayQuery(value.Ge(f1, 'c')) [3, 4, 5] >>> displayQuery(value.Le(f1, 'c')) [1, 2, 3, 5, 6] Of course, you can chain those queries with the others as demonstrated before. The ``value`` module also supports ``zc.catalog`` extents. The first query is ``ExtentAny``, which returns all douments matching the extent. If the the extent is ``None``, all document ids are returned: >>> displayQuery(value.ExtentAny(f1, None)) [1, 2, 3, 4, 5, 6] If we now create an extent that is only in the scope of the first four documents, >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(4): ... extent.add(i, i) then only the first four are returned: >>> displayQuery(value.ExtentAny(f1, extent)) [1, 2, 3, 4] The opposite query is the ``ExtentNone`` query, which returns all ids in the extent that are *not* in the index: >>> id = intid.register(Content(7, 'b')) >>> id = intid.register(Content(8, 'c')) >>> id = intid.register(Content(9, 'a')) >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(9): ... extent.add(i, i) >>> displayQuery(value.ExtentNone(f1, extent)) [7, 8, 9]