Cheshire3 Objects: Index


An Index is an object which defines an access point into records and is responsable for extracting that information from them. It can then store the information extracted in an IndexStore. The entry point can be defined using one or more XPath expressions, and the extraction process can be defined using a workflow chain of standard objects. These chains must start with an Extractor, but from there might then include PreParsers, Parsers, Transformers, Normalizers and even other Indexes.

A simple chain for a stemmed keyword index might be:

Whereas a chain which takes a URL from an attribute, fetches it and then extracts title keywords from it might look like:


Further documentation is available on configuring Cheshire3 Indexes:

__init__domNode, parentObject The constructer takes a DOM tree containing the configuration of the index and the object which the database should consider as its parent, normally a database.
index_recordsession, record Index a record according to this index's rules
remove_recordsession, record Remove the terms of a record from our index
update_recordsession, newRecord, oldRecord Remove the terms from the old record and add the new
searchsession, searchClause, databaseresultSetProcess search query against self and return resultSet
scansession, searchClause, numberOfTerms, directionlist of term informationHandle a scan/browse request
sortsession, resultSetsresultSetSort resultsets according to the extraction rules of this index.
begin_indexingsession Set up for batch mode indexing (many records will be indexed, optimize and save data on commit)
commit_indexingsession Save batch mode indexing