Cheshire3 Configuration Files

Introduction

As Cheshire3 is so flexible and modular in the way that it can be implemented and then the pieces fitted together, it requires configuration files to set up which pieces to use and in which order. The configuration files are also very modular, allowing as many objects to be defined in one file as desired and then imported as required. They are put together from a small number of elements, with some additional constructions for specialised objects.

Example configurations are included in the Cheshire3 distribution and may be used as a base on which to build.

Every object in the system that is not instantiated from a request or as the result of processing requires a configuration section. Many of these configurations will just contain the object class to instantiate and an identifier with which to refer to the object. Object constructor functions are called with the top DOM node of their configuration and another object to be used as a parent. This allows a tree heirarchy of objects, with a Server at the top level. It also means that objects can handle their own specialised configuration elements, while leaving the common elements to the base configuration handler.

The main elements will be described here, the specialised elements and values will be described in object specific pages.

Example
<config type="database" id="db_l5r">
  <objectType>database.SimpleDatabase</objectType>
  <paths>
    <path type="defaultPath">/home/cheshire/c3/cheshire3/l5r</path>
    <path type="metadataPath">metadata.bdb</path>
    <object type="recordStore" ref="l5rRecordStore"/>
  </paths>
  <options>
    <setting type="log">handle_search</setting>
  </options>
  <subConfigs>
    <subConfig type="parser" id="l5rAttrParser">
      <objectType>parser.SaxParser</objectType>
      <options>
        <setting type="attrHash">text@type</setting>
     </options>
    </subConfig>
    <subConfig id = "l5r-idx-1">
      <objectType>index.SimpleIndex</objectType>
      <paths>
        <object type="indexStore" ref="l5rIndexStore"/>
      </paths>
      <source>
        <xpath>/card/name</xpath>
        <process>
          <object type="extractor" ref="ExactExtractor"/>
          <object type="normalizer" ref="CaseNormalizer"/>
        </process>
      </source>
    </subConfig>
    <path type="index" id="l5r-idx-2">configs/idx2-cfg.xml<path>
  </subConfigs>
  <objects>
    <path ref="l5RAttrParser"/>
    <path ref="l5r-idx-1"/>
  </objects>
</config>
			
<config>

The top level element of any configuration file is the config element, and contains at least one object to construct. It should have an 'id' attribute containing an identifier for the object in the system, and a 'type' attribute specifying what sort of object is being created.

If the configuration file is not for the top level Server, this element must contain an <objectType> element. It may also contain one of each of <docs>, <paths>, <subConfigs>, <options> and <objects>.

<objectType>

This element contains the module and class to use when instantiating the object. If the class does not come from one of the base Cheshire3 modules, then it must be imported. See the <imports> section. Otherwise it just uses the regular module.class Python syntax.

<imports>

[Coming soon]

<docs>

This element may be used to provide configured object level documentation.
e.g. to explain that a particular tokenizer splits data into sentences based on some pre-defined pattern.

<paths>

This element may contain <path> and <object> elements to be stored when building the object in the system. Path elements are used for storing a file path to a resource required by the object. Object elements are used to create references to other objects in the system by their identifier, for example the default recordStore used by the database.

<path>

This element is used to refer to a path to a resource and has several attributes to govern this.

<object>

A reference to an object in the system. It has two attributes, the 'type' of object and 'ref' for the object's identifier.

<options>

This section may include one or more <setting> (a value that can't be changed) and <default> (a value that can be overridden in a request).

<setting> and <default>

Settings and Defaults have a 'type' attribute to specify which setting/default the value is for and the contents of the element is the value for it. Each class will have different setting and default types.

<subConfigs>

This wrapper element contains one or more <subConfig> elements. Each subConfig has the same model as the config, and hence a nested tree of configurations and subConfigurations can be constructed. It may also contain <path> elements with a file path to another file to read in and treat as further subConfigurations.

Cheshire3 employs 'Just In Time' instantiation of objects. That is to say they will be instantiated when required by the system, or when requested from their parent object in a script.

<subConfig>

This element has the same model as the <config> element to allow for nested configurations. 'id' and 'type' attributes are mandatory for this element.

<objects>

The objects element contains one or more path elements, each with a reference to an identifier for a subConfiguration. This reference acts as an instruction to the system to actually instantiate the object from the configuration (after which time it would be refered to with an object element).

Note Well that while this is no longer required (due to the implementation of 'Just In Time' object instantiation) it remains in the configuration schema as there are still situation in which this may be desirable.
e.g. to instantiate objects with long spin-up times at the server level.

Object Specific Elements