Cheshire3 Installation

Introduction

The following instructions will hopefully walk you through installing Cheshire3 and its prerequisites from scratch under Linux (or any Unix). If you have troubles at any stage, feel free to contact us.

Easy Install

This is the easy, preferred, method for installing Cheshire3.

  1. Download the latest version of the software from http://www.cheshire3.org/download/latest. This includes one or more packages and a shell script, build.sh, which will compile everything for you.
  2. Put all of the packages you want to install and build.sh in the directory you want to use as the home directory for Cheshire3. It is recommended to make a new user called 'cheshire' and running it in the user's home directory. If you cannot run it as a new user, then putting it in $HOME/cheshire3/ is the recommended location.
  3. Run build.sh, go and have a coffee for 20 minutes while everything compiles.
  4. Come back, and follow the instructions on your screen. Normally this involves adding the 'export' commands to your shell's init file (for example ~/.bashrc) and possibly changing the web server's configuration.
  5. Proceed to database configuration :)
Install by Hand

These are the requirements for Cheshire3, if you want to install everything by hand.

The links below are for if you want to check if there's a more recent version than the one we have in our build packages. Note that these will not have been tested, won't necessarily be supported, but might have useful bug fixes.

MinimumCurrentLocationNote
expat 1.95.81.95.8http://sourceforge.net/projects/expat/(see note)
BerkeleyDB 4.04.4.20http://www.sleepycat.com/ 
Python 2.3.02.4.3http://www.python.org/ 
4Suite 1.0a3-cvs1.0b3http://sourceforge.net/projects/foursuite/ 
ZSI 1.5-cvs1.7http://sourceforge.net/projects/pywebsvcs/2.0 not supported
PyZ39502.06http://www.panix.com/~asl2/software/PyZ3950/ 
python-dateutil 0.91.1http://labix.org/python-dateutil 
SRW 1.11.1-3http://srw.cheshire3.org/downloads/ 
libxml2 2.6.102.6.26http://www.xmlsoft.org/ 
libxslt 1.1.81.1.17http://www.xmlsoft.org/ 
lxml 1.0.11.0.3http://codespeak.net/lxml/ 
numarray 1.5.11.5.1http://www.stsci.edu/resources/software_hardware/numarray 
TextIndexNG 2.13.1.9http://www.zopyx.com/OpenSource/TextIndexNG 
apache 2.0.422.0.58http://httpd.apache.org/ 
mod_python 3.1.03.2.10http://www.modpython.org/ 
Cheshire3 0.90.9.9http://www.cheshire3.org/ 
Installing

To ensure that you have the most recent version of these instructions, it is suggested that you also follow along in the build.sh script.

Below is a set of instructions to install all of the requirements in user space, rather than globally. If you want to install globally omit the --prefix from the configurations. The example location is '/home/cheshire/install' and the source is being decompressed in /home/cheshire/build

Before embarking on the process below, you'll need to have a C compiler (we strongly recommend GCC) and make utility installed along with the appropriate libraries. You probably already do but just in case, these utilities can often be found under 'Development Tools' in package management applications provided in *nix distributions.

If you don't install everything in one session, you'll need to ensure that the environment variables are reset:

export CPPFLAGS=-I/home/cheshire/install/include
export LDFLAGS=-L/home/cheshire/install/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cheshire/install/lib
  1. Install Expat

    Expat is the XML parser library that everything links to, so you'll need to install this first. Libxml2 is an alternative parser, but you'll need expat regardless as it's linked by Apache, Python and 4Suite.

    ./configure --prefix=/home/cheshire/install
    make
    make install

    Python 2.4+ and 4Suite 1.0a4+ both include expat version 1.95.8. Previously the included versions were different and this could cause problems running under Apache. The 'minimum' versions have now been updated to these, but if you want to hack in the same version of expat to Python/4Suite or not run under Apache, then previous versions will be okay. You do not need to install this package if it is already present on your system. Most *nix distributions include Expat. Check that the version installed is 1.95.8, if not you probably want to install 1.95.8 globally rather than local to cheshire.

  2. Install BerkeleyDB

    BerkeleyDB is a very fast transactional database system. It's used by such giants as Ebay, Amazon and IBM. For the purposes of Cheshire3, it is 10+ times faster than a relational database used for the same job. It's used by all of the Store interfaces (indexes, records, configurations and other objects).

    cd build_unix
    ../dist/configure --prefix=/home/cheshire/install
    make
    make install

    BerkeleyDB is generally present in most Linux systems. If there is a version 4 or greater, then this is unnecessary.

  3. Install Python

    Python is the language that all of the main operational functions are written in, as opposed to the raw number crunching which is mostly done by C libraries. It's easy to understand and maintain, enabling developers to get right into the nitty gritty if desired, but without significant sacrifices to performance.

    MacOSX note: use --enable-framework in the configure to build as a framework.

    ./configure --prefix=/home/cheshire/install
    make
    make install
  4. Install 4Suite

    4Suite is the best XML processing library available at the current time for Python. We use it for XPath and XSLT processing, as well as most DOM creation. See libxml2 for an alternative, but currently it does not support SAX2 under Python, nor does it produce unicode objects, just strings.

    python ./setup.py build
    python ./setup.py install
  5. Install ZSI

    ZSI is the best Python SOAP toolkit. The most recent version (1.5) comes with a WSDL compiler, however it's not yet quite up to SRW.

    The most recent version of ZSI either requires PyXML to be installed (which is not otherwise required for Cheshire3, and can conflict with 4Suite which is a better package) or to use the CVS version. The CVS version is available in the Cheshire3 FTP site.

    python ./setup.py build
    python ./setup.py install
  6. Install TextIndexNG (Optional)

    TextIndexNG includes a wrapper around the Snowball stemming language. It provides interfaces to stemmers which reduce a word down to its lexical stem, eg 'princesses' to 'princess' or 'understanding' to 'understand'. TextIndexNG comes with a lot of other code, which is also installed, but not used. The author of the package was approached regarding splitting out the stemming library, but was not amenable to the suggestion.

    python ./setup.py install
  7. Install PyZ3950

    This package is required even if you don't want to enable Z39.50 interfaces as it contains the CQL libraries used in all C3 queries. It's also used by the Z3950SearchDocumentStream as a Z client.

    You may need to install lex and yacc by hand first, as these are required to build the ASN.1 compiler.

    cp lex.py yacc.py /home/cheshire/install/lib/python2.4/site-packages/
    python ./setup.py install
  8. Install SRW (Optional)

    This very small package contains the stubs for ZSI as well as a quick SRW demo client in python. If you're not going to enable an SRW/U interface, or the SRWSearchDocumentStream, you can omit it.

    python ./setup.py install
  9. Install DateUtils (Optional)

    The DateUtils code provides an excellent free text date parser (though doesn't currently handle multiple dates in the same block of text)

    python ./setup.py install
  10. Install libxml2 (Optional)

    This library is faster than expat and comes with its own XPath and XSLT implementations. It's not required if you don't feel like it, but it does parse XML really fast!

    ./configure --prefix=/home/cheshire/install --with-python
    make
    make install
    cd python
    python ./setup.py install
  11. Install libxslt (Optional)

    A companion library to libxml2 to process XSLT.

    ./configure --prefix=/home/cheshire/install --with-python
    make
    make install
  12. Install Apache (Optional)

    Try to make sure that it links the version of expat you just installed by checking the output of configure to see whereabouts it's linking. Apache is only required if you want to have a remote interface to the Cheshire3 databases, eg by SRW/U, OAI or Z39.50. However using regular CGI calls rather than mod_python (below) handlers will be much slower as the infrastructure takes a second or so to configure and instantiate.

    export CPPFLAGS=-I/home/cheshire/install/include
    export LDFLAGS=-L/home/cheshire/install/lib
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cheshire/install/lib
    ./configure --prefix=/home/cheshire/install --enable-mods=all --with-berkeley-db=/home/cheshire/install --enable-suexec
    make
    make install

    Apache is also generally present in most systems, however you must ensure that it is run with the right environment variable so that it will link against the libraries that have been installed. Also, you'll need to ensure that the user which Apache is run as has read (and potentially write) access to the databases which the index and record data is maintained in.

  13. Install mod_python (Optional)

    If you haven't installed Apache, you can skip this section. Mod_python allows Apache to run python code internally to handle connections and requests. Each apache thread gets its own python interpreter which is only started once and left running. This means that the Cheshire3 architecture only needs to be built once, rather than per invocation.

    ./configure --prefix=/home/cheshire/install --with-python=/home/cheshire/install/bin/python2.4 --with-apxs=/home/cheshire/install/bin/apxs
    make
    make install
  14. Install PVM (Optional)

    The Parallel Virtual Machine library is a very fast, low transaction cost parallelization system. It lets you run processes on multiple machines and compiles for multiple platforms. As Python and hence Cheshire3 will run on multiple platforms without any additional effort this means that you can build a completely heterogeneous cluster without any difficulties.

    [Coming]
  15. Install PyPVM (Required if PVM is installed)

    This is the Python wrapper around the PVM library.

    [Coming]
  16. If you run into issues with the 'sort' utility breaking, check you have the latest version of textutils installed.

    ./configure --prefix=/home/cheshire/install
    make
    make install
Configuration
  1. Environment variables required if you have installed this as a local user.

    export LD_LIBRARY_PATH=/home/cheshire/install/lib
    export LD_RUN_PATH=/home/cheshire/install/lib

    Also ensure that Apache is run with these environment variables (envvars / envvars-std files with httpd binary)

  2. Configure Apache.
    • The standard configuration is typically sufficient to start with. Add:
      Include conf/cheshire3.conf
    • And then in cheshire3.conf:

      			
      # Load mod_python
      LoadModule python_module modules/mod_python.so
      
      # SRW/U interface at /srw/dbname
      <Directory /home/cheshire/install/htdocs/srw>
        SetHandler mod_python
        PythonDebug On
        PythonPath "['/home/cheshire/cheshire3/code']+sys.path"
        PythonHandler srwApacheHandler
      </Directory>
      
      # Z3950 interface on 2100
      Listen 2100
      <VirtualHost *:2100>
            PythonPath "['/home/cheshire/cheshire3/code']+sys.path"
            PythonConnectionHandler zApacheHandler
            PythonDebug On
      </VirtualHost>
      			
      			
  3. Configure Cheshire3

    See further documentation.

Troubleshooting

If you get strange errors from mod_python under Linux, first trying restarting Apache. If this fails with a No space left on device error when there obviously is space, then you've hit the semaphore problem.
The fix is:

echo "512 32000 32 512" > /proc/sys/kernel/sem

Or see:

http://clarens.sourceforge.net/index.php?docs+faq