Cheshire3 Object Model: Class PreParser

Module baseObjects :: Class PreParser

Class PreParser
source code

Object Tree:
           object --+    
                    |    
configParser.C3Object --+
                        |
                       PreParser
Known Subclasses:
multivalent.MultivalentPreParser, multivalent.MvdPdfPreParser, preParser.AmpPreParser, preParser.B64DecodePreParser, preParser.B64EncodePreParser, preParser.BzipPreParser, preParser.CharacterEntityPreParser, preParser.GzipPreParser, preParser.HtmlSmashPreParser, preParser.HtmlTidyPreParser, preParser.MarcToSgmlPreParser, preParser.MarcToXmlPreParser, preParser.NormalizerPreParser, preParser.UrlPreParser, preParser.PdfToTxtPreParser, preParser.PdfToXmlPreParser, preParser.PrintableOnlyPreParser, preParser.RegexpSmashPreParser, preParser.SgmlPreParser, preParser.TagStripPreParser, preParser.TxtToXmlPreParser, textmining.tmPreParser.PosPreParser, textmining.tmPreParser.GeniaTextPreParser, textmining.tmPreParser.TsujiiChunkerPreParser

A PreParser takes a Document and creates a second one. For example, the input document might consist solely of a URL. The output would be a Document with the data that the PreParser has fetched from that address. This functionality allows for work flow chains to be strung together in many ways, and perhaps in ways which the original implemention had not foreseen.

Instance Methods

process_document(self, session, doc)
Take a Document, transform it and return a new Document object.

Inherited from configParser.C3Object: __init__, auth_function, get_config, get_default, get_object, get_path, get_setting, log_function, unauth_function, unlog_function

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__


Class Variables

Inherited from configParser.C3Object: configStore, defaults, functionLogger, id, name, objectType, objects, parent, paths, permissionHandlers, settings, subConfigs, unresolvedObjects

Inherited from object: __class__


Method Details

process_document(self, session, doc)

source code 
Take a Document, transform it and return a new Document object.