Cheshire3 Object Model: Class Document

Module baseObjects :: Class Document

Class Document
source code

Known Subclasses:
document.StringDocument

A Document is the raw data which will become a record. It may be processed into a Record by a Parser, or into another Document type by a PreParser. Documents might be stored in a DocumentStore, if necessary, but can generally be discarded. Documents may be anything from a JPG file, to an unparsed XML file, to a string containing a URL. This allows for future compatability with new formats, as they may be incorporated into the system by implementing a Document type and a PreParser.

Instance Methods

__init__(self, data, creator="", history=[], mimeType="", parent=None)
The constructer takes the data which should be used to construct the document.
get_raw(self)
Return the raw data associated with this document.

Class Variables

documentStore  
id  
mimeType  
parent  
processHistory  
text  

Method Details

__init__(self, data, creator="", history=[], mimeType="", parent=None)
(Constructor)

source code 
The constructer takes the data which should be used to construct the document. This is implementation dependant. It also optionally may take a creator object, process history information and a mimetype. The parent option is for documents which have been extracted from another document, for example pages from a book.

get_raw(self)

source code 
Return the raw data associated with this document.

Class Variable Details

documentStore

Value:
''                                                                     
      

id

Value:
-1                                                                    
      

mimeType

Value:
''                                                                     
      

parent

Value:
('', '', -1)                                                           
      

processHistory

Value:
[]                                                                     
      

text

Value:
''