Configuring PreParsers


PreParsers are used to convert documents when they are being introduced to the system into a form in which they can be most easily processed.

They typically only do one thing, and as such do not have extensive configuration sections.


Example preParser configurations:

01 <subConfig type="preParser" id="SgmlPreParser">
02   <objectType>extractor.SgmlPreParser</objectType>
03   <options>
04     <setting type="emptyElements">lb ptr extptr hr<setting>
05   </options>
06 </subConfig>
08 <subConfig type="preParser" id="CharacterEntityPreParser">
09   <objectType>extractor.CharacterEntityPreParser</objectType>
10 </subConfig>


There's obviously not much to say, as these objects only do one thing and don't have a lot of options or paths to set. The first example is one of the only ones that does, and has a list of empty SGML elements to be converted to empty XML elements (eg <hr> -> <hr/>)

Some of the currently available PreParsers: