org.writersforge.catalan.transform.text
Class Normalizer

java.lang.Object
  extended byorg.writersforge.catalan.transform.BaseNodeProcessor
      extended byorg.writersforge.catalan.transform.text.TextProcessor
          extended byorg.writersforge.catalan.transform.text.Normalizer
All Implemented Interfaces:
org.writersforge.bellows.traverse.NodeProcessor

public class Normalizer
extends TextProcessor

A text processor which cleans up the input text content according to its set of configurable rules. It is primarily used to shrink whitespace. This processor accepts only String input, and produces only String output.

Author:
jsheets

Constructor Summary
Normalizer()
          Creates a new instance of Normalizer.
Normalizer(org.writersforge.bellows.Datum xml)
          Creates a new instance of Normalizer from the XML spec.
 
Method Summary
 void addExclusion(java.lang.String startToken, java.lang.String endToken)
          Adds an exclusion region, for example quoted data that should not be normalized.
protected  java.util.List processText(java.lang.String text)
          Processes the text node.
 void setNormalizeTokens(java.lang.String[] tokens)
          Assigns a new set of tokens to normalize.
 void setResolver(java.lang.String resolver)
          Assigns the text that the processor substitutes for normalized text.
 java.lang.String toString()
          Convert this object to a String value.
 
Methods inherited from class org.writersforge.catalan.transform.text.TextProcessor
processNode
 
Methods inherited from class org.writersforge.catalan.transform.BaseNodeProcessor
addLeftover, end, getLeftovers, getNodes, start
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Normalizer

public Normalizer(org.writersforge.bellows.Datum xml)
Creates a new instance of Normalizer from the XML spec.

Parameters:
xml - XML initializing spec

Normalizer

public Normalizer()
Creates a new instance of Normalizer.

Method Detail

setNormalizeTokens

public void setNormalizeTokens(java.lang.String[] tokens)
Assigns a new set of tokens to normalize. The processor will collapse all consecutive spans of normalization tokens into a single occurence of the resolver token. The default normalization tokens are the space character (" "), the tab character ("\t"), the line feed character ("\n"), and the return character ("\r").

Parameters:
tokens - an array of the new normalization tokens
See Also:
setResolver(String)

setResolver

public void setResolver(java.lang.String resolver)
Assigns the text that the processor substitutes for normalized text. The default resolve text is a single space character (" ").

Parameters:
resolver - the text to insert in place of normalized text

addExclusion

public void addExclusion(java.lang.String startToken,
                         java.lang.String endToken)
Adds an exclusion region, for example quoted data that should not be normalized. A start and an end token are needed for each region. By default, the processor has no exclusions. To add exclusions for both single and double quotes, you would do something like this:
  Normalizer normalizer = new Normalizer ();
  normalizer.addExclusion ("'", "'");
  normalizer.addExclusion ("\"", "\"");
 

Parameters:
startToken - the token which starts the exclusion region
endToken - the token which ends the exclusion region

processText

protected java.util.List processText(java.lang.String text)
Processes the text node. Not called for non-String nodes.

Specified by:
processText in class TextProcessor
Parameters:
text - the text to process
Returns:
the processed text

toString

public java.lang.String toString()
Convert this object to a String value.

Returns:
stringified object