org.writersforge.catalan.transform
Class Transformer

java.lang.Object
  |
  +--org.writersforge.catalan.transform.Transformer

public class Transformer
extends java.lang.Object

A XML driven data transformer for converting data from one format to another. Performs a series of operations against a List of input data objects. The input data will change content and size as it goes through the chain of transforms. Currently supports the following processors:

The following XML loads the Transformer instance with a series of transforms. Comments for each transform are interspersed below.
 <transform>
 
Simple text search and replace. Replaces all occurrences of the 'oldtext' attribute in the input data with the value of the 'newtext' attribute. Corresponds to the TextReplacer processor. The example below will replace all hyphen characters with "X" characters. So input data with three text nodes [ "a--ple", "------", "e-ample" ] will become [ "aXXple", "XXXXXX", "eXample" ].
   <replace oldtext='-' newtext='X'/>
 
Simple text search and replace with limited replacements. Only replaces the first count characters in each node. The count resets for each new node. The example below would convert the input data [ "a--ple", "------", "e-ample" ] to [ "aXXple", "XX----", "eXample" ].
   <replace oldtext='-' newtext='X' count='2'/>
 
Variable replacement. Replaces UNIX-style variables with static text. Variables take the form of "${variable}". The extra markup is part of the variable, and is removed when the variable is replaced. For example, the text node "The ${version} version of ${product}" would become "The 0.1.3 version of Catalan" when run through the transform below.
   <lookup>
     <var name='product' text='Catalan'/>
     <var name='version' text='0.1.3'/>
   </lookup>
 
Variable replacement with alternate markup. Replaces custom-style variables. The start-token and end-token attributes define the alternate markup for the variables. For example, the text node "The @version@ version of @product@" would become "The 0.1.3 version of Catalan" when run through the transform below.
   <lookup start-token='@' end-token='@'>
     <var name='product' text='Catalan'/>
     <var name='version' text='0.1.3'/>
   </lookup>
 
Token splitter. Chops up input data by the given tokens. The example below would convert the input data [ "one,two::three;", ":four:" ] into [ "one", "two", "three", "four" ].
   <tokenize>
     <token>,</token>
     <token>;</token>
     <token>:</token>
   </tokenize>
 
Token splitter with tokens. Chops up input data by the given tokens and keeps the tokens in the output. The example below would convert the input data [ "one,two::three;", ":four:" ] into [ "one", ",", "two", ":", ":", "three", ";", ":", "four", ":" ].
   <tokenize include-delimiters='yes'>
     <token>,</token>
     <token>;</token>
     <token>:</token>
   </tokenize>
 
Input data concatenator. Converts all input data nodes into String form and concatenates them all together into a single String. Converts Datum trees to XML with DatumWriter, and calls String.valueOf() on everything else. Output from a simple <concat> transform will always be a single node with String data. For example, given input data of [ "one", new Integer(2), <three/>, <four><five/></four> ], where the XML data is actually a Datum tree, the transform below would result in literal String output of [ "one2<three/>\n<four>\n <five/>\n</four>\n" ]. The extra whitespace is a by-product of DatumWriter.
   <concat/>
 
Input data concatenator with limited node count. Converts input data nodes into String form until it reaches the specified node count or runs out of input data. Any unprocessed nodes are passed to the output untouched. For example, given the input data in the example above, the following example would create output of [ "one2<three/>\n", <four><five/></four> ]. The first three input data nodes are concatenated and the fourth node, the second Datum tree, is passed through as a non-Stringified Datum tree.
   <concat count='3'/>
 
Whitespace normalizer. Converts all consecutive spans of whitespace into single space characters. By default, the space (" "), tab ("\t"), and return ("\n" and "\r") characters are considered to be whitespace. The string "  \t\t white space \r\n " becomes " white space ".
   <normalize/>
 
Normalizer with custom whitespace. Resolves all spans of custom tokens into single custom output characters. If any <token> elements are defined, the default whitespace tokens no longer apply. You can change the token the whitespace resolves to, with the resolver attribute. In the example below, all consecutive spans of space and tab characters will resolve to the '#' character. Thus, the string "  \t white space \t\n " would become "#white#space#\n#".
   <normalize resolver='#'>
     <token> </token>
     <token>\t</token>
   </normalize>
 
Whitespace normalizer with custom exclusion areas. The exclusion areas can be delimited by a single token, with the delim attribute, or with different start and end tokens, using the start-delim and end-delim attributes. The transform below would convert the text "--one--|---two--|--three----[--four-five---]--six" into "XoneX|---two--|XthreeX[--four-five---]Xsix", normalizing anything not in an exclusion area delimited by '|...|' or '[...]'.
   <normalize resolver='X'>
     <token>-</token>
     <exclude delim='|'/>
     <exclude start-delim='[' end-delim=']'/>
   </normalize>
 

Object to ASCII converter. This processor packs simple Java objects into a packed ASCII data string according to the field specification described in AsciiFieldManager. It does its best to convert the input data objects into the field types in the spec. Any input data that doesn't fit in the spec are passed through, untouched. Input data that does not fit into its field will be clipped, which will unfortunately result in data loss. Input data nodes are not consumed on padding fields. For example, given the input data [ 12345, "one", "two", "three", "four" ], the processor and field spec below would result in output data of [ "1234  onetwothr   ", "four" ]. The default padding for 'x' fields is the space character.
   <to-ascii spec='4i 2x 3s[3] 3x'/>
 
Object to ASCII converter with custom padding. The padding attribute lets you change the default padding. The padding string is repeated across all padding fields. Given the input data from the previous example, the output data would be [ "1234ABonetwothrCDA", "four" ].
   <to-ascii spec='4i 2x 3s[3] 3x' padding='ABCD'/>
 
ASCII to Object converter. This processor performs the same transformation as <to-ascii> except in reverse. The input data is one or more blocks of packed ASCII data, and the output is the set of exploded Java objects and arrays from all the input data. For example, given the input data [ "1234--onetwothrxxx" ], the output data would be [ 1234, new String[] { "one", "two", "thr" } ]. The padding "--" and "xxx" are ignored. The input data [ "1234..onetwothrABC" ] would result in exactly the same output data.
   <from-ascii spec='4i 2x 3s[3] 3x'/>
 
Object to Datum converter. This processor channels input data nodes into an XML structure (actually a Bellows Datum tree), based on a push/pop stack of formatting directives. The <start-element/> element pulls the next input data node, converts it to a string, and uses that as the element name. The <end-element/> directive closes the current element. It's possible to nest elements to arbitrary depths. The <attribute/> element pulls the next two input nodes, using the first for the attribute name and the second for the attribute value. Finally, the <pcdata/> element appends the current input node to the PCDATA content of the current element. For example, the input data [ "one", "two", "three", "four", "five", "six" ] processed by the below transform would create a Datum tree corresponding to the XML: "<one three='four' five='six'>two</one>". If the input data contains more nodes than the <to-xml> transform uses, the extra nodes will be copied directly to the output, after the Datum tree.
   <to-xml>
     <start-element/>
     <pcdata/>
     <attribute/>
     <attribute/>
     <end-element/>
   </to-xml>
 
Object to Datum converter with static content. By default, the <to-xml> transform pulls all of its non-markup content from the input data nodes. However, it is possible to override that content with static text inside the transform. The element name can be set with the 'name' attribute; the attribute content can be set with the 'name' and 'value' attributes; and PCDATA content can be set by simply including it as PCDATA in the <pcdata> element. Statically set values do not consume input data. Thus, input data of [ "one", "two", "three", "four", "five", "six" ] processed with the below transform would result in output data of [ "<staticroot attr1='two' attr2='staticval'>one--three</staticroot>", "four", "five", "six" ]. The unused input nodes are passed through to the output.
   <to-xml>
     <start-element name='staticroot'/>
     <pcdata/>
     <attribute name='attr1'/>
     <attribute name='attr2' value='staticval'/>
     <pcdata>--</pcdata>
     <pcdata/>
     <end-element/>
   </to-xml>
 </transform>
 
Datum to Object converter. The <from-xml> processor decomposes Datum XML trees into component Java objects. The <query> elements select which parts of the XML document to operate on; each <from-xml> processor can contain more than one query, and queries can be nested inside of each other. Nested queries act upon the set of Datum objects selected by the parent query, with a relative path. Inside the query, commands select the content to place in the output. The <property> command looks up the named XML attribute in all selected elements. The <type> command places the element name in the output. The <datum> command copies the Datum object itself into the output. The <int>, <string>, and <float> commands place static nodes into the output. Those commands in the example below would result in new Integer(1), "two", and new Double(3.3) output nodes.
   <from-xml>
     <query path='root/child'>
       <property name='prop1'/>
       <property name='prop2'/>
       <int value='1'/>
       <string value='two'/>
       <float value='3.3'/>
       <query path='child/*[@use=yes]'>
         <type/>
         <property name='id'/>
       </query>
     </query>
     <query path='root/child[2]'>
         <datum/>
     </query>
   </from-xml>
 
XML to JavaBean converter. Maps an XML document into a JavaBean instance. Requires two input nodes: the JavaBean class, as either a String or a Class instance; and a Datum tree. The converter will do its best to recursively load the XML data into the JavaBean, matching element and attribute names to JavaBean properties. Supports many different naming styles, e.g., "my-bean", "my_bean", "MY-BEAN", "MY_BEAN", "MyBean", and "myBean". Also searches for primitive properties as both child elements and attributes.
   <xml-to-bean/>
 
JavaBean to XML converter. Generates an XML Datum tree from a JavaBean instance, converting JavaBean properties into XML-style element names, e.g., "my-bean", "my-bean-property". By default, creates all properties as nested child elements.
   <bean-to-xml/>
 
JavaBean to XML converter with collapsed attributes. Generates an XML Datum tree from a JavaBean instance, storing all primitive JavaBean properties as attributes instead of elements. Still creates child elements for complex properties like nested JavaBeans.
   <bean-to-xml collapse='true'/>
 
JavaBean to XML converter with alternate naming styles. Generates an XML Datum tree from a JavaBean instance, using different element and/or attribute (if collapse='true') naming styles. The examples below represent the styles "my-bean", "my-bean", "my_bean", "MY-BEAN", "MY_BEAN", "MyBean", and "myBean", respectively.
   <bean-to-xml naming-style='default'/>
   <bean-to-xml naming-style='lower-hyphen'/>
   <bean-to-xml naming-style='lower-underscore'/>
   <bean-to-xml naming-style='upper-hyphen'/>
   <bean-to-xml naming-style='upper-underscore'/>
   <bean-to-xml naming-style='case-delim'/>
   <bean-to-xml naming-style='javabean'/>
 
Composite processors. This is an organizational wrapper which lets you group multiple processors into a single unit. Groups can be nested to arbitrary depths. Plain groups work on all types of data content.
   <group>
     <concat/>
     <normalize/>
   </group>
 
Composite processors with XML filter. This is a composite processor which lets you select a subset of an XML tree to operate on.
   <group select='root/child'>
     <xform-rename new-name='new-root'/>
     <group select='child/*'>
       <xform-rename new-name='grandchild'/>
     </group>
   </group>
 
XML processor to insert static element content. Evaluates the select query and creates a copy of the nested content in each of the matched query nodes. The example below would place a copy of the full <newContent>element, including any attributes, into all <child> elements immediately inside the base <root> element of the input tree. As with all xform processors, all non-Datum content is passed through to the output untouched.
   <xform-insert select='root/child'>
      <newContent>
        <subContent1/>
        <subContent2 prop='value'/>
      </newContent>
    </xform-insert>
 
XML processor to insert static attribute content. Creates static attributes in all matched query nodes. This example would create a 'newProp' attribute with the value of 'newValue' on all selected <child> elements.
   <xform-insert select='root/child'
     attribute="newProp" value="newValue"/>
 
XML processor to delete element content. Permanently deletes all selected nodes. This example removes all <child> elements inside the <root> element.
   <xform-delete select='root/child'/>
 
XML processor to delete attribute content. Permanently deletes the named attribute from all selected elements. The transform below removes the 'origProp' attribute from all selected <child> elements.
   <xform-delete select='root/child' 
     attribute='origProp'/>
 
XML processor to duplicate element content. Creates a new copy of the selected elements at each element that the 'dest' query selects. This example would make copies of all selected <child> elements and put them into each selected <to> element. If more than one destination node is selected, the processor will make multiple copies of the same source elements. The transform does not alter the original content, e.g., the 'root/child' nodes below.
   <xform-copy select='root/child' dest='root/to'/>
 
XML processor to duplicate attribute content. Collects all matched attributes and places all of them in each destination node, similar to the element copy processor. If more than one attribute is copied into the same destination node, the second and later attributes are mangled to keep the attribute names unique, by appending numbers to the duplicated attributes. Thus, if the example below matches three 'origProp' attributes in the selected <child> elements, the processor will create the attributes 'origProp', 'origProp2', and 'origProp3' in each destination element.
   <xform-copy select='root/from/child' dest='root/to' 
     attribute='origProp'/>
 
XML processor to move element content. Moves element content to other parts of the XML tree. Behaves exactly like the copy processor, except it deletes all the source nodes. If the destination selects more than one node, the source nodes will be copied separately to each destination node.
   <xform-move select='root/from/child' dest='root/to'/>
 
XML processor to move attribute content. Moves attributes to other elements in the XML tree. Behaves exactly like the copy processor, except it deletes all the source attributes. If the destination selects more than one attribute, the attributes will be copied separately to each destination node, with any necessary attribute name mangling.
   <xform-move select='root/from/child' dest='root/to' 
     attribute='origProp'/>
 
XML processor to rename elements. Renames all selected elements to the new name. The example below would rename all selected <child> elements to <newChild>.
   <xform-rename select='root/child' new-name='newChild'/>
 
XML processor to rename attributes. Renames the specified attribute in all selected elements. The example below would rename all 'oldProp' attributes in the selected <child> elements to 'newProp'.
   <xform-rename select='root/child' 
     attribute='oldProp' new-name='newProp'/>
 
XML processor to increase element nesting. Wraps the selected elements with a newly created wrapper element. In the example below, the processor would place all selected <child> elements into <child-wrap> elements, without losing their place in the <root> element. Thus, after the transform, the same <child> elements could be selected with a query of 'root/child-wrap/child'.
   <xform-wrap select='root/child' wrapper='child-wrap'/>
 
XML processor to decrease element nesting. Removes all selected elements without deleting the child content of those elements. Essentially a non-recursive delete. The inline transform is the opposite of the wrap transform. All inlined content is inserted in place; if an inlined element has more than one child, all children will be inserted into the parent where the former inlined element was. This may offset the index counts of later elements. All attributes in the inlined elements are lost. Thus, the transform below would convert the sample input data into the sample output data below:
   <xform-inline select='root/child'/>
 
INPUT:
   <root>
     <child>
       <grandchild1/>
       <grandchild2/>
     </child>
     <other/>
     <child>
       <grandchild3/>
     </child>
   </root>
 
OUTPUT:
   <root>
     <grandchild1/>
     <grandchild2/>
     <other/>
     <grandchild3/>
   </root>
 
XML processor to convert attributes into elements. Converts attributes into PCDATA elements. For each of the selected nodes, the processor will move the requested attribute into a child element, placing the content into PCDATA inside the element. In the example below, an element "<child prop='value'/>" would become "<child><prop>value</prop></child>". Elements without the attribute will not be altered.
   <xform-to-element select='root/child' attribute='prop'/>
 
XML processor to convert element PCDATA content into attributes. The reverse of <xform-to-element>, this processor converts PCDATA elements into attributes on the parent element. First it extracts all PCDATA from all selected elements and appends it together into a single string, then assigns it to the named attribute. The entire content of all nodes becomes one attribute, and any attributes in the selected nodes are lost. The transform below would convert the input data into the output data shown below.
   <xform-to-attribute select='root/child' attribute='prop'/>
 
INPUT:
   <root>
     <child child-prop='child-prop-value'>CHILD1</child>
     <notChild>NOT-CHILD</notChild>
     <child>CHILD2</child>
   </root>
 
OUTPUT:
   <root prop='CHILD1CHILD2'>
     <notChild>NOT-CHILD</notChild>
   </root>
 
XML processor to change naming styles. Recursively converts the selected elements and their attributes into the requested naming style. Uses the same styles as the <bean-to-xml> transform above. An optional select parameter specifies which branches to convert; if the select query is omitted, the processor will convert the entire tree.
   <xform-style new-style='javabean' select='root/child'/>
 
XML processor to collapse simple PCDATA elements into attributes. Recursively converts all elements which contain only PCDATA into attributes of the same name in the parent element. An optional select parameter specifies which branches to convert; if the select query is omitted, the processor will convert the entire tree. Elements which contain other element content will not be converted. Given the transform below, the XML "<root><child>CHILD-DATA</child></root>" would become "<root child='CHILD-DATA'/>".
   <xform-style new-style='collapsed' select='root/child'/>
 
XML processor to expand attributes into PCDATA elements. The reverse of the collapse transform. Given the transform below, the XML "<root child='CHILD-DATA'/>" would become "<root><child>CHILD-DATA</child></root>".
   <xform-style new-style='expanded'
     select='root/child'/>
 
User-defined processors. If the above processors aren't enough, or if a custom processor would greatly simplify the transformation process, you can implement a processor of your own and invoke it anywhere in the transform. The processor must derive from CustomNodeProcessor. The <custom> processor XML itself is passed to the custom processor, which can use any attributes or sub-elements inside <custom> to initialize itself.
   <custom class='org.mypackage.MyCustomProcessor' param='value1'>
     <param value='value2'/>
     <param value='value3'/>
   </custom>
 
 </transform>
 

Author:
jsheets

Constructor Summary
Transformer(org.writersforge.bellows.Datum transform)
          Creates a new instance of Transformer, loading a collection of NodeProcessor implementations based on the specification in the supplied Datum tree.
 
Method Summary
 java.lang.String[] getProcessorIds()
          Retrieves an array of unique identifiers for all NodeProcessors in this Transformer.
 java.util.List process(java.lang.String[] ids, java.util.List nodes)
          Runs the List of input nodes through a set of processors, by id.
 java.util.List processAll(java.util.List nodes)
          Runs the List of input nodes through every processor in the Transformer's XML specification, in order.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Transformer

public Transformer(org.writersforge.bellows.Datum transform)
Creates a new instance of Transformer, loading a collection of NodeProcessor implementations based on the specification in the supplied Datum tree.

Parameters:
transform - the XML specification
Method Detail

processAll

public java.util.List processAll(java.util.List nodes)
Runs the List of input nodes through every processor in the Transformer's XML specification, in order.

Parameters:
nodes - input data nodes
Returns:
the processed output data nodes

getProcessorIds

public java.lang.String[] getProcessorIds()
Retrieves an array of unique identifiers for all NodeProcessors in this Transformer. A processor's id can be explicitly specified in the XML specification with the 'id' attribute. If the 'id' attribute does not exist for a processor, the id becomes the fully qualified class of the NodeProcessor followed by a space and the zero-based index of the processor in the XML specification. For example, given this XML specification:
 <transform>
   <replace old='oldtext1' new='newtext1'/>
   <replace id='textreplace' old='oldtext2' new='newtext2'/>
   <replace old='oldtext3' new='newtext3'/>
 <transform>

The returned ids would be:

If any processors share the same explicit id, the second and later duplicates will all be treated as if they had no explicit id. Thus, this XML specification:

 <transform>
   <replace id='textreplace' old='oldtext1' new='newtext1'/>
   <replace id='textreplace' old='oldtext2' new='newtext2'/>
   <replace id='textreplace' old='oldtext3' new='newtext3'/>
 <transform>

would result in the following ids:

Returns:
an array of processor identifiers

process

public java.util.List process(java.lang.String[] ids,
                              java.util.List nodes)
Runs the List of input nodes through a set of processors, by id. The input data will be sent through the processors in the order they appear in the ids array, even if that means the same id is run more than once.

Parameters:
ids - the ids of the transforms to run
nodes - input data nodes
Returns:
the processed output data nodes
Throws:
java.lang.IllegalArgumentException - if an id in ids does not exist in the XML specification