org.writersforge.catalan.text.extractors
Class RegexpSplitter

java.lang.Object
  extended byorg.writersforge.catalan.text.extractors.RegexpSplitter
All Implemented Interfaces:
ITextExtractor

public class RegexpSplitter
extends java.lang.Object
implements ITextExtractor

Text splitter which identifies delimiters according to a regular expression. The regular expression can span multiple lines of the text input.

Author:
jsheets

Constructor Summary
RegexpSplitter(java.lang.String regexp, boolean keepDelimiters)
          Creates a new instance of MultilineSplitter.
 
Method Summary
 java.lang.String[] extractText(java.lang.String text)
          Extracts fragments of text from the input text document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegexpSplitter

public RegexpSplitter(java.lang.String regexp,
                      boolean keepDelimiters)
Creates a new instance of MultilineSplitter.

Parameters:
regexp - regular expression
keepDelimiters - true to pass splitting lines to output as separate nodes
Method Detail

extractText

public java.lang.String[] extractText(java.lang.String text)
Extracts fragments of text from the input text document.

This implementation finds delimiters that match the regular expression.

Specified by:
extractText in interface ITextExtractor
Parameters:
text - input text document
Returns:
extracted text fragments