public abstract class TokenizerBase
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
TokenizerBase.Builder
Abstract Builder shared by all tokenizers
|
static class |
TokenizerBase.Mode |
Modifier and Type | Field and Description |
---|---|
protected java.util.EnumMap<ViterbiNode.Type,Dictionary> |
dictionaryMap |
protected TokenFactory |
tokenFactory |
Constructor and Description |
---|
TokenizerBase() |
Modifier and Type | Method and Description |
---|---|
protected void |
configure(TokenizerBase.Builder builder) |
protected <T extends TokenBase> |
createTokenList(java.lang.String text)
Tokenizes the provided text and returns a list of tokens with various feature information
|
void |
debugLattice(java.io.OutputStream outputStream,
java.lang.String text)
Writes the Viterbi lattice for the provided text to an output stream
|
void |
debugTokenize(java.io.OutputStream outputStream,
java.lang.String text)
Tokenizes the provided text and outputs the corresponding Viterbi lattice and the Viterbi path to the provided output stream
|
java.util.List<? extends TokenBase> |
tokenize(java.lang.String text) |
protected TokenFactory tokenFactory
protected java.util.EnumMap<ViterbiNode.Type,Dictionary> dictionaryMap
protected void configure(TokenizerBase.Builder builder)
public java.util.List<? extends TokenBase> tokenize(java.lang.String text)
protected <T extends TokenBase> java.util.List<T> createTokenList(java.lang.String text)
This method is thread safe
T
- token typetext
- text to tokenizepublic void debugTokenize(java.io.OutputStream outputStream, java.lang.String text) throws java.io.IOException
The output is written in DOT format.
This method is not thread safe
outputStream
- output stream to write totext
- text to tokenizejava.io.IOException
- if an error occurs when writing the lattice and pathpublic void debugLattice(java.io.OutputStream outputStream, java.lang.String text) throws java.io.IOException
The output is written in DOT format.
This method is not thread safe
outputStream
- output stream to write totext
- text to create lattice forjava.io.IOException
- if an error occurs when writing the lattice