public abstract class AbstractTfidfVectorizer<VECTOR_TYPE> extends TextVectorizer<VECTOR_TYPE>
Vectorizer.RecordCallBackcache, MIN_WORD_FREQUENCY, minWordFrequency, STOP_WORDS, stopWords, TOKENIZER, tokenizerFactory, VOCAB_CACHE| Constructor and Description |
|---|
AbstractTfidfVectorizer() |
| Modifier and Type | Method and Description |
|---|---|
TokenizerFactory |
createTokenizerFactory(Configuration conf)
Create tokenizer factory based on the configuration
|
abstract VECTOR_TYPE |
createVector(java.lang.Object[] args)
Create a vector based on the given arguments
|
void |
doWithTokens(Tokenizer tokenizer)
Increment counts, add to collection,...
|
abstract VECTOR_TYPE |
fitTransform(RecordReader reader)
Fit based on a record reader
|
abstract VECTOR_TYPE |
transform(Record record)
Transform a record in to a vector
|
fit, fit, initialize, toString, wordFrequenciesForRecordclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitfitTransformpublic void doWithTokens(Tokenizer tokenizer)
TextVectorizerdoWithTokens in class TextVectorizer<VECTOR_TYPE>public TokenizerFactory createTokenizerFactory(Configuration conf)
TextVectorizercreateTokenizerFactory in class TextVectorizer<VECTOR_TYPE>conf - the configuration to usepublic abstract VECTOR_TYPE createVector(java.lang.Object[] args)
Vectorizerargs - the arguments to create a vector withpublic abstract VECTOR_TYPE fitTransform(RecordReader reader)
Vectorizerpublic abstract VECTOR_TYPE transform(Record record)
Vectorizerrecord - the record to write