public interface VocabCache
Modifier and Type | Method and Description |
---|---|
double |
idf(java.lang.String word)
Number of documents word has occurred in
|
void |
incrementCount(java.lang.String word)
Increment a word count by 1
|
void |
incrementCount(java.lang.String word,
double by)
Increment count for a word
|
void |
incrementDocCount(java.lang.String word)
Increment the doc count for a word by 1
|
void |
incrementDocCount(java.lang.String word,
double by)
Increment the document count for a particular word
|
void |
incrementNumDocs(double by)
Increment the number of documents
|
void |
initialize(Configuration conf)
Configuration for initializing
|
int |
minWordFrequency()
The min word frequency
needed to be included in the vocab
(default 5)
|
double |
numDocs()
Number of documents
|
double |
tfidf(java.lang.String word,
double frequency)
Calculate the tfidf of the word given the document frequency
|
Index |
vocabWords()
All of the vocab words (ordered)
note that these are not all the possible tokens
|
java.lang.String |
wordAt(int i)
Returns a word in the vocab at a particular index
|
double |
wordFrequency(java.lang.String word)
Get the word frequency for a word
|
void incrementNumDocs(double by)
by
- double numDocs()
java.lang.String wordAt(int i)
i
- the index to getvoid initialize(Configuration conf)
conf
- the configuration to initialize withdouble wordFrequency(java.lang.String word)
word
- the word to get frequency forint minWordFrequency()
Index vocabWords()
void incrementDocCount(java.lang.String word)
word
- the word to increment the count forvoid incrementDocCount(java.lang.String word, double by)
word
- the word to increment the count forby
- the amount to increment byvoid incrementCount(java.lang.String word)
word
- the word to increment the count forvoid incrementCount(java.lang.String word, double by)
word
- the word to increment the count forby
- the amount to increment bydouble idf(java.lang.String word)
word
- the word to get the idf fordouble tfidf(java.lang.String word, double frequency)
word
- the word to get frequency forfrequency
- the frequency