public interface TextVectorizer extends Vectorizer
Modifier and Type | Method and Description |
---|---|
void |
fit()
Train the model
|
InvertedIndex<VocabWord> |
getIndex()
Inverted index
|
VocabCache<VocabWord> |
getVocabCache()
The vocab sorted in descending order
|
long |
numWordsEncountered()
Returns the number of words encountered so far
|
org.nd4j.linalg.api.ndarray.INDArray |
transform(java.util.List<java.lang.String> tokens)
Transforms the matrix
|
org.nd4j.linalg.api.ndarray.INDArray |
transform(java.lang.String text)
Transforms the matrix
|
org.nd4j.linalg.dataset.DataSet |
vectorize(java.io.File input,
java.lang.String label) |
org.nd4j.linalg.dataset.DataSet |
vectorize(java.io.InputStream is,
java.lang.String label)
Text coming from an input stream considered as one document
|
org.nd4j.linalg.dataset.DataSet |
vectorize(java.lang.String text,
java.lang.String label)
Vectorizes the passed in text treating it as one document
|
vectorize
VocabCache<VocabWord> getVocabCache()
org.nd4j.linalg.dataset.DataSet vectorize(java.io.InputStream is, java.lang.String label)
is
- the input stream to read fromlabel
- the label to assignorg.nd4j.linalg.dataset.DataSet vectorize(java.lang.String text, java.lang.String label)
text
- the text to vectorizelabel
- the label of the textvoid fit()
org.nd4j.linalg.dataset.DataSet vectorize(java.io.File input, java.lang.String label)
input
- the text to vectorizelabel
- the label of the textDataSet
with a applyTransformToDestination of
weights(relative to impl; could be word counts or tfidf scores)org.nd4j.linalg.api.ndarray.INDArray transform(java.lang.String text)
text
- text to transformINDArray
org.nd4j.linalg.api.ndarray.INDArray transform(java.util.List<java.lang.String> tokens)
tokens
- long numWordsEncountered()
InvertedIndex<VocabWord> getIndex()