TextVectorizer

All Superinterfaces:

java.io.Serializable, Vectorizer

All Known Implementing Classes:

BagOfWordsVectorizer, BaseTextVectorizer, TfidfVectorizer
```
public interface TextVectorizer
extends Vectorizer
```
Vectorizes text

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`void`	`fit()` Train the model
`InvertedIndex<VocabWord>`	`getIndex()` Inverted index
`VocabCache<VocabWord>`	`getVocabCache()` The vocab sorted in descending order
`long`	`numWordsEncountered()` Returns the number of words encountered so far
`org.nd4j.linalg.api.ndarray.INDArray`	`transform(java.util.List<java.lang.String> tokens)` Transforms the matrix
`org.nd4j.linalg.api.ndarray.INDArray`	`transform(java.lang.String text)` Transforms the matrix
`org.nd4j.linalg.dataset.DataSet`	`vectorize(java.io.File input, java.lang.String label)`
`org.nd4j.linalg.dataset.DataSet`	`vectorize(java.io.InputStream is, java.lang.String label)` Text coming from an input stream considered as one document
`org.nd4j.linalg.dataset.DataSet`	`vectorize(java.lang.String text, java.lang.String label)` Vectorizes the passed in text treating it as one document

Methods inherited from interface org.deeplearning4j.datasets.vectorizer.Vectorizer
vectorize

- Method Detail
  - getVocabCache
```
VocabCache<VocabWord> getVocabCache()
```
    The vocab sorted in descending order
    
    Returns:
    
    the vocab sorted in descending order
  - vectorize
```
org.nd4j.linalg.dataset.DataSet vectorize(java.io.InputStream is,
                                          java.lang.String label)
```
    Text coming from an input stream considered as one document
    
    Parameters:
    
    is - the input stream to read from
    
    label - the label to assign
    
    Returns:
    
    a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - vectorize
```
org.nd4j.linalg.dataset.DataSet vectorize(java.lang.String text,
                                          java.lang.String label)
```
    Vectorizes the passed in text treating it as one document
    
    Parameters:
    
    text - the text to vectorize
    
    label - the label of the text
    
    Returns:
    
    a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
  - fit
```
void fit()
```
    Train the model
  - vectorize
```
org.nd4j.linalg.dataset.DataSet vectorize(java.io.File input,
                                          java.lang.String label)
```
    Parameters:
    
    input - the text to vectorize
    
    label - the label of the text
    
    Returns:
    
    DataSet with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - transform
```
org.nd4j.linalg.api.ndarray.INDArray transform(java.lang.String text)
```
    Transforms the matrix
    
    Parameters:
    
    text - text to transform
    
    Returns:
    
    INDArray
  - transform
```
org.nd4j.linalg.api.ndarray.INDArray transform(java.util.List<java.lang.String> tokens)
```
    Transforms the matrix
    
    Parameters:
    
    tokens -
    
    Returns:
  - numWordsEncountered
```
long numWordsEncountered()
```
    Returns the number of words encountered so far
    
    Returns:
    
    the number of words encountered so far
  - getIndex
```
InvertedIndex<VocabWord> getIndex()
```
    Inverted index
    
    Returns:
    
    the inverted index for this vectorizer

Interface TextVectorizer

Method Summary

Methods inherited from interface org.deeplearning4j.datasets.vectorizer.Vectorizer

Method Detail

getVocabCache

vectorize

vectorize

fit

vectorize

transform

transform

numWordsEncountered

getIndex