TfidfVectorizer

java.lang.Object
- org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
- - org.deeplearning4j.bagofwords.vectorizer.TfidfVectorizer

All Implemented Interfaces:

java.io.Serializable, TextVectorizer, Vectorizer
```
public class TfidfVectorizer
extends BaseTextVectorizer
```
See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class TfidfVectorizer.Builder

Nested Classes
Modifier and Type	Class and Description
`static class`	`TfidfVectorizer.Builder`

Field Summary
- Fields inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
  index, isParallel, iterator, labelsSource, minWordFrequency, stopWords, tokenizerFactory, vocabCache

Constructor Summary

Constructors
Constructor and Description

TfidfVectorizer()

Constructors
Constructor and Description
`TfidfVectorizer()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`double`	`tfidfWord(java.lang.String word, long wordCount, long documentLength)`
`org.nd4j.linalg.api.ndarray.INDArray`	`transform(java.util.List<java.lang.String> tokens)` Transforms the matrix
`org.nd4j.linalg.api.ndarray.INDArray`	`transform(java.lang.String text)` Transforms the matrix
`org.nd4j.linalg.dataset.DataSet`	`vectorize()` Vectorizes the input source in to a dataset
`org.nd4j.linalg.dataset.DataSet`	`vectorize(java.io.File input, java.lang.String label)`
`org.nd4j.linalg.dataset.DataSet`	`vectorize(java.io.InputStream is, java.lang.String label)` Text coming from an input stream considered as one document
`org.nd4j.linalg.dataset.DataSet`	`vectorize(java.lang.String text, java.lang.String label)` Vectorizes the passed in text treating it as one document

Methods inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
buildVocab, fit, getLabelsSource, numWordsEncountered

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.deeplearning4j.bagofwords.vectorizer.TextVectorizer
getIndex, getVocabCache

- Constructor Detail
  - TfidfVectorizer
```
public TfidfVectorizer()
```
- Method Detail
  - vectorize
```
public org.nd4j.linalg.dataset.DataSet vectorize(java.io.InputStream is,
                                                 java.lang.String label)
```
    Text coming from an input stream considered as one document
    
    Parameters:
    
    is - the input stream to read from
    
    label - the label to assign
    
    Returns:
    
    a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - vectorize
```
public org.nd4j.linalg.dataset.DataSet vectorize(java.lang.String text,
                                                 java.lang.String label)
```
    Vectorizes the passed in text treating it as one document
    
    Parameters:
    
    text - the text to vectorize
    
    label - the label of the text
    
    Returns:
    
    a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
  - vectorize
```
public org.nd4j.linalg.dataset.DataSet vectorize(java.io.File input,
                                                 java.lang.String label)
```
    Parameters:
    
    input - the text to vectorize
    
    label - the label of the text
    
    Returns:
    
    DataSet with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - transform
```
public org.nd4j.linalg.api.ndarray.INDArray transform(java.lang.String text)
```
    Transforms the matrix
    
    Parameters:
    
    text - text to transform
    
    Returns:
    
    INDArray
  - transform
```
public org.nd4j.linalg.api.ndarray.INDArray transform(java.util.List<java.lang.String> tokens)
```
    Description copied from interface: TextVectorizer
    
    Transforms the matrix
    
    Returns:
  - tfidfWord
```
public double tfidfWord(java.lang.String word,
                        long wordCount,
                        long documentLength)
```
  - vectorize
```
public org.nd4j.linalg.dataset.DataSet vectorize()
```
    Vectorizes the input source in to a dataset
    
    Returns:
    
    Adam Gibson

Class TfidfVectorizer

Nested Class Summary

Field Summary

Fields inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer

Constructor Summary

Method Summary

Methods inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer

Methods inherited from class java.lang.Object

Methods inherited from interface org.deeplearning4j.bagofwords.vectorizer.TextVectorizer

Constructor Detail

TfidfVectorizer

Method Detail

vectorize

vectorize

vectorize

transform

transform

tfidfWord

vectorize