Word2Vec.Builder

java.lang.Object
- org.deeplearning4j.spark.models.embeddings.word2vec.Word2Vec.Builder

Enclosing class:: Word2Vec

public static class Word2Vec.Builder
extends java.lang.Object

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`batchSize`
`protected VectorsConfiguration`	`configuration`
`protected int`	`layerSize`
`protected double`	`learningRate`
`protected double`	`minLearningRate`
`protected int`	`minWordFrequency`
`protected double`	`negative`
`protected int`	`nGrams`
`protected int`	`numEpochs`
`protected int`	`numIterations`
`protected double`	`sampling`
`protected long`	`seed`
`protected java.util.List<java.lang.String>`	`stopWords`
`protected TokenizerFactory`	`tokenizerFactory`
`protected boolean`	`useAdaGrad`
`protected boolean`	`useUnk`
`protected int`	`windowSize`

Constructor Summary

Constructors
Constructor and Description
`Builder()` Creates Builder instance with default parameters set.
`Builder(VectorsConfiguration configuration)` Uses VectorsConfiguration bean to initialize Word2Vec model parameters

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`Word2Vec.Builder`	`batchSize(int batchSize)` Specifies the size of mini-batch, used in single iteration during training
`Word2Vec`	`build()`
`Word2Vec.Builder`	`epochs(int numEpochs)` This method specifies number of epochs done over whole corpus PLEASE NOTE: NOT IMPLEMENTED
`Word2Vec.Builder`	`iterations(int numIterations)` This method specifies number of iterations over batch on each node
`Word2Vec.Builder`	`layerSize(int layerSize)` Specifies output vector's dimensions
`Word2Vec.Builder`	`learningRate(double lr)` This method specifies initial learning rate for model
`Word2Vec.Builder`	`minLearningRate(double mlr)` This method specifies bottom threshold for learning rate decay
`Word2Vec.Builder`	`minWordFrequency(int minWordFrequency)` This method specifies minimum word frequency threshold.
`Word2Vec.Builder`	`negative(int negative)` Specifies negative sampling
`Word2Vec.Builder`	`sampling(double sampling)` Specifies subsamplng value
`Word2Vec.Builder`	`seed(long seed)` Specifies random seed to be used during weights initialization;
`Word2Vec.Builder`	`setNGrams(int nGrams)` Specifies N of n-Grams :)
`Word2Vec.Builder`	`stopWords(java.util.List<java.lang.String> stopWords)` This method defines list of stop-words, that are to be ignored during vocab building and training
`Word2Vec.Builder`	`tokenizerFactory(java.lang.String tokenizer)` Specifies TokenizerFactory class to be used for tokenization
`Word2Vec.Builder`	`tokenizerFactory(TokenizerFactory factory)` Specifies TokenizerFactory to be used for tokenization PLEASE NOTE: You can't use anonymous implementation here
`Word2Vec.Builder`	`tokenPreprocessor(java.lang.String tokenPreprocessor)` Specifies TokenPreProcessor class to be used during tokenization
`Word2Vec.Builder`	`useAdaGrad(boolean reallyUse)` This method specifies, if adaptive gradients should be used during model training
`Word2Vec.Builder`	`useUnknown(boolean reallyUse)` Specifies, if UNK word should be used instead of words that are absent in vocab
`Word2Vec.Builder`	`windowSize(int windowSize)` Specifies window size
`Word2Vec.Builder`	`workers(int workers)` Specify number of workers for training process.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - nGrams
```
protected int nGrams
```
  - numIterations
```
protected int numIterations
```
  - minWordFrequency
```
protected int minWordFrequency
```
  - numEpochs
```
protected int numEpochs
```
  - learningRate
```
protected double learningRate
```
  - minLearningRate
```
protected double minLearningRate
```
  - windowSize
```
protected int windowSize
```
  - negative
```
protected double negative
```
  - sampling
```
protected double sampling
```
  - seed
```
protected long seed
```
  - useAdaGrad
```
protected boolean useAdaGrad
```
  - tokenizerFactory
```
protected TokenizerFactory tokenizerFactory
```
  - configuration
```
protected VectorsConfiguration configuration
```
  - layerSize
```
protected int layerSize
```
  - stopWords
```
protected java.util.List<java.lang.String> stopWords
```
  - batchSize
```
protected int batchSize
```
  - useUnk
```
protected boolean useUnk
```
- Constructor Detail
  - Builder
```
public Builder()
```
    Creates Builder instance with default parameters set.
  - Builder
```
public Builder(VectorsConfiguration configuration)
```
    Uses VectorsConfiguration bean to initialize Word2Vec model parameters
    
    Parameters:
    
    configuration -
- Method Detail
  - windowSize
```
public Word2Vec.Builder windowSize(int windowSize)
```
    Specifies window size
    
    Parameters:
    
    windowSize -
    
    Returns:
  - negative
```
public Word2Vec.Builder negative(int negative)
```
    Specifies negative sampling
    
    Parameters:
    
    negative -
    
    Returns:
  - sampling
```
public Word2Vec.Builder sampling(double sampling)
```
    Specifies subsamplng value
    
    Parameters:
    
    sampling -
    
    Returns:
  - learningRate
```
public Word2Vec.Builder learningRate(double lr)
```
    This method specifies initial learning rate for model
    
    Parameters:
    
    lr -
    
    Returns:
  - minLearningRate
```
public Word2Vec.Builder minLearningRate(double mlr)
```
    This method specifies bottom threshold for learning rate decay
    
    Parameters:
    
    mlr -
    
    Returns:
  - iterations
```
public Word2Vec.Builder iterations(int numIterations)
```
    This method specifies number of iterations over batch on each node
    
    Parameters:
    
    numIterations -
    
    Returns:
  - epochs
```
public Word2Vec.Builder epochs(int numEpochs)
```
    This method specifies number of epochs done over whole corpus PLEASE NOTE: NOT IMPLEMENTED
    
    Parameters:
    
    numEpochs -
    
    Returns:
  - minWordFrequency
```
public Word2Vec.Builder minWordFrequency(int minWordFrequency)
```
    This method specifies minimum word frequency threshold. All words below this threshold will be ignored.
    
    Parameters:
    
    minWordFrequency -
    
    Returns:
  - useAdaGrad
```
public Word2Vec.Builder useAdaGrad(boolean reallyUse)
```
    This method specifies, if adaptive gradients should be used during model training
    
    Parameters:
    
    reallyUse -
    
    Returns:
  - seed
```
public Word2Vec.Builder seed(long seed)
```
    Specifies random seed to be used during weights initialization;
    
    Parameters:
    
    seed -
    
    Returns:
  - tokenizerFactory
```
public Word2Vec.Builder tokenizerFactory(@NonNull
                                         TokenizerFactory factory)
```
    Specifies TokenizerFactory to be used for tokenization PLEASE NOTE: You can't use anonymous implementation here
    
    Parameters:
    
    factory -
    
    Returns:
  - tokenizerFactory
```
public Word2Vec.Builder tokenizerFactory(@NonNull
                                         java.lang.String tokenizer)
```
    Specifies TokenizerFactory class to be used for tokenization
    
    Parameters:
    
    tokenizer - class name for tokenizerFactory
    
    Returns:
  - tokenPreprocessor
```
public Word2Vec.Builder tokenPreprocessor(@NonNull
                                          java.lang.String tokenPreprocessor)
```
    Specifies TokenPreProcessor class to be used during tokenization
    
    Parameters:
    
    tokenPreprocessor - class name for tokenPreProcessor
    
    Returns:
  - workers
```
public Word2Vec.Builder workers(int workers)
```
    Specify number of workers for training process. This value will be used to repartition RDD. PLEASE NOTE: Recommended value is number of vCPU available within your spark cluster.
    
    Parameters:
    
    workers -
    
    Returns:
  - layerSize
```
public Word2Vec.Builder layerSize(int layerSize)
```
    Specifies output vector's dimensions
    
    Parameters:
    
    layerSize -
    
    Returns:
  - setNGrams
```
public Word2Vec.Builder setNGrams(int nGrams)
```
    Specifies N of n-Grams :)
    
    Parameters:
    
    nGrams -
    
    Returns:
  - stopWords
```
public Word2Vec.Builder stopWords(@NonNull
                                  java.util.List<java.lang.String> stopWords)
```
    This method defines list of stop-words, that are to be ignored during vocab building and training
    
    Parameters:
    
    stopWords -
    
    Returns:
  - batchSize
```
public Word2Vec.Builder batchSize(int batchSize)
```
    Specifies the size of mini-batch, used in single iteration during training
    
    Parameters:
    
    batchSize -
    
    Returns:
  - useUnknown
```
public Word2Vec.Builder useUnknown(boolean reallyUse)
```
    Specifies, if UNK word should be used instead of words that are absent in vocab
    
    Parameters:
    
    reallyUse -
    
    Returns:
  - build
```
public Word2Vec build()
```

Class Word2Vec.Builder

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

nGrams

numIterations

minWordFrequency

numEpochs

learningRate

minLearningRate

windowSize

negative

sampling

seed

useAdaGrad

tokenizerFactory

configuration

layerSize

stopWords

batchSize

useUnk

Constructor Detail

Builder

Builder

Method Detail

windowSize

negative

sampling

learningRate

minLearningRate

iterations

epochs

minWordFrequency

useAdaGrad

seed

tokenizerFactory

tokenizerFactory

tokenPreprocessor

workers

layerSize

setNGrams

stopWords

batchSize

useUnknown

build