public static class Word2Vec.Builder
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
protected int |
batchSize |
protected VectorsConfiguration |
configuration |
protected int |
layerSize |
protected double |
learningRate |
protected double |
minLearningRate |
protected int |
minWordFrequency |
protected double |
negative |
protected int |
nGrams |
protected int |
numEpochs |
protected int |
numIterations |
protected double |
sampling |
protected long |
seed |
protected java.util.List<java.lang.String> |
stopWords |
protected TokenizerFactory |
tokenizerFactory |
protected boolean |
useAdaGrad |
protected boolean |
useUnk |
protected int |
windowSize |
Constructor and Description |
---|
Builder()
Creates Builder instance with default parameters set.
|
Builder(VectorsConfiguration configuration)
Uses VectorsConfiguration bean to initialize Word2Vec model parameters
|
Modifier and Type | Method and Description |
---|---|
Word2Vec.Builder |
batchSize(int batchSize)
Specifies the size of mini-batch, used in single iteration during training
|
Word2Vec |
build() |
Word2Vec.Builder |
epochs(int numEpochs)
This method specifies number of epochs done over whole corpus
PLEASE NOTE: NOT IMPLEMENTED
|
Word2Vec.Builder |
iterations(int numIterations)
This method specifies number of iterations over batch on each node
|
Word2Vec.Builder |
layerSize(int layerSize)
Specifies output vector's dimensions
|
Word2Vec.Builder |
learningRate(double lr)
This method specifies initial learning rate for model
|
Word2Vec.Builder |
minLearningRate(double mlr)
This method specifies bottom threshold for learning rate decay
|
Word2Vec.Builder |
minWordFrequency(int minWordFrequency)
This method specifies minimum word frequency threshold.
|
Word2Vec.Builder |
negative(int negative)
Specifies negative sampling
|
Word2Vec.Builder |
sampling(double sampling)
Specifies subsamplng value
|
Word2Vec.Builder |
seed(long seed)
Specifies random seed to be used during weights initialization;
|
Word2Vec.Builder |
setNGrams(int nGrams)
Specifies N of n-Grams :)
|
Word2Vec.Builder |
stopWords(java.util.List<java.lang.String> stopWords)
This method defines list of stop-words, that are to be ignored during vocab building and training
|
Word2Vec.Builder |
tokenizerFactory(java.lang.String tokenizer)
Specifies TokenizerFactory class to be used for tokenization
|
Word2Vec.Builder |
tokenizerFactory(TokenizerFactory factory)
Specifies TokenizerFactory to be used for tokenization
PLEASE NOTE: You can't use anonymous implementation here
|
Word2Vec.Builder |
tokenPreprocessor(java.lang.String tokenPreprocessor)
Specifies TokenPreProcessor class to be used during tokenization
|
Word2Vec.Builder |
useAdaGrad(boolean reallyUse)
This method specifies, if adaptive gradients should be used during model training
|
Word2Vec.Builder |
useUnknown(boolean reallyUse)
Specifies, if UNK word should be used instead of words that are absent in vocab
|
Word2Vec.Builder |
windowSize(int windowSize)
Specifies window size
|
Word2Vec.Builder |
workers(int workers)
Specify number of workers for training process.
|
protected int nGrams
protected int numIterations
protected int minWordFrequency
protected int numEpochs
protected double learningRate
protected double minLearningRate
protected int windowSize
protected double negative
protected double sampling
protected long seed
protected boolean useAdaGrad
protected TokenizerFactory tokenizerFactory
protected VectorsConfiguration configuration
protected int layerSize
protected java.util.List<java.lang.String> stopWords
protected int batchSize
protected boolean useUnk
public Builder()
public Builder(VectorsConfiguration configuration)
configuration
- public Word2Vec.Builder windowSize(int windowSize)
windowSize
- public Word2Vec.Builder negative(int negative)
negative
- public Word2Vec.Builder sampling(double sampling)
sampling
- public Word2Vec.Builder learningRate(double lr)
lr
- public Word2Vec.Builder minLearningRate(double mlr)
mlr
- public Word2Vec.Builder iterations(int numIterations)
numIterations
- public Word2Vec.Builder epochs(int numEpochs)
numEpochs
- public Word2Vec.Builder minWordFrequency(int minWordFrequency)
minWordFrequency
- public Word2Vec.Builder useAdaGrad(boolean reallyUse)
reallyUse
- public Word2Vec.Builder seed(long seed)
seed
- public Word2Vec.Builder tokenizerFactory(@NonNull TokenizerFactory factory)
factory
- public Word2Vec.Builder tokenizerFactory(@NonNull java.lang.String tokenizer)
tokenizer
- class name for tokenizerFactorypublic Word2Vec.Builder tokenPreprocessor(@NonNull java.lang.String tokenPreprocessor)
tokenPreprocessor
- class name for tokenPreProcessorpublic Word2Vec.Builder workers(int workers)
workers
- public Word2Vec.Builder layerSize(int layerSize)
layerSize
- public Word2Vec.Builder setNGrams(int nGrams)
nGrams
- public Word2Vec.Builder stopWords(@NonNull java.util.List<java.lang.String> stopWords)
stopWords
- public Word2Vec.Builder batchSize(int batchSize)
batchSize
- public Word2Vec.Builder useUnknown(boolean reallyUse)
reallyUse
- public Word2Vec build()