public static class Word2Vec.Builder extends SequenceVectors.Builder<VocabWord>
Modifier and Type | Field and Description |
---|---|
protected boolean |
allowParallelTokenization |
protected LabelAwareIterator |
labelAwareIterator |
protected SentenceIterator |
sentenceIterator |
protected TokenizerFactory |
tokenizerFactory |
batchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, window, workers
Constructor and Description |
---|
Builder() |
Builder(VectorsConfiguration configuration) |
Modifier and Type | Method and Description |
---|---|
Word2Vec.Builder |
allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization.
|
Word2Vec.Builder |
batchSize(int batchSize)
This method defines mini-batch size
|
Word2Vec |
build()
Build SequenceVectors instance with defined settings/options
|
Word2Vec.Builder |
elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
Word2Vec.Builder |
elementsLearningAlgorithm(java.lang.String algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
Word2Vec.Builder |
enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction
Default value: disabled
|
Word2Vec.Builder |
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training
|
Word2Vec.Builder |
index(InvertedIndex<VocabWord> index)
Deprecated.
|
Word2Vec.Builder |
iterate(DocumentIterator iterator) |
Word2Vec.Builder |
iterate(LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that is usually used
|
Word2Vec.Builder |
iterate(SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors
|
Word2Vec.Builder |
iterate(SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors
|
Word2Vec.Builder |
iterations(int iterations)
This method defines number of iterations done for each mini-batch during training
|
Word2Vec.Builder |
layerSize(int layerSize)
This method defines number of dimensions for output vectors
|
Word2Vec.Builder |
learningRate(double learningRate)
This method defines initial learning rate for model training
|
Word2Vec.Builder |
lookupTable(WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used
|
Word2Vec.Builder |
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training
|
Word2Vec.Builder |
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.
|
Word2Vec.Builder |
modelUtils(ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc
|
Word2Vec.Builder |
negativeSample(double negative)
This method defines whether negative sampling should be used or not
|
Word2Vec.Builder |
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not
|
Word2Vec.Builder |
sampling(double sampling)
This method defines whether subsampling should be used or not
|
Word2Vec.Builder |
seed(long randomSeed)
This method defines random seed for random numbers generator
|
Word2Vec.Builder |
setVectorsListeners(java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model
|
Word2Vec.Builder |
stopWords(java.util.Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training
|
Word2Vec.Builder |
stopWords(java.util.List<java.lang.String> stopList)
This method defines stop words that should be ignored during training
|
Word2Vec.Builder |
tokenizerFactory(TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training
PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.
|
Word2Vec.Builder |
trainElementsRepresentation(boolean trainElements)
This method is hardcoded to TRUE, since that's whole point of Word2Vec
|
Word2Vec.Builder |
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to FALSE, since that's whole point of Word2Vec
|
Word2Vec.Builder |
unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used
|
Word2Vec.Builder |
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not
|
protected Word2Vec.Builder |
useExistingWordVectors(WordVectors vec)
This method has no effect for Word2Vec
|
Word2Vec.Builder |
useHierarchicSoftmax(boolean reallyUse)
Enable/disable hierarchic softmax
|
Word2Vec.Builder |
usePreciseWeightInit(boolean reallyUse)
If set to true, initial weights for elements/sequences will be derived from elements themself.
|
Word2Vec.Builder |
useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally
|
Word2Vec.Builder |
useVariableWindow(int... windows)
This method allows to use variable window size.
|
Word2Vec.Builder |
vocabCache(VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used
|
Word2Vec.Builder |
windowSize(int windowSize)
This method defines context window size
|
Word2Vec.Builder |
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training
|
presetTables, sequenceLearningAlgorithm, sequenceLearningAlgorithm
protected SentenceIterator sentenceIterator
protected LabelAwareIterator labelAwareIterator
protected TokenizerFactory tokenizerFactory
protected boolean allowParallelTokenization
public Builder()
public Builder(@NonNull VectorsConfiguration configuration)
protected Word2Vec.Builder useExistingWordVectors(@NonNull WordVectors vec)
useExistingWordVectors
in class SequenceVectors.Builder<VocabWord>
vec
- existing WordVectors modelpublic Word2Vec.Builder iterate(@NonNull DocumentIterator iterator)
public Word2Vec.Builder iterate(@NonNull SentenceIterator iterator)
iterator
- public Word2Vec.Builder tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
tokenizerFactory
- @Deprecated public Word2Vec.Builder index(@NonNull InvertedIndex<VocabWord> index)
public Word2Vec.Builder iterate(@NonNull SequenceIterator<VocabWord> iterator)
iterate
in class SequenceVectors.Builder<VocabWord>
iterator
- public Word2Vec.Builder iterate(@NonNull LabelAwareIterator iterator)
iterator
- public Word2Vec.Builder batchSize(int batchSize)
batchSize
in class SequenceVectors.Builder<VocabWord>
batchSize
- public Word2Vec.Builder iterations(int iterations)
iterations
in class SequenceVectors.Builder<VocabWord>
iterations
- public Word2Vec.Builder epochs(int numEpochs)
epochs
in class SequenceVectors.Builder<VocabWord>
numEpochs
- public Word2Vec.Builder layerSize(int layerSize)
layerSize
in class SequenceVectors.Builder<VocabWord>
layerSize
- public Word2Vec.Builder learningRate(double learningRate)
learningRate
in class SequenceVectors.Builder<VocabWord>
learningRate
- public Word2Vec.Builder minWordFrequency(int minWordFrequency)
minWordFrequency
in class SequenceVectors.Builder<VocabWord>
minWordFrequency
- public Word2Vec.Builder minLearningRate(double minLearningRate)
minLearningRate
in class SequenceVectors.Builder<VocabWord>
minLearningRate
- public Word2Vec.Builder resetModel(boolean reallyReset)
resetModel
in class SequenceVectors.Builder<VocabWord>
reallyReset
- public Word2Vec.Builder vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
vocabCache
in class SequenceVectors.Builder<VocabWord>
vocabCache
- public Word2Vec.Builder lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
lookupTable
in class SequenceVectors.Builder<VocabWord>
lookupTable
- public Word2Vec.Builder sampling(double sampling)
sampling
in class SequenceVectors.Builder<VocabWord>
sampling
- set > 0 to subsampling argument, or 0 to disablepublic Word2Vec.Builder useAdaGrad(boolean reallyUse)
useAdaGrad
in class SequenceVectors.Builder<VocabWord>
reallyUse
- public Word2Vec.Builder negativeSample(double negative)
negativeSample
in class SequenceVectors.Builder<VocabWord>
negative
- set > 0 as negative sampling argument, or 0 to disablepublic Word2Vec.Builder stopWords(@NonNull java.util.List<java.lang.String> stopList)
stopWords
in class SequenceVectors.Builder<VocabWord>
stopList
- public Word2Vec.Builder trainElementsRepresentation(boolean trainElements)
trainElementsRepresentation
in class SequenceVectors.Builder<VocabWord>
trainElements
- public Word2Vec.Builder trainSequencesRepresentation(boolean trainSequences)
trainSequencesRepresentation
in class SequenceVectors.Builder<VocabWord>
trainSequences
- public Word2Vec.Builder stopWords(@NonNull java.util.Collection<VocabWord> stopList)
stopWords
in class SequenceVectors.Builder<VocabWord>
stopList
- public Word2Vec.Builder windowSize(int windowSize)
windowSize
in class SequenceVectors.Builder<VocabWord>
windowSize
- public Word2Vec.Builder seed(long randomSeed)
seed
in class SequenceVectors.Builder<VocabWord>
randomSeed
- public Word2Vec.Builder workers(int numWorkers)
workers
in class SequenceVectors.Builder<VocabWord>
numWorkers
- public Word2Vec.Builder modelUtils(@NonNull ModelUtils<VocabWord> modelUtils)
modelUtils
in class SequenceVectors.Builder<VocabWord>
modelUtils
- model utils to be usedpublic Word2Vec.Builder useVariableWindow(int... windows)
useVariableWindow
in class SequenceVectors.Builder<VocabWord>
windows
- public Word2Vec.Builder unknownElement(VocabWord element)
unknownElement
in class SequenceVectors.Builder<VocabWord>
element
- public Word2Vec.Builder useUnknown(boolean reallyUse)
useUnknown
in class SequenceVectors.Builder<VocabWord>
reallyUse
- public Word2Vec.Builder setVectorsListeners(@NonNull java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
setVectorsListeners
in class SequenceVectors.Builder<VocabWord>
vectorsListeners
- public Word2Vec.Builder elementsLearningAlgorithm(@NonNull java.lang.String algorithm)
SequenceVectors.Builder
elementsLearningAlgorithm
in class SequenceVectors.Builder<VocabWord>
algorithm
- fully qualified class namepublic Word2Vec.Builder elementsLearningAlgorithm(@NonNull ElementsLearningAlgorithm<VocabWord> algorithm)
SequenceVectors.Builder
elementsLearningAlgorithm
in class SequenceVectors.Builder<VocabWord>
algorithm
- ElementsLearningAlgorithm implementationpublic Word2Vec.Builder allowParallelTokenization(boolean allow)
allow
- public Word2Vec.Builder enableScavenger(boolean reallyEnable)
enableScavenger
in class SequenceVectors.Builder<VocabWord>
reallyEnable
- public Word2Vec.Builder useHierarchicSoftmax(boolean reallyUse)
SequenceVectors.Builder
useHierarchicSoftmax
in class SequenceVectors.Builder<VocabWord>
public Word2Vec.Builder usePreciseWeightInit(boolean reallyUse)
SequenceVectors.Builder
usePreciseWeightInit
in class SequenceVectors.Builder<VocabWord>
public Word2Vec build()
SequenceVectors.Builder
build
in class SequenceVectors.Builder<VocabWord>