public static class Word2Vec.Builder extends SequenceVectors.Builder<VocabWord>
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
allowParallelTokenization |
protected LabelAwareIterator |
labelAwareIterator |
protected SentenceIterator |
sentenceIterator |
protected TokenizerFactory |
tokenizerFactory |
batchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, window, workers| Constructor and Description |
|---|
Builder() |
Builder(VectorsConfiguration configuration) |
| Modifier and Type | Method and Description |
|---|---|
Word2Vec.Builder |
allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization.
|
Word2Vec.Builder |
batchSize(int batchSize)
This method defines mini-batch size
|
Word2Vec |
build()
Build SequenceVectors instance with defined settings/options
|
Word2Vec.Builder |
elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
Word2Vec.Builder |
elementsLearningAlgorithm(java.lang.String algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
Word2Vec.Builder |
enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction
Default value: disabled
|
Word2Vec.Builder |
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training
|
Word2Vec.Builder |
index(InvertedIndex<VocabWord> index)
Deprecated.
|
Word2Vec.Builder |
iterate(DocumentIterator iterator) |
Word2Vec.Builder |
iterate(LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that is usually used
|
Word2Vec.Builder |
iterate(SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors
|
Word2Vec.Builder |
iterate(SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors
|
Word2Vec.Builder |
iterations(int iterations)
This method defines number of iterations done for each mini-batch during training
|
Word2Vec.Builder |
layerSize(int layerSize)
This method defines number of dimensions for output vectors
|
Word2Vec.Builder |
learningRate(double learningRate)
This method defines initial learning rate for model training
|
Word2Vec.Builder |
lookupTable(WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used
|
Word2Vec.Builder |
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training
|
Word2Vec.Builder |
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.
|
Word2Vec.Builder |
modelUtils(ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc
|
Word2Vec.Builder |
negativeSample(double negative)
This method defines whether negative sampling should be used or not
|
Word2Vec.Builder |
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not
|
Word2Vec.Builder |
sampling(double sampling)
This method defines whether subsampling should be used or not
|
Word2Vec.Builder |
seed(long randomSeed)
This method defines random seed for random numbers generator
|
Word2Vec.Builder |
setVectorsListeners(java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model
|
Word2Vec.Builder |
stopWords(java.util.Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training
|
Word2Vec.Builder |
stopWords(java.util.List<java.lang.String> stopList)
This method defines stop words that should be ignored during training
|
Word2Vec.Builder |
tokenizerFactory(TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training
PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.
|
Word2Vec.Builder |
trainElementsRepresentation(boolean trainElements)
This method is hardcoded to TRUE, since that's whole point of Word2Vec
|
Word2Vec.Builder |
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to FALSE, since that's whole point of Word2Vec
|
Word2Vec.Builder |
unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used
|
Word2Vec.Builder |
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not
|
protected Word2Vec.Builder |
useExistingWordVectors(WordVectors vec)
This method has no effect for Word2Vec
|
Word2Vec.Builder |
useHierarchicSoftmax(boolean reallyUse)
Enable/disable hierarchic softmax
|
Word2Vec.Builder |
usePreciseWeightInit(boolean reallyUse)
If set to true, initial weights for elements/sequences will be derived from elements themself.
|
Word2Vec.Builder |
useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally
|
Word2Vec.Builder |
useVariableWindow(int... windows)
This method allows to use variable window size.
|
Word2Vec.Builder |
vocabCache(VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used
|
Word2Vec.Builder |
windowSize(int windowSize)
This method defines context window size
|
Word2Vec.Builder |
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training
|
presetTables, sequenceLearningAlgorithm, sequenceLearningAlgorithmprotected SentenceIterator sentenceIterator
protected LabelAwareIterator labelAwareIterator
protected TokenizerFactory tokenizerFactory
protected boolean allowParallelTokenization
public Builder()
public Builder(@NonNull
VectorsConfiguration configuration)
protected Word2Vec.Builder useExistingWordVectors(@NonNull WordVectors vec)
useExistingWordVectors in class SequenceVectors.Builder<VocabWord>vec - existing WordVectors modelpublic Word2Vec.Builder iterate(@NonNull DocumentIterator iterator)
public Word2Vec.Builder iterate(@NonNull SentenceIterator iterator)
iterator - public Word2Vec.Builder tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
tokenizerFactory - @Deprecated public Word2Vec.Builder index(@NonNull InvertedIndex<VocabWord> index)
public Word2Vec.Builder iterate(@NonNull SequenceIterator<VocabWord> iterator)
iterate in class SequenceVectors.Builder<VocabWord>iterator - public Word2Vec.Builder iterate(@NonNull LabelAwareIterator iterator)
iterator - public Word2Vec.Builder batchSize(int batchSize)
batchSize in class SequenceVectors.Builder<VocabWord>batchSize - public Word2Vec.Builder iterations(int iterations)
iterations in class SequenceVectors.Builder<VocabWord>iterations - public Word2Vec.Builder epochs(int numEpochs)
epochs in class SequenceVectors.Builder<VocabWord>numEpochs - public Word2Vec.Builder layerSize(int layerSize)
layerSize in class SequenceVectors.Builder<VocabWord>layerSize - public Word2Vec.Builder learningRate(double learningRate)
learningRate in class SequenceVectors.Builder<VocabWord>learningRate - public Word2Vec.Builder minWordFrequency(int minWordFrequency)
minWordFrequency in class SequenceVectors.Builder<VocabWord>minWordFrequency - public Word2Vec.Builder minLearningRate(double minLearningRate)
minLearningRate in class SequenceVectors.Builder<VocabWord>minLearningRate - public Word2Vec.Builder resetModel(boolean reallyReset)
resetModel in class SequenceVectors.Builder<VocabWord>reallyReset - public Word2Vec.Builder vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
vocabCache in class SequenceVectors.Builder<VocabWord>vocabCache - public Word2Vec.Builder lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
lookupTable in class SequenceVectors.Builder<VocabWord>lookupTable - public Word2Vec.Builder sampling(double sampling)
sampling in class SequenceVectors.Builder<VocabWord>sampling - set > 0 to subsampling argument, or 0 to disablepublic Word2Vec.Builder useAdaGrad(boolean reallyUse)
useAdaGrad in class SequenceVectors.Builder<VocabWord>reallyUse - public Word2Vec.Builder negativeSample(double negative)
negativeSample in class SequenceVectors.Builder<VocabWord>negative - set > 0 as negative sampling argument, or 0 to disablepublic Word2Vec.Builder stopWords(@NonNull java.util.List<java.lang.String> stopList)
stopWords in class SequenceVectors.Builder<VocabWord>stopList - public Word2Vec.Builder trainElementsRepresentation(boolean trainElements)
trainElementsRepresentation in class SequenceVectors.Builder<VocabWord>trainElements - public Word2Vec.Builder trainSequencesRepresentation(boolean trainSequences)
trainSequencesRepresentation in class SequenceVectors.Builder<VocabWord>trainSequences - public Word2Vec.Builder stopWords(@NonNull java.util.Collection<VocabWord> stopList)
stopWords in class SequenceVectors.Builder<VocabWord>stopList - public Word2Vec.Builder windowSize(int windowSize)
windowSize in class SequenceVectors.Builder<VocabWord>windowSize - public Word2Vec.Builder seed(long randomSeed)
seed in class SequenceVectors.Builder<VocabWord>randomSeed - public Word2Vec.Builder workers(int numWorkers)
workers in class SequenceVectors.Builder<VocabWord>numWorkers - public Word2Vec.Builder modelUtils(@NonNull ModelUtils<VocabWord> modelUtils)
modelUtils in class SequenceVectors.Builder<VocabWord>modelUtils - model utils to be usedpublic Word2Vec.Builder useVariableWindow(int... windows)
useVariableWindow in class SequenceVectors.Builder<VocabWord>windows - public Word2Vec.Builder unknownElement(VocabWord element)
unknownElement in class SequenceVectors.Builder<VocabWord>element - public Word2Vec.Builder useUnknown(boolean reallyUse)
useUnknown in class SequenceVectors.Builder<VocabWord>reallyUse - public Word2Vec.Builder setVectorsListeners(@NonNull java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
setVectorsListeners in class SequenceVectors.Builder<VocabWord>vectorsListeners - public Word2Vec.Builder elementsLearningAlgorithm(@NonNull java.lang.String algorithm)
SequenceVectors.BuilderelementsLearningAlgorithm in class SequenceVectors.Builder<VocabWord>algorithm - fully qualified class namepublic Word2Vec.Builder elementsLearningAlgorithm(@NonNull ElementsLearningAlgorithm<VocabWord> algorithm)
SequenceVectors.BuilderelementsLearningAlgorithm in class SequenceVectors.Builder<VocabWord>algorithm - ElementsLearningAlgorithm implementationpublic Word2Vec.Builder allowParallelTokenization(boolean allow)
allow - public Word2Vec.Builder enableScavenger(boolean reallyEnable)
enableScavenger in class SequenceVectors.Builder<VocabWord>reallyEnable - public Word2Vec.Builder useHierarchicSoftmax(boolean reallyUse)
SequenceVectors.BuilderuseHierarchicSoftmax in class SequenceVectors.Builder<VocabWord>public Word2Vec.Builder usePreciseWeightInit(boolean reallyUse)
SequenceVectors.BuilderusePreciseWeightInit in class SequenceVectors.Builder<VocabWord>public Word2Vec build()
SequenceVectors.Builderbuild in class SequenceVectors.Builder<VocabWord>