public static class ParagraphVectors.Builder extends Word2Vec.Builder
| Modifier and Type | Field and Description |
|---|---|
protected DocumentIterator |
docIter |
protected LabelAwareIterator |
labelAwareIterator |
protected LabelsSource |
labelsSource |
allowParallelTokenization, sentenceIterator, tokenizerFactorybatchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, window, workers| Constructor and Description |
|---|
Builder() |
Builder(VectorsConfiguration configuration) |
| Modifier and Type | Method and Description |
|---|---|
ParagraphVectors.Builder |
allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization.
|
ParagraphVectors.Builder |
batchSize(int batchSize)
This method defines mini-batch size
|
ParagraphVectors |
build()
Build SequenceVectors instance with defined settings/options
|
ParagraphVectors.Builder |
elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
ParagraphVectors.Builder |
elementsLearningAlgorithm(java.lang.String algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
ParagraphVectors.Builder |
enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction
Default value: disabled
|
ParagraphVectors.Builder |
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training
|
ParagraphVectors.Builder |
index(InvertedIndex<VocabWord> index) |
ParagraphVectors.Builder |
iterate(DocumentIterator iterator)
This method used to feed DocumentIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareDocumentIterator iterator)
This method used to feed LabelAwareDocumentIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareSentenceIterator iterator)
This method used to feed LabelAwareSentenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterations(int iterations)
This method defines number of iterations done for each mini-batch during training
|
ParagraphVectors.Builder |
labels(java.util.List<java.lang.String> labels)
Deprecated.
|
ParagraphVectors.Builder |
labelsSource(LabelsSource source)
This method attaches pre-defined labels source to ParagraphVectors
|
ParagraphVectors.Builder |
layerSize(int layerSize)
This method defines number of dimensions for output vectors
|
ParagraphVectors.Builder |
learningRate(double learningRate)
This method defines initial learning rate for model training
|
ParagraphVectors.Builder |
lookupTable(WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used
|
ParagraphVectors.Builder |
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training
|
ParagraphVectors.Builder |
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.
|
ParagraphVectors.Builder |
modelUtils(ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc
|
ParagraphVectors.Builder |
negativeSample(double negative)
This method defines whether negative sampling should be used or not
|
ParagraphVectors.Builder |
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not
|
ParagraphVectors.Builder |
sampling(double sampling)
This method defines whether subsampling should be used or not
|
ParagraphVectors.Builder |
seed(long randomSeed)
This method defines random seed for random numbers generator
|
ParagraphVectors.Builder |
sequenceLearningAlgorithm(SequenceLearningAlgorithm<VocabWord> algorithm)
Sets specific LearningAlgorithm as Sequence Learning Algorithm
|
ParagraphVectors.Builder |
sequenceLearningAlgorithm(java.lang.String algorithm)
Sets specific LearningAlgorithm as Sequence Learning Algorithm
|
ParagraphVectors.Builder |
setVectorsListeners(java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model
|
ParagraphVectors.Builder |
stopWords(java.util.Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training
|
ParagraphVectors.Builder |
stopWords(java.util.List<java.lang.String> stopList)
This method defines stop words that should be ignored during training
|
ParagraphVectors.Builder |
tokenizerFactory(TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training
PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.
|
ParagraphVectors.Builder |
trainElementsRepresentation(boolean trainElements)
This method defines, if words representation should be build together with documents representations.
|
ParagraphVectors.Builder |
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to TRUE, since that's whole point of ParagraphVectors
|
ParagraphVectors.Builder |
trainWordVectors(boolean trainElements)
This method defines, if words representations should be build together with documents representations.
|
ParagraphVectors.Builder |
unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used
|
ParagraphVectors.Builder |
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not
|
ParagraphVectors.Builder |
useExistingWordVectors(WordVectors vec)
This method allows you to use pre-built WordVectors model (Word2Vec or GloVe) for ParagraphVectors.
|
ParagraphVectors.Builder |
useHierarchicSoftmax(boolean reallyUse)
Enable/disable hierarchic softmax
|
ParagraphVectors.Builder |
usePreciseWeightInit(boolean reallyUse)
If set to true, initial weights for elements/sequences will be derived from elements themself.
|
ParagraphVectors.Builder |
useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally
|
ParagraphVectors.Builder |
useVariableWindow(int... windows)
This method has no effect for ParagraphVectors
|
ParagraphVectors.Builder |
vocabCache(VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used
|
ParagraphVectors.Builder |
windowSize(int windowSize)
This method defines context window size
|
ParagraphVectors.Builder |
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training
|
presetTablesprotected LabelAwareIterator labelAwareIterator
protected LabelsSource labelsSource
protected DocumentIterator docIter
public Builder()
public Builder(@NonNull
VectorsConfiguration configuration)
public ParagraphVectors.Builder useExistingWordVectors(@NonNull WordVectors vec)
useExistingWordVectors in class Word2Vec.Buildervec - existing WordVectors modelpublic ParagraphVectors.Builder trainWordVectors(boolean trainElements)
trainElements - public ParagraphVectors.Builder labelsSource(@NonNull LabelsSource source)
source - @Deprecated public ParagraphVectors.Builder labels(@NonNull java.util.List<java.lang.String> labels)
labels - public ParagraphVectors.Builder iterate(@NonNull LabelAwareDocumentIterator iterator)
iterator - public ParagraphVectors.Builder iterate(@NonNull LabelAwareSentenceIterator iterator)
iterator - public ParagraphVectors.Builder iterate(@NonNull LabelAwareIterator iterator)
iterate in class Word2Vec.Builderiterator - public ParagraphVectors.Builder iterate(@NonNull DocumentIterator iterator)
iterate in class Word2Vec.Builderiterator - public ParagraphVectors.Builder iterate(@NonNull SentenceIterator iterator)
iterate in class Word2Vec.Builderiterator - public ParagraphVectors.Builder modelUtils(@NonNull ModelUtils<VocabWord> modelUtils)
modelUtils in class Word2Vec.BuildermodelUtils - model utils to be usedpublic ParagraphVectors.Builder unknownElement(VocabWord element)
unknownElement in class Word2Vec.Builderelement - public ParagraphVectors.Builder allowParallelTokenization(boolean allow)
allowParallelTokenization in class Word2Vec.Builderallow - public ParagraphVectors.Builder useUnknown(boolean reallyUse)
useUnknown in class Word2Vec.BuilderreallyUse - public ParagraphVectors.Builder enableScavenger(boolean reallyEnable)
enableScavenger in class Word2Vec.BuilderreallyEnable - public ParagraphVectors build()
SequenceVectors.Builderbuild in class Word2Vec.Builderpublic ParagraphVectors.Builder tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
tokenizerFactory in class Word2Vec.BuildertokenizerFactory - public ParagraphVectors.Builder index(@NonNull InvertedIndex<VocabWord> index)
index in class Word2Vec.Builderpublic ParagraphVectors.Builder iterate(@NonNull SequenceIterator<VocabWord> iterator)
iterate in class Word2Vec.Builderiterator - public ParagraphVectors.Builder batchSize(int batchSize)
batchSize in class Word2Vec.BuilderbatchSize - public ParagraphVectors.Builder iterations(int iterations)
iterations in class Word2Vec.Builderiterations - public ParagraphVectors.Builder epochs(int numEpochs)
epochs in class Word2Vec.BuildernumEpochs - public ParagraphVectors.Builder layerSize(int layerSize)
layerSize in class Word2Vec.BuilderlayerSize - public ParagraphVectors.Builder setVectorsListeners(@NonNull java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
setVectorsListeners in class Word2Vec.BuildervectorsListeners - public ParagraphVectors.Builder learningRate(double learningRate)
learningRate in class Word2Vec.BuilderlearningRate - public ParagraphVectors.Builder minWordFrequency(int minWordFrequency)
minWordFrequency in class Word2Vec.BuilderminWordFrequency - public ParagraphVectors.Builder minLearningRate(double minLearningRate)
minLearningRate in class Word2Vec.BuilderminLearningRate - public ParagraphVectors.Builder resetModel(boolean reallyReset)
resetModel in class Word2Vec.BuilderreallyReset - public ParagraphVectors.Builder vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
vocabCache in class Word2Vec.BuildervocabCache - public ParagraphVectors.Builder lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
lookupTable in class Word2Vec.BuilderlookupTable - public ParagraphVectors.Builder sampling(double sampling)
sampling in class Word2Vec.Buildersampling - set > 0 to subsampling argument, or 0 to disablepublic ParagraphVectors.Builder useAdaGrad(boolean reallyUse)
useAdaGrad in class Word2Vec.BuilderreallyUse - public ParagraphVectors.Builder negativeSample(double negative)
negativeSample in class Word2Vec.Buildernegative - set > 0 as negative sampling argument, or 0 to disablepublic ParagraphVectors.Builder stopWords(@NonNull java.util.List<java.lang.String> stopList)
stopWords in class Word2Vec.BuilderstopList - public ParagraphVectors.Builder trainElementsRepresentation(boolean trainElements)
trainElementsRepresentation in class Word2Vec.BuildertrainElements - public ParagraphVectors.Builder trainSequencesRepresentation(boolean trainSequences)
trainSequencesRepresentation in class Word2Vec.BuildertrainSequences - public ParagraphVectors.Builder stopWords(@NonNull java.util.Collection<VocabWord> stopList)
stopWords in class Word2Vec.BuilderstopList - public ParagraphVectors.Builder windowSize(int windowSize)
windowSize in class Word2Vec.BuilderwindowSize - public ParagraphVectors.Builder workers(int numWorkers)
workers in class Word2Vec.BuildernumWorkers - public ParagraphVectors.Builder sequenceLearningAlgorithm(SequenceLearningAlgorithm<VocabWord> algorithm)
SequenceVectors.BuildersequenceLearningAlgorithm in class SequenceVectors.Builder<VocabWord>algorithm - SequenceLearningAlgorithm implementationpublic ParagraphVectors.Builder sequenceLearningAlgorithm(java.lang.String algorithm)
SequenceVectors.BuildersequenceLearningAlgorithm in class SequenceVectors.Builder<VocabWord>algorithm - fully qualified class namepublic ParagraphVectors.Builder useHierarchicSoftmax(boolean reallyUse)
SequenceVectors.BuilderuseHierarchicSoftmax in class Word2Vec.Builderpublic ParagraphVectors.Builder useVariableWindow(int... windows)
useVariableWindow in class Word2Vec.Builderwindows - public ParagraphVectors.Builder elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
SequenceVectors.BuilderelementsLearningAlgorithm in class Word2Vec.Builderalgorithm - ElementsLearningAlgorithm implementationpublic ParagraphVectors.Builder elementsLearningAlgorithm(java.lang.String algorithm)
SequenceVectors.BuilderelementsLearningAlgorithm in class Word2Vec.Builderalgorithm - fully qualified class namepublic ParagraphVectors.Builder usePreciseWeightInit(boolean reallyUse)
SequenceVectors.BuilderusePreciseWeightInit in class Word2Vec.Builderpublic ParagraphVectors.Builder seed(long randomSeed)
seed in class Word2Vec.BuilderrandomSeed -