public static class ParagraphVectors.Builder extends Word2Vec.Builder
Modifier and Type | Field and Description |
---|---|
protected DocumentIterator |
docIter |
protected LabelAwareIterator |
labelAwareIterator |
protected LabelsSource |
labelsSource |
allowParallelTokenization, sentenceIterator, tokenizerFactory
batchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, window, workers
Constructor and Description |
---|
Builder() |
Builder(VectorsConfiguration configuration) |
Modifier and Type | Method and Description |
---|---|
ParagraphVectors.Builder |
allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization.
|
ParagraphVectors.Builder |
batchSize(int batchSize)
This method defines mini-batch size
|
ParagraphVectors |
build()
Build SequenceVectors instance with defined settings/options
|
ParagraphVectors.Builder |
elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
ParagraphVectors.Builder |
elementsLearningAlgorithm(java.lang.String algorithm)
* Sets specific LearningAlgorithm as Elements Learning Algorithm
|
ParagraphVectors.Builder |
enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction
Default value: disabled
|
ParagraphVectors.Builder |
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training
|
ParagraphVectors.Builder |
index(InvertedIndex<VocabWord> index) |
ParagraphVectors.Builder |
iterate(DocumentIterator iterator)
This method used to feed DocumentIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareDocumentIterator iterator)
This method used to feed LabelAwareDocumentIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareSentenceIterator iterator)
This method used to feed LabelAwareSentenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterations(int iterations)
This method defines number of iterations done for each mini-batch during training
|
ParagraphVectors.Builder |
labels(java.util.List<java.lang.String> labels)
Deprecated.
|
ParagraphVectors.Builder |
labelsSource(LabelsSource source)
This method attaches pre-defined labels source to ParagraphVectors
|
ParagraphVectors.Builder |
layerSize(int layerSize)
This method defines number of dimensions for output vectors
|
ParagraphVectors.Builder |
learningRate(double learningRate)
This method defines initial learning rate for model training
|
ParagraphVectors.Builder |
lookupTable(WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used
|
ParagraphVectors.Builder |
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training
|
ParagraphVectors.Builder |
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.
|
ParagraphVectors.Builder |
modelUtils(ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc
|
ParagraphVectors.Builder |
negativeSample(double negative)
This method defines whether negative sampling should be used or not
|
ParagraphVectors.Builder |
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not
|
ParagraphVectors.Builder |
sampling(double sampling)
This method defines whether subsampling should be used or not
|
ParagraphVectors.Builder |
seed(long randomSeed)
This method defines random seed for random numbers generator
|
ParagraphVectors.Builder |
sequenceLearningAlgorithm(SequenceLearningAlgorithm<VocabWord> algorithm)
Sets specific LearningAlgorithm as Sequence Learning Algorithm
|
ParagraphVectors.Builder |
sequenceLearningAlgorithm(java.lang.String algorithm)
Sets specific LearningAlgorithm as Sequence Learning Algorithm
|
ParagraphVectors.Builder |
setVectorsListeners(java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model
|
ParagraphVectors.Builder |
stopWords(java.util.Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training
|
ParagraphVectors.Builder |
stopWords(java.util.List<java.lang.String> stopList)
This method defines stop words that should be ignored during training
|
ParagraphVectors.Builder |
tokenizerFactory(TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training
PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.
|
ParagraphVectors.Builder |
trainElementsRepresentation(boolean trainElements)
This method defines, if words representation should be build together with documents representations.
|
ParagraphVectors.Builder |
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to TRUE, since that's whole point of ParagraphVectors
|
ParagraphVectors.Builder |
trainWordVectors(boolean trainElements)
This method defines, if words representations should be build together with documents representations.
|
ParagraphVectors.Builder |
unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used
|
ParagraphVectors.Builder |
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not
|
ParagraphVectors.Builder |
useExistingWordVectors(WordVectors vec)
This method allows you to use pre-built WordVectors model (Word2Vec or GloVe) for ParagraphVectors.
|
ParagraphVectors.Builder |
useHierarchicSoftmax(boolean reallyUse)
Enable/disable hierarchic softmax
|
ParagraphVectors.Builder |
usePreciseWeightInit(boolean reallyUse)
If set to true, initial weights for elements/sequences will be derived from elements themself.
|
ParagraphVectors.Builder |
useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally
|
ParagraphVectors.Builder |
useVariableWindow(int... windows)
This method has no effect for ParagraphVectors
|
ParagraphVectors.Builder |
vocabCache(VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used
|
ParagraphVectors.Builder |
windowSize(int windowSize)
This method defines context window size
|
ParagraphVectors.Builder |
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training
|
presetTables
protected LabelAwareIterator labelAwareIterator
protected LabelsSource labelsSource
protected DocumentIterator docIter
public Builder()
public Builder(@NonNull VectorsConfiguration configuration)
public ParagraphVectors.Builder useExistingWordVectors(@NonNull WordVectors vec)
useExistingWordVectors
in class Word2Vec.Builder
vec
- existing WordVectors modelpublic ParagraphVectors.Builder trainWordVectors(boolean trainElements)
trainElements
- public ParagraphVectors.Builder labelsSource(@NonNull LabelsSource source)
source
- @Deprecated public ParagraphVectors.Builder labels(@NonNull java.util.List<java.lang.String> labels)
labels
- public ParagraphVectors.Builder iterate(@NonNull LabelAwareDocumentIterator iterator)
iterator
- public ParagraphVectors.Builder iterate(@NonNull LabelAwareSentenceIterator iterator)
iterator
- public ParagraphVectors.Builder iterate(@NonNull LabelAwareIterator iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors.Builder iterate(@NonNull DocumentIterator iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors.Builder iterate(@NonNull SentenceIterator iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors.Builder modelUtils(@NonNull ModelUtils<VocabWord> modelUtils)
modelUtils
in class Word2Vec.Builder
modelUtils
- model utils to be usedpublic ParagraphVectors.Builder unknownElement(VocabWord element)
unknownElement
in class Word2Vec.Builder
element
- public ParagraphVectors.Builder allowParallelTokenization(boolean allow)
allowParallelTokenization
in class Word2Vec.Builder
allow
- public ParagraphVectors.Builder useUnknown(boolean reallyUse)
useUnknown
in class Word2Vec.Builder
reallyUse
- public ParagraphVectors.Builder enableScavenger(boolean reallyEnable)
enableScavenger
in class Word2Vec.Builder
reallyEnable
- public ParagraphVectors build()
SequenceVectors.Builder
build
in class Word2Vec.Builder
public ParagraphVectors.Builder tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
tokenizerFactory
in class Word2Vec.Builder
tokenizerFactory
- public ParagraphVectors.Builder index(@NonNull InvertedIndex<VocabWord> index)
index
in class Word2Vec.Builder
public ParagraphVectors.Builder iterate(@NonNull SequenceIterator<VocabWord> iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors.Builder batchSize(int batchSize)
batchSize
in class Word2Vec.Builder
batchSize
- public ParagraphVectors.Builder iterations(int iterations)
iterations
in class Word2Vec.Builder
iterations
- public ParagraphVectors.Builder epochs(int numEpochs)
epochs
in class Word2Vec.Builder
numEpochs
- public ParagraphVectors.Builder layerSize(int layerSize)
layerSize
in class Word2Vec.Builder
layerSize
- public ParagraphVectors.Builder setVectorsListeners(@NonNull java.util.Collection<VectorsListener<VocabWord>> vectorsListeners)
setVectorsListeners
in class Word2Vec.Builder
vectorsListeners
- public ParagraphVectors.Builder learningRate(double learningRate)
learningRate
in class Word2Vec.Builder
learningRate
- public ParagraphVectors.Builder minWordFrequency(int minWordFrequency)
minWordFrequency
in class Word2Vec.Builder
minWordFrequency
- public ParagraphVectors.Builder minLearningRate(double minLearningRate)
minLearningRate
in class Word2Vec.Builder
minLearningRate
- public ParagraphVectors.Builder resetModel(boolean reallyReset)
resetModel
in class Word2Vec.Builder
reallyReset
- public ParagraphVectors.Builder vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
vocabCache
in class Word2Vec.Builder
vocabCache
- public ParagraphVectors.Builder lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
lookupTable
in class Word2Vec.Builder
lookupTable
- public ParagraphVectors.Builder sampling(double sampling)
sampling
in class Word2Vec.Builder
sampling
- set > 0 to subsampling argument, or 0 to disablepublic ParagraphVectors.Builder useAdaGrad(boolean reallyUse)
useAdaGrad
in class Word2Vec.Builder
reallyUse
- public ParagraphVectors.Builder negativeSample(double negative)
negativeSample
in class Word2Vec.Builder
negative
- set > 0 as negative sampling argument, or 0 to disablepublic ParagraphVectors.Builder stopWords(@NonNull java.util.List<java.lang.String> stopList)
stopWords
in class Word2Vec.Builder
stopList
- public ParagraphVectors.Builder trainElementsRepresentation(boolean trainElements)
trainElementsRepresentation
in class Word2Vec.Builder
trainElements
- public ParagraphVectors.Builder trainSequencesRepresentation(boolean trainSequences)
trainSequencesRepresentation
in class Word2Vec.Builder
trainSequences
- public ParagraphVectors.Builder stopWords(@NonNull java.util.Collection<VocabWord> stopList)
stopWords
in class Word2Vec.Builder
stopList
- public ParagraphVectors.Builder windowSize(int windowSize)
windowSize
in class Word2Vec.Builder
windowSize
- public ParagraphVectors.Builder workers(int numWorkers)
workers
in class Word2Vec.Builder
numWorkers
- public ParagraphVectors.Builder sequenceLearningAlgorithm(SequenceLearningAlgorithm<VocabWord> algorithm)
SequenceVectors.Builder
sequenceLearningAlgorithm
in class SequenceVectors.Builder<VocabWord>
algorithm
- SequenceLearningAlgorithm implementationpublic ParagraphVectors.Builder sequenceLearningAlgorithm(java.lang.String algorithm)
SequenceVectors.Builder
sequenceLearningAlgorithm
in class SequenceVectors.Builder<VocabWord>
algorithm
- fully qualified class namepublic ParagraphVectors.Builder useHierarchicSoftmax(boolean reallyUse)
SequenceVectors.Builder
useHierarchicSoftmax
in class Word2Vec.Builder
public ParagraphVectors.Builder useVariableWindow(int... windows)
useVariableWindow
in class Word2Vec.Builder
windows
- public ParagraphVectors.Builder elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
SequenceVectors.Builder
elementsLearningAlgorithm
in class Word2Vec.Builder
algorithm
- ElementsLearningAlgorithm implementationpublic ParagraphVectors.Builder elementsLearningAlgorithm(java.lang.String algorithm)
SequenceVectors.Builder
elementsLearningAlgorithm
in class Word2Vec.Builder
algorithm
- fully qualified class namepublic ParagraphVectors.Builder usePreciseWeightInit(boolean reallyUse)
SequenceVectors.Builder
usePreciseWeightInit
in class Word2Vec.Builder
public ParagraphVectors.Builder seed(long randomSeed)
seed
in class Word2Vec.Builder
randomSeed
-