public class SparkParagraphVectors extends SparkSequenceVectors<VocabWord>
SparkSequenceVectors.Builder<T extends SequenceElement>
SequenceVectors.AsyncSequencer
configurationBroadcast, ela, elementsFreqAccum, elementsFreqAccumExtra, exporter, isAutoDiscoveryMode, isEnvironmentReady, paramServerConfiguration, shallowVocabCache, shallowVocabCacheBroadcast, sla, storageLevel, vocabCacheBroadcast
configuration, configured, elementsLearningAlgorithm, enableScavenger, eventListeners, existingModel, iterator, log, scoreElements, scoreSequences, sequenceLearningAlgorithm, unknownElement
batchSize, DEFAULT_UNK, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, numIterations, resetModel, sampling, seed, stopWords, trainElementsVectors, trainSequenceVectors, useAdeGrad, useUnknown, variableWindows, vocab, window, workers
Modifier | Constructor and Description |
---|---|
protected |
SparkParagraphVectors() |
Modifier and Type | Method and Description |
---|---|
void |
fitLabelledDocuments(org.apache.spark.api.java.JavaRDD<LabelledDocument> documentsRdd)
This method builds ParagraphVectors model, expecting JavaRDD
|
void |
fitMultipleFiles(org.apache.spark.api.java.JavaPairRDD<java.lang.String,java.lang.String> documentsRdd)
This method builds ParagraphVectors model, expecting JavaPairRDD with key as label, and value as document-in-a-string.
|
protected VocabCache<ShallowSequenceElement> |
getShallowVocabCache() |
protected void |
validateConfiguration() |
broadcastEnvironment, buildShallowVocabCache, fit, fitLists, fitSequences, getCounter
buildVocab, getElementsScore, getSequencesScore, getUNK, getWordVectorMatrix, initLearners, trainSequence
accuracy, getLayerSize, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, lookupTable, setLookupTable, setModelUtils, setVocab, similarity, similarWordsInVocabTo, update, update, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
accuracy, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, lookupTable, setModelUtils, setUNK, similarity, similarWordsInVocabTo, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum
protected VocabCache<ShallowSequenceElement> getShallowVocabCache()
getShallowVocabCache
in class SparkSequenceVectors<VocabWord>
protected void validateConfiguration()
validateConfiguration
in class SparkSequenceVectors<VocabWord>
public void fitMultipleFiles(org.apache.spark.api.java.JavaPairRDD<java.lang.String,java.lang.String> documentsRdd)
documentsRdd
- public void fitLabelledDocuments(org.apache.spark.api.java.JavaRDD<LabelledDocument> documentsRdd)
documentsRdd
-