public class SparkComputationGraph extends SparkListenable
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_EVAL_SCORE_BATCH_SIZE |
trainingMaster| Constructor and Description |
|---|
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
ComputationGraph network,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraph network,
TrainingMaster trainingMaster)
Instantiate a ComputationGraph instance with the given context and network.
|
| Modifier and Type | Method and Description |
|---|---|
double |
calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average)
Calculate the score for all examples in the provided
JavaRDD<DataSet>, either by summing
or averaging over the entire data set. |
double |
calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average,
int minibatchSize)
Calculate the score for all examples in the provided
JavaRDD<DataSet>, either by summing
or averaging over the entire data set. |
double |
calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average)
Calculate the score for all examples in the provided
JavaRDD<MultiDataSet>, either by summing
or averaging over the entire data set. |
double |
calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average,
int minibatchSize)
Calculate the score for all examples in the provided
JavaRDD<MultiDataSet>, either by summing
or averaging over the entire data set. |
<K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> |
feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> featuresData,
int batchSize)
Feed-forward the specified data, with the given keys.
|
<K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> |
feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> featuresData,
int batchSize)
Feed-forward the specified data, with the given keys.
|
ComputationGraph |
fit(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fit(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.DataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fit(java.lang.String path)
Fit the SparkComputationGraph network using a directory of serialized DataSet objects
The assumption here is that the directory contains a number of
DataSet objects, each serialized using
DataSet.save(OutputStream) |
ComputationGraph |
fit(java.lang.String path,
int minPartitions)
Deprecated.
Use
fit(String) |
ComputationGraph |
fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fitMultiDataSet(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fitMultiDataSet(java.lang.String path)
Fit the SparkComputationGraph network using a directory of serialized MultiDataSet objects
The assumption here is that the directory contains a number of serialized
MultiDataSet objects |
ComputationGraph |
fitMultiDataSet(java.lang.String path,
int minPartitions)
Deprecated.
|
ComputationGraph |
fitPaths(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
Fit the network using a list of paths for serialized DataSet objects.
|
ComputationGraph |
fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
Fit the network using a list of paths for serialized MultiDataSet objects.
|
ComputationGraph |
getNetwork() |
double |
getScore()
Gets the last (average) minibatch score from calling fit.
|
org.apache.spark.api.java.JavaSparkContext |
getSparkContext() |
SparkTrainingStats |
getSparkTrainingStats() |
TrainingMaster |
getTrainingMaster() |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaPairRDD, boolean) |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaRDD, boolean) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE. |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE. |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
void |
setCollectTrainingStats(boolean collectTrainingStats) |
void |
setNetwork(ComputationGraph network) |
void |
setScore(double lastScore) |
setListeners, setListeners, setListeners, setListenerspublic static final int DEFAULT_EVAL_SCORE_BATCH_SIZE
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraph network,
TrainingMaster trainingMaster)
sparkContext - the spark context to usenetwork - the network to usepublic SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
ComputationGraph network,
TrainingMaster trainingMaster)
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster)
public SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster)
public org.apache.spark.api.java.JavaSparkContext getSparkContext()
public void setCollectTrainingStats(boolean collectTrainingStats)
public SparkTrainingStats getSparkTrainingStats()
public ComputationGraph getNetwork()
public TrainingMaster getTrainingMaster()
public void setNetwork(ComputationGraph network)
public ComputationGraph fit(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.DataSet> rdd)
rdd - Data to train onpublic ComputationGraph fit(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd)
rdd - Data to train onpublic ComputationGraph fit(java.lang.String path)
DataSet objects, each serialized using
DataSet.save(OutputStream)path - Path to the directory containing the serialized DataSet objcets@Deprecated public ComputationGraph fit(java.lang.String path, int minPartitions)
fit(String)public ComputationGraph fitPaths(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
paths - List of pathspublic ComputationGraph fitMultiDataSet(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
rdd - Data to train onpublic ComputationGraph fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
rdd - Data to train onpublic ComputationGraph fitMultiDataSet(java.lang.String path)
MultiDataSet objectspath - Path to the directory containing the serialized MultiDataSet objcetspublic ComputationGraph fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
paths - List of paths@Deprecated public ComputationGraph fitMultiDataSet(java.lang.String path, int minPartitions)
fitMultiDataSet(String)public double getScore()
public void setScore(double lastScore)
public double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average)
JavaRDD<DataSet>, either by summing
or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean)
or one of the similar methods. Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZEdata - Data to scoreaverage - Whether to sum the scores, or average thempublic double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average,
int minibatchSize)
JavaRDD<DataSet>, either by summing
or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean)
or one of the similar methodsdata - Data to scoreaverage - Whether to sum the scores, or average themminibatchSize - The number of examples to use in each minibatch when scoring. If more examples are in a partition than
this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition
in one go)public double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average)
JavaRDD<MultiDataSet>, either by summing
or averaging over the entire data set.
Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZEdata - Data to scoreaverage - Whether to sum the scores, or average thempublic double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average,
int minibatchSize)
JavaRDD<MultiDataSet>, either by summing
or averaging over the entire data set.
*data - Data to scoreaverage - Whether to sum the scores, or average themminibatchSize - The number of examples to use in each minibatch when scoring. If more examples are in a partition than
this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition
in one go)public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
scoreExamples(JavaRDD, boolean)public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
scoreExamples(JavaPairRDD, boolean, int)public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
scoreExamples(JavaPairRDD, boolean)public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
scoreExamples(JavaPairRDD, boolean, int)public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE. Unlike calculateScore(JavaRDD, boolean),
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean) or scoreExamples(JavaPairRDD, boolean, int) which can have
a key for each example.data - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)ComputationGraph.scoreExamples(MultiDataSet, boolean)public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
calculateScore(JavaRDD, boolean),
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean) or scoreExamples(JavaPairRDD, boolean, int) which can have
a key for each example.data - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)batchSize - Batch size to use when doing scoringComputationGraph.scoreExamples(MultiDataSet, boolean)public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE. Unlike calculateScore(JavaRDD, boolean),
this method returns a score for each example separatelyK - Key typedata - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double> containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)public <K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> featuresData,
int batchSize)
K - Type of data for key - may be anythingfeaturesData - Features data to feed through the networkbatchSize - Batch size to use when doing feed forward operationspublic <K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> featuresData,
int batchSize)
K - Type of data for key - may be anythingfeaturesData - Features data to feed through the networkbatchSize - Batch size to use when doing feed forward operationspublic <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
calculateScore(JavaRDD, boolean),
this method returns a score for each example separatelyK - Key typedata - Data to scoreincludeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double> containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)