public class SparkComputationGraph extends SparkListenable
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_EVAL_SCORE_BATCH_SIZE |
trainingMaster
Constructor and Description |
---|
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
ComputationGraph network,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraph network,
TrainingMaster trainingMaster)
Instantiate a ComputationGraph instance with the given context and network.
|
Modifier and Type | Method and Description |
---|---|
double |
calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average)
Calculate the score for all examples in the provided
JavaRDD<DataSet> , either by summing
or averaging over the entire data set. |
double |
calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean average,
int minibatchSize)
Calculate the score for all examples in the provided
JavaRDD<DataSet> , either by summing
or averaging over the entire data set. |
double |
calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average)
Calculate the score for all examples in the provided
JavaRDD<MultiDataSet> , either by summing
or averaging over the entire data set. |
double |
calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean average,
int minibatchSize)
Calculate the score for all examples in the provided
JavaRDD<MultiDataSet> , either by summing
or averaging over the entire data set. |
<K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> |
feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> featuresData,
int batchSize)
Feed-forward the specified data, with the given keys.
|
<K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> |
feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> featuresData,
int batchSize)
Feed-forward the specified data, with the given keys.
|
ComputationGraph |
fit(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fit(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.DataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fit(java.lang.String path)
Fit the SparkComputationGraph network using a directory of serialized DataSet objects
The assumption here is that the directory contains a number of
DataSet objects, each serialized using
DataSet.save(OutputStream) |
ComputationGraph |
fit(java.lang.String path,
int minPartitions)
Deprecated.
Use
fit(String) |
ComputationGraph |
fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fitMultiDataSet(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fitMultiDataSet(java.lang.String path)
Fit the SparkComputationGraph network using a directory of serialized MultiDataSet objects
The assumption here is that the directory contains a number of serialized
MultiDataSet objects |
ComputationGraph |
fitMultiDataSet(java.lang.String path,
int minPartitions)
Deprecated.
|
ComputationGraph |
fitPaths(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
Fit the network using a list of paths for serialized DataSet objects.
|
ComputationGraph |
fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
Fit the network using a list of paths for serialized MultiDataSet objects.
|
ComputationGraph |
getNetwork() |
double |
getScore()
Gets the last (average) minibatch score from calling fit.
|
org.apache.spark.api.java.JavaSparkContext |
getSparkContext() |
SparkTrainingStats |
getSparkTrainingStats() |
TrainingMaster |
getTrainingMaster() |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaPairRDD, boolean) |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaRDD, boolean) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE . |
<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE . |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
void |
setCollectTrainingStats(boolean collectTrainingStats) |
void |
setNetwork(ComputationGraph network) |
void |
setScore(double lastScore) |
setListeners, setListeners, setListeners, setListeners
public static final int DEFAULT_EVAL_SCORE_BATCH_SIZE
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext, ComputationGraph network, TrainingMaster trainingMaster)
sparkContext
- the spark context to usenetwork
- the network to usepublic SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext, ComputationGraph network, TrainingMaster trainingMaster)
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext, ComputationGraphConfiguration conf, TrainingMaster trainingMaster)
public SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext, ComputationGraphConfiguration conf, TrainingMaster trainingMaster)
public org.apache.spark.api.java.JavaSparkContext getSparkContext()
public void setCollectTrainingStats(boolean collectTrainingStats)
public SparkTrainingStats getSparkTrainingStats()
public ComputationGraph getNetwork()
public TrainingMaster getTrainingMaster()
public void setNetwork(ComputationGraph network)
public ComputationGraph fit(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.DataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fit(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fit(java.lang.String path)
DataSet
objects, each serialized using
DataSet.save(OutputStream)
path
- Path to the directory containing the serialized DataSet objcets@Deprecated public ComputationGraph fit(java.lang.String path, int minPartitions)
fit(String)
public ComputationGraph fitPaths(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
paths
- List of pathspublic ComputationGraph fitMultiDataSet(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fitMultiDataSet(java.lang.String path)
MultiDataSet
objectspath
- Path to the directory containing the serialized MultiDataSet objcetspublic ComputationGraph fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
paths
- List of paths@Deprecated public ComputationGraph fitMultiDataSet(java.lang.String path, int minPartitions)
fitMultiDataSet(String)
public double getScore()
public void setScore(double lastScore)
public double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean average)
JavaRDD<DataSet>
, either by summing
or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean)
or one of the similar methods. Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZE
data
- Data to scoreaverage
- Whether to sum the scores, or average thempublic double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean average, int minibatchSize)
JavaRDD<DataSet>
, either by summing
or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean)
or one of the similar methodsdata
- Data to scoreaverage
- Whether to sum the scores, or average themminibatchSize
- The number of examples to use in each minibatch when scoring. If more examples are in a partition than
this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition
in one go)public double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean average)
JavaRDD<MultiDataSet>
, either by summing
or averaging over the entire data set.
Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZE
data
- Data to scoreaverage
- Whether to sum the scores, or average thempublic double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean average, int minibatchSize)
JavaRDD<MultiDataSet>
, either by summing
or averaging over the entire data set.
*data
- Data to scoreaverage
- Whether to sum the scores, or average themminibatchSize
- The number of examples to use in each minibatch when scoring. If more examples are in a partition than
this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition
in one go)public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms)
scoreExamples(JavaRDD, boolean)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms, int batchSize)
scoreExamples(JavaPairRDD, boolean, int)
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms)
scoreExamples(JavaPairRDD, boolean)
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms, int batchSize)
scoreExamples(JavaPairRDD, boolean, int)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE
. Unlike calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean)
or scoreExamples(JavaPairRDD, boolean, int)
which can have
a key for each example.data
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)ComputationGraph.scoreExamples(MultiDataSet, boolean)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)
calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean)
or scoreExamples(JavaPairRDD, boolean, int)
which can have
a key for each example.data
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)batchSize
- Batch size to use when doing scoringComputationGraph.scoreExamples(MultiDataSet, boolean)
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE
. Unlike calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separatelyK
- Key typedata
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double>
containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)
public <K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> featuresData, int batchSize)
K
- Type of data for key - may be anythingfeaturesData
- Features data to feed through the networkbatchSize
- Batch size to use when doing feed forward operationspublic <K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> featuresData, int batchSize)
K
- Type of data for key - may be anythingfeaturesData
- Features data to feed through the networkbatchSize
- Batch size to use when doing feed forward operationspublic <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)
calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separatelyK
- Key typedata
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double>
containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)