SparkComputationGraph

java.lang.Object
- org.deeplearning4j.spark.impl.SparkListenable
- - org.deeplearning4j.spark.impl.graph.SparkComputationGraph

public class SparkComputationGraph
extends SparkListenable

Main class for training ComputationGraph networks using Spark

Field Summary

Fields
Modifier and Type Field and Description

static int DEFAULT_EVAL_SCORE_BATCH_SIZE
- Fields inherited from class org.deeplearning4j.spark.impl.SparkListenable
  trainingMaster

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_EVAL_SCORE_BATCH_SIZE`

Constructor Summary

Constructors
Constructor and Description
`SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext, ComputationGraphConfiguration conf, TrainingMaster trainingMaster)`
`SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext, ComputationGraph network, TrainingMaster trainingMaster)`
`SparkComputationGraph(org.apache.spark.SparkContext sparkContext, ComputationGraphConfiguration conf, TrainingMaster trainingMaster)`
`SparkComputationGraph(org.apache.spark.SparkContext sparkContext, ComputationGraph network, TrainingMaster trainingMaster)` Instantiate a ComputationGraph instance with the given context and network.

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`double`	`calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean average)` Calculate the score for all examples in the provided `JavaRDD<DataSet>`, either by summing or averaging over the entire data set.
`double`	`calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean average, int minibatchSize)` Calculate the score for all examples in the provided `JavaRDD<DataSet>`, either by summing or averaging over the entire data set.
`double`	`calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean average)` Calculate the score for all examples in the provided `JavaRDD<MultiDataSet>`, either by summing or averaging over the entire data set.
`double`	`calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean average, int minibatchSize)` Calculate the score for all examples in the provided `JavaRDD<MultiDataSet>`, either by summing or averaging over the entire data set.
`<K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]>`	`feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> featuresData, int batchSize)` Feed-forward the specified data, with the given keys.
`<K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray>`	`feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> featuresData, int batchSize)` Feed-forward the specified data, with the given keys.
`ComputationGraph`	`fit(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd)` Fit the ComputationGraph with the given data set
`ComputationGraph`	`fit(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.DataSet> rdd)` Fit the ComputationGraph with the given data set
`ComputationGraph`	`fit(java.lang.String path)` Fit the SparkComputationGraph network using a directory of serialized DataSet objects The assumption here is that the directory contains a number of `DataSet` objects, each serialized using `DataSet.save(OutputStream)`
`ComputationGraph`	`fit(java.lang.String path, int minPartitions)` Deprecated. Use `fit(String)`
`ComputationGraph`	`fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)` Fit the ComputationGraph with the given data set
`ComputationGraph`	`fitMultiDataSet(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)` Fit the ComputationGraph with the given data set
`ComputationGraph`	`fitMultiDataSet(java.lang.String path)` Fit the SparkComputationGraph network using a directory of serialized MultiDataSet objects The assumption here is that the directory contains a number of serialized `MultiDataSet` objects
`ComputationGraph`	`fitMultiDataSet(java.lang.String path, int minPartitions)` Deprecated. use `fitMultiDataSet(String)`
`ComputationGraph`	`fitPaths(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)` Fit the network using a list of paths for serialized DataSet objects.
`ComputationGraph`	`fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)` Fit the network using a list of paths for serialized MultiDataSet objects.
`ComputationGraph`	`getNetwork()`
`double`	`getScore()` Gets the last (average) minibatch score from calling fit.
`org.apache.spark.api.java.JavaSparkContext`	`getSparkContext()`
`SparkTrainingStats`	`getSparkTrainingStats()`
`TrainingMaster`	`getTrainingMaster()`
`<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double>`	`scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms)` DataSet version of `scoreExamples(JavaPairRDD, boolean)`
`<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double>`	`scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms, int batchSize)` DataSet version of `scoreExamples(JavaPairRDD, boolean, int)`
`org.apache.spark.api.java.JavaDoubleRDD`	`scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms)` DataSet version of `scoreExamples(JavaRDD, boolean)`
`org.apache.spark.api.java.JavaDoubleRDD`	`scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data, boolean includeRegularizationTerms, int batchSize)` DataSet version of `scoreExamples(JavaPairRDD, boolean, int)`
`<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double>`	`scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms)` Score the examples individually, using the default batch size `DEFAULT_EVAL_SCORE_BATCH_SIZE`.
`<K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double>`	`scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)` Score the examples individually, using a specified batch size.
`org.apache.spark.api.java.JavaDoubleRDD`	`scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms)` Score the examples individually, using the default batch size `DEFAULT_EVAL_SCORE_BATCH_SIZE`.
`org.apache.spark.api.java.JavaDoubleRDD`	`scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)` Score the examples individually, using a specified batch size.
`void`	`setCollectTrainingStats(boolean collectTrainingStats)`
`void`	`setNetwork(ComputationGraph network)`
`void`	`setScore(double lastScore)`

Methods inherited from class org.deeplearning4j.spark.impl.SparkListenable
setListeners, setListeners, setListeners, setListeners

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_EVAL_SCORE_BATCH_SIZE
```
public static final int DEFAULT_EVAL_SCORE_BATCH_SIZE
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - SparkComputationGraph
```
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
                             ComputationGraph network,
                             TrainingMaster trainingMaster)
```
    Instantiate a ComputationGraph instance with the given context and network.
    
    Parameters:
    
    sparkContext - the spark context to use
    
    network - the network to use
  - SparkComputationGraph
```
public SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
                             ComputationGraph network,
                             TrainingMaster trainingMaster)
```
  - SparkComputationGraph
```
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
                             ComputationGraphConfiguration conf,
                             TrainingMaster trainingMaster)
```
  - SparkComputationGraph
```
public SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
                             ComputationGraphConfiguration conf,
                             TrainingMaster trainingMaster)
```
- Method Detail
  - getSparkContext
```
public org.apache.spark.api.java.JavaSparkContext getSparkContext()
```
  - setCollectTrainingStats
```
public void setCollectTrainingStats(boolean collectTrainingStats)
```
  - getSparkTrainingStats
```
public SparkTrainingStats getSparkTrainingStats()
```
  - getNetwork
```
public ComputationGraph getNetwork()
```
    Returns:
    
    The trained ComputationGraph
  - getTrainingMaster
```
public TrainingMaster getTrainingMaster()
```
    Returns:
    
    The TrainingMaster for this network
  - setNetwork
```
public void setNetwork(ComputationGraph network)
```
  - fit
```
public ComputationGraph fit(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.DataSet> rdd)
```
    Fit the ComputationGraph with the given data set
    
    Parameters:
    
    rdd - Data to train on
    
    Returns:
    
    Trained network
  - fit
```
public ComputationGraph fit(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> rdd)
```
    Fit the ComputationGraph with the given data set
    
    Parameters:
    
    rdd - Data to train on
    
    Returns:
    
    Trained network
  - fit
```
public ComputationGraph fit(java.lang.String path)
```
    Fit the SparkComputationGraph network using a directory of serialized DataSet objects The assumption here is that the directory contains a number of DataSet objects, each serialized using DataSet.save(OutputStream)
    
    Parameters:
    
    path - Path to the directory containing the serialized DataSet objcets
    
    Returns:
    
    The MultiLayerNetwork after training
  - fit
```
@Deprecated
public ComputationGraph fit(java.lang.String path,
                                        int minPartitions)
```
    Deprecated. Use fit(String)
  - fitPaths
```
public ComputationGraph fitPaths(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
```
    Fit the network using a list of paths for serialized DataSet objects.
    
    Parameters:
    
    paths - List of paths
    
    Returns:
    
    trained network
  - fitMultiDataSet
```
public ComputationGraph fitMultiDataSet(org.apache.spark.rdd.RDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
```
    Fit the ComputationGraph with the given data set
    
    Parameters:
    
    rdd - Data to train on
    
    Returns:
    
    Trained network
  - fitMultiDataSet
```
public ComputationGraph fitMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> rdd)
```
    Fit the ComputationGraph with the given data set
    
    Parameters:
    
    rdd - Data to train on
    
    Returns:
    
    Trained network
  - fitMultiDataSet
```
public ComputationGraph fitMultiDataSet(java.lang.String path)
```
    Fit the SparkComputationGraph network using a directory of serialized MultiDataSet objects The assumption here is that the directory contains a number of serialized MultiDataSet objects
    
    Parameters:
    
    path - Path to the directory containing the serialized MultiDataSet objcets
    
    Returns:
    
    The MultiLayerNetwork after training
  - fitPathsMultiDataSet
```
public ComputationGraph fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<java.lang.String> paths)
```
    Fit the network using a list of paths for serialized MultiDataSet objects.
    
    Parameters:
    
    paths - List of paths
    
    Returns:
    
    trained network
  - fitMultiDataSet
```
@Deprecated
public ComputationGraph fitMultiDataSet(java.lang.String path,
                                                    int minPartitions)
```
    Deprecated. use fitMultiDataSet(String)
  - getScore
```
public double getScore()
```
    Gets the last (average) minibatch score from calling fit. This is the average score across all executors for the last minibatch executed in each worker
  - setScore
```
public void setScore(double lastScore)
```
  - calculateScore
```
public double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
                             boolean average)
```
    Calculate the score for all examples in the provided JavaRDD<DataSet>, either by summing or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean) or one of the similar methods. Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZE
    
    Parameters:
    
    data - Data to score
    
    average - Whether to sum the scores, or average them
  - calculateScore
```
public double calculateScore(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
                             boolean average,
                             int minibatchSize)
```
    Calculate the score for all examples in the provided JavaRDD<DataSet>, either by summing or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean) or one of the similar methods
    
    Parameters:
    
    data - Data to score
    
    average - Whether to sum the scores, or average them
    
    minibatchSize - The number of examples to use in each minibatch when scoring. If more examples are in a partition than this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition in one go)
  - calculateScoreMultiDataSet
```
public double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
                                         boolean average)
```
    Calculate the score for all examples in the provided JavaRDD<MultiDataSet>, either by summing or averaging over the entire data set. Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZE
    
    Parameters:
    
    data - Data to score
    
    average - Whether to sum the scores, or average them
  - calculateScoreMultiDataSet
```
public double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
                                         boolean average,
                                         int minibatchSize)
```
    Calculate the score for all examples in the provided JavaRDD<MultiDataSet>, either by summing or averaging over the entire data set. *
    
    Parameters:
    
    data - Data to score
    
    average - Whether to sum the scores, or average them
    
    minibatchSize - The number of examples to use in each minibatch when scoring. If more examples are in a partition than this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition in one go)
  - scoreExamples
```
public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
                                                             boolean includeRegularizationTerms)
```
    DataSet version of scoreExamples(JavaRDD, boolean)
  - scoreExamples
```
public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.DataSet> data,
                                                             boolean includeRegularizationTerms,
                                                             int batchSize)
```
    DataSet version of scoreExamples(JavaPairRDD, boolean, int)
  - scoreExamples
```
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
                                                                                   boolean includeRegularizationTerms)
```
    DataSet version of scoreExamples(JavaPairRDD, boolean)
  - scoreExamples
```
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.DataSet> data,
                                                                                   boolean includeRegularizationTerms,
                                                                                   int batchSize)
```
    DataSet version of scoreExamples(JavaPairRDD, boolean, int)
  - scoreExamplesMultiDataSet
```
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
                                                                         boolean includeRegularizationTerms)
```
    Score the examples individually, using the default batch size DEFAULT_EVAL_SCORE_BATCH_SIZE. Unlike calculateScore(JavaRDD, boolean), this method returns a score for each example separately. If scoring is needed for specific examples use either scoreExamples(JavaPairRDD, boolean) or scoreExamples(JavaPairRDD, boolean, int) which can have a key for each example.
    
    Parameters:
    
    data - Data to score
    
    includeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)
    
    Returns:
    
    A JavaDoubleRDD containing the scores of each example
    
    See Also:
    
    ComputationGraph.scoreExamples(MultiDataSet, boolean)
  - scoreExamplesMultiDataSet
```
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<org.nd4j.linalg.dataset.api.MultiDataSet> data,
                                                                         boolean includeRegularizationTerms,
                                                                         int batchSize)
```
    Score the examples individually, using a specified batch size. Unlike calculateScore(JavaRDD, boolean), this method returns a score for each example separately. If scoring is needed for specific examples use either scoreExamples(JavaPairRDD, boolean) or scoreExamples(JavaPairRDD, boolean, int) which can have a key for each example.
    
    Parameters:
    
    data - Data to score
    
    includeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)
    
    batchSize - Batch size to use when doing scoring
    
    Returns:
    
    A JavaDoubleRDD containing the scores of each example
    
    See Also:
    
    ComputationGraph.scoreExamples(MultiDataSet, boolean)
  - scoreExamplesMultiDataSet
```
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
                                                                                               boolean includeRegularizationTerms)
```
    Score the examples individually, using the default batch size DEFAULT_EVAL_SCORE_BATCH_SIZE. Unlike calculateScore(JavaRDD, boolean), this method returns a score for each example separately
    Note: The provided JavaPairRDD has a key that is associated with each example and returned score.
    Note: The DataSet objects passed in must have exactly one example in them (otherwise: can't have a 1:1 association between keys and data sets to score)
    
    Type Parameters:
    
    K - Key type
    
    Parameters:
    
    data - Data to score
    
    includeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)
    
    Returns:
    
    A JavaPairRDD<K,Double> containing the scores of each example
    
    See Also:
    
    MultiLayerNetwork.scoreExamples(DataSet, boolean)
  - feedForwardWithKeySingle
```
public <K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray> featuresData,
                                                                                                                  int batchSize)
```
    Feed-forward the specified data, with the given keys. i.e., get the network output/predictions for the specified data
    
    Type Parameters:
    
    K - Type of data for key - may be anything
    
    Parameters:
    
    featuresData - Features data to feed through the network
    
    batchSize - Batch size to use when doing feed forward operations
    
    Returns:
    
    Network output given the input, by key
  - feedForwardWithKey
```
public <K> org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.api.ndarray.INDArray[]> featuresData,
                                                                                                              int batchSize)
```
    Feed-forward the specified data, with the given keys. i.e., get the network output/predictions for the specified data
    
    Type Parameters:
    
    K - Type of data for key - may be anything
    
    Parameters:
    
    featuresData - Features data to feed through the network
    
    batchSize - Batch size to use when doing feed forward operations
    
    Returns:
    
    Network output given the input, by key
  - scoreExamplesMultiDataSet
```
public <K> org.apache.spark.api.java.JavaPairRDD<K,java.lang.Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,org.nd4j.linalg.dataset.api.MultiDataSet> data,
                                                                                               boolean includeRegularizationTerms,
                                                                                               int batchSize)
```
    Score the examples individually, using a specified batch size. Unlike calculateScore(JavaRDD, boolean), this method returns a score for each example separately
    Note: The provided JavaPairRDD has a key that is associated with each example and returned score.
    Note: The DataSet objects passed in must have exactly one example in them (otherwise: can't have a 1:1 association between keys and data sets to score)
    
    Type Parameters:
    
    K - Key type
    
    Parameters:
    
    data - Data to score
    
    includeRegularizationTerms - If true: include the l1/l2 regularization terms with the score (if any)
    
    Returns:
    
    A JavaPairRDD<K,Double> containing the scores of each example
    
    See Also:
    
    MultiLayerNetwork.scoreExamples(DataSet, boolean)

Class SparkComputationGraph

Field Summary

Fields inherited from class org.deeplearning4j.spark.impl.SparkListenable

Constructor Summary

Method Summary

Methods inherited from class org.deeplearning4j.spark.impl.SparkListenable

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_EVAL_SCORE_BATCH_SIZE

Constructor Detail

SparkComputationGraph

SparkComputationGraph

SparkComputationGraph

SparkComputationGraph

Method Detail

getSparkContext

setCollectTrainingStats

getSparkTrainingStats

getNetwork

getTrainingMaster

setNetwork

fit

fit

fit

fit

fitPaths

fitMultiDataSet

fitMultiDataSet

fitMultiDataSet

fitPathsMultiDataSet

fitMultiDataSet

getScore

setScore

calculateScore

calculateScore

calculateScoreMultiDataSet

calculateScoreMultiDataSet

scoreExamples

scoreExamples

scoreExamples

scoreExamples

scoreExamplesMultiDataSet

scoreExamplesMultiDataSet

scoreExamplesMultiDataSet

feedForwardWithKeySingle

feedForwardWithKey

scoreExamplesMultiDataSet