ParameterAveragingTrainingMaster.Builder

java.lang.Object
- org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster.Builder

Enclosing class:: ParameterAveragingTrainingMaster

public static class ParameterAveragingTrainingMaster.Builder
extends java.lang.Object

Constructor Summary

Constructors
Constructor and Description
`Builder(int rddDataSetNumExamples)` Same as `#Builder(Integer, int)` but automatically set number of workers based on JavaSparkContext.defaultParallelism()
`Builder(java.lang.Integer numWorkers, int rddDataSetNumExamples)` Create a builder, where the following number of workers (Spark executors * number of threads per executor) are being used. Note: this should match the configuration of the cluster.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`ParameterAveragingTrainingMaster.Builder`	`averagingFrequency(int averagingFrequency)` Frequency with which to average worker parameters. Note: Too high or too low can be bad for different reasons. - Too low (such as 1) can result in a lot of network traffic - Too high (>> 20 or so) can result in accuracy issues or problems with network convergence
`ParameterAveragingTrainingMaster.Builder`	`batchSizePerWorker(int batchSizePerWorker)` Batch size (in number of examples) per worker, for each fit(DataSet) call.
`ParameterAveragingTrainingMaster`	`build()`
`ParameterAveragingTrainingMaster.Builder`	`exportDirectory(java.lang.String exportDirectory)` When `rddTrainingApproach(RDDTrainingApproach)` is set to `RDDTrainingApproach.Export` (as it is by default) the data is exported to a temporary directory first.
`ParameterAveragingTrainingMaster.Builder`	`rddTrainingApproach(RDDTrainingApproach rddTrainingApproach)` The approach to use when training on a `RDD<DataSet>` or `RDD<MultiDataSet>`.
`ParameterAveragingTrainingMaster.Builder`	`repartionData(Repartition repartition)` Set if/when repartitioning should be conducted for the training data. Default value: always repartition (if required to guarantee correct number of partitions and correct number of examples in each partition).
`ParameterAveragingTrainingMaster.Builder`	`repartitionStrategy(RepartitionStrategy repartitionStrategy)` Used in conjunction with `repartionData(Repartition)` (which defines when repartitioning should be conducted), repartitionStrategy defines how the repartitioning should be done.
`ParameterAveragingTrainingMaster.Builder`	`rngSeed(long rngSeed)` Random number generator seed, used mainly for enforcing repeatable splitting on RDDs Default: no seed set (i.e., random seed)
`ParameterAveragingTrainingMaster.Builder`	`saveUpdater(boolean saveUpdater)` Set whether the updater (i.e., historical state for momentum, adagrad, etc should be saved).
`ParameterAveragingTrainingMaster.Builder`	`storageLevel(org.apache.spark.storage.StorageLevel storageLevel)` Set the storage level for `RDD<DataSet>`s. Default: StorageLevel.MEMORY_ONLY_SER() - i.e., store in memory, in serialized form To use no RDD persistence, use `null`
`ParameterAveragingTrainingMaster.Builder`	`storageLevelStreams(org.apache.spark.storage.StorageLevel storageLevelStreams)` Set the storage level RDDs used when fitting data from Streams: either PortableDataStreams (sc.binaryFiles via `SparkDl4jMultiLayer.fit(String)` and `SparkComputationGraph.fit(String)`) or String paths (via `SparkDl4jMultiLayer.fitPaths(JavaRDD)`, `SparkComputationGraph.fitPaths(JavaRDD)` and `SparkComputationGraph.fitPathsMultiDataSet(JavaRDD)`).
`ParameterAveragingTrainingMaster.Builder`	`trainingHooks(java.util.Collection<TrainingHook> trainingHooks)` Adds training hooks to the master.
`ParameterAveragingTrainingMaster.Builder`	`trainingHooks(TrainingHook... hooks)` Adds training hooks to the master.
`ParameterAveragingTrainingMaster.Builder`	`workerPrefetchNumBatches(int prefetchNumBatches)` Set the number of minibatches to asynchronously prefetch in the worker.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Builder
```
public Builder(int rddDataSetNumExamples)
```
    Same as #Builder(Integer, int) but automatically set number of workers based on JavaSparkContext.defaultParallelism()
    
    Parameters:
    
    rddDataSetNumExamples - Number of examples in each DataSet object in the RDD<DataSet>
  - Builder
```
public Builder(java.lang.Integer numWorkers,
               int rddDataSetNumExamples)
```
    Create a builder, where the following number of workers (Spark executors * number of threads per executor) are being used.
    Note: this should match the configuration of the cluster.
    
    It is also necessary to specify how many examples are in each DataSet that appears in the RDD<DataSet> or JavaRDD<DataSet> used for training.
    Two most common cases here:
    (a) Preprocessed data pipelines will often load binary DataSet objects with N > 1 examples in each; in this case, rddDataSetNumExamples should be set to N
    (b) "In line" data pipelines (for example, CSV String -> record reader -> DataSet just before training) will typically have exactly 1 example in each DataSet object. In this case, rddDataSetNumExamples should be set to 1
    
    Parameters:
    
    numWorkers - Number of Spark execution threads in the cluster. May be null. If null: number of workers will be obtained from JavaSparkContext.defaultParallelism(), which should provide the number of cores in the cluster.
    
    rddDataSetNumExamples - Number of examples in each DataSet object in the RDD<DataSet>
- Method Detail
  - trainingHooks
```
public ParameterAveragingTrainingMaster.Builder trainingHooks(java.util.Collection<TrainingHook> trainingHooks)
```
    Adds training hooks to the master. The training master will setup the workers with the desired hooks for training. This can allow for tings like parameter servers and async updates as well as collecting statistics.
    
    Parameters:
    
    trainingHooks - the training hooks to ad
    
    Returns:
  - trainingHooks
```
public ParameterAveragingTrainingMaster.Builder trainingHooks(TrainingHook... hooks)
```
    Adds training hooks to the master. The training master will setup the workers with the desired hooks for training. This can allow for tings like parameter servers and async updates as well as collecting statistics.
    
    Parameters:
    
    hooks - the training hooks to ad
    
    Returns:
  - batchSizePerWorker
```
public ParameterAveragingTrainingMaster.Builder batchSizePerWorker(int batchSizePerWorker)
```
    Batch size (in number of examples) per worker, for each fit(DataSet) call.
    
    Parameters:
    
    batchSizePerWorker - Size of each minibatch to use for each worker
    
    Returns:
  - averagingFrequency
```
public ParameterAveragingTrainingMaster.Builder averagingFrequency(int averagingFrequency)
```
    Frequency with which to average worker parameters.
    Note: Too high or too low can be bad for different reasons.
    - Too low (such as 1) can result in a lot of network traffic
    - Too high (>> 20 or so) can result in accuracy issues or problems with network convergence
    
    Parameters:
    
    averagingFrequency - Frequency (in number of minibatches of size 'batchSizePerWorker') to average parameters
  - workerPrefetchNumBatches
```
public ParameterAveragingTrainingMaster.Builder workerPrefetchNumBatches(int prefetchNumBatches)
```
    Set the number of minibatches to asynchronously prefetch in the worker.
    Default: 0 (no prefetching)
    
    Parameters:
    
    prefetchNumBatches - Number of minibatches (DataSets of size batchSizePerWorker) to fetch
  - saveUpdater
```
public ParameterAveragingTrainingMaster.Builder saveUpdater(boolean saveUpdater)
```
    Set whether the updater (i.e., historical state for momentum, adagrad, etc should be saved). NOTE: This can double (or more) the amount of network traffic in each direction, but might improve network training performance (and can be more stable for certain updaters such as adagrad).
    
    This is enabled by default.
    
    Parameters:
    
    saveUpdater - If true: retain the updater state (default). If false, don't retain (updaters will be reinitalized in each worker after averaging).
  - repartionData
```
public ParameterAveragingTrainingMaster.Builder repartionData(Repartition repartition)
```
    Set if/when repartitioning should be conducted for the training data.
    Default value: always repartition (if required to guarantee correct number of partitions and correct number of examples in each partition).
    
    Parameters:
    
    repartition - Setting for repartitioning
  - repartitionStrategy
```
public ParameterAveragingTrainingMaster.Builder repartitionStrategy(RepartitionStrategy repartitionStrategy)
```
    Used in conjunction with repartionData(Repartition) (which defines when repartitioning should be conducted), repartitionStrategy defines how the repartitioning should be done. See RepartitionStrategy for details
    
    Parameters:
    
    repartitionStrategy - Repartitioning strategy to use
  - storageLevel
```
public ParameterAveragingTrainingMaster.Builder storageLevel(org.apache.spark.storage.StorageLevel storageLevel)
```
    Set the storage level for RDD<DataSet>s.
    Default: StorageLevel.MEMORY_ONLY_SER() - i.e., store in memory, in serialized form
    To use no RDD persistence, use null
    
    Note: Spark's StorageLevel.MEMORY_ONLY() and StorageLevel.MEMORY_AND_DISK() can be problematic when it comes to off-heap data (which DL4J/ND4J uses extensively). Spark does not account for off-heap memory when deciding if/when to drop blocks to ensure enough free memory; consequently, for DataSet RDDs that are larger than the total amount of (off-heap) memory, this can lead to OOM issues. Put another way: Spark counts the on-heap size of DataSet and INDArray objects only (which is negligible) resulting in a significant underestimate of the true DataSet object sizes. More DataSets are thus kept in memory than we can really afford.
    
    Parameters:
    
    storageLevel - Storage level to use for DataSet RDDs
  - storageLevelStreams
```
public ParameterAveragingTrainingMaster.Builder storageLevelStreams(org.apache.spark.storage.StorageLevel storageLevelStreams)
```
    Set the storage level RDDs used when fitting data from Streams: either PortableDataStreams (sc.binaryFiles via SparkDl4jMultiLayer.fit(String) and SparkComputationGraph.fit(String)) or String paths (via SparkDl4jMultiLayer.fitPaths(JavaRDD), SparkComputationGraph.fitPaths(JavaRDD) and SparkComputationGraph.fitPathsMultiDataSet(JavaRDD)).
    
    Default storage level is StorageLevel.MEMORY_ONLY() which should be appropriate in most cases.
    
    Parameters:
    
    storageLevelStreams - Storage level to use
  - rddTrainingApproach
```
public ParameterAveragingTrainingMaster.Builder rddTrainingApproach(RDDTrainingApproach rddTrainingApproach)
```
    The approach to use when training on a RDD<DataSet> or RDD<MultiDataSet>. Default: RDDTrainingApproach.Export, which exports data to a temporary directory first
    
    Parameters:
    
    rddTrainingApproach - Training approach to use when training from a RDD<DataSet> or RDD<MultiDataSet>
  - exportDirectory
```
public ParameterAveragingTrainingMaster.Builder exportDirectory(java.lang.String exportDirectory)
```
    When rddTrainingApproach(RDDTrainingApproach) is set to RDDTrainingApproach.Export (as it is by default) the data is exported to a temporary directory first.
    Default: null. -> use {hadoop.tmp.dir}/dl4j/. In this case, data is exported to {hadoop.tmp.dir}/dl4j/SOME_UNIQUE_ID/
    If you specify a directory, the directory {exportDirectory}/SOME_UNIQUE_ID/ will be used instead.
    
    Parameters:
    
    exportDirectory - Base directory to export data
  - rngSeed
```
public ParameterAveragingTrainingMaster.Builder rngSeed(long rngSeed)
```
    Random number generator seed, used mainly for enforcing repeatable splitting on RDDs Default: no seed set (i.e., random seed)
    
    Parameters:
    
    rngSeed - RNG seed
    
    Returns:
  - build
```
public ParameterAveragingTrainingMaster build()
```

Class ParameterAveragingTrainingMaster.Builder

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Builder

Builder

Method Detail

trainingHooks

trainingHooks

batchSizePerWorker

averagingFrequency

workerPrefetchNumBatches

saveUpdater

repartionData

repartitionStrategy

storageLevel

storageLevelStreams

rddTrainingApproach

exportDirectory

rngSeed

build