public class BatchAndExportDataSetsFunction
extends java.lang.Object
implements org.apache.spark.api.java.function.Function2<java.lang.Integer,java.util.Iterator<org.nd4j.linalg.dataset.DataSet>,java.util.Iterator<java.lang.String>>
RDD<DataSet>.mapPartitionsWithIndex
.
It does two things:
1. Batch DataSets together, to the specified minibatch size. This may result in splitting or combining existing
DataSet objects as required
2. Export the DataSet objects to the specified directory.
Naming convention for exported files: "dataset_" + partitionIdx + JVM_UID + "_" + idx + ".bin" where 'idx' is the index of the DataSet objects in this partition
Constructor and Description |
---|
BatchAndExportDataSetsFunction(int minibatchSize,
java.lang.String exportBaseDirectory) |
Modifier and Type | Method and Description |
---|---|
java.util.Iterator<java.lang.String> |
call(java.lang.Integer partitionIdx,
java.util.Iterator<org.nd4j.linalg.dataset.DataSet> iterator) |
public BatchAndExportDataSetsFunction(int minibatchSize, java.lang.String exportBaseDirectory)
minibatchSize
- Minibatch size to combine examples to (if necessary)exportBaseDirectory
- Base directory for exportingpublic java.util.Iterator<java.lang.String> call(java.lang.Integer partitionIdx, java.util.Iterator<org.nd4j.linalg.dataset.DataSet> iterator) throws java.lang.Exception
call
in interface org.apache.spark.api.java.function.Function2<java.lang.Integer,java.util.Iterator<org.nd4j.linalg.dataset.DataSet>,java.util.Iterator<java.lang.String>>
java.lang.Exception