public class BatchAndExportMultiDataSetsFunction
extends java.lang.Object
implements org.apache.spark.api.java.function.Function2<java.lang.Integer,java.util.Iterator<org.nd4j.linalg.dataset.api.MultiDataSet>,java.util.Iterator<java.lang.String>>
RDD<MultiDataSet>.mapPartitionsWithIndex
.
It does two things:
1. Batch MultiDataSets together, to the specified minibatch size. This may result in splitting or combining existing
MultiDataSet objects as required
2. Export the MultiDataSet objects to the specified directory.
Naming convention for exported files: "mds_" + partitionIdx + JVM_UID + "_" + idx + ".bin" where 'idx' is the index of the MultiDataSet objects in this partition
Constructor and Description |
---|
BatchAndExportMultiDataSetsFunction(int minibatchSize,
java.lang.String exportBaseDirectory) |
Modifier and Type | Method and Description |
---|---|
java.util.Iterator<java.lang.String> |
call(java.lang.Integer partitionIdx,
java.util.Iterator<org.nd4j.linalg.dataset.api.MultiDataSet> iterator) |
public BatchAndExportMultiDataSetsFunction(int minibatchSize, java.lang.String exportBaseDirectory)
minibatchSize
- Minibatch size to combine examples to (if necessary)exportBaseDirectory
- Base directory for exportingpublic java.util.Iterator<java.lang.String> call(java.lang.Integer partitionIdx, java.util.Iterator<org.nd4j.linalg.dataset.api.MultiDataSet> iterator) throws java.lang.Exception
call
in interface org.apache.spark.api.java.function.Function2<java.lang.Integer,java.util.Iterator<org.nd4j.linalg.dataset.api.MultiDataSet>,java.util.Iterator<java.lang.String>>
java.lang.Exception