public static class TransformProcess.Builder
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
TransformProcess.Builder |
addConstantColumn(java.lang.String newColumnName,
ColumnType newColumnType,
Writable fixedValue)
Add a new column, where all values in the column are identical and as specified.
|
TransformProcess.Builder |
addConstantDoubleColumn(java.lang.String newColumnName,
double value)
Add a new double column, where the value for that column (for all records) are identical
|
TransformProcess.Builder |
addConstantIntegerColumn(java.lang.String newColumnName,
int value)
Add a new integer column, where the value for that column (for all records) are identical
|
TransformProcess.Builder |
addConstantLongColumn(java.lang.String newColumnName,
long value)
Add a new integer column, where the value for that column (for all records) are identical
|
TransformProcess.Builder |
appendStringColumnTransform(java.lang.String column,
java.lang.String toAppend)
Append a String to a specified column
|
TransformProcess |
build()
Create the TransformProcess object
|
TransformProcess.Builder |
calculateSortedRank(java.lang.String newColumnName,
java.lang.String sortOnColumn,
WritableComparator comparator)
CalculateSortedRank: calculate the rank of each example, after sorting example.
|
TransformProcess.Builder |
calculateSortedRank(java.lang.String newColumnName,
java.lang.String sortOnColumn,
WritableComparator comparator,
boolean ascending)
CalculateSortedRank: calculate the rank of each example, after sorting example.
|
TransformProcess.Builder |
categoricalToInteger(java.lang.String... columnNames)
Convert the specified column(s) from a categorical representation to an integer representation.
|
TransformProcess.Builder |
categoricalToOneHot(java.lang.String... columnNames)
Convert the specified column(s) from a categorical representation to a one-hot representation.
|
TransformProcess.Builder |
conditionalCopyValueTransform(java.lang.String columnToReplace,
java.lang.String sourceColumn,
Condition condition)
Replace the value in a specified column with a new value taken from another column, if a condition is satisfied/true.
Note that the condition can be any generic condition, including on other column(s), different to the column that will be modified if the condition is satisfied/true. |
TransformProcess.Builder |
conditionalReplaceValueTransform(java.lang.String column,
Writable newValue,
Condition condition)
Replace the values in a specified column with a specified new value, if some condition holds.
|
TransformProcess.Builder |
convertFromSequence()
Convert a sequence to a set of individual values (by treating each value in each sequence as a separate example)
|
TransformProcess.Builder |
convertToSequence(java.lang.String keyColumn,
SequenceComparator comparator)
Convert a set of independent records/examples into a sequence, according to some key.
|
TransformProcess.Builder |
convertToString(java.lang.String inputColumn)
Convert the specified column to a string.
|
TransformProcess.Builder |
doubleColumnsMathOp(java.lang.String newColumnName,
MathOp mathOp,
java.lang.String... columnNames)
Calculate and add a new double column by performing a mathematical operation on a number of existing columns.
|
TransformProcess.Builder |
doubleMathFunction(java.lang.String columnName,
MathFunction mathFunction)
Perform a mathematical operation (such as sin(x), ceil(x), exp(x) etc) on a column
|
TransformProcess.Builder |
doubleMathOp(java.lang.String columnName,
MathOp mathOp,
double scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified double column, with a scalar
|
TransformProcess.Builder |
duplicateColumn(java.lang.String column,
java.lang.String newName)
Duplicate a single column
|
TransformProcess.Builder |
duplicateColumns(java.util.List<java.lang.String> columnNames,
java.util.List<java.lang.String> newNames)
Duplicate a set of columns
|
TransformProcess.Builder |
filter(Condition condition)
Add a filter operation, based on the specified condition.
|
TransformProcess.Builder |
filter(Filter filter)
Add a filter operation to be executed after the previously-added operations have been executed
|
TransformProcess.Builder |
integerColumnsMathOp(java.lang.String newColumnName,
MathOp mathOp,
java.lang.String... columnNames)
Calculate and add a new integer column by performing a mathematical operation on a number of existing columns.
|
TransformProcess.Builder |
integerMathOp(java.lang.String column,
MathOp mathOp,
int scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified integer column, with a scalar
|
TransformProcess.Builder |
integerToCategorical(java.lang.String columnName,
java.util.List<java.lang.String> categoryStateNames)
Convert the specified column from an integer representation (assume values 0 to numCategories-1) to
a categorical representation, given the specified state names
|
TransformProcess.Builder |
integerToCategorical(java.lang.String columnName,
java.util.Map<java.lang.Integer,java.lang.String> categoryIndexNameMap)
Convert the specified column from an integer representation to a categorical representation, given the specified
mapping between integer indexes and state names
|
TransformProcess.Builder |
longColumnsMathOp(java.lang.String newColumnName,
MathOp mathOp,
java.lang.String... columnNames)
Calculate and add a new long column by performing a mathematical operation on a number of existing columns.
|
TransformProcess.Builder |
longMathOp(java.lang.String columnName,
MathOp mathOp,
long scalar)
Perform a mathematical operation (add, subtract, scalar max etc) on the specified long column, with a scalar
|
TransformProcess.Builder |
normalize(java.lang.String column,
Normalize type,
DataAnalysis da)
Normalize the specified column with a given type of normalization
|
TransformProcess.Builder |
reduce(IReducer reducer)
Reduce (i.e., aggregate/combine) a set of examples (typically by key).
|
TransformProcess.Builder |
reduceSequenceByWindow(IReducer reducer,
WindowFunction windowFunction)
Reduce (i.e., aggregate/combine) a set of sequence examples - for each sequence individually - using a window function.
|
TransformProcess.Builder |
removeAllColumnsExceptFor(java.util.Collection<java.lang.String> columnNames)
Remove all columns, except for those that are specified here
|
TransformProcess.Builder |
removeAllColumnsExceptFor(java.lang.String... columnNames)
Remove all columns, except for those that are specified here
|
TransformProcess.Builder |
removeColumns(java.util.Collection<java.lang.String> columnNames)
Remove all of the specified columns, by name
|
TransformProcess.Builder |
removeColumns(java.lang.String... columnNames)
Remove all of the specified columns, by name
|
TransformProcess.Builder |
renameColumn(java.lang.String oldName,
java.lang.String newName)
Rename a single column
|
TransformProcess.Builder |
renameColumns(java.util.List<java.lang.String> oldNames,
java.util.List<java.lang.String> newNames)
Rename multiple columns
|
TransformProcess.Builder |
reorderColumns(java.lang.String... newOrder)
Reorder the columns using a partial or complete new ordering.
|
TransformProcess.Builder |
splitSequence(SequenceSplit split)
Split sequences into 1 or more other sequences.
|
TransformProcess.Builder |
stringMapTransform(java.lang.String columnName,
java.util.Map<java.lang.String,java.lang.String> mapping)
Replace one or more String values in the specified column with new values.
|
TransformProcess.Builder |
stringRemoveWhitespaceTransform(java.lang.String columnName)
Remove all whitespace characters from the values in the specified String column
|
TransformProcess.Builder |
stringToCategorical(java.lang.String columnName,
java.util.List<java.lang.String> stateNames)
Convert the specified String column to a categorical column.
|
TransformProcess.Builder |
stringToTimeTransform(java.lang.String column,
java.lang.String format,
org.joda.time.DateTimeZone dateTimeZone)
Convert a String column (containing a date/time String) to a time column (by parsing the date/time String)
|
TransformProcess.Builder |
timeMathOp(java.lang.String columnName,
MathOp mathOp,
long timeQuantity,
java.util.concurrent.TimeUnit timeUnit)
Perform a mathematical operation (add, subtract, scalar min/max only) on the specified time column
|
TransformProcess.Builder |
transform(Transform transform)
Add a transformation to be executed after the previously-added operations have been executed
|
public Builder(Schema initialSchema)
public TransformProcess.Builder transform(Transform transform)
transform
- Transform to executepublic TransformProcess.Builder filter(Filter filter)
filter
- Filter operation to executepublic TransformProcess.Builder filter(Condition condition)
condition
- Condition to filter onpublic TransformProcess.Builder removeColumns(java.lang.String... columnNames)
columnNames
- Names of the columns to removepublic TransformProcess.Builder removeColumns(java.util.Collection<java.lang.String> columnNames)
columnNames
- Names of the columns to removepublic TransformProcess.Builder removeAllColumnsExceptFor(java.lang.String... columnNames)
columnNames
- Names of the columns to keeppublic TransformProcess.Builder removeAllColumnsExceptFor(java.util.Collection<java.lang.String> columnNames)
columnNames
- Names of the columns to keeppublic TransformProcess.Builder renameColumn(java.lang.String oldName, java.lang.String newName)
oldName
- Original column namenewName
- New column namepublic TransformProcess.Builder renameColumns(java.util.List<java.lang.String> oldNames, java.util.List<java.lang.String> newNames)
oldNames
- List of original column namesnewNames
- List of new column namespublic TransformProcess.Builder reorderColumns(java.lang.String... newOrder)
newOrder
- Names of the columns, in the order they will appear in the outputpublic TransformProcess.Builder duplicateColumn(java.lang.String column, java.lang.String newName)
column
- Name of the column to duplicatenewName
- Name of the new (duplicate) columnpublic TransformProcess.Builder duplicateColumns(java.util.List<java.lang.String> columnNames, java.util.List<java.lang.String> newNames)
columnNames
- Names of the columns to duplicatenewNames
- Names of the new (duplicated) columnspublic TransformProcess.Builder integerMathOp(java.lang.String column, MathOp mathOp, int scalar)
column
- The integer column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operationpublic TransformProcess.Builder integerColumnsMathOp(java.lang.String newColumnName, MathOp mathOp, java.lang.String... columnNames)
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operationpublic TransformProcess.Builder longMathOp(java.lang.String columnName, MathOp mathOp, long scalar)
columnName
- The long column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operationpublic TransformProcess.Builder longColumnsMathOp(java.lang.String newColumnName, MathOp mathOp, java.lang.String... columnNames)
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operationpublic TransformProcess.Builder doubleMathOp(java.lang.String columnName, MathOp mathOp, double scalar)
columnName
- The double column to perform the operation onmathOp
- The mathematical operationscalar
- The scalar value to use in the mathematical operationpublic TransformProcess.Builder doubleColumnsMathOp(java.lang.String newColumnName, MathOp mathOp, java.lang.String... columnNames)
newColumnName
- Name of the new/derived columnmathOp
- Mathematical operation to execute on the columnscolumnNames
- Names of the columns to use in the mathematical operationpublic TransformProcess.Builder doubleMathFunction(java.lang.String columnName, MathFunction mathFunction)
columnName
- Column name to operate onmathFunction
- MathFunction to apply to the columnpublic TransformProcess.Builder timeMathOp(java.lang.String columnName, MathOp mathOp, long timeQuantity, java.util.concurrent.TimeUnit timeUnit)
columnName
- The integer column to perform the operation onmathOp
- The mathematical operationtimeQuantity
- The quantity used in the mathematical optimeUnit
- The unit that timeQuantity is specified inpublic TransformProcess.Builder categoricalToOneHot(java.lang.String... columnNames)
columnNames
- Names of the categorical column(s) to convert to a one-hot representationpublic TransformProcess.Builder categoricalToInteger(java.lang.String... columnNames)
columnNames
- Name of the categorical column(s) to convert to an integer representationpublic TransformProcess.Builder integerToCategorical(java.lang.String columnName, java.util.List<java.lang.String> categoryStateNames)
columnName
- Name of the column to convertcategoryStateNames
- Names of the states for the categorical columnpublic TransformProcess.Builder integerToCategorical(java.lang.String columnName, java.util.Map<java.lang.Integer,java.lang.String> categoryIndexNameMap)
columnName
- Name of the column to convertcategoryIndexNameMap
- Names of the states for the categorical columnpublic TransformProcess.Builder addConstantColumn(java.lang.String newColumnName, ColumnType newColumnType, Writable fixedValue)
newColumnName
- Name of the new columnnewColumnType
- Type of the new columnfixedValue
- Value in the new column for all recordspublic TransformProcess.Builder addConstantDoubleColumn(java.lang.String newColumnName, double value)
newColumnName
- Name of the new columnvalue
- Value in the new column for all recordspublic TransformProcess.Builder addConstantIntegerColumn(java.lang.String newColumnName, int value)
newColumnName
- Name of the new columnvalue
- Value of the new column for all recordspublic TransformProcess.Builder addConstantLongColumn(java.lang.String newColumnName, long value)
newColumnName
- Name of the new columnvalue
- Value in the new column for all recordspublic TransformProcess.Builder convertToString(java.lang.String inputColumn)
inputColumn
- the input column to convertpublic TransformProcess.Builder normalize(java.lang.String column, Normalize type, DataAnalysis da)
column
- Column to normalizetype
- Type of normalization to applyda
- DataAnalysis objectpublic TransformProcess.Builder convertToSequence(java.lang.String keyColumn, SequenceComparator comparator)
SequenceComparator
keyColumn
- Column to use as a key (values with the same key will be combined into sequences)comparator
- A SequenceComparator to order the values within each sequence (for example, by time or String order)public TransformProcess.Builder convertFromSequence()
public TransformProcess.Builder splitSequence(SequenceSplit split)
split
- SequenceSplit that defines how splits will occurpublic TransformProcess.Builder reduce(IReducer reducer)
reducer
- Reducer to usepublic TransformProcess.Builder reduceSequenceByWindow(IReducer reducer, WindowFunction windowFunction)
reducer
- Reducer to use to reduce each windowwindowFunction
- Window function to find apply on each sequence individuallypublic TransformProcess.Builder calculateSortedRank(java.lang.String newColumnName, java.lang.String sortOnColumn, WritableComparator comparator)
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column
newColumnName
- Name of the new column (will contain the rank for each example)sortOnColumn
- Column to sort oncomparator
- Comparator used to sort examplespublic TransformProcess.Builder calculateSortedRank(java.lang.String newColumnName, java.lang.String sortOnColumn, WritableComparator comparator, boolean ascending)
Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data Furthermore, the current implementation can only sort on one column
newColumnName
- Name of the new column (will contain the rank for each example)sortOnColumn
- Column to sort oncomparator
- Comparator used to sort examplesascending
- If true: sort ascending. False: descendingpublic TransformProcess.Builder stringToCategorical(java.lang.String columnName, java.util.List<java.lang.String> stateNames)
columnName
- Name of the String column to convert to categoricalstateNames
- State names of the categorypublic TransformProcess.Builder stringRemoveWhitespaceTransform(java.lang.String columnName)
columnName
- Name of the column to remove whitespace frompublic TransformProcess.Builder stringMapTransform(java.lang.String columnName, java.util.Map<java.lang.String,java.lang.String> mapping)
Keys in the map are the original values; the Values in the map are their replacements. If a String appears in the data but does not appear in the provided map (as a key), that String values will not be modified.
columnName
- Name of the column in which to do replacementmapping
- Map of oldValues -> newValuespublic TransformProcess.Builder stringToTimeTransform(java.lang.String column, java.lang.String format, org.joda.time.DateTimeZone dateTimeZone)
column
- String column containing the date/time Stringsformat
- Format of the strings. Time format is specified as per http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.htmldateTimeZone
- Timezone of the columnpublic TransformProcess.Builder appendStringColumnTransform(java.lang.String column, java.lang.String toAppend)
column
- Column to append the value totoAppend
- String to append to the end of each writablepublic TransformProcess.Builder conditionalReplaceValueTransform(java.lang.String column, Writable newValue, Condition condition)
column
- Column to operate onnewValue
- Value to use as replacement, if condition is satisfiedcondition
- Condition that must be satisfied for replacementpublic TransformProcess.Builder conditionalCopyValueTransform(java.lang.String columnToReplace, java.lang.String sourceColumn, Condition condition)
columnToReplace
- Name of the column in which values will be replaced (if condition is satisfied)sourceColumn
- Name of the column from which the new values will becondition
- Condition to usepublic TransformProcess build()