DataFrames

java.lang.Object
- org.datavec.spark.transform.DataFrames

public class DataFrames
extends java.lang.Object

Namespace for datavec dataframe interop

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String SEQUENCE_INDEX_COLUMN

static java.lang.String SEQUENCE_UUID_COLUMN

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`SEQUENCE_INDEX_COLUMN`
`static java.lang.String`	`SEQUENCE_UUID_COLUMN`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static org.apache.spark.sql.types.StructType`	`fromSchema(Schema schema)` Convert a datavec schema to a struct type in spark
`static org.apache.spark.sql.types.StructType`	`fromSchemaSequence(Schema schema)` Convert the DataVec sequence schema to a StructType for Spark, for example for use in `toDataFrameSequence(Schema, JavaRDD)`} Note: as per `toDataFrameSequence(Schema, JavaRDD)`}, the StructType has two additional columns added to it: - Column 0: Sequence UUID (name: `SEQUENCE_UUID_COLUMN`) - a UUID for the original sequence - Column 1: Sequence index (name: `SEQUENCE_INDEX_COLUMN` - an index (integer, starting at 0) for the position of this record in the original time series. These two columns are required if the data is to be converted back into a sequence at a later point, for example using `toRecordsSequence(DataRowsFacade)`
`static Schema`	`fromStructType(org.apache.spark.sql.types.StructType structType)` Create a datavec schema from a struct type
`static org.apache.spark.sql.Column`	`max(DataRowsFacade dataFrame, java.lang.String columnName)` Max for a column
`static org.apache.spark.sql.Column`	`mean(DataRowsFacade dataFrame, java.lang.String columnName)` Mean for a column
`static org.apache.spark.sql.Column`	`min(DataRowsFacade dataFrame, java.lang.String columnName)` MIn for a column
`static java.util.List<Writable>`	`rowToWritables(Schema schema, org.apache.spark.sql.Row row)` Convert a given Row to a list of writables, given the specified Schema
`static org.apache.spark.sql.Column`	`std(DataRowsFacade dataFrame, java.lang.String columnName)` Standard deviation for a column
`static java.lang.String[]`	`toArray(java.util.List<java.lang.String> list)` Convert a string list into a array
`static java.util.List<org.apache.spark.sql.Column>`	`toColumn(java.util.List<java.lang.String> columns)` Convert a list of string names to columns
`static org.apache.spark.sql.Column[]`	`toColumns(java.lang.String... columns)` Convert an array of strings to column names
`static DataRowsFacade`	`toDataFrame(Schema schema, org.apache.spark.api.java.JavaRDD<java.util.List<Writable>> data)` Creates a data frame from a collection of writables rdd given a schema
`static DataRowsFacade`	`toDataFrameSequence(Schema schema, org.apache.spark.api.java.JavaRDD<java.util.List<java.util.List<Writable>>> data)` Convert the given sequence data set to a DataFrame. Note: The resulting DataFrame has two additional columns added to it: - Column 0: Sequence UUID (name: `SEQUENCE_UUID_COLUMN`) - a UUID for the original sequence - Column 1: Sequence index (name: `SEQUENCE_INDEX_COLUMN` - an index (integer, starting at 0) for the position of this record in the original time series. These two columns are required if the data is to be converted back into a sequence at a later point, for example using `toRecordsSequence(DataRowsFacade)`
`static java.util.List<java.lang.String>`	`toList(java.lang.String[] input)` Convert a string array into a list
`static org.nd4j.linalg.api.ndarray.INDArray`	`toMatrix(java.util.List<org.apache.spark.sql.Row> rows)` Convert a list of rows to a matrix
`static Pair<Schema,org.apache.spark.api.java.JavaRDD<java.util.List<Writable>>>`	`toRecords(DataRowsFacade dataFrame)` Create a compatible schema and rdd for datavec
`static Pair<Schema,org.apache.spark.api.java.JavaRDD<java.util.List<java.util.List<Writable>>>>`	`toRecordsSequence(DataRowsFacade dataFrame)` Convert the given DataFrame to a sequence Note: It is assumed here that the DataFrame has been created by `toDataFrameSequence(Schema, JavaRDD)`.
`static org.apache.spark.sql.Column`	`var(DataRowsFacade dataFrame, java.lang.String columnName)` Standard deviation for a column

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - SEQUENCE_UUID_COLUMN
```
public static final java.lang.String SEQUENCE_UUID_COLUMN
```
    See Also:
    
    Constant Field Values
  - SEQUENCE_INDEX_COLUMN
```
public static final java.lang.String SEQUENCE_INDEX_COLUMN
```
    See Also:
    
    Constant Field Values
- Method Detail
  - std
```
public static org.apache.spark.sql.Column std(DataRowsFacade dataFrame,
                                              java.lang.String columnName)
```
    Standard deviation for a column
    
    Parameters:
    
    dataFrame - the dataframe to get the column from
    
    columnName - the name of the column to get the standard deviation for
    
    Returns:
    
    the column that represents the standard deviation
  - var
```
public static org.apache.spark.sql.Column var(DataRowsFacade dataFrame,
                                              java.lang.String columnName)
```
    Standard deviation for a column
    
    Parameters:
    
    dataFrame - the dataframe to get the column from
    
    columnName - the name of the column to get the standard deviation for
    
    Returns:
    
    the column that represents the standard deviation
  - min
```
public static org.apache.spark.sql.Column min(DataRowsFacade dataFrame,
                                              java.lang.String columnName)
```
    MIn for a column
    
    Parameters:
    
    dataFrame - the dataframe to get the column from
    
    columnName - the name of the column to get the min for
    
    Returns:
    
    the column that represents the min
  - max
```
public static org.apache.spark.sql.Column max(DataRowsFacade dataFrame,
                                              java.lang.String columnName)
```
    Max for a column
    
    Parameters:
    
    dataFrame - the dataframe to get the column from
    
    columnName - the name of the column to get the max for
    
    Returns:
    
    the column that represents the max
  - mean
```
public static org.apache.spark.sql.Column mean(DataRowsFacade dataFrame,
                                               java.lang.String columnName)
```
    Mean for a column
    
    Parameters:
    
    dataFrame - the dataframe to get the column fron
    
    columnName - the name of the column to get the mean for
    
    Returns:
    
    the column that represents the mean
  - fromSchema
```
public static org.apache.spark.sql.types.StructType fromSchema(Schema schema)
```
    Convert a datavec schema to a struct type in spark
    
    Parameters:
    
    schema - the schema to convert
    
    Returns:
    
    the datavec struct type
  - fromSchemaSequence
```
public static org.apache.spark.sql.types.StructType fromSchemaSequence(Schema schema)
```
    Convert the DataVec sequence schema to a StructType for Spark, for example for use in toDataFrameSequence(Schema, JavaRDD)} Note: as per toDataFrameSequence(Schema, JavaRDD)}, the StructType has two additional columns added to it:
    - Column 0: Sequence UUID (name: SEQUENCE_UUID_COLUMN) - a UUID for the original sequence
    - Column 1: Sequence index (name: SEQUENCE_INDEX_COLUMN - an index (integer, starting at 0) for the position of this record in the original time series.
    These two columns are required if the data is to be converted back into a sequence at a later point, for example using toRecordsSequence(DataRowsFacade)
    
    Parameters:
    
    schema - Schema to convert
    
    Returns:
    
    StructType for the schema
  - fromStructType
```
public static Schema fromStructType(org.apache.spark.sql.types.StructType structType)
```
    Create a datavec schema from a struct type
    
    Parameters:
    
    structType - the struct type to create the schema from
    
    Returns:
    
    the created schema
  - toRecords
```
public static Pair<Schema,org.apache.spark.api.java.JavaRDD<java.util.List<Writable>>> toRecords(DataRowsFacade dataFrame)
```
    Create a compatible schema and rdd for datavec
    
    Parameters:
    
    dataFrame - the dataframe to convert
    
    Returns:
    
    the converted schema and rdd of writables
  - toRecordsSequence
```
public static Pair<Schema,org.apache.spark.api.java.JavaRDD<java.util.List<java.util.List<Writable>>>> toRecordsSequence(DataRowsFacade dataFrame)
```
    Convert the given DataFrame to a sequence
    Note: It is assumed here that the DataFrame has been created by toDataFrameSequence(Schema, JavaRDD). In particular:
    - the first column is a UUID for the original sequence the row is from
    - the second column is a time step index: where the row appeared in the original sequence
    
    Typical use: Normalization via the Normalization static methods
    
    Parameters:
    
    dataFrame - Data frame to convert
    
    Returns:
    
    Data in sequence (i.e., List<List<Writable>> form
  - toDataFrame
```
public static DataRowsFacade toDataFrame(Schema schema,
                                         org.apache.spark.api.java.JavaRDD<java.util.List<Writable>> data)
```
    Creates a data frame from a collection of writables rdd given a schema
    
    Parameters:
    
    schema - the schema to use
    
    data - the data to convert
    
    Returns:
    
    the dataframe object
  - toDataFrameSequence
```
public static DataRowsFacade toDataFrameSequence(Schema schema,
                                                 org.apache.spark.api.java.JavaRDD<java.util.List<java.util.List<Writable>>> data)
```
    Convert the given sequence data set to a DataFrame.
    Note: The resulting DataFrame has two additional columns added to it:
    - Column 0: Sequence UUID (name: SEQUENCE_UUID_COLUMN) - a UUID for the original sequence
    - Column 1: Sequence index (name: SEQUENCE_INDEX_COLUMN - an index (integer, starting at 0) for the position of this record in the original time series.
    These two columns are required if the data is to be converted back into a sequence at a later point, for example using toRecordsSequence(DataRowsFacade)
    
    Parameters:
    
    schema - Schema for the data
    
    data - Sequence data to convert to a DataFrame
    
    Returns:
    
    The dataframe object
  - rowToWritables
```
public static java.util.List<Writable> rowToWritables(Schema schema,
                                                      org.apache.spark.sql.Row row)
```
    Convert a given Row to a list of writables, given the specified Schema
    
    Parameters:
    
    schema - Schema for the data
    
    row - Row of data
  - toList
```
public static java.util.List<java.lang.String> toList(java.lang.String[] input)
```
    Convert a string array into a list
    
    Parameters:
    
    input - the input to create the list from
    
    Returns:
    
    the created array
  - toArray
```
public static java.lang.String[] toArray(java.util.List<java.lang.String> list)
```
    Convert a string list into a array
    
    Parameters:
    
    list - the input to create the array from
    
    Returns:
    
    the created list
  - toMatrix
```
public static org.nd4j.linalg.api.ndarray.INDArray toMatrix(java.util.List<org.apache.spark.sql.Row> rows)
```
    Convert a list of rows to a matrix
    
    Parameters:
    
    rows - the list of rows to convert
    
    Returns:
    
    the converted matrix
  - toColumn
```
public static java.util.List<org.apache.spark.sql.Column> toColumn(java.util.List<java.lang.String> columns)
```
    Convert a list of string names to columns
    
    Parameters:
    
    columns - the columns to convert
    
    Returns:
    
    the resulting column list
  - toColumns
```
public static org.apache.spark.sql.Column[] toColumns(java.lang.String... columns)
```
    Convert an array of strings to column names
    
    Parameters:
    
    columns - the columns to convert
    
    Returns:
    
    the converted columns

Class DataFrames

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

SEQUENCE_UUID_COLUMN

SEQUENCE_INDEX_COLUMN

Method Detail

std

var

min

max

mean

fromSchema

fromSchemaSequence

fromStructType

toRecords

toRecordsSequence

toDataFrame

toDataFrameSequence

rowToWritables

toList

toArray

toMatrix

toColumn

toColumns