public class TfidfRecordReader extends FileRecordReader
appendLabel, conf, currentFile, inputSplit, iter, labels
listeners
APPEND_LABEL, LABELS, NAME_SPACE
Constructor and Description |
---|
TfidfRecordReader() |
Modifier and Type | Method and Description |
---|---|
void |
close() |
Configuration |
getConf()
Return the configuration used by this object.
|
int |
getNumFeatures() |
TfidfVectorizer |
getTfidfVectorizer() |
boolean |
hasNext()
Whether there are anymore records
|
void |
initialize(Configuration conf,
InputSplit split)
Called once at initialization.
|
void |
initialize(InputSplit split)
Called once at initialization.
|
java.util.List<Record> |
loadFromMetaData(java.util.List<RecordMetaData> recordMetaDatas)
Load multiple records from the given a list of
RecordMetaData instances |
Record |
loadFromMetaData(RecordMetaData recordMetaData)
Load a single record from the given
RecordMetaData instanceNote: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once using RecordReader.loadFromMetaData(List) |
java.util.List<Writable> |
next()
Get the next record
|
Record |
nextRecord()
Similar to
RecordReader.next() , but returns a Record object, that may include metadata such as the source
of the data |
void |
reset()
Reset record reader iterator
|
void |
setConf(Configuration conf)
Set the configuration to be used by this object.
|
void |
setTfidfVectorizer(TfidfVectorizer tfidfVectorizer) |
void |
shuffle() |
void |
shuffle(java.util.Random random) |
doInitialize, getCurrentLabel, getLabels, record, setLabels
getListeners, invokeListeners, setListeners, setListeners
public void initialize(InputSplit split) throws java.io.IOException, java.lang.InterruptedException
RecordReader
initialize
in interface RecordReader
initialize
in class FileRecordReader
split
- the split that defines the range of records to readjava.io.IOException
java.lang.InterruptedException
public void initialize(Configuration conf, InputSplit split) throws java.io.IOException, java.lang.InterruptedException
RecordReader
initialize
in interface RecordReader
initialize
in class FileRecordReader
conf
- a configuration for initializationsplit
- the split that defines the range of records to readjava.io.IOException
java.lang.InterruptedException
public void reset()
RecordReader
reset
in interface RecordReader
reset
in class FileRecordReader
public Record nextRecord()
RecordReader
RecordReader.next()
, but returns a Record
object, that may include metadata such as the source
of the datanextRecord
in interface RecordReader
nextRecord
in class FileRecordReader
public java.util.List<Writable> next()
RecordReader
next
in interface RecordReader
next
in class FileRecordReader
public boolean hasNext()
RecordReader
hasNext
in interface RecordReader
hasNext
in class FileRecordReader
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class FileRecordReader
java.io.IOException
public void setConf(Configuration conf)
Configurable
setConf
in interface Configurable
setConf
in class FileRecordReader
public Configuration getConf()
Configurable
getConf
in interface Configurable
getConf
in class FileRecordReader
public TfidfVectorizer getTfidfVectorizer()
public void setTfidfVectorizer(TfidfVectorizer tfidfVectorizer)
public int getNumFeatures()
public void shuffle()
public void shuffle(java.util.Random random)
public Record loadFromMetaData(RecordMetaData recordMetaData) throws java.io.IOException
RecordReader
RecordMetaData
instanceRecordReader.loadFromMetaData(List)
loadFromMetaData
in interface RecordReader
loadFromMetaData
in class FileRecordReader
recordMetaData
- Metadata for the record that we want to load fromjava.io.IOException
- If I/O error occurs during loadingpublic java.util.List<Record> loadFromMetaData(java.util.List<RecordMetaData> recordMetaDatas) throws java.io.IOException
RecordReader
RecordMetaData
instancesloadFromMetaData
in interface RecordReader
loadFromMetaData
in class FileRecordReader
recordMetaDatas
- Metadata for the records that we want to load fromjava.io.IOException
- If I/O error occurs during loading