Package org.terrier.indexing
Class TwitterJSONCollection
- java.lang.Object
-
- org.terrier.indexing.TwitterJSONCollection
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,Collection
public class TwitterJSONCollection extends java.lang.Object implements Collection
This class represents a collection of tweets stored in JSON format. Like TRECCollection, it expects a collection specification containing all of the files to be read. Each file is assumed to be in gzip format, with one tweet per line. The google.gson parser is used to read the tweet JSON. The FlatJSONDocument representation is used.- Since:
- 4.0
- Author:
- Richard McCreadie
-
-
Field Summary
Fields Modifier and Type Field Description protected DocumentcurrentDocumentThe current documentprotected java.lang.StringcurrentFilenameThe name of the current fileprotected java.io.BufferedReadercurrentTweetStreamThe underlying file stream reading tweets from the current fileprotected booleanendOfCollectionHave we reached the end of the collection yet?protected intFileNumberThe index in the FilesToProcess of the currently processed file.protected java.util.List<java.lang.String>FilesToProcessThe list of files to process.protected com.google.gson.JsonStreamParserJSONStreamThe JSON stream containing the tweetsprotected static org.slf4j.Loggerloggerlogger for this classprotected booleanSkipFileA boolean which is true when a new file is open.
-
Constructor Summary
Constructors Constructor Description TwitterJSONCollection()TwitterJSONCollection(java.lang.String CollectionSpecFile)TwitterJSONCollection(java.lang.String addressCollectionFilename, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)additional constructors required by TRECIndexingTwitterJSONCollection(java.util.List<java.lang.String> files, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddFileToProcess(java.lang.String JSONFile)voidclose()booleanendOfCollection()Returns true if the end of the collection has been reachedDocumentgetDocument()Get the document object representing the current document.voidinit()protected voidloadJSON(java.lang.String file)booleannextDocument()Move the collection to the start of the next document.booleanopenNextFile()Opens the next document from the collection specification.protected voidreadCollectionSpec(java.lang.String CollectionSpecFilename)com.google.gson.JsonObjectreadTweet()voidreset()Resets the Collection iterator to the start of the collection.
-
-
-
Field Detail
-
logger
protected static final org.slf4j.Logger logger
logger for this class
-
FilesToProcess
protected java.util.List<java.lang.String> FilesToProcess
The list of files to process.
-
SkipFile
protected boolean SkipFile
A boolean which is true when a new file is open.
-
JSONStream
protected com.google.gson.JsonStreamParser JSONStream
The JSON stream containing the tweets
-
currentTweetStream
protected java.io.BufferedReader currentTweetStream
The underlying file stream reading tweets from the current file
-
currentDocument
protected Document currentDocument
The current document
-
currentFilename
protected java.lang.String currentFilename
The name of the current file
-
FileNumber
protected int FileNumber
The index in the FilesToProcess of the currently processed file.
-
endOfCollection
protected boolean endOfCollection
Have we reached the end of the collection yet?
-
-
Constructor Detail
-
TwitterJSONCollection
public TwitterJSONCollection(java.lang.String CollectionSpecFile)
-
TwitterJSONCollection
public TwitterJSONCollection()
-
TwitterJSONCollection
public TwitterJSONCollection(java.lang.String addressCollectionFilename, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)additional constructors required by TRECIndexing
-
TwitterJSONCollection
public TwitterJSONCollection(java.util.List<java.lang.String> files, java.lang.String ignored1, java.lang.String ignored2, java.lang.String ignored3)
-
-
Method Detail
-
init
public void init()
-
loadJSON
protected void loadJSON(java.lang.String file) throws java.io.IOException- Throws:
java.io.IOException
-
addFileToProcess
public void addFileToProcess(java.lang.String JSONFile)
-
readCollectionSpec
protected void readCollectionSpec(java.lang.String CollectionSpecFilename)
-
openNextFile
public boolean openNextFile() throws java.io.IOExceptionOpens the next document from the collection specification.- Returns:
- boolean true if the file was opened successufully. If there are no more files to open, it returns false.
- Throws:
java.io.IOException- if there is an exception while opening the collection files.
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
nextDocument
public boolean nextDocument()
Description copied from interface:CollectionMove the collection to the start of the next document.- Specified by:
nextDocumentin interfaceCollection- Returns:
- boolean true if there exists another document in the collection, otherwise it returns false.
-
readTweet
public com.google.gson.JsonObject readTweet()
-
getDocument
public Document getDocument()
Description copied from interface:CollectionGet the document object representing the current document.- Specified by:
getDocumentin interfaceCollection- Returns:
- Document the current document;
-
endOfCollection
public boolean endOfCollection()
Description copied from interface:CollectionReturns true if the end of the collection has been reached- Specified by:
endOfCollectionin interfaceCollection- Returns:
- boolean true if the end of collection has been reached, otherwise it returns false.
-
reset
public void reset()
Description copied from interface:CollectionResets the Collection iterator to the start of the collection.- Specified by:
resetin interfaceCollection
-
-