Class WARC10Collection

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, Collection

    public class WARC10Collection
    extends WARC018Collection
    This object is used to parse WARC format web crawls, version 0.10. Uses properties from WARC018Collection.
    Author:
    Craig Macdonald
    • Constructor Detail

      • WARC10Collection

        public WARC10Collection()
      • WARC10Collection

        public WARC10Collection​(java.io.InputStream input)
      • WARC10Collection

        public WARC10Collection​(java.lang.String CollectionSpecFilename)
      • WARC10Collection

        public WARC10Collection​(java.util.List<java.lang.String> files,
                                java.lang.String TagSet,
                                java.lang.String BlacklistSpecFilename,
                                java.lang.String ignored)
      • WARC10Collection

        public WARC10Collection​(java.util.List<java.lang.String> files)
      • WARC10Collection

        public WARC10Collection​(java.lang.String CollectionSpecFilename,
                                java.lang.String TagSet,
                                java.lang.String BlacklistSpecFilename,
                                java.lang.String ignored)
    • Method Detail

      • processRedirect

        protected void processRedirect​(java.lang.String source,
                                       java.lang.String target)
      • nextDocument

        public boolean nextDocument()
        Move the collection to the start of the next document.
        Specified by:
        nextDocument in interface Collection
        Overrides:
        nextDocument in class WARC018Collection
        Returns:
        boolean true if there exists another document in the collection, otherwise it returns false.