My idea for this issue is that the Lexicon will have configurable key types and value types. For unigram, the key type will always be String. The value type will always be a subclass of LexiconEntry. LexiconEntry will implement two interfaces - TermStatistics (possibly extended by FieldStatistics), and BitIndexPointer, which represents the byte/bit offset and the number of pointers for that term (same as Nt usually).
The 'name' of the structure in the index will always be 'lexicon' for unigram. However, it can be changed in the case that the lexicon is being used for another purpose. E.g. i would suggest '2lexicon' for 2-gram lexicon.
Stay tuned, more details soon.
The issue here is if we want to store all or most of the information in a unique file or not. Lexicon contains information about how to find information about the elements of our algebra/probability space. For example exact matching or k-grams may require a dedicate structure and a dedicate lexicon.
So, suppose that all related issues about unary lexicons have been solved, so that we have all desiderata all fields, field counter, intelligent merger etc. In principle we have implicitly all information about the collection where a token is and under the scope of what label/tag occurrence. Now, we want to store more information. What is this new information?