Package org.terrier.terms
Class PorterStemmer
- java.lang.Object
-
- org.terrier.terms.StemmerTermPipeline
-
- org.terrier.terms.PorterStemmer
-
- All Implemented Interfaces:
Stemmer,TermPipeline
- Direct Known Subclasses:
WeakPorterStemmer
public class PorterStemmer extends StemmerTermPipeline
Stemmer, implementing the Porter Stemming Algorithm. By Martin Porter. The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.- Since:
- 3.0
-
-
Constructor Summary
Constructors Constructor Description PorterStemmer()constructorPorterStemmer(TermPipeline next)Constructs an instance of PorterStemmer.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(char ch)Add a character to the word being stemmed.voidadd(char[] w, int wLen)Adds wLen characters to the word being stemmed contained in a portion of a char[] array.protected booleancons(int _i)protected booleancvc(int _i)protected booleandoublec(int _j)protected booleanends(java.lang.String s)char[]getResultBuffer()Returns a reference to a character buffer containing the results of the stemming process.intgetResultLength()Returns the length of the word resulting from the stemming process.protected intm()static voidmain(java.lang.String[] args)Test program for demonstrating the Stemmer.protected voidr(java.lang.String s)protected voidsetto(java.lang.String s)voidstem()Stem the word placed into the Stemmer buffer through calls to add().java.lang.Stringstem(java.lang.String s)Returns the stem of a given termprotected voidstep1()protected voidstep2()protected voidstep3()protected voidstep4()protected voidstep5()protected voidstep6()java.lang.StringtoString()After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)protected booleanvowelinstem()-
Methods inherited from class org.terrier.terms.StemmerTermPipeline
processTerm, reset
-
-
-
-
Field Detail
-
b
protected char[] b
-
i
protected int i
-
i_end
protected int i_end
-
j
protected int j
-
k
protected int k
-
INC
protected static final int INC
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PorterStemmer
public PorterStemmer()
constructor
-
PorterStemmer
public PorterStemmer(TermPipeline next)
Constructs an instance of PorterStemmer.- Parameters:
next-
-
-
Method Detail
-
add
public void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.
-
add
public void add(char[] w, int wLen)Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.
-
toString
public java.lang.String toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)- Overrides:
toStringin classjava.lang.Object
-
getResultLength
public int getResultLength()
Returns the length of the word resulting from the stemming process.
-
getResultBuffer
public char[] getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.
-
cons
protected final boolean cons(int _i)
-
m
protected final int m()
-
vowelinstem
protected final boolean vowelinstem()
-
doublec
protected final boolean doublec(int _j)
-
cvc
protected final boolean cvc(int _i)
-
ends
protected final boolean ends(java.lang.String s)
-
setto
protected final void setto(java.lang.String s)
-
r
protected final void r(java.lang.String s)
-
step1
protected final void step1()
-
step2
protected final void step2()
-
step3
protected final void step3()
-
step4
protected final void step4()
-
step5
protected final void step5()
-
step6
protected final void step6()
-
stem
public void stem()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().
-
main
public static void main(java.lang.String[] args)
Test program for demonstrating the Stemmer. It reads text from a a list of files, stems each word, and writes the result to standard output. Note that the word stemmed is expected to be in lower case: forcing lower case must be done outside the Stemmer class. Usage: Stemmer file-name file-name ...
-
stem
public java.lang.String stem(java.lang.String s)
Returns the stem of a given term- Parameters:
s- String the term to be stemmed.- Returns:
- String the stem of a given term.
-
-