Package org.apache.lucene.analysis.en
Class AbstractWordsFileFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
- Direct Known Subclasses:
CommonGramsFilterFactory,KeepWordFilterFactory,StopFilterFactory
public abstract class AbstractWordsFileFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Abstract parent class for analysis factories that accept a stopwords file as input.Concrete implementations can leverage the following input attributes. All attributes are optional:
ignoreCasedefaults tofalsewordsshould be the name of a stopwords file to parse, if not specified the factory will use the value provided bycreateDefaultWords()implementation in concrete subclass.formatdefines how thewordsfile will be parsed, and defaults towordset. Ifwordsis not specified, thenformatmust not be specified.
The valid values for the
formatoption are:wordset- This is the default format, which supports one word per line (including any intra-word whitespace) and allows whole line comments beginning with the "#" character. Blank lines are ignored. SeeWordlistLoader.getLinesfor details.snowball- This format allows for multiple words specified on each line, and trailing comments may be specified using the vertical line ("|"). Blank lines are ignored. SeeWordlistLoader.getSnowballWordSetfor details.
-
-
Field Summary
Fields Modifier and Type Field Description private java.lang.Stringformatstatic java.lang.StringFORMAT_SNOWBALLstatic java.lang.StringFORMAT_WORDSETprivate booleanignoreCaseprivate java.lang.StringwordFilesprivate CharArraySetwords-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedAbstractWordsFileFilterFactory()Default ctor for compatibility with SPIAbstractWordsFileFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)Initialize this factory via a set of key-value pairs.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract CharArraySetcreateDefaultWords()Default word set implementation.java.lang.StringgetFormat()java.lang.StringgetWordFiles()CharArraySetgetWords()voidinform(ResourceLoader loader)Initialize the set of stopwords provided via ResourceLoader, or using defaults.booleanisIgnoreCase()-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, create, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
FORMAT_WORDSET
public static final java.lang.String FORMAT_WORDSET
- See Also:
- Constant Field Values
-
FORMAT_SNOWBALL
public static final java.lang.String FORMAT_SNOWBALL
- See Also:
- Constant Field Values
-
words
private CharArraySet words
-
wordFiles
private final java.lang.String wordFiles
-
format
private final java.lang.String format
-
ignoreCase
private final boolean ignoreCase
-
-
Constructor Detail
-
AbstractWordsFileFilterFactory
protected AbstractWordsFileFilterFactory()
Default ctor for compatibility with SPI
-
AbstractWordsFileFilterFactory
public AbstractWordsFileFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)
Initialize this factory via a set of key-value pairs.
-
-
Method Detail
-
inform
public void inform(ResourceLoader loader) throws java.io.IOException
Initialize the set of stopwords provided via ResourceLoader, or using defaults.- Specified by:
informin interfaceResourceLoaderAware- Throws:
java.io.IOException
-
createDefaultWords
protected abstract CharArraySet createDefaultWords()
Default word set implementation.
-
getWords
public CharArraySet getWords()
-
getWordFiles
public java.lang.String getWordFiles()
-
getFormat
public java.lang.String getFormat()
-
isIgnoreCase
public boolean isIgnoreCase()
-
-