Package org.apache.lucene.analysis.ko
Class KoreanTokenizerFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenizerFactory
-
- org.apache.lucene.analysis.ko.KoreanTokenizerFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
public class KoreanTokenizerFactory extends TokenizerFactory implements ResourceLoaderAware
Factory forKoreanTokenizer.<fieldType name="text_ko" class="solr.TextField"> <analyzer> <tokenizer class="solr.KoreanTokenizerFactory" decompoundMode="discard" userDictionary="user.txt" userDictionaryEncoding="UTF-8" outputUnknownUnigrams="false" discardPunctuation="true" /> </analyzer> </fieldType>Supports the following attributes:
- userDictionary: User dictionary path.
- userDictionaryEncoding: User dictionary encoding.
- decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See
KoreanTokenizer.DecompoundMode - outputUnknownUnigrams: If true outputs unigrams for unknown words.
- discardPunctuation: true if punctuation tokens should be dropped from the output.
- Since:
- 7.4.0
-
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.StringDECOMPOUND_MODEprivate static java.lang.StringDISCARD_PUNCTUATIONprivate booleandiscardPunctuationprivate KoreanTokenizer.DecompoundModemodestatic java.lang.StringNAMESPI nameprivate static java.lang.StringOUTPUT_UNKNOWN_UNIGRAMSprivate booleanoutputUnknownUnigramsprivate static java.lang.StringUSER_DICT_ENCODINGprivate static java.lang.StringUSER_DICT_PATHprivate UserDictionaryuserDictionaryprivate java.lang.StringuserDictionaryEncodingprivate java.lang.StringuserDictionaryPath-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description KoreanTokenizerFactory()Default ctor for compatibility with SPIKoreanTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)Creates a new KoreanTokenizerFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description KoreanTokenizercreate(AttributeFactory factory)Creates a TokenStream of the specified input using the given AttributeFactoryvoidinform(ResourceLoader loader)Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).-
Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final java.lang.String NAME
SPI name- See Also:
- Constant Field Values
-
USER_DICT_PATH
private static final java.lang.String USER_DICT_PATH
- See Also:
- Constant Field Values
-
USER_DICT_ENCODING
private static final java.lang.String USER_DICT_ENCODING
- See Also:
- Constant Field Values
-
DECOMPOUND_MODE
private static final java.lang.String DECOMPOUND_MODE
- See Also:
- Constant Field Values
-
OUTPUT_UNKNOWN_UNIGRAMS
private static final java.lang.String OUTPUT_UNKNOWN_UNIGRAMS
- See Also:
- Constant Field Values
-
DISCARD_PUNCTUATION
private static final java.lang.String DISCARD_PUNCTUATION
- See Also:
- Constant Field Values
-
userDictionaryPath
private final java.lang.String userDictionaryPath
-
userDictionaryEncoding
private final java.lang.String userDictionaryEncoding
-
userDictionary
private UserDictionary userDictionary
-
mode
private final KoreanTokenizer.DecompoundMode mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams
-
discardPunctuation
private final boolean discardPunctuation
-
-
Method Detail
-
inform
public void inform(ResourceLoader loader) throws java.io.IOException
Description copied from interface:ResourceLoaderAwareInitializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
informin interfaceResourceLoaderAware- Throws:
java.io.IOException
-
create
public KoreanTokenizer create(AttributeFactory factory)
Description copied from class:TokenizerFactoryCreates a TokenStream of the specified input using the given AttributeFactory- Specified by:
createin classTokenizerFactory
-
-