java.lang.Object
org.apache.lucene.analysis.morph.Viterbi<Token,Viterbi.Position>
org.apache.lucene.analysis.ko.Viterbi
Viterbi subclass for Korean morphological analysis.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.morph.Viterbi
Viterbi.Position, Viterbi.WrappedPositionArray<U extends Viterbi.Position> -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final CharacterDefinitionprivate final EnumMap<TokenType, Dictionary<? extends KoMorphData>> private final booleanprivate GraphvizFormatter<KoMorphData> private final KoreanTokenizer.DecompoundModeprivate final booleanprivate final UnknownDictionaryFields inherited from class org.apache.lucene.analysis.morph.Viterbi
buffer, costs, enableSpacePenaltyFactor, end, lastBackTracePos, MAX_UNKNOWN_WORD_LENGTH, outputLongestUserEntryOnly, outputNBest, pending, pos, positions, VERBOSE, wordIdRef -
Constructor Summary
ConstructorsConstructorDescriptionViterbi(TokenInfoFST fst, FST.BytesReader fstReader, TokenInfoDictionary dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, UserDictionary userDictionary, ConnectionCosts costs, UnknownDictionary unkDictionary, CharacterDefinition characterDefinition, boolean discardPunctuation, KoreanTokenizer.DecompoundMode mode, boolean outputUnknownUnigrams) -
Method Summary
Modifier and TypeMethodDescriptionprotected voidbacktrace(Viterbi.Position endPosData, int fromIDX) Backtrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list.protected intcomputeSpacePenalty(MorphData morphData, int wordID, int numSpaces) Returns the space penalty associated with the providedPOS.Tag.(package private) Dictionary<? extends KoMorphData> private static booleanprivate static booleanisPunctuation(char ch) private static booleanisPunctuation(char ch, int cid) private static booleanisSameScript(Character.UnicodeScript scriptOne, Character.UnicodeScript scriptTwo) Determine if two scripts are compatible.protected intprocessUnknownWord(boolean anyMatches, Viterbi.Position posData) Add unknown words to the position graph.(package private) voidprivate booleanshouldFilterToken(Token token) Methods inherited from class org.apache.lucene.analysis.morph.Viterbi
add, backtraceNBest, computePenalty, fixupPendingList, forward, getPending, getPos, isEnd, isOutputNBest, resetBuffer, resetState, shouldSkipProcessUnknownWord
-
Field Details
-
dictionaryMap
-
unkDictionary
-
characterDefinition
-
discardPunctuation
private final boolean discardPunctuation -
mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams -
dotOut
-
-
Constructor Details
-
Viterbi
Viterbi(TokenInfoFST fst, FST.BytesReader fstReader, TokenInfoDictionary dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, UserDictionary userDictionary, ConnectionCosts costs, UnknownDictionary unkDictionary, CharacterDefinition characterDefinition, boolean discardPunctuation, KoreanTokenizer.DecompoundMode mode, boolean outputUnknownUnigrams)
-
-
Method Details
-
processUnknownWord
Description copied from class:ViterbiAdd unknown words to the position graph.- Specified by:
processUnknownWordin classViterbi<Token,Viterbi.Position> - Returns:
- word length
- Throws:
IOException
-
setGraphvizFormatter
-
backtrace
Description copied from class:ViterbiBacktrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list. The pending list is then in-reverse (last token should be returned first).- Specified by:
backtracein classViterbi<Token,Viterbi.Position>
-
computeSpacePenalty
Returns the space penalty associated with the providedPOS.Tag.- Overrides:
computeSpacePenaltyin classViterbi<Token,Viterbi.Position>
-
getDict
-
shouldFilterToken
-
isPunctuation
private static boolean isPunctuation(char ch) -
isPunctuation
private static boolean isPunctuation(char ch, int cid) -
isCommonOrInherited
-
isSameScript
private static boolean isSameScript(Character.UnicodeScript scriptOne, Character.UnicodeScript scriptTwo) Determine if two scripts are compatible.
-