Class Dictionary
- java.lang.Object
-
- org.apache.lucene.analysis.hunspell.Dictionary
-
public class Dictionary extends java.lang.ObjectIn-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static classDictionary.BreaksPossible word breaks according to BREAK directivesprivate static classDictionary.DefaultAsUtf8FlagParsingStrategyUsed to read flags as UTF-8 even if the rest of the file is in the default (8-bit) encodingprivate static classDictionary.DoubleASCIIFlagParsingStrategyImplementation ofDictionary.FlagParsingStrategythat assumes each flag is encoded as two ASCII characters whose codes must be combined into a single character.(package private) static classDictionary.FlagParsingStrategyAbstraction of the process of parsing flags taken from the affix and dic filesprivate static classDictionary.NumFlagParsingStrategyImplementation ofDictionary.FlagParsingStrategythat assumes each flag is encoded in its numerical form.private static classDictionary.SimpleFlagParsingStrategySimple implementation ofDictionary.FlagParsingStrategythat treats the chars in each String as a individual flags.
-
Field Summary
Fields Modifier and Type Field Description (package private) static intAFFIX_APPENDprivate static intAFFIX_CONDITION(package private) static intAFFIX_FLAG(package private) static intAFFIX_STRIP_ORD(package private) char[]affixDataprivate intaliasCountprivate java.lang.String[]aliasesprivate booleanalternateCasingprivate static byte[]BOM_UTF8(package private) Dictionary.Breaksbreaks(package private) static java.util.Map<java.lang.String,java.lang.String>CHARSET_ALIASES(package private) booleancheckCompoundCase(package private) booleancheckCompoundDup(package private) java.util.List<CheckCompoundPattern>checkCompoundPatterns(package private) booleancheckCompoundRep(package private) booleancheckCompoundTriple(package private) booleancheckSharpS(package private) charcircumfix(package private) booleancomplexPrefixes(package private) charcompoundBegin(package private) charcompoundEnd(package private) charcompoundFlag(package private) charcompoundForbid(package private) intcompoundMax(package private) charcompoundMiddle(package private) intcompoundMin(package private) charcompoundPermit(package private) java.util.List<CompoundRule>compoundRulesprivate intcurrentAffix(package private) java.nio.charset.CharsetDecoderdecoder(package private) static java.nio.charset.CharsetDEFAULT_CHARSETprivate static intDEFAULT_FLAGS(package private) booleanenableSplitSuggestionsprivate static charFLAG_SEPARATOR(package private) static charFLAG_UNSET(package private) FlagEnumerator.LookupflagLookupThe list of unique flagsets (wordforms).(package private) Dictionary.FlagParsingStrategyflagParsingStrategy(package private) charforbiddenword(package private) charforceUCase(package private) booleanfullStrip(package private) booleanhasCustomMorphDatawe set this during sorting, so we know to add an extra int (index inmorphData) to FST output(package private) static charHIDDEN_FLAG(package private) ConvTableiconvprivate char[]ignore(package private) booleanignoreCase(package private) charkeepcase(package private) java.lang.Stringlanguage(package private) java.util.List<java.util.List<java.lang.String>>mapTable(package private) static intMAX_PROLOGUE_SCAN_WINDOW(package private) intmaxDiff(package private) intmaxNGramSuggestionsprivate static charMORPH_SEPARATORprivate intmorphAliasCountprivate java.lang.String[]morphAliases(package private) java.util.List<java.lang.String>morphData(package private) charneedaffix(package private) java.lang.String[]neighborKeyGroups(package private) static char[]NOFLAGS(package private) charnoSuggest(package private) ConvTableoconv(package private) charonlyincompound(package private) booleanonlyMaxDiff(package private) java.util.ArrayList<AffixCondition>patternsAll condition checks used by prefixes and suffixes.(package private) FST<IntsRef>prefixes(package private) java.util.List<RepEntry>repTableprivate char[]secondStagePrefixFlagsAll flags used in affix continuation classes.private char[]secondStageSuffixFlagsAll flags used in affix continuation classes.(package private) booleansimplifiedTriple(package private) char[]stripData(package private) int[]stripOffsets(package private) charsubStandard(package private) FST<IntsRef>suffixes(package private) java.lang.StringtryChars(package private) java.lang.StringwordChars(package private) WordStoragewordsThe entries in the .dic file, mapping to their set of flags
-
Constructor Summary
Constructors Constructor Description Dictionary(Directory tempDir, java.lang.String tempFileNamePrefix, java.io.InputStream affix, java.io.InputStream dictionary)Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files.Dictionary(Directory tempDir, java.lang.String tempFileNamePrefix, java.io.InputStream affix, java.util.List<java.io.InputStream> dictionaries, boolean ignoreCase)Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidaddHiddenCapitalizedWord(java.lang.StringBuilder reuse, OfflineSorter.ByteSequencesWriter writer, java.lang.String word, java.lang.String afterSep)private intaddMorphFields(java.util.Map<java.lang.String,java.lang.Integer> indices, java.lang.String morphFields)private voidaddPhoneticRepEntries(java.lang.String word, java.lang.String ph)(package private) charaffixData(int affixIndex, int offset)private FST<IntsRef>affixFST(java.util.TreeMap<java.lang.String,java.util.List<java.lang.Integer>> affixes)(package private) charcaseFold(char c)folds single character (according to LANG if present)private voidcheckCriticalDirectiveSame(java.lang.String directive, java.io.LineNumberReader reader, java.lang.Object expected, java.lang.Object actual)(package private) java.lang.CharSequencecleanInput(java.lang.CharSequence input, java.lang.StringBuilder reuse)(package private) static java.lang.StringextractLanguageCode(java.lang.String isoCode)private java.lang.StringfirstArgument(java.io.LineNumberReader reader, java.lang.String line)(package private) intformStep()(package private) intgetAffixCondition(int affix)private java.lang.StringgetAliasValue(int id)private java.nio.charset.CharsetDecodergetDecoder(java.lang.String encoding)Retrieves the CharsetDecoder for the given encoding.(package private) static java.nio.file.PathgetDefaultTempDir()Returns the default temporary directory pointed to byjava.io.tmpdir.(package private) static Dictionary.FlagParsingStrategygetFlagParsingStrategy(java.lang.String flagLine, java.nio.charset.Charset charset)Determines the appropriateDictionary.FlagParsingStrategybased on the FLAG definition line taken from the affix filebooleangetIgnoreCase()Returns true if this dictionary was constructed with theignoreCaseoption(package private) booleanhasFlag(int entryId, char flag)(package private) booleanhasFlag(IntsRef forms, char flag)(package private) booleanhasLanguage(java.lang.String... langCodes)(package private) static intindexOfSpaceOrTab(java.lang.String text, int start)(package private) booleanisCrossProduct(int affix)(package private) booleanisDotICaseChangeDisallowed(char[] word)(package private) booleanisSecondStagePrefix(char flag)(package private) booleanisSecondStageSuffix(char flag)private IntsReflookup(FST<IntsRef> fst, char[] word)DictEntrieslookupEntries(java.lang.String root)(package private) IntsReflookupPrefix(char[] word)(package private) IntsReflookupSuffix(char[] word)(package private) IntsReflookupWord(char[] word, int offset, int length)Looks up Hunspell word forms from the dictionaryprivate static booleanmaybeConsume(java.io.BufferedInputStream stream, byte[] bytes)Consume the provided byte sequence in full, if present.(package private) booleanmayNeedInputCleaning()private intmergeDictionaries(java.util.List<java.io.InputStream> dictionaries, java.nio.charset.CharsetDecoder decoder, IndexOutput output)private static intmorphBoundary(java.lang.String line)(package private) booleanneedsInputCleaning(java.lang.CharSequence input)(package private) static IntsRefnextArc(FST<IntsRef> fst, FST.Arc<IntsRef> arc, FST.BytesReader reader, IntsRef output, int ch)private voidparseAffix(java.util.TreeMap<java.lang.String,java.util.List<java.lang.Integer>> affixes, java.util.Set<java.lang.Character> secondStageFlags, java.lang.String header, java.io.LineNumberReader reader, AffixKind kind, java.util.Map<java.lang.String,java.lang.Integer> seenPatterns, java.util.Map<java.lang.String,java.lang.Integer> seenStrips, FlagEnumerator flags)Parses a specific affix rule putting the result into the provided affix mapprivate voidparseAlias(java.lang.String line)private Dictionary.BreaksparseBreaks(java.io.LineNumberReader reader, java.lang.String line)private java.util.List<CompoundRule>parseCompoundRules(java.io.LineNumberReader reader, int num)private ConvTableparseConversions(java.io.LineNumberReader reader, int num)private java.util.List<java.lang.String>parseMapEntry(java.io.LineNumberReader reader, java.lang.String line)private voidparseMorphAlias(java.lang.String line)private intparseNum(java.io.LineNumberReader reader, java.lang.String line)private voidreadAffixFile(java.io.InputStream affixStream, java.nio.charset.CharsetDecoder decoder, FlagEnumerator flags)Reads the affix file through the provided InputStream, building up the prefix and suffix mapsprivate voidreadConfig(java.io.InputStream stream, java.nio.charset.Charset streamCharset)Parses the encoding and flag format specified in the provided InputStreamprivate java.util.List<java.lang.String>readMorphFields(java.lang.String word, java.lang.String unparsed)private WordStoragereadSortedDictionaries(Directory tempDir, java.lang.String sorted, FlagEnumerator flags, int wordCount)private static java.nio.charset.CharsetDecoderreplacingDecoder(java.nio.charset.Charset charset)private static booleanshouldSkipEscapedChar(char ch)private java.lang.StringsingleArgument(java.io.LineNumberReader reader, java.lang.String line)private java.lang.StringsortWordsOffline(Directory tempDir, java.lang.String tempFileNamePrefix, IndexOutput unsorted)private java.lang.String[]splitBySpace(java.io.LineNumberReader reader, java.lang.String line, int expectedParts)private java.lang.String[]splitBySpace(java.io.LineNumberReader reader, java.lang.String line, int minParts, int maxParts)private java.util.List<java.lang.String>splitMorphData(java.lang.String morphData)(package private) java.lang.StringtoLowerCase(java.lang.String word)(package private) static char[]toSortedCharArray(java.util.Set<java.lang.Character> set)(package private) java.lang.StringtoTitleCase(java.lang.String word)private java.lang.StringunescapeEntry(java.lang.String entry)private intwriteNormalizedWordEntry(java.lang.StringBuilder reuse, OfflineSorter.ByteSequencesWriter writer, java.lang.String line)
-
-
-
Field Detail
-
MAX_PROLOGUE_SCAN_WINDOW
static final int MAX_PROLOGUE_SCAN_WINDOW
- See Also:
- Constant Field Values
-
NOFLAGS
static final char[] NOFLAGS
-
FLAG_UNSET
static final char FLAG_UNSET
- See Also:
- Constant Field Values
-
DEFAULT_FLAGS
private static final int DEFAULT_FLAGS
- See Also:
- Constant Field Values
-
HIDDEN_FLAG
static final char HIDDEN_FLAG
- See Also:
- Constant Field Values
-
DEFAULT_CHARSET
static final java.nio.charset.Charset DEFAULT_CHARSET
-
decoder
java.nio.charset.CharsetDecoder decoder
-
breaks
Dictionary.Breaks breaks
-
patterns
java.util.ArrayList<AffixCondition> patterns
All condition checks used by prefixes and suffixes. these are typically re-used across many affix stripping rules. so these are deduplicated, to save RAM.
-
words
WordStorage words
The entries in the .dic file, mapping to their set of flags
-
flagLookup
final FlagEnumerator.Lookup flagLookup
The list of unique flagsets (wordforms). theoretically huge, but practically small (for Polish this is 756), otherwise humans wouldn't be able to deal with it either.
-
stripData
char[] stripData
-
stripOffsets
int[] stripOffsets
-
wordChars
java.lang.String wordChars
-
affixData
char[] affixData
-
currentAffix
private int currentAffix
-
AFFIX_FLAG
static final int AFFIX_FLAG
- See Also:
- Constant Field Values
-
AFFIX_STRIP_ORD
static final int AFFIX_STRIP_ORD
- See Also:
- Constant Field Values
-
AFFIX_CONDITION
private static final int AFFIX_CONDITION
- See Also:
- Constant Field Values
-
AFFIX_APPEND
static final int AFFIX_APPEND
- See Also:
- Constant Field Values
-
flagParsingStrategy
Dictionary.FlagParsingStrategy flagParsingStrategy
-
aliases
private java.lang.String[] aliases
-
aliasCount
private int aliasCount
-
morphAliases
private java.lang.String[] morphAliases
-
morphAliasCount
private int morphAliasCount
-
morphData
final java.util.List<java.lang.String> morphData
-
hasCustomMorphData
boolean hasCustomMorphData
we set this during sorting, so we know to add an extra int (index inmorphData) to FST output
-
ignoreCase
boolean ignoreCase
-
checkSharpS
boolean checkSharpS
-
complexPrefixes
boolean complexPrefixes
-
secondStagePrefixFlags
private char[] secondStagePrefixFlags
All flags used in affix continuation classes. If an outer affix's flag isn't here, there's no need to do 2-level affix stripping with it.
-
secondStageSuffixFlags
private char[] secondStageSuffixFlags
All flags used in affix continuation classes. If an outer affix's flag isn't here, there's no need to do 2-level affix stripping with it.
-
circumfix
char circumfix
-
keepcase
char keepcase
-
forceUCase
char forceUCase
-
needaffix
char needaffix
-
forbiddenword
char forbiddenword
-
onlyincompound
char onlyincompound
-
compoundBegin
char compoundBegin
-
compoundMiddle
char compoundMiddle
-
compoundEnd
char compoundEnd
-
compoundFlag
char compoundFlag
-
compoundPermit
char compoundPermit
-
compoundForbid
char compoundForbid
-
checkCompoundCase
boolean checkCompoundCase
-
checkCompoundDup
boolean checkCompoundDup
-
checkCompoundRep
boolean checkCompoundRep
-
checkCompoundTriple
boolean checkCompoundTriple
-
simplifiedTriple
boolean simplifiedTriple
-
compoundMin
int compoundMin
-
compoundMax
int compoundMax
-
compoundRules
java.util.List<CompoundRule> compoundRules
-
checkCompoundPatterns
java.util.List<CheckCompoundPattern> checkCompoundPatterns
-
ignore
private char[] ignore
-
tryChars
java.lang.String tryChars
-
neighborKeyGroups
java.lang.String[] neighborKeyGroups
-
enableSplitSuggestions
boolean enableSplitSuggestions
-
repTable
java.util.List<RepEntry> repTable
-
mapTable
java.util.List<java.util.List<java.lang.String>> mapTable
-
maxDiff
int maxDiff
-
maxNGramSuggestions
int maxNGramSuggestions
-
onlyMaxDiff
boolean onlyMaxDiff
-
noSuggest
char noSuggest
-
subStandard
char subStandard
-
iconv
ConvTable iconv
-
oconv
ConvTable oconv
-
fullStrip
boolean fullStrip
-
language
java.lang.String language
-
alternateCasing
private boolean alternateCasing
-
BOM_UTF8
private static final byte[] BOM_UTF8
-
CHARSET_ALIASES
static final java.util.Map<java.lang.String,java.lang.String> CHARSET_ALIASES
-
FLAG_SEPARATOR
private static final char FLAG_SEPARATOR
- See Also:
- Constant Field Values
-
MORPH_SEPARATOR
private static final char MORPH_SEPARATOR
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
Dictionary
public Dictionary(Directory tempDir, java.lang.String tempFileNamePrefix, java.io.InputStream affix, java.io.InputStream dictionary) throws java.io.IOException, java.text.ParseException
Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files. You have to close the provided InputStreams yourself.- Parameters:
tempDir- Directory to use for offline sortingtempFileNamePrefix- prefix to use to generate temp file namesaffix- InputStream for reading the hunspell affix file (won't be closed).dictionary- InputStream for reading the hunspell dictionary file (won't be closed).- Throws:
java.io.IOException- Can be thrown while reading from the InputStreamsjava.text.ParseException- Can be thrown if the content of the files does not meet expected formats
-
Dictionary
public Dictionary(Directory tempDir, java.lang.String tempFileNamePrefix, java.io.InputStream affix, java.util.List<java.io.InputStream> dictionaries, boolean ignoreCase) throws java.io.IOException, java.text.ParseException
Creates a new Dictionary containing the information read from the provided InputStreams to hunspell affix and dictionary files. You have to close the provided InputStreams yourself.- Parameters:
tempDir- Directory to use for offline sortingtempFileNamePrefix- prefix to use to generate temp file namesaffix- InputStream for reading the hunspell affix file (won't be closed).dictionaries- InputStream for reading the hunspell dictionary files (won't be closed).- Throws:
java.io.IOException- Can be thrown while reading from the InputStreamsjava.text.ParseException- Can be thrown if the content of the files does not meet expected formats
-
-
Method Detail
-
formStep
int formStep()
-
lookupWord
IntsRef lookupWord(char[] word, int offset, int length)
Looks up Hunspell word forms from the dictionary
-
lookupPrefix
IntsRef lookupPrefix(char[] word)
-
lookupSuffix
IntsRef lookupSuffix(char[] word)
-
nextArc
static IntsRef nextArc(FST<IntsRef> fst, FST.Arc<IntsRef> arc, FST.BytesReader reader, IntsRef output, int ch)
-
readAffixFile
private void readAffixFile(java.io.InputStream affixStream, java.nio.charset.CharsetDecoder decoder, FlagEnumerator flags) throws java.io.IOException, java.text.ParseExceptionReads the affix file through the provided InputStream, building up the prefix and suffix maps- Parameters:
affixStream- InputStream to read the content of the affix file fromdecoder- CharsetDecoder to decode the content of the file- Throws:
java.io.IOException- Can be thrown while reading from the InputStreamjava.text.ParseException
-
checkCriticalDirectiveSame
private void checkCriticalDirectiveSame(java.lang.String directive, java.io.LineNumberReader reader, java.lang.Object expected, java.lang.Object actual) throws java.text.ParseException- Throws:
java.text.ParseException
-
parseMapEntry
private java.util.List<java.lang.String> parseMapEntry(java.io.LineNumberReader reader, java.lang.String line) throws java.text.ParseException- Throws:
java.text.ParseException
-
hasLanguage
boolean hasLanguage(java.lang.String... langCodes)
-
lookupEntries
public DictEntries lookupEntries(java.lang.String root)
- Parameters:
root- a string to look up in the dictionary. No case conversion or affix removal is performed. To get the possible roots of any word, you may callHunspell.getRoots(String)- Returns:
- the dictionary entries for the given root, or
nullif there's none
-
extractLanguageCode
static java.lang.String extractLanguageCode(java.lang.String isoCode)
-
parseNum
private int parseNum(java.io.LineNumberReader reader, java.lang.String line) throws java.text.ParseException- Throws:
java.text.ParseException
-
singleArgument
private java.lang.String singleArgument(java.io.LineNumberReader reader, java.lang.String line) throws java.text.ParseException- Throws:
java.text.ParseException
-
firstArgument
private java.lang.String firstArgument(java.io.LineNumberReader reader, java.lang.String line) throws java.text.ParseException- Throws:
java.text.ParseException
-
splitBySpace
private java.lang.String[] splitBySpace(java.io.LineNumberReader reader, java.lang.String line, int expectedParts) throws java.text.ParseException- Throws:
java.text.ParseException
-
splitBySpace
private java.lang.String[] splitBySpace(java.io.LineNumberReader reader, java.lang.String line, int minParts, int maxParts) throws java.text.ParseException- Throws:
java.text.ParseException
-
parseCompoundRules
private java.util.List<CompoundRule> parseCompoundRules(java.io.LineNumberReader reader, int num) throws java.io.IOException, java.text.ParseException
- Throws:
java.io.IOExceptionjava.text.ParseException
-
parseBreaks
private Dictionary.Breaks parseBreaks(java.io.LineNumberReader reader, java.lang.String line) throws java.io.IOException, java.text.ParseException
- Throws:
java.io.IOExceptionjava.text.ParseException
-
affixFST
private FST<IntsRef> affixFST(java.util.TreeMap<java.lang.String,java.util.List<java.lang.Integer>> affixes) throws java.io.IOException
- Throws:
java.io.IOException
-
parseAffix
private void parseAffix(java.util.TreeMap<java.lang.String,java.util.List<java.lang.Integer>> affixes, java.util.Set<java.lang.Character> secondStageFlags, java.lang.String header, java.io.LineNumberReader reader, AffixKind kind, java.util.Map<java.lang.String,java.lang.Integer> seenPatterns, java.util.Map<java.lang.String,java.lang.Integer> seenStrips, FlagEnumerator flags) throws java.io.IOException, java.text.ParseExceptionParses a specific affix rule putting the result into the provided affix map- Parameters:
affixes- Map where the result of the parsing will be putheader- Header line of the affix rulereader- BufferedReader to read the content of the rule fromseenPatterns- map from condition -> index of patterns, for deduplication.- Throws:
java.io.IOException- Can be thrown while reading the rulejava.text.ParseException
-
affixData
char affixData(int affixIndex, int offset)
-
isCrossProduct
boolean isCrossProduct(int affix)
-
getAffixCondition
int getAffixCondition(int affix)
-
parseConversions
private ConvTable parseConversions(java.io.LineNumberReader reader, int num) throws java.io.IOException, java.text.ParseException
- Throws:
java.io.IOExceptionjava.text.ParseException
-
readConfig
private void readConfig(java.io.InputStream stream, java.nio.charset.Charset streamCharset) throws java.io.IOException, java.text.ParseExceptionParses the encoding and flag format specified in the provided InputStream- Throws:
java.io.IOExceptionjava.text.ParseException
-
maybeConsume
private static boolean maybeConsume(java.io.BufferedInputStream stream, byte[] bytes) throws java.io.IOExceptionConsume the provided byte sequence in full, if present. Otherwise leave the input stream intact.- Returns:
trueif the sequence matched and has been consumed.- Throws:
java.io.IOException
-
getDecoder
private java.nio.charset.CharsetDecoder getDecoder(java.lang.String encoding)
Retrieves the CharsetDecoder for the given encoding. Note, This isn't perfect as I think ISCII-DEVANAGARI and MICROSOFT-CP1251 etc are allowed...- Parameters:
encoding- Encoding to retrieve the CharsetDecoder for- Returns:
- CharSetDecoder for the given encoding
-
replacingDecoder
private static java.nio.charset.CharsetDecoder replacingDecoder(java.nio.charset.Charset charset)
-
getFlagParsingStrategy
static Dictionary.FlagParsingStrategy getFlagParsingStrategy(java.lang.String flagLine, java.nio.charset.Charset charset)
Determines the appropriateDictionary.FlagParsingStrategybased on the FLAG definition line taken from the affix file- Parameters:
flagLine- Line containing the flag information- Returns:
- FlagParsingStrategy that handles parsing flags in the way specified in the FLAG definition
-
unescapeEntry
private java.lang.String unescapeEntry(java.lang.String entry)
-
shouldSkipEscapedChar
private static boolean shouldSkipEscapedChar(char ch)
-
morphBoundary
private static int morphBoundary(java.lang.String line)
-
indexOfSpaceOrTab
static int indexOfSpaceOrTab(java.lang.String text, int start)
-
mergeDictionaries
private int mergeDictionaries(java.util.List<java.io.InputStream> dictionaries, java.nio.charset.CharsetDecoder decoder, IndexOutput output) throws java.io.IOException- Throws:
java.io.IOException
-
writeNormalizedWordEntry
private int writeNormalizedWordEntry(java.lang.StringBuilder reuse, OfflineSorter.ByteSequencesWriter writer, java.lang.String line) throws java.io.IOException- Returns:
- the number of word entries written
- Throws:
java.io.IOException
-
addHiddenCapitalizedWord
private void addHiddenCapitalizedWord(java.lang.StringBuilder reuse, OfflineSorter.ByteSequencesWriter writer, java.lang.String word, java.lang.String afterSep) throws java.io.IOException- Throws:
java.io.IOException
-
toLowerCase
java.lang.String toLowerCase(java.lang.String word)
-
toTitleCase
java.lang.String toTitleCase(java.lang.String word)
-
sortWordsOffline
private java.lang.String sortWordsOffline(Directory tempDir, java.lang.String tempFileNamePrefix, IndexOutput unsorted) throws java.io.IOException
- Throws:
java.io.IOException
-
readSortedDictionaries
private WordStorage readSortedDictionaries(Directory tempDir, java.lang.String sorted, FlagEnumerator flags, int wordCount) throws java.io.IOException
- Throws:
java.io.IOException
-
readMorphFields
private java.util.List<java.lang.String> readMorphFields(java.lang.String word, java.lang.String unparsed)
-
addMorphFields
private int addMorphFields(java.util.Map<java.lang.String,java.lang.Integer> indices, java.lang.String morphFields)
-
addPhoneticRepEntries
private void addPhoneticRepEntries(java.lang.String word, java.lang.String ph)
-
isDotICaseChangeDisallowed
boolean isDotICaseChangeDisallowed(char[] word)
-
parseAlias
private void parseAlias(java.lang.String line)
-
getAliasValue
private java.lang.String getAliasValue(int id)
-
parseMorphAlias
private void parseMorphAlias(java.lang.String line)
-
splitMorphData
private java.util.List<java.lang.String> splitMorphData(java.lang.String morphData)
-
hasFlag
boolean hasFlag(IntsRef forms, char flag)
-
hasFlag
boolean hasFlag(int entryId, char flag)
-
mayNeedInputCleaning
boolean mayNeedInputCleaning()
-
needsInputCleaning
boolean needsInputCleaning(java.lang.CharSequence input)
-
cleanInput
java.lang.CharSequence cleanInput(java.lang.CharSequence input, java.lang.StringBuilder reuse)
-
toSortedCharArray
static char[] toSortedCharArray(java.util.Set<java.lang.Character> set)
-
isSecondStagePrefix
boolean isSecondStagePrefix(char flag)
-
isSecondStageSuffix
boolean isSecondStageSuffix(char flag)
-
caseFold
char caseFold(char c)
folds single character (according to LANG if present)
-
getIgnoreCase
public boolean getIgnoreCase()
Returns true if this dictionary was constructed with theignoreCaseoption
-
getDefaultTempDir
static java.nio.file.Path getDefaultTempDir() throws java.io.IOExceptionReturns the default temporary directory pointed to byjava.io.tmpdir. If not accessible or not available, an IOException is thrown.- Throws:
java.io.IOException
-
-