Class BinaryDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ja.dict.BinaryDictionary
-
- All Implemented Interfaces:
Dictionary
- Direct Known Subclasses:
TokenInfoDictionary,UnknownDictionary
public abstract class BinaryDictionary extends java.lang.Object implements Dictionary
Base class for a binary-encoded in-memory dictionary.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBinaryDictionary.ResourceSchemeDeprecated, for removal: This API element is subject to removal in a future version.
-
Field Summary
Fields Modifier and Type Field Description private java.nio.ByteBufferbufferstatic java.lang.StringDICT_FILENAME_SUFFIXstatic java.lang.StringDICT_HEADERstatic intHAS_BASEFORMflag that the entry has baseform data.static intHAS_PRONUNCIATIONflag that the entry has pronunciation data.static intHAS_READINGflag that the entry has reading data.private java.lang.String[]inflFormDictprivate java.lang.String[]inflTypeDictprivate java.lang.String[]posDictstatic java.lang.StringPOSDICT_FILENAME_SUFFIXstatic java.lang.StringPOSDICT_HEADERprivate int[]targetMapstatic java.lang.StringTARGETMAP_FILENAME_SUFFIXstatic java.lang.StringTARGETMAP_HEADERprivate int[]targetMapOffsetsstatic intVERSION-
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedBinaryDictionary(IOSupplier<java.io.InputStream> targetMapResource, IOSupplier<java.io.InputStream> posResource, IOSupplier<java.io.InputStream> dictResource)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description private static intbaseFormOffset(int wordId)java.lang.StringgetBaseForm(int wordId, char[] surfaceForm, int off, int len)Get base form of wordjava.lang.StringgetInflectionForm(int wordId)Get inflection form of tokensjava.lang.StringgetInflectionType(int wordId)Get inflection type of tokensintgetLeftId(int wordId)Get left id of specified wordjava.lang.StringgetPartOfSpeech(int wordId)Get Part-Of-Speech of tokensjava.lang.StringgetPronunciation(int wordId, char[] surface, int off, int len)Get pronunciation of tokensjava.lang.StringgetReading(int wordId, char[] surface, int off, int len)Get reading of tokensstatic java.io.InputStreamgetResource(BinaryDictionary.ResourceScheme scheme, java.lang.String path)Deprecated, for removal: This API element is subject to removal in a future version.intgetRightId(int wordId)Get right id of specified wordintgetWordCost(int wordId)Get word cost of specified wordprivate booleanhasBaseFormData(int wordId)private booleanhasPronunciationData(int wordId)private booleanhasReadingData(int wordId)voidlookupWordIds(int sourceId, IntsRef ref)private static voidpopulatePosDict(DataInput in, int posSize, java.lang.String[] posDict, java.lang.String[] inflTypeDict, java.lang.String[] inflFormDict)private static voidpopulateTargetMap(DataInput in, int[] targetMap, int[] targetMapOffsets)private intpronunciationOffset(int wordId)private intreadingOffset(int wordId)private java.lang.StringreadString(int offset, int length, boolean kana)
-
-
-
Field Detail
-
DICT_FILENAME_SUFFIX
public static final java.lang.String DICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
TARGETMAP_FILENAME_SUFFIX
public static final java.lang.String TARGETMAP_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
POSDICT_FILENAME_SUFFIX
public static final java.lang.String POSDICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
DICT_HEADER
public static final java.lang.String DICT_HEADER
- See Also:
- Constant Field Values
-
TARGETMAP_HEADER
public static final java.lang.String TARGETMAP_HEADER
- See Also:
- Constant Field Values
-
POSDICT_HEADER
public static final java.lang.String POSDICT_HEADER
- See Also:
- Constant Field Values
-
VERSION
public static final int VERSION
- See Also:
- Constant Field Values
-
buffer
private final java.nio.ByteBuffer buffer
-
targetMapOffsets
private final int[] targetMapOffsets
-
targetMap
private final int[] targetMap
-
posDict
private final java.lang.String[] posDict
-
inflTypeDict
private final java.lang.String[] inflTypeDict
-
inflFormDict
private final java.lang.String[] inflFormDict
-
HAS_BASEFORM
public static final int HAS_BASEFORM
flag that the entry has baseform data. otherwise it's not inflected (same as surface form)- See Also:
- Constant Field Values
-
HAS_READING
public static final int HAS_READING
flag that the entry has reading data. otherwise reading is surface form converted to katakana- See Also:
- Constant Field Values
-
HAS_PRONUNCIATION
public static final int HAS_PRONUNCIATION
flag that the entry has pronunciation data. otherwise pronunciation is the reading- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BinaryDictionary
protected BinaryDictionary(IOSupplier<java.io.InputStream> targetMapResource, IOSupplier<java.io.InputStream> posResource, IOSupplier<java.io.InputStream> dictResource) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
populateTargetMap
private static void populateTargetMap(DataInput in, int[] targetMap, int[] targetMapOffsets) throws java.io.IOException
- Throws:
java.io.IOException
-
populatePosDict
private static void populatePosDict(DataInput in, int posSize, java.lang.String[] posDict, java.lang.String[] inflTypeDict, java.lang.String[] inflFormDict) throws java.io.IOException
- Throws:
java.io.IOException
-
getResource
@Deprecated(forRemoval=true, since="9.1") public static final java.io.InputStream getResource(BinaryDictionary.ResourceScheme scheme, java.lang.String path) throws java.io.IOExceptionDeprecated, for removal: This API element is subject to removal in a future version.- Throws:
java.io.IOException
-
lookupWordIds
public void lookupWordIds(int sourceId, IntsRef ref)
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:DictionaryGet left id of specified word- Specified by:
getLeftIdin interfaceDictionary- Returns:
- left id
-
getRightId
public int getRightId(int wordId)
Description copied from interface:DictionaryGet right id of specified word- Specified by:
getRightIdin interfaceDictionary- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:DictionaryGet word cost of specified word- Specified by:
getWordCostin interfaceDictionary- Returns:
- word's cost
-
getBaseForm
public java.lang.String getBaseForm(int wordId, char[] surfaceForm, int off, int len)Description copied from interface:DictionaryGet base form of word- Specified by:
getBaseFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getReading
public java.lang.String getReading(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet reading of tokens- Specified by:
getReadingin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
public java.lang.String getPartOfSpeech(int wordId)
Description copied from interface:DictionaryGet Part-Of-Speech of tokens- Specified by:
getPartOfSpeechin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Part-Of-Speech of the token
-
getPronunciation
public java.lang.String getPronunciation(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet pronunciation of tokens- Specified by:
getPronunciationin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
public java.lang.String getInflectionType(int wordId)
Description copied from interface:DictionaryGet inflection type of tokens- Specified by:
getInflectionTypein interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
public java.lang.String getInflectionForm(int wordId)
Description copied from interface:DictionaryGet inflection form of tokens- Specified by:
getInflectionFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection form, or null
-
baseFormOffset
private static int baseFormOffset(int wordId)
-
readingOffset
private int readingOffset(int wordId)
-
pronunciationOffset
private int pronunciationOffset(int wordId)
-
hasBaseFormData
private boolean hasBaseFormData(int wordId)
-
hasReadingData
private boolean hasReadingData(int wordId)
-
hasPronunciationData
private boolean hasPronunciationData(int wordId)
-
readString
private java.lang.String readString(int offset, int length, boolean kana)
-
-