Class DirectoryTaxonomyReader
- All Implemented Interfaces:
Closeable,AutoCloseable,Accountable
TaxonomyReader which retrieves stored taxonomy information from a Directory.
Reading from the on-disk index on every method call is too slow, so this implementation employs caching: Some methods cache recent requests and their results, while other methods prefetch all the data into memory and then provide answers directly from in-memory tables. See the documentation of individual methods for comments on their performance.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.facet.taxonomy.TaxonomyReader
TaxonomyReader.ChildrenIterator -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final longprivate LRUHashMap<Integer, FacetLabel> private static final intprivate final DirectoryReaderprivate LRUHashMap<FacetLabel, Integer> private TaxonomyIndexArraysprivate final longprivate final DirectoryTaxonomyWriterFields inherited from class org.apache.lucene.facet.taxonomy.TaxonomyReader
INVALID_ORDINAL, ROOT_ORDINALFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE -
Constructor Summary
ConstructorsConstructorDescriptionDirectoryTaxonomyReader(DirectoryTaxonomyWriter taxoWriter) Opens aDirectoryTaxonomyReaderover the givenDirectoryTaxonomyWriter(for NRT).DirectoryTaxonomyReader(DirectoryReader indexReader, DirectoryTaxonomyWriter taxoWriter, LRUHashMap<FacetLabel, Integer> ordinalCache, LRUHashMap<Integer, FacetLabel> categoryCache, TaxonomyIndexArrays taxoArrays) Expert: Use this method to explicitly force theDirectoryTaxonomyReaderto use specific parent/children arrays and caches.DirectoryTaxonomyReader(Directory directory) Open for reading a taxonomy stored in a givenDirectory. -
Method Summary
Modifier and TypeMethodDescriptionprivate voidcheckOrdinalBounds(int... ordinals) Checks if the ordinals in the array are >=0 and invalid input: '<'DirectoryTaxonomyReader#indexReader.maxDoc()protected voiddoClose()performs the actual task of closing the resources that are used by the taxonomy reader.protected DirectoryTaxonomyReaderImplements the opening of a newDirectoryTaxonomyReaderinstance if the taxonomy has changed.int[]getBulkOrdinals(FacetLabel... categoryPaths) Returns the ordinals of the categories given as a path.getBulkPath(int... ordinals) Returns an array of FacetLabels for a given array of ordinals.Returns nested resources of this class.Retrieve user committed data.Expert: returns the underlyingDirectoryReaderinstance that is used by thisTaxonomyReader.intgetOrdinal(FacetLabel cp) Returns the ordinal of the category given as a path.Returns aParallelTaxonomyArraysobject which can be used to efficiently traverse the taxonomy tree.getPath(int ordinal) Returns the path name of the category with the given ordinal.private FacetLabel[]getPathFromCache(int... ordinals) intgetSize()Returns the number of categories in the taxonomy.protected DirectoryReaderopenIndexReader(IndexWriter writer) Open theDirectoryReaderfrom thisIndexWriter.protected DirectoryReaderopenIndexReader(Directory directory) Open theDirectoryReaderfrom thisDirectory.longReturn the memory usage of this object in bytes.voidsetCacheSize(int size) setCacheSize controls the maximum allowed size of each of the caches used bygetPath(int)andgetOrdinal(FacetLabel).toString(int max) Returns ordinal -> label mapping, up to the provided max ordinal or number of ordinals, whichever is smaller.Methods inherited from class org.apache.lucene.facet.taxonomy.TaxonomyReader
close, decRef, ensureOpen, getChildren, getOrdinal, getRefCount, incRef, openIfChanged, tryIncRef
-
Field Details
-
DEFAULT_CACHE_VALUE
private static final int DEFAULT_CACHE_VALUE- See Also:
-
BYTES_PER_CACHE_ENTRY
private static final long BYTES_PER_CACHE_ENTRY -
taxoWriter
-
taxoEpoch
private final long taxoEpoch -
indexReader
-
ordinalCache
-
categoryCache
-
taxoArrays
-
-
Constructor Details
-
DirectoryTaxonomyReader
DirectoryTaxonomyReader(DirectoryReader indexReader, DirectoryTaxonomyWriter taxoWriter, LRUHashMap<FacetLabel, Integer> ordinalCache, LRUHashMap<Integer, throws IOExceptionFacetLabel> categoryCache, TaxonomyIndexArrays taxoArrays) Expert: Use this method to explicitly force theDirectoryTaxonomyReaderto use specific parent/children arrays and caches.Called from
doOpenIfChanged(). If the taxonomy has been recreated, you should passnullas the caches and parent/children arrays.- Parameters:
indexReader- An indexReader that is opened in the desired DirectorytaxoWriter- TheDirectoryTaxonomyWriterfrom which to obtain newly added categories, in real-time.ordinalCache- a FacetLabel to Integer ordinal mapping if it already existscategoryCache- an ordinal to FacetLabel mapping if it already existstaxoArrays- taxonomy arrays that store the parent, siblings, children information- Throws:
IOException
-
DirectoryTaxonomyReader
Open for reading a taxonomy stored in a givenDirectory.- Parameters:
directory- TheDirectoryin which the taxonomy resides.- Throws:
CorruptIndexException- if the Taxonomy is corrupt.IOException- if another error occurred.
-
DirectoryTaxonomyReader
Opens aDirectoryTaxonomyReaderover the givenDirectoryTaxonomyWriter(for NRT).- Parameters:
taxoWriter- TheDirectoryTaxonomyWriterfrom which to obtain newly added categories, in real-time.- Throws:
IOException
-
-
Method Details
-
doClose
Description copied from class:TaxonomyReaderperforms the actual task of closing the resources that are used by the taxonomy reader.- Specified by:
doClosein classTaxonomyReader- Throws:
IOException
-
doOpenIfChanged
Implements the opening of a newDirectoryTaxonomyReaderinstance if the taxonomy has changed.NOTE: the returned
DirectoryTaxonomyReadershares the ordinal and category caches with this reader. This is not expected to cause any issues, unless the two instances continue to live. The reader guarantees that the two instances cannot affect each other in terms of correctness of the caches, however if the size of the cache is changed throughsetCacheSize(int), it will affect both reader instances.- Specified by:
doOpenIfChangedin classTaxonomyReader- Throws:
IOException- See Also:
-
openIndexReader
Open theDirectoryReaderfrom thisDirectory.- Throws:
IOException
-
openIndexReader
Open theDirectoryReaderfrom thisIndexWriter.- Throws:
IOException
-
getInternalIndexReader
Expert: returns the underlyingDirectoryReaderinstance that is used by thisTaxonomyReader. -
getParallelTaxonomyArrays
Description copied from class:TaxonomyReaderReturns aParallelTaxonomyArraysobject which can be used to efficiently traverse the taxonomy tree.- Specified by:
getParallelTaxonomyArraysin classTaxonomyReader- Throws:
IOException
-
getCommitUserData
Description copied from class:TaxonomyReaderRetrieve user committed data.- Specified by:
getCommitUserDatain classTaxonomyReader- Throws:
IOException- See Also:
-
getOrdinal
Description copied from class:TaxonomyReaderReturns the ordinal of the category given as a path. The ordinal is the category's serial number, an integer which starts with 0 and grows as more categories are added (note that once a category is added, it can never be deleted).- Specified by:
getOrdinalin classTaxonomyReader- Returns:
- the category's ordinal or
TaxonomyReader.INVALID_ORDINALif the category wasn't found. - Throws:
IOException
-
getBulkOrdinals
Description copied from class:TaxonomyReaderReturns the ordinals of the categories given as a path. The ordinal is the category's serial number, an integer which starts with 0 and grows as more categories are added (note that once a category is added, it can never be deleted).The implementation in
DirectoryTaxonomyReaderis generally faster than iteratively callingTaxonomyReader.getOrdinal(FacetLabel)- Overrides:
getBulkOrdinalsin classTaxonomyReader- Returns:
- array of the category's' ordinals or
TaxonomyReader.INVALID_ORDINALif the category wasn't found. - Throws:
IOException
-
getPath
Description copied from class:TaxonomyReaderReturns the path name of the category with the given ordinal.- Specified by:
getPathin classTaxonomyReader- Throws:
IOException
-
getPathFromCache
-
checkOrdinalBounds
Checks if the ordinals in the array are >=0 and invalid input: '<'DirectoryTaxonomyReader#indexReader.maxDoc()- Parameters:
ordinals- Integer array of ordinals- Throws:
IllegalArgumentException- Throw an IllegalArgumentException if one of the ordinals is out of bounds
-
getBulkPath
Returns an array of FacetLabels for a given array of ordinals.This API is generally faster than iteratively calling
getPath(int)over an array of ordinals. It uses thegetPath(int)method iteratively when it detects that the index was created using StoredFields (with no performance gains) and uses DocValues based iteration when the index is based on BinaryDocValues. Lucene switched to BinaryDocValues in version 9.0- Overrides:
getBulkPathin classTaxonomyReader- Parameters:
ordinals- Array of category ordinals that were added to the taxonomy index- Throws:
IOException
-
getSize
public int getSize()Description copied from class:TaxonomyReaderReturns the number of categories in the taxonomy. Note that the number of categories returned is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).- Specified by:
getSizein classTaxonomyReader
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsedin interfaceAccountable
-
getChildResources
Description copied from interface:AccountableReturns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResourcesin interfaceAccountable- See Also:
-
setCacheSize
public void setCacheSize(int size) setCacheSize controls the maximum allowed size of each of the caches used bygetPath(int)andgetOrdinal(FacetLabel).Currently, if the given size is smaller than the current size of a cache, it will not shrink, and rather we be limited to its current size.
- Parameters:
size- the new maximum cache size, in number of entries.
-
toString
Returns ordinal -> label mapping, up to the provided max ordinal or number of ordinals, whichever is smaller.
-