Package org.apache.lucene.util.fst
Class FSTCompiler<T>
- java.lang.Object
-
- org.apache.lucene.util.fst.FSTCompiler<T>
-
public class FSTCompiler<T> extends java.lang.ObjectBuilds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698
The parameterized type T is the output type. See the subclasses of
Outputs.FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static classFSTCompiler.Arc<T>Expert: holds a pending (seen but not yet serialized) arc.static classFSTCompiler.Builder<T>Fluent-style constructor for FSTFSTCompiler.(package private) static classFSTCompiler.CompiledNode(package private) static classFSTCompiler.FixedLengthArcsBufferReusable buffer for building nodes with fixed length arcs (binary search or direct addressing).(package private) static interfaceFSTCompiler.Node(package private) static classFSTCompiler.UnCompiledNode<T>Expert: holds a pending (seen but not yet serialized) Node.
-
Field Summary
Fields Modifier and Type Field Description (package private) booleanallowFixedLengthArcs(package private) longarcCount(package private) longbinarySearchNodeCount(package private) BytesStorebytesprivate NodeHash<T>dedupHash(package private) static floatDIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR(package private) longdirectAddressingExpansionCredit(package private) floatdirectAddressingMaxOversizingFactor(package private) longdirectAddressingNodeCountprivate booleandoShareNonSingletonNodes(package private) FSTCompiler.FixedLengthArcsBufferfixedLengthArcsBufferprivate FSTCompiler.UnCompiledNode<T>[]frontier(package private) FST<T>fst(package private) longlastFrozenNodeprivate IntsRefBuilderlastInputprivate intminSuffixCount1private intminSuffixCount2private TNO_OUTPUT(package private) longnodeCount(package private) int[]numBytesPerArc(package private) int[]numLabelBytesPerArcprivate intshareMaxTailLength
-
Constructor Summary
Constructors Modifier Constructor Description privateFSTCompiler(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, boolean allowFixedLengthArcs, int bytesPageBits, float directAddressingMaxOversizingFactor)FSTCompiler(FST.INPUT_TYPE inputType, Outputs<T> outputs)Instantiates an FST/FSA builder with default settings and pruning options turned off.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(IntsRef input, T output)Add the next input/output pair.FST<T>compile()Returns final FST.private voidcompileAllTargets(FSTCompiler.UnCompiledNode<T> node, int tailLength)private FSTCompiler.CompiledNodecompileNode(FSTCompiler.UnCompiledNode<T> nodeIn, int tailLength)private voidfreezeTail(int prefixLenPlus1)longfstRamBytesUsed()longgetArcCount()floatgetDirectAddressingMaxOversizingFactor()longgetMappedStateCount()longgetNodeCount()longgetTermCount()private booleanvalidOutput(T output)
-
-
-
Field Detail
-
DIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR
static final float DIRECT_ADDRESSING_MAX_OVERSIZING_FACTOR
- See Also:
- Constant Field Values
-
NO_OUTPUT
private final T NO_OUTPUT
-
minSuffixCount1
private final int minSuffixCount1
-
minSuffixCount2
private final int minSuffixCount2
-
doShareNonSingletonNodes
private final boolean doShareNonSingletonNodes
-
shareMaxTailLength
private final int shareMaxTailLength
-
lastInput
private final IntsRefBuilder lastInput
-
frontier
private FSTCompiler.UnCompiledNode<T>[] frontier
-
lastFrozenNode
long lastFrozenNode
-
numBytesPerArc
int[] numBytesPerArc
-
numLabelBytesPerArc
int[] numLabelBytesPerArc
-
fixedLengthArcsBuffer
final FSTCompiler.FixedLengthArcsBuffer fixedLengthArcsBuffer
-
arcCount
long arcCount
-
nodeCount
long nodeCount
-
binarySearchNodeCount
long binarySearchNodeCount
-
directAddressingNodeCount
long directAddressingNodeCount
-
allowFixedLengthArcs
final boolean allowFixedLengthArcs
-
directAddressingMaxOversizingFactor
final float directAddressingMaxOversizingFactor
-
directAddressingExpansionCredit
long directAddressingExpansionCredit
-
bytes
final BytesStore bytes
-
-
Constructor Detail
-
FSTCompiler
public FSTCompiler(FST.INPUT_TYPE inputType, Outputs<T> outputs)
Instantiates an FST/FSA builder with default settings and pruning options turned off. For more tuning and tweaking, seeFSTCompiler.Builder.
-
FSTCompiler
private FSTCompiler(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, boolean allowFixedLengthArcs, int bytesPageBits, float directAddressingMaxOversizingFactor)
-
-
Method Detail
-
getDirectAddressingMaxOversizingFactor
public float getDirectAddressingMaxOversizingFactor()
-
getTermCount
public long getTermCount()
-
getNodeCount
public long getNodeCount()
-
getArcCount
public long getArcCount()
-
getMappedStateCount
public long getMappedStateCount()
-
compileNode
private FSTCompiler.CompiledNode compileNode(FSTCompiler.UnCompiledNode<T> nodeIn, int tailLength) throws java.io.IOException
- Throws:
java.io.IOException
-
freezeTail
private void freezeTail(int prefixLenPlus1) throws java.io.IOException- Throws:
java.io.IOException
-
add
public void add(IntsRef input, T output) throws java.io.IOException
Add the next input/output pair. The provided input must be sorted after the previous one according toIntsRef.compareTo(org.apache.lucene.util.IntsRef). It's also OK to add the same input twice in a row with different outputs, as long asOutputsimplements theOutputs.merge(T, T)method. Note that input is fully consumed after this method is returned (so caller is free to reuse), but output is not. So if your outputs are changeable (egByteSequenceOutputsorIntSequenceOutputs) then you cannot reuse across calls.- Throws:
java.io.IOException
-
validOutput
private boolean validOutput(T output)
-
compile
public FST<T> compile() throws java.io.IOException
Returns final FST. NOTE: this will return null if nothing is accepted by the FST.- Throws:
java.io.IOException
-
compileAllTargets
private void compileAllTargets(FSTCompiler.UnCompiledNode<T> node, int tailLength) throws java.io.IOException
- Throws:
java.io.IOException
-
fstRamBytesUsed
public long fstRamBytesUsed()
-
-