java.lang.Object
org.apache.lucene.util.automaton.StringsToAutomaton
Builds a minimal, deterministic
Automaton that accepts a set of strings using the
algorithm described in Incremental Construction
of Minimal Acyclic Finite-State Automata by Daciuk, Mihov, Watson and Watson. This requires
sorted input data, but is very fast (nearly linear with the input size). Also offers the ability
to directly build a binary Automaton representation. Users should access this
functionality through Automata static methods.- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final classDFSA state withcharlabels on transitions. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate BytesRefBuilderUsed for input order checking (only through assertions right now)private final StringsToAutomaton.StateRoot automaton state.A "registry" for state interning. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void(package private) static AutomatonBuild a minimal, deterministic automaton from a sorted list ofBytesRefrepresenting strings in UTF-8.(package private) static Automatonbuild(BytesRefIterator input, boolean asBinary) Build a minimal, deterministic automaton from a sorted list ofBytesRefrepresenting strings in UTF-8.private AutomatonCalled after adding all terms.private static intconvert(Automaton.Builder a, StringsToAutomaton.State s, IdentityHashMap<StringsToAutomaton.State, Integer> visited) Internal recursive traversal for conversion.private voidReplace last child ofstatewith an already registered state or stateRegistry the last child state.private booleansetPrevious(BytesRef current) Copycurrentinto an internal buffer.
-
Field Details
-
stateRegistry
A "registry" for state interning. -
root
Root automaton state. -
previous
Used for input order checking (only through assertions right now)
-
-
Constructor Details
-
StringsToAutomaton
private StringsToAutomaton()The default constructor is private. Use static methods directly.
-
-
Method Details
-
setPrevious
Copycurrentinto an internal buffer. -
convert
private static int convert(Automaton.Builder a, StringsToAutomaton.State s, IdentityHashMap<StringsToAutomaton.State, Integer> visited) Internal recursive traversal for conversion. -
completeAndConvert
Called after adding all terms. Performs final minimization and converts to a standardAutomatoninstance. -
build
-
build
Build a minimal, deterministic automaton from a sorted list ofBytesRefrepresenting strings in UTF-8. These strings must be binary-sorted. Creates anAutomatonwith either UTF-8 codepoints as transition labels or binary (compiled) transition labels based onasBinary.- Throws:
IOException
-
add
-
replaceOrRegister
Replace last child ofstatewith an already registered state or stateRegistry the last child state.
-