java.lang.Object
org.apache.lucene.search.SimpleCollector
org.apache.lucene.facet.FacetsCollector
org.apache.lucene.facet.RandomSamplingFacetsCollector
- All Implemented Interfaces:
Collector,LeafCollector
- Direct Known Subclasses:
RandomSamplingFacetsCollector.ReducedRandomSamplingFacetsCollector
Collects hits for subsequent faceting, using sampling if needed. Once you've run a search and
collect hits into this, instantiate one of the
Facets subclasses to do the facet
counting. Note that this collector does not collect the scores of matching docs (i.e. FacetsCollector.MatchingDocs.scores()) is null.
If you require the original set of hits, you can call getOriginalMatchingDocs().
Also, since the counts of the top-facets is based on the sampled set, you can amortize the counts
by calling amortizeFacetCounts(org.apache.lucene.facet.FacetResult, org.apache.lucene.facet.FacetsConfig, org.apache.lucene.search.IndexSearcher).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static classprivate static classFaster alternative for java.util.Random, inspired by http://dmurphy747.wordpress.com/2011/03/23/xorshift-vs-random- performance-in-java/Nested classes/interfaces inherited from class org.apache.lucene.facet.FacetsCollector
FacetsCollector.MatchingDocs -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate intprivate intprivate static final intprivate final RandomSamplingFacetsCollector.XORShift64Randomprivate List<FacetsCollector.MatchingDocs> private final intprivate doubleprivate int -
Constructor Summary
ConstructorsConstructorDescriptionRandomSamplingFacetsCollector(int sampleSize) Constructor with the given sample size and default seed.RandomSamplingFacetsCollector(int sampleSize, long seed) Constructor with the given sample size and seed. -
Method Summary
Modifier and TypeMethodDescriptionamortizeFacetCounts(FacetResult res, FacetsConfig config, IndexSearcher searcher) Note: if you use a countingFacetsimplementation, you can amortize the sampled counts by calling this method.createManager(int sampleSize, long seed) Creates aCollectorManagerfor concurrent random sampling throughRandomSamplingFacetsCollectorprivate FacetsCollector.MatchingDocsCreate a sampled of the given hits.private List<FacetsCollector.MatchingDocs> createSampledDocs(List<FacetsCollector.MatchingDocs> matchingDocsList) Create a sampled copy of the matching documents list.Returns the sampled list of the matching documents.Returns the original matching documents.doubleReturns the sampling rate that was used.Methods inherited from class org.apache.lucene.facet.FacetsCollector
collect, doSetNextReader, finish, getKeepScores, scoreMode, setScorerMethods inherited from class org.apache.lucene.search.SimpleCollector
getLeafCollectorMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.lucene.search.LeafCollector
collect, competitiveIterator
-
Field Details
-
NOT_CALCULATED
private static final int NOT_CALCULATED- See Also:
-
sampleSize
private final int sampleSize -
random
-
samplingRate
private double samplingRate -
sampledDocs
-
totalHits
private int totalHits -
leftoverBin
private int leftoverBin -
leftoverIndex
private int leftoverIndex
-
-
Constructor Details
-
RandomSamplingFacetsCollector
public RandomSamplingFacetsCollector(int sampleSize) Constructor with the given sample size and default seed.- See Also:
-
RandomSamplingFacetsCollector
public RandomSamplingFacetsCollector(int sampleSize, long seed) Constructor with the given sample size and seed.- Parameters:
sampleSize- The preferred sample size. If the number of hits is greater than the size, sampling will be done using a sample ratio of sampling size / totalN. For example: 1000 hits, sample size = 10 results in samplingRatio of 0.01. If the number of hits is lower, no sampling is done at allseed- The random seed. If0then a seed will be chosen for you.
-
-
Method Details
-
getMatchingDocs
Returns the sampled list of the matching documents. Note that aFacetsCollector.MatchingDocsinstance is returned per segment, even if no hits from that segment are included in the sampled set.Note: One or more of the MatchingDocs might be empty (not containing any hits) as result of sampling.
Note:
MatchingDocs.totalHitsis copied from the original MatchingDocs, scores is set tonull- Overrides:
getMatchingDocsin classFacetsCollector
-
getOriginalMatchingDocs
Returns the original matching documents. -
createSampledDocs
private List<FacetsCollector.MatchingDocs> createSampledDocs(List<FacetsCollector.MatchingDocs> matchingDocsList) Create a sampled copy of the matching documents list. -
createSample
Create a sampled of the given hits. -
amortizeFacetCounts
public FacetResult amortizeFacetCounts(FacetResult res, FacetsConfig config, IndexSearcher searcher) throws IOException Note: if you use a countingFacetsimplementation, you can amortize the sampled counts by calling this method. Uses theFacetsConfigand theIndexSearcherto determine the upper bound for each facet value.- Throws:
IOException
-
getSamplingRate
public double getSamplingRate()Returns the sampling rate that was used. -
createManager
public static CollectorManager<RandomSamplingFacetsCollector,RandomSamplingFacetsCollector> createManager(int sampleSize, long seed) Creates aCollectorManagerfor concurrent random sampling throughRandomSamplingFacetsCollector
-