http://java.dzone.com/articles/custom-security-filtering-solr
Post filtering
Even without caching, filter sets default to generate in advance. In some cases it can be extremely expensive and prohibitive to generate a filter set. One example of this is with access control filtering that needs to take the users query context into account in order to know which documents are allowed to be returned or not. Ideally only matching documents, documents that match the query and straightforward filters, should be evaluated for security access control. It’s wasteful to evaluate any other documents that wouldn’t otherwise match anyway.
- Documents have an “access control list” associated with them, specifying allowed and disallowed users as well as allowed and disallowed groups.
- The access control list is an ordered list of allowed/disallowed users and groups. Order matters, such that the first matching rule determines access.
- If no allowing access is found, the document is not allowed.
- Solr has logic that only kicks in PostFilter’s when the cost is >= 100, that’s why the getCost method is the way it is.
In this implementation, the access control rules are entirely specified on each document, in the acl field. In order to efficiently filter by these rules at query time, Lucene’s FieldCache is used. There is upfront cost in time and RAM in building the FieldCache data structure, making this rapid to access at query time; when FieldCache is used (sorting, some faceting implementations, function queries, and this custom query parser) it is wise to put in appropriate warming queries to have the FieldCache entries built at commit-time rather than end users waiting longer at query-time.
To make it easy to present, a quick and dirty Velocity template, ids.vm, was added to the conf/velocity directory:
And finally let’s see the results, using the base request of http://localhost:8983/solr/select?q=*:*&wt=velocity&v.template=ids,
https://trello.com/c/5z5PpR4r/50-design-solr-document-level-security-filter-solution
Document Level Security
Manifold CF (Connector Framework)
One way to add document level security to your search is through Apache ManifoldCF. ManifoldCF "defines a security model for target repositories that permits them to enforce source-repository security policies".
It works by adding security tokens from the source repositories as metadata on the indexed documents. Then, at query time, a Search Component adds a filter to all queries, matching only documents the logged-in user is allowed to see. ManifoldCF supports AD security out of the box.
Path Based Authentication
<requestHandler name="/instock" class="solr.DisMaxRequestHandler" > <lst name="appends"> <str name="fq">inStock:true</str> </lst> <lst name="invariants"> <str name="facet.field">cat</str> </lst> </requestHandler>
Authentication: You have to authenticate to access any path starting with "/core1/". Other paths can be accessed without authenticating. Authentication will have to be performed against a "realm" called "Test Realm". The "realm" will verify credentials