Solr 8.1.1 Authorization、Core、HanLP、Searching、Commits、NRT ...

本贴最后更新于 1930 天前,其中的信息可能已经时移世改

本文 using Solr in standalone mode

intro

  • 基本搭建部署
  • 认证、授权
  • Core 创建配置
  • Data Import Handler
  • managed-schema 配置
  • HanLP 自然语言分词处理
  • Standard Query Parse 语法
  • Suggester 智能提醒组件
  • NRT 近实时搜索概念
  • Soft 提交、Hard 提交
  • DirectoryFactor
  • indexConfig
  • Segments
  • Cache
  • Java client SolrJ

start、stop、restart

# using Solr in standalone mode
# help
./solr -help
./solr start -help
./solr stop -help

# start、stop、restart(root用户启动Solr需加 -force)
./solr start -p 8983 -force
./solr restart -p 8983 -force
./solr stop -p 8983
./solr stop -all

Authorization Plugins

Solr 支持 Basic AuthenticationJWT Authentication 等多种认证, Solr Ref Guide 8.1-securing-solr.html

  • Enable Basic Authentication

    /usr/solr-8.1.1/server/solr 文件夹下创建文件 security.json,内容如下

    {
      "authentication":{ 
        "blockUnknown":  true, 
        "class": "solr.BasicAuthPlugin",
        "credentials": {"solr":"IV0EHq1OnNrj6gvR......"}, 
        "realm": "My Solr users", 
        "forwardCredentials": false 
      },
      "authorization":{
        "class": "solr.RuleBasedAuthorizationPlugin",
        "permissions": [{"name":"security-edit", "role":"admin"}], 
        "user-role": {"solr":"admin"}
      }
    }
    
    1. 启用基本身份验证和基于规则的授权插件。
    2. 参数 blockUnknown: true 表示不允许未经身份验证的请求通过。
    3. 已定义了一个名为 solr 的用户,默认密码为 SolrRocks
    4. admin 角色已定义,并且具有编辑安全设置的权限。
    5. solr 用户已被定义为 admin 角色。

Create Core

  1. 通过 Admin UI 界面创建。创建 Core 时必须能够找到以下配置,否则将创建失败;

    • instanceDir 必须已经存在
    • instanceDir 必须包含一个 conf 文件夹
    • conf 文件夹下必须包含 solrconfig.xmlmanaged-schema
    • 创建成功后,在 /instanceDir 下将会创建 core.properties
      # core.properties
      name=blog
      config=./conf/solrconfig.xml
      schema=./conf/managed-schema
      dataDir=./data
      
  2. 调用 CoreAdmin API 创建。

    GET /solr/admin/cores?action=CREATE&name=blog&instanceDir=/usr/solr-8.1.1/server/blog&config=solrconfig.xml&dataDir=data HTTP/1.1
    Host: 127.0.0.1:8983
    Authorization: Basic c29scjpTb2xyUm9ja3M=
    Postman-Token: 66fcf8ec-aa89-4ba6-98f3-c36339d71a84,dbbf0e06-62b5-4994-a160-0feb4fcdfcc4
    

Managed-schema

  1. 修改 uniqueKey
    <uniqueKey>id</uniqueKey>

  2. 加入索引字段

    <uniqueKey>id</uniqueKey>
    #field 
    <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
    <field name="article" type="text_cn" multiValued="true" indexed="true" stored="true"/>
    <field name="authorName" type="string" indexed="true" stored="true"/>
    <field name="language" type="string" indexed="true" stored="false"/>
    <field name="mobileUrl" type="string" indexed="true" stored="true"/>
    <field name="publishDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR"/>
    <field name="siteName" type="string" indexed="true" stored="true"/>
    <field name="title" type="text_cn" indexed="true" stored="true"/>
    <field name="summary" type="text_cn" indexed="true" stored="true"/>
    <field name="summaryAuto" type="text_cn" indexed="true" stored="true"/>
    <field name="url" type="string" indexed="true" stored="true"/>
    #copyField 
    <copyField source="authorName" dest="article"/>
    <copyField source="siteName" dest="article"/>
    <copyField source="title" dest="article"/>
    <copyField source="summary" dest="article"/>
    

HanLP or IKAnalyzer

  1. HanLP 中文分词(推荐)

    • 集成 HanLP 分词 full-text-retrieval-solr-integrated-hanlp-chinese-word-segmentation.html
      <fieldType name="text_cn" class="solr.TextField">
        <analyzer type="index">
          <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="true"/>
          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
          <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
         <!-- 切记不要在query中开启index模式 -->
         <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="false"/>
           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
           <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
      </fieldType>
      
  2. IKAnalyzer

    • IKAnalyzer IKAnalyzer2012_u6.jar 很古老了 😰 ),已经不兼容后续新版本
    • GitHub https://github.com/magese/ik-analyzer-solr 支持 solr 7&8;
    • ik-analyzer-8.1.0.jar 放入 /usr/solr-8.1.1/server/solr-webapp/webapp/WEB-INF/lib
    • 配置 managed-schema,添加 ik分词器,还有同义词、停止词等配置
      <!-- ik分词器 -->
      <fieldType name="text_ik" class="solr.TextField">
        <analyzer type="index">
          <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" conf="ik.conf" useSmart="false"/>
           <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
           <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
          <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" conf="ik.conf" useSmart="true"/>
          <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
          <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
          <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
      </fieldType>
      

Data Import Handler

  1. /usr/solr-8.1.1/server/solr/blog/conf 下创建 data-config.xml 文件;内容如下:

    <dataConfig>  
      <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"  
               url="jdbc:mysql://127.0.0.1:3306/solr"  
               user="solr" password="xxoo" batchSize="-1" />  
      <document>  
         <entity name="article" pk="id" transformer="DateFormatTransformer"  
           query="SELECT * from readhub"  
           deltaImportQuery="SELECT * from readhub where oId='${dataimporter.delta.id}'"   
           deltaQuery="SELECT id FROM readhub where date_format(publishDate,'yyyy-MM-dd HH:mi:ss') > '${dataimporter.last_index_time}'"  >  
           <field column="id" name="id" />  
           <field column="authorName" name="authorName" />  
           <field column="language" name="language" />  
           <field column="mobileUrl" name="mobileUrl" />  
           <field column="publishDate" name="publishDate" />  
           <field column="siteName" name="siteName" />  
           <field column="title" name="title" />  
           <field column="summary" name="summary" />  
           <field column="summaryAuto" name="summaryAuto" />  
           <field column="url" name="url" />  
        </entity>   
     </document>   
     <propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type="SimplePropertiesWriter" />   
    </dataConfig>  
    
  2. 数据库密码加密 🔗 encrypting-a-database-password
    /usr/solr-8.1.1/server/solr/blog/ 下创建 encryptionkey 文件,并写入明文密码,通过命令获取加密密码:

    echo -n "my-jdbc-password" | openssl enc -aes-128-cbc -a -salt -md md5 -pass  file:/usr/solr-8.1.1/server/solr/blog/encryptionkey  
    
    # 将上面输出的字符写入`data-config.xml password属性`配置,修改如下
    <dataSource type="JdbcDataSource"   
       driver="com.mysql.jdbc.Driver"   
       url="jdbc:mysql://127.0.0.1:3306/solr"   
       user="solr"   
       password="U2FsdGVkXXXXXXXXXXXXXXXXXXXXX="   
       encryptKeyFile="/usr/solr-8.1.1/server/solr/blog/encryptionkey"   
       batchSize="-1" />
    
  3. 配置 DataImportHandler 在/usr/solr-8.1.1/server/solr/blog/conf/solrconfig.xml,加入如下配置

    <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />  
    <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">   
      <lst name="defaults">   
         <str name="config">data-config.xml</str>   
      </lst>   
    </requestHandler>  
    

Uploading Structured Data

15676854061.jpg

CoreAdmin API

  • /solr/admin/cores?action= STATUS &core=blog
  • /solr/admin/cores?action= CREATE &name=blog&instanceDir=/usr/solr-8.1.1/server/blog&config=solrconfig.xml&dataDir=data
  • /solr/admin/cores?action= RELOAD &core=blog
  • /solr/admin/cores?action= SWAP &core=core-name&other=other-core-name

Searching

默认会使用 Standard Query ParserdefType=lucene

模糊查询 sol~0.5 (相似度数值)
邻近查询 "solr apache"~5
范围查询 date:[20190810 TO 20190910] 中括号[]表示包含边界,花括号{}表示排除边界 
权重值查询 Solr^4 apache
Boolean操作符 AND、+、OR、NOT、-
子查询表达式  (Solr OR Apache) AND Jetty
  1. Standard Query Parser 8_1/the-standard-query-parser.html

    • q
    • fq
    • sort
    • fl
    • df
    • start, rows
    • q.op
  2. Highlighting 8_1/highlighting.html

    • hl
    • hl.fl
    • hl.fragsize
    • hl.snippets
    • hl.tag.pre
    • hl.tag.post
  3. Faceting 8_1/faceting.html

    • facet
    • facet.query
    • facet.field
    • facet.prefix
    • facet.sort
    • facet.limit
    • facet.mincount
    • facet.threads
  4. Grouping 8_1/result-grouping.html

    • group
    • group.field
    • group.query
    • group.sort

更多请查看官方文档 /solr/guide/8_1/searching.html

Suggester

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
    <str name="suggest.dictionary">titleSuggester</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">titleSuggester</str>
    <str name="lookupImpl">FuzzyLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">title</str>
    <str name="weightField">title</str>
    <str name="suggestAnalyzerFieldType">text_ik</str>
    <str name="buildOnStartup">true</str>
  </lst>
</searchComponent>
localhost:8983/solr/blog/suggest?suggest=true&suggest.build=false&suggest.dictionary=siteNameSuggester&suggest.q=新浪

# Response
{
  "responseHeader":{
    "status":0,
    "QTime":93},
  "suggest":{"siteNameSuggester":{
      "新浪":{
        "numFound":2,
        "suggestions":[{
            "term":"新浪",
            "weight":0,
            "payload":""},
          {
            "term":"新浪科技",
            "weight":0,
            "payload":""}]}}}}

Near Real Time

🔗 near-real-time-searching

Document durability and searchability are controlled by commits . The "Near" in "Near Real Time" is configurable to meet the needs of your application. Commits are either "hard" or "soft" and can be issued by a client (say SolrJ), via a REST call or configured to occur automatically in solrconfig.xml . The recommendation usually gives is to configure your commit strategy in solrconfig.xml (see below) and avoid issuing commits externally.

Typically in NRT applications, hard commits are configured with openSearcher=false , and soft commits are configured to make documents visible for search.

  • hard commit
    A hard commit calls fsync on the index files to ensure they have been flushed to stable storage. The current transaction log is closed and a new one is opened. See the "transaction log" discussion below for how data is recovered in the absence of a hard commit. Optionally a hard commit can also make documents visible for search, but this is not recommended for NRT searching as it is more expensive than a soft commit.

  • soft commit
    A soft commit is faster since it only makes index changes visible and does not fsync index files, start a new segment or start a new transaction log. Search collections that have NRT requirements will want to soft commit often enough to satisfy the visibility requirements of the application. A softCommit may be "less expensive" than a hard commit (openSearcher=true), but it is not free. It is recommended that this be set for as long as is reasonable given the application requirements.

Commits

autoSoftCommit 不启用(maxTime = -1)情况下,autoCommitopenSearcher 若为 false,则需要 reload core api 之后才能看到 update/add index,可见 autoSoftCommit 带有 openSearcher 功能。

openSearcher 如果设置为 false,提交会使最近的索引更新写入到索引目录,但不会创建一个新的 IndexSearcher 实例。创建一个新的 IndexSearcher 实例使那些最近更新的索引数据立即
可见。

2 操作
14032 在 2019-09-04 13:05:24 更新了该帖
14032 在 2019-06-26 22:03:31 更新了该帖

相关帖子

欢迎来到这里!

我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。

注册 关于
请输入回帖内容 ...