本文 using Solr in standalone mode
intro
- 基本搭建部署
- 认证、授权
Core
创建配置Data Import Handler
managed-schema
配置HanLP
自然语言分词处理Standard Query Parse
语法Suggester
智能提醒组件NRT
近实时搜索概念Soft
提交、Hard
提交DirectoryFactor
indexConfig
Segments
Cache
Java client SolrJ
start、stop、restart
# using Solr in standalone mode
# help
./solr -help
./solr start -help
./solr stop -help
# start、stop、restart(root用户启动Solr需加 -force)
./solr start -p 8983 -force
./solr restart -p 8983 -force
./solr stop -p 8983
./solr stop -all
Authorization Plugins
Solr
支持 Basic Authentication
、JWT Authentication
等多种认证, Solr Ref Guide 8.1-securing-solr.html
-
Enable Basic Authentication
在
/usr/solr-8.1.1/server/solr
文件夹下创建文件security.json
,内容如下{ "authentication":{ "blockUnknown": true, "class": "solr.BasicAuthPlugin", "credentials": {"solr":"IV0EHq1OnNrj6gvR......"}, "realm": "My Solr users", "forwardCredentials": false }, "authorization":{ "class": "solr.RuleBasedAuthorizationPlugin", "permissions": [{"name":"security-edit", "role":"admin"}], "user-role": {"solr":"admin"} } }
- 启用基本身份验证和基于规则的授权插件。
- 参数
blockUnknown: true
表示不允许未经身份验证的请求通过。 - 已定义了一个名为
solr
的用户,默认密码为SolrRocks
。 admin
角色已定义,并且具有编辑安全设置的权限。solr
用户已被定义为admin
角色。
Create Core
-
通过
Admin UI
界面创建。创建Core
时必须能够找到以下配置,否则将创建失败;instanceDir
必须已经存在instanceDir
必须包含一个conf
文件夹conf
文件夹下必须包含solrconfig.xml
和managed-schema
- 创建成功后,在
/instanceDir
下将会创建core.properties
# core.properties name=blog config=./conf/solrconfig.xml schema=./conf/managed-schema dataDir=./data
-
调用
CoreAdmin API
创建。GET /solr/admin/cores?action=CREATE&name=blog&instanceDir=/usr/solr-8.1.1/server/blog&config=solrconfig.xml&dataDir=data HTTP/1.1 Host: 127.0.0.1:8983 Authorization: Basic c29scjpTb2xyUm9ja3M= Postman-Token: 66fcf8ec-aa89-4ba6-98f3-c36339d71a84,dbbf0e06-62b5-4994-a160-0feb4fcdfcc4
Managed-schema
-
修改
uniqueKey
<uniqueKey>id</uniqueKey>
-
加入索引字段
<uniqueKey>id</uniqueKey> #field <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/> <field name="article" type="text_cn" multiValued="true" indexed="true" stored="true"/> <field name="authorName" type="string" indexed="true" stored="true"/> <field name="language" type="string" indexed="true" stored="false"/> <field name="mobileUrl" type="string" indexed="true" stored="true"/> <field name="publishDate" type="tdate" indexed="true" stored="true" default="NOW+8HOUR"/> <field name="siteName" type="string" indexed="true" stored="true"/> <field name="title" type="text_cn" indexed="true" stored="true"/> <field name="summary" type="text_cn" indexed="true" stored="true"/> <field name="summaryAuto" type="text_cn" indexed="true" stored="true"/> <field name="url" type="string" indexed="true" stored="true"/> #copyField <copyField source="authorName" dest="article"/> <copyField source="siteName" dest="article"/> <copyField source="title" dest="article"/> <copyField source="summary" dest="article"/>
HanLP or IKAnalyzer
-
HanLP
中文分词(推荐)- 集成
HanLP
分词 full-text-retrieval-solr-integrated-hanlp-chinese-word-segmentation.html<fieldType name="text_cn" class="solr.TextField"> <analyzer type="index"> <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <!-- 切记不要在query中开启index模式 --> <tokenizer class="com.hankcs.lucene.HanLPTokenizerFactory" enableIndexMode="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
- 集成
-
IKAnalyzer
IKAnalyzer
(IKAnalyzer2012_u6.jar
很古老了 😰 ),已经不兼容后续新版本GitHub
https://github.com/magese/ik-analyzer-solr 支持 solr 7&8;- 将
ik-analyzer-8.1.0.jar
放入/usr/solr-8.1.1/server/solr-webapp/webapp/WEB-INF/lib
- 配置
managed-schema
,添加ik分词器
,还有同义词、停止词等配置<!-- ik分词器 --> <fieldType name="text_ik" class="solr.TextField"> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" conf="ik.conf" useSmart="false"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" conf="ik.conf" useSmart="true"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Data Import Handler
-
在
/usr/solr-8.1.1/server/solr/blog/conf
下创建data-config.xml
文件;内容如下:<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/solr" user="solr" password="xxoo" batchSize="-1" /> <document> <entity name="article" pk="id" transformer="DateFormatTransformer" query="SELECT * from readhub" deltaImportQuery="SELECT * from readhub where oId='${dataimporter.delta.id}'" deltaQuery="SELECT id FROM readhub where date_format(publishDate,'yyyy-MM-dd HH:mi:ss') > '${dataimporter.last_index_time}'" > <field column="id" name="id" /> <field column="authorName" name="authorName" /> <field column="language" name="language" /> <field column="mobileUrl" name="mobileUrl" /> <field column="publishDate" name="publishDate" /> <field column="siteName" name="siteName" /> <field column="title" name="title" /> <field column="summary" name="summary" /> <field column="summaryAuto" name="summaryAuto" /> <field column="url" name="url" /> </entity> </document> <propertyWriter dateFormat="yyyy-MM-dd HH:mm:ss" type="SimplePropertiesWriter" /> </dataConfig>
-
数据库密码加密 🔗 encrypting-a-database-password
在/usr/solr-8.1.1/server/solr/blog/
下创建encryptionkey
文件,并写入明文密码,通过命令获取加密密码:echo -n "my-jdbc-password" | openssl enc -aes-128-cbc -a -salt -md md5 -pass file:/usr/solr-8.1.1/server/solr/blog/encryptionkey
# 将上面输出的字符写入`data-config.xml password属性`配置,修改如下 <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/solr" user="solr" password="U2FsdGVkXXXXXXXXXXXXXXXXXXXXX=" encryptKeyFile="/usr/solr-8.1.1/server/solr/blog/encryptionkey" batchSize="-1" />
-
配置
DataImportHandler 在/usr/solr-8.1.1/server/solr/blog/conf/solrconfig.xml
,加入如下配置<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" /> <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler>
Uploading Structured Data
CoreAdmin API
/solr/admin/cores?action=
STATUS&core=blog
/solr/admin/cores?action=
CREATE&name=blog&instanceDir=/usr/solr-8.1.1/server/blog&config=solrconfig.xml&dataDir=data
/solr/admin/cores?action=
RELOAD&core=blog
/solr/admin/cores?action=
SWAP&core=core-name&other=other-core-name
Searching
默认会使用
Standard Query Parser
即defType=lucene
模糊查询 sol~0.5 (相似度数值)
邻近查询 "solr apache"~5
范围查询 date:[20190810 TO 20190910] 中括号[]表示包含边界,花括号{}表示排除边界
权重值查询 Solr^4 apache
Boolean操作符 AND、+、OR、NOT、-
子查询表达式 (Solr OR Apache) AND Jetty
-
Standard Query Parser
8_1/the-standard-query-parser.htmlq
fq
sort
fl
df
start, rows
q.op
-
Highlighting
8_1/highlighting.htmlhl
hl.fl
hl.fragsize
hl.snippets
hl.tag.pre
hl.tag.post
-
Faceting
8_1/faceting.htmlfacet
facet.query
facet.field
facet.prefix
facet.sort
facet.limit
facet.mincount
facet.threads
-
Grouping
8_1/result-grouping.htmlgroup
group.field
group.query
group.sort
更多请查看官方文档 /solr/guide/8_1/searching.html
Suggester
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">titleSuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">titleSuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">title</str>
<str name="weightField">title</str>
<str name="suggestAnalyzerFieldType">text_ik</str>
<str name="buildOnStartup">true</str>
</lst>
</searchComponent>
localhost:8983/solr/blog/suggest?suggest=true&suggest.build=false&suggest.dictionary=siteNameSuggester&suggest.q=新浪
# Response
{
"responseHeader":{
"status":0,
"QTime":93},
"suggest":{"siteNameSuggester":{
"新浪":{
"numFound":2,
"suggestions":[{
"term":"新浪",
"weight":0,
"payload":""},
{
"term":"新浪科技",
"weight":0,
"payload":""}]}}}}
Near Real Time
Document durability and searchability are controlled by
commits. The "Near" in "Near Real Time" is configurable to meet the needs of your application. Commits are either "hard" or "soft" and can be issued by a client (say SolrJ), via a REST call or configured to occur automatically in
solrconfig.xml. The recommendation usually gives is to configure your commit strategy in
solrconfig.xml(see below) and avoid issuing commits externally.
Typically in NRT applications, hard commits are configured with
openSearcher=false, and soft commits are configured to make documents visible for search.
-
hard commit
A
hard commitcalls
fsyncon the index files to ensure they have been flushed to stable storage. The current transaction log is closed and a new one is opened. See the "transaction log" discussion below for how data is recovered in the absence of a hard commit. Optionally a hard commit can also make documents visible for search, but this is not recommended for NRT searching as it is more expensive than a soft commit.
-
soft commit
A
soft commitis faster since it only makes index changes visible and does not
fsyncindex files, start a new segment or start a new transaction log. Search collections that have NRT requirements will want to soft commit often enough to satisfy the visibility requirements of the application. A softCommit may be "less expensive" than a hard commit (openSearcher=true), but it is not free. It is recommended that this be set for as long as is reasonable given the application requirements.
Commits
在 autoSoftCommit
不启用(maxTime = -1
)情况下,autoCommit
的 openSearcher
若为 false
,则需要 reload core api
之后才能看到 update/add index
,可见 autoSoftCommit
带有 openSearcher
功能。
openSearcher
如果设置为false
,提交会使最近的索引更新写入到索引目录,但不会创建一个新的IndexSearcher
实例。创建一个新的IndexSearcher
实例使那些最近更新的索引数据立即
可见。
欢迎来到这里!
我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。
注册 关于