重构前商品搜索现状
搜索不准确,搜出来跟关键词不相关的商品
搜索无结果,输入关键词搜索不出结果
搜索排序不准确
商品发布、修改后不能即时同步到搜索引擎
分词效果不好
搜索效率较低
搜索问题原因分析
技术层面
- 商品模块ER图
- 重构前的数据表与es索引关系图
- 重构前的搜索逻辑图
- 索引数据结构设计不合理。 从上面的图片可看出,系统有两种商品类型,分别是单品、整套,单品和整套都有sku规格,除此之外还有客户物料编码,与ERP物料表以及单品sku相关联,用于客户快速找到自己熟悉的物料,这些商品信息分四个索引存储,分别为商品信息cor_goods索引、单品sku cor_products索引、整套sku cor_pack索引、客户物料编码cor_user_material索引,搜索商品时需要对四个索引进行搜索,因此降低了搜索的效率、提高了编写搜索条件的复杂度,以及不能进行准确的排序
- 搜索的逻辑不合理。 根据上面的搜索逻辑图,用户输入关键词点击搜索之后,首先在客户物料编码cor_user_material索引进行搜索,如果存在结果,则返回给用户,不再对其它的索引进行搜索,如果不存在结果,则在下一个索引搜索,以此类推,直到最后在商品信息cor_goods索引搜索后,不管是否有结果也返回给用户。这样就会出现搜索不准确的问题,例如用户想搜索以AB开头的商品编码,因为在第一个客户物料编码cor_user_material索引存在商品编码为AB的商品,所以直接返回了商品编码为AB的商品,不会返回存在其它索引中编码为ABC、ABCD、ABCDE的商品。再如用户想搜索编码为ABC的商品,由于分词的影响,在第一个客户物料编码索引就会命中编码为AB的商品,不会再继续在其它索引搜索编码为ABC的商品
- 同步商品到elasticsearch的方式或方法不合理。 使用Linux的crontab定时任务每分钟查询一次需要同步的商品,商品信息变更后不会及时同步到elasticsearch,如果商品数量太多,更新的时间会更长
- 需要对搜索结果进行客户屏蔽、品牌屏蔽等进行二次处理。 商品和指定品牌的商品会对某些客户进行屏蔽,屏蔽的逻辑没有做到搜索引擎,搜索出来之后再进行二次处理,由于需要到MySQL查询屏蔽关系,而且搜索条件较复杂,所以处理效率较低。
- 对elasticsearch认识不够深,不熟悉,没有使用好。 例如用户只记得编码中的前面、中间或是后面的部分,由于没有使用prefix和wildcard搜索方式配合,所以用户搜不出想要的结果,还有没有更合理地使用boost导致排序的不准确等等
业务层面
- 客户对于同一款产品的叫法可能会跟公司不一样,例如公司叫B,客户叫C,当客户拿C去搜索的时候是搜索不出来B的,因此需要去维护好这些关系才能让客户更好地搜索到商品
重构后达到的效果
客户能够更好更准确地搜索到商品
搜索结果排序准确
搜索效率得到提高
商品信息变更之后及时同步到搜索引擎
重构过程
重新设计索引结构
数据表与es索引关系图
为了提高搜索的准确度与效率,以及解决排序的问题,将单品、成套、以及配置整套(这次迭代新增的商品)三种商品的信息,包括商品名称、商品编码、商品分类、sku信息、客户屏蔽关系、品牌屏蔽关系、客户物料编码、关键词与商品关联关系等,整合到一个商品文档,用商品类型字段区分,保存到同一个商品goods索引。
管理端修改商品信息时,将修改的商品ID投递到商品更新队列,实时同步到elasticsearch商品goods索引。
用户使用客户端从商品goods索引进行商品搜索
索引结构json:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"char_filter": [
"html_strip"
],
"filter": [
"my_stopwords",
"lowercase"
]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [
"the",
"a",
".",
","
]
}
}
}
},
"mappings": {
"new_goods_index_type": {
"_source": {
"enabled": true
},
"properties": {
"goods_id": {
"type": "integer"
},
"goods_type": {
"type": "byte"
},
"goods_pack": {
"type": "byte"
},
"brand_id": {
"type": "short"
},
"brand_name": {
"type": "keyword"
},
"category_id": {
"type": "short"
},
"category_name": {
"type": "keyword"
},
"goods_name_full": {
"type": "keyword"
},
"goods_name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"goods_sn_full": {
"type": "keyword"
},
"goods_sn": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"goods_subhead": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"goods_desc": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"goods_desc_mobile": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"keywords": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"brand_is_restrict": {
"type": "byte"
},
"show_brand_customer_ids": {
"type": "nested",
"properties": {
"customer_id": {
"type": "integer"
}
}
},
"is_restrict": {
"type": "byte"
},
"show_customer_ids": {
"type": "nested",
"properties": {
"customer_id": {
"type": "integer"
}
}
},
"show_customer_rank_ids": {
"type": "nested",
"properties": {
"customer_rank_id": {
"type": "integer"
}
}
},
"market_price": {
"type": "double"
},
"shop_price": {
"type": "double"
},
"measure_unit": {
"type": "text",
"index": false
},
"comment_num": {
"type": "integer"
},
"goods_img": {
"type": "text",
"index": false
},
"pack_images": {
"type": "text",
"index": false
},
"goods_thumb": {
"type": "text",
"index": false
},
"is_on_sale": {
"type": "byte"
},
"onsale_time": {
"type": "integer"
},
"is_alone_sale": {
"type": "byte"
},
"is_delete": {
"type": "byte"
},
"is_show_index": {
"type": "byte"
},
"is_best": {
"type": "byte"
},
"is_hot": {
"type": "byte"
},
"is_new": {
"type": "byte"
},
"sale_num": {
"type": "integer"
},
"collect_num": {
"type": "integer"
},
"click_count": {
"type": "integer"
},
"sort_order": {
"type": "integer"
},
"video": {
"type": "text",
"index": false
},
"desc_video": {
"type": "text",
"index": false
},
"is_integral": {
"type": "integer"
},
"need_integral": {
"type": "integer"
},
"is_integral_exchange": {
"type": "integer"
},
"is_integral_off_sale": {
"type": "integer"
},
"is_unite_promotion": {
"type": "integer"
},
"goods_sku": {
"type": "nested",
"properties": {
"product_id": {
"type": "integer"
},
"product_sn_full": {
"type": "keyword"
},
"product_sn": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"is_restrict": {
"type": "byte"
},
"show_customer_ids": {
"type": "nested",
"properties": {
"customer_id": {
"type": "integer"
}
}
},
"user_material": {
"type": "nested",
"properties": {
"user_id": {
"type": "integer"
},
"user_material_sn": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"user_material_sn_full": {
"type": "keyword"
},
"user_material_name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"user_material_name_full": {
"type": "keyword"
},
"hopo_material_sn": {
"type": "keyword"
}
}
},
"goods_attr_str": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
},
"pack_sku": {
"type": "nested",
"properties": {
"pack_id": {
"type": "integer"
},
"pack_sn_full": {
"type": "keyword"
},
"pack_sn": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"is_restrict": {
"type": "byte"
},
"show_customer_ids": {
"type": "nested",
"properties": {
"customer_id": {
"type": "integer"
}
}
},
"pack_attr_str": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
},
"keyword_goods_assoc": {
"type": "nested",
"properties": {
"goods_keyword": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
},
"goods_sort": {
"type": "integer"
}
}
}
}
}
}
}
调整搜索逻辑
商品搜索逻辑图
从原先依次对四个索引搜索改为在一个索引搜索,将之前在代码层面处理的分页排序、客户屏蔽、品牌屏蔽转移到搜索引擎处理。
搜索语句示例:
{
"query":{
"bool":{
"must":[
{
"term":{
"is_on_sale":1
}
},
{
"term":{
"is_alone_sale":1
}
},
{
"term":{
"is_delete":0
}
},
{
"terms":{
"brand_id":[
1,
4,
16,
19,
20
]
}
},
{
"bool":{
"should":[
{
"term":{
"goods_name_full":{
"value":"PH",
"boost":3
}
}
},
{
"match":{
"goods_name":{
"query":"PH",
"boost":2,
"minimum_should_match":"100%"
}
}
},
{
"wildcard":{
"goods_name_full":{
"value":"*PH*",
"boost":2
}
}
},
{
"term":{
"goods_sn_full":{
"value":"PH",
"boost":3
}
}
},
{
"match":{
"goods_sn":{
"query":"PH",
"boost":2,
"minimum_should_match":"90%"
}
}
},
{
"wildcard":{
"goods_sn_full":{
"value":"*PH*",
"boost":2
}
}
},
{
"prefix":{
"goods_sn_full":{
"value":"PH",
"boost":2
}
}
},
{
"nested":{
"path":"goods_sku",
"query":{
"term":{
"goods_sku.product_sn_full":{
"value":"PH",
"boost":3
}
}
}
}
},
{
"nested":{
"path":"goods_sku",
"query":{
"match":{
"goods_sku.product_sn":{
"query":"PH",
"boost":2,
"minimum_should_match":"100%"
}
}
}
}
},
{
"nested":{
"path":"goods_sku",
"query":{
"prefix":{
"goods_sku.product_sn_full":{
"value":"PH",
"boost":2
}
}
}
}
},
{
"nested":{
"path":"goods_sku",
"query":{
"wildcard":{
"goods_sku.product_sn_full":{
"value":"*PH*",
"boost":2
}
}
}
}
},
{
"nested":{
"path":"goods_sku",
"query":{
"match":{
"goods_sku.goods_attr_str":{
"query":"PH",
"minimum_should_match":"100%"
}
}
}
}
},
{
"nested":{
"path":"pack_sku",
"query":{
"term":{
"pack_sku.pack_sn_full":{
"value":"PH",
"boost":3
}
}
}
}
},
{
"nested":{
"path":"pack_sku",
"query":{
"match":{
"pack_sku.pack_sn":{
"query":"PH",
"boost":2,
"minimum_should_match":"100%"
}
}
}
}
},
{
"nested":{
"path":"pack_sku",
"query":{
"prefix":{
"pack_sku.pack_sn_full":{
"value":"PH",
"boost":2
}
}
}
}
},
{
"nested":{
"path":"pack_sku",
"query":{
"wildcard":{
"pack_sku.pack_sn_full":{
"value":"*PH*",
"boost":2
}
}
}
}
},
{
"nested":{
"path":"pack_sku",
"query":{
"match":{
"pack_sku.pack_attr_str":{
"query":"PH",
"minimum_should_match":"100%"
}
}
}
}
},
{
"nested":{
"path":"keyword_goods_assoc",
"query":{
"match":{
"keyword_goods_assoc.goods_keyword":{
"query":"PH",
"minimum_should_match":"99%"
}
}
}
}
}
]
}
},
{
"bool":{
"filter":{
"bool":{
"must":[
{
"term":{
"is_restrict":{
"value":0
}
}
},
{
"term":{
"brand_is_restrict":{
"value":0
}
}
},
{
"bool":{
"should":[
{
"bool":{
"must_not":[
{
"nested":{
"path":"goods_sku",
"query":{
"exists":{
"field":"goods_sku.is_restrict"
}
}
}
}
]
}
},
{
"nested":{
"path":"goods_sku",
"query":{
"term":{
"goods_sku.is_restrict":0
}
}
}
}
]
}
},
{
"bool":{
"should":[
{
"bool":{
"must_not":[
{
"nested":{
"path":"pack_sku",
"query":{
"exists":{
"field":"is_restrict"
}
}
}
}
]
}
},
{
"nested":{
"path":"pack_sku",
"query":{
"term":{
"is_restrict":0
}
}
}
}
]
}
},
{
"bool":{
"must_not":[
{
"nested":{
"path":"show_customer_rank_ids",
"query":{
"exists":{
"field":"customer_id"
}
}
}
}
]
}
}
]
}
}
}
}
]
}
},
"sort":[
{
"_score":{
"order":"desc"
}
},
{
"sale_num":{
"order":"desc"
}
},
{
"sort_order":{
"order":"ASC"
}
},
{
"is_new":{
"order":"desc"
}
}
],
"_source":[
],
"highlight":{
"fields":{
"pack_sku.pack_sn_full":{
},
"goods_sku.product_sn_full":{
}
}
},
"from":40,
"size":"20"
}
关键词与商品关联规则维护
通过关键词搜索监控表,筛选出命中率低的关键词,再以这个关键词创建关联规则,将这个关键词与单品、成套、配置整套进行关联,提高命中率
重构遇到的问题与解决办法
通过商品编码的前缀、中间部分、后缀搜索不到商品
原因是prefix、wildcard搜索的字段是text类型,text类型会被分词再进行存储,因此搜索不到结果
解决办法是使用keyword类型字段进行prefix、wildcard搜索,keyword类型与text类型的区别可以参考这篇文章 elasticsearch text类型与keyword类型区别
通过sku编码精确搜索商品出来后,跳转到详情默认选中不是该编码问题
如果用户通过sku的完整编码精确搜索出来商品后,点击商品进入详情时需要默认选中这个sku,这就需要知道该次搜索是不是通过商品的sku编码精确搜索出来的
网上查阅资料后发现,可以使用搜索结果高亮的方法解决这个问题,具体方式是在搜索条件加上sku编码的高亮字段设置,如下
"highlight":{
"fields":{
"pack_sku.pack_sn_full":{
},
"goods_sku.product_sn_full":{
}
}
}
由于sku编码是唯一的,如果通过sku编码命中之后,搜索结果只会存在一个商品信息,highlight字段对应的sku编码字段的值也只有一个元素,以下是通过PH903.00.069
sku编码搜索出来的结果
{
"msg":"搜索成功",
"error_code":0,
"content":{
"lists":{
"total":1,
"current_page":"1",
"per_page":"20",
"last_page":1,
"data":[
{
"goods_id":3490,
"goods_type":1,
"goods_pack":0,
"brand_id":1,
"brand_name":"HOPO",
"category_id":405,
"category_name":"推拉门(方轴)",
"goods_name_full":"PH903-执手平开门窗推拉门窗方轴执手",
"goods_name":"PH903-执手平开门窗推拉门窗方轴执手",
"goods_sn_full":"PH903",
"goods_sn":"PH903",
"goods_subhead":"执手转动时带机械声增强手感;执手美观大方;执手、方轴分离出货,方便用户更换",
"goods_desc":"商品描述",
"goods_desc_mobile":"",
"keywords":[
"方钢执手、七字执手"
],
"brand_is_restrict":0,
"show_brand_customer_ids":[
],
"is_restrict":0,
"show_customer_ids":[
],
"show_customer_rank_ids":[
],
"market_price":"0.00",
"shop_price":"0.00",
"measure_unit":"个",
"comment_num":0,
"goods_img":"images\/202110\/goods_img\/286_thumb_P_16345880204861.jpg",
"pack_images":"",
"goods_thumb":"images\/202110\/goods_img\/286_thumb_P_16345880204861.jpg",
"is_on_sale":1,
"onsale_time":0,
"is_alone_sale":1,
"is_delete":0,
"is_show_index":1,
"is_best":0,
"is_hot":0,
"is_new":0,
"sale_num":2097290,
"collect_num":13,
"click_count":14932,
"sort_order":100,
"video":"",
"desc_video":"",
"is_integral":0,
"need_integral":0,
"is_integral_exchange":0,
"is_integral_off_sale":0,
"is_unite_promotion":0,
"goods_sku":[
{
"product_id":1608,
"product_sn":"PH903.00.012",
"product_sn_full":"PH903.00.012",
"is_restrict":0,
"user_material":[
],
"show_customer_ids":[
],
"goods_attr_str":"喷银|无轴"
},
{
"product_id":1609,
"product_sn":"PH903.00.043",
"product_sn_full":"PH903.00.043",
"is_restrict":1,
"user_material":[
],
"show_customer_ids":[
{
"customer_id":"1048"
}
],
"goods_attr_str":"喷古铜|无轴"
},
{
"product_id":1770,
"product_sn":"PH903.00.044",
"product_sn_full":"PH903.00.044",
"is_restrict":0,
"user_material":[
],
"show_customer_ids":[
],
"goods_attr_str":"无轴|喷深古铜"
},
{
"product_id":3235,
"product_sn":"PH903.00.069H5",
"product_sn_full":"PH903.00.069H5",
"is_restrict":1,
"user_material":[
],
"show_customer_ids":[
{
"customer_id":"1048"
},
{
"customer_id":"272"
},
{
"customer_id":"4615"
}
],
"goods_attr_str":"无轴|亚黑"
},
{
"product_id":4129,
"product_sn":"PH903.00.011H1",
"product_sn_full":"PH903.00.011H1",
"is_restrict":1,
"user_material":[
],
"show_customer_ids":[
{
"customer_id":"763"
},
{
"customer_id":"3076"
},
{
"customer_id":"468"
}
],
"goods_attr_str":"无轴|氧银"
},
{
"product_id":4130,
"product_sn":"PH903.00.045H5",
"product_sn_full":"PH903.00.045H5",
"is_restrict":1,
"user_material":[
],
"show_customer_ids":[
{
"customer_id":"763"
},
{
"customer_id":"1350"
},
{
"customer_id":"453"
},
{
"customer_id":"387"
},
{
"customer_id":"21"
},
{
"customer_id":"1048"
},
{
"customer_id":"2142"
},
{
"customer_id":"136"
},
{
"customer_id":"1967"
},
{
"customer_id":"443"
}
],
"goods_attr_str":"无轴|新氧化深古铜"
}
],
"pack_sku":[
],
"keyword_goods_assoc":[
],
"highlight":{
"goods_sku.product_sn_full":[
"<em>PH903.00.069H5<\/em>"
]
},
"sku_id":3235,
"price_format":{
"number":1,
"price":"32.77",
"price_format":"¥32.77"
}
}
]
}
},
"scope":null,
"log_id":"c715126e1cfce6dff48fad51ff7d5634"
}
如果命中的关键词只有一个,而且是当前商品sku的其中一个,说明这个商品就是通过sku编码精确搜索的,可以通过以下代码查到此次命中的sku
protected function getHitSku($goods)
{
if (isset($goods['highlight']['goods_sku.product_sn_full'])){
$productSnArr = $goods['highlight']['goods_sku.product_sn_full'];
$skuArr = array_column($goods['goods_sku'], 'product_id', 'product_sn');
}
if (isset($goods['highlight']['pack_sku.pack_sn_full'])){
$productSnArr = $goods['highlight']['pack_sku.pack_sn_full'];
$skuArr = array_column($goods['pack_sku'], 'pack_id', 'pack_sn');
}
if (isset($productSnArr)&& isset($skuArr) && is_array($productSnArr) && count($productSnArr) == 1 && !empty($productSnArr[0])){
$keywords = $this->params['keywords'];
$keywordsEm = "<em>{$keywords}</em>";
if ($keywordsEm == $productSnArr[0]){
return $skuArr[$keywords] ?? 0;
}
}
return 0;
}
通过规格搜索不准的问题
规格字段goods_attr_str
的类型是text
,而且设置了保存与搜索的分词器
"goods_attr_str": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
从索引结构可看到用的是ik_max_word
分词器
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"char_filter": [
"html_strip"
],
"filter": [
"my_stopwords",
"lowercase"
]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [
"the",
"a",
".",
","
]
}
}
}
}
规格保存的方式是字符串,使用|
分割
"goods_attr_str":"喷银|无轴"
由于使用的是match_phrase
查询,查询前关键词会被分词,goods_attr_str
保存时也会进行分词后再保存,因此当用户选择多个规格筛选时会出现搜索不准确
解决办法是新增一个goods_attr_str_full
字段,类型为keyword
来保存商品规格,然后采用wildcard
查询,示例如下
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"goods_attr_full": {
"value": "*黑色*"
}
}
},
{
"wildcard": {
"goods_attr_full": {
"value": "*内开*"
}
}
}
]
}
}
}
搜索大小写问题
商品名称和编码用的是大写的字母,创建索引时使用了keyword
类型来保存,分别goods_name_full
和goods_sn_full
字段,对商品名称和编码进行搜索时,用户输入了小写字母的关键字,例如ph312
,搜索不出编码为PH312.00.1T
或名称以PH312.00.1T
开头的商品。
解决办法:
创建normalizer
,goods_name_full
和goods_sn_full
字段设置normalizer
{
"settings":{
"number_of_shards":1,
"number_of_replicas":0,
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"ik_max_word",
"char_filter":[
"html_strip"
],
"filter":[
"my_stopwords",
"lowercase"
]
},
"normalizer":{
"case_insensitive":{
"type":"custom",
"filter":"lowercase"
}
}
},
"filter":{
"my_stopwords":{
"type":"stop",
"stopwords":[
"the",
"a",
".",
","
]
}
}
}
},
"mappings":{
"new_goods_index_type":{
"_source":{
"enabled":true
},
"properties":{
"goods_name_full":{
"type":"keyword",
"normalizer":"case_insensitive"
},
"goods_sn_full":{
"type":"keyword",
"normalizer":"case_insensitive"
}
}
}
}
}
感想
通过这一次重构,对elasticsearch有了更深入的理解,提高了使用elasticsearch的能力,在这个过程当中也遇到了各种各样的问题,大部分是自己对elasticsearch不熟悉导致的,还需继续加强对elasticsearch基础知识的学习