Elasticsearch:如何针对一个区域的搜索结果进行加权

编程入门 行业动态 更新时间:2024-10-12 10:30:21

Elasticsearch:如何针对一个区域的<a href=https://www.elefans.com/category/jswz/34/1762107.html style=搜索结果进行加权"/>

Elasticsearch:如何针对一个区域的搜索结果进行加权

在我们实际的很多位置搜索中,我们有许多案例需要针对某个区域的搜索结果进行加权,从而使得这个区域的搜索结果的得分较高而排在返回结果的前面。比如有一下的一些使用场景:

  • 针对地理位置搜索,对于某个区域的搜索结果进行加权,从而提高对这个区域人口的警觉。在 Elasticsearch 中,我们可以使用行政区域来进行检索。你可以在文章中看到这个是如何实现的。关于 EMS (Elastic Maps Service) 的更多可以在链接找到。
  • 在实际的应用中,我们可能遇到很少的情况是按照行政区域进行划分的。针对一些特殊行业,比如快递。我们可能指定某些快递员专门负责一个区域的投放,但是如果在该区域的快递全部投放完毕,可以让这部分人帮忙投放其它相邻区域的投放。在这种情况下,可以针对这些快递员负责的区域进行加权,从而让他们负责的区域的快件搜索结果靠前,相邻区域的次之。

针对上面的两种情况,我们可能需要针对他们进行特别区域的划分。我们可以使用一个 Polygon 来画一个我们想要的区域,并对它的搜索结果进行加权。

采用的方法

我们可以通过 Elasticsearch 所提供的 compound query:

{"query": {"bool": {"must": [搜索的区域],"should": [对搜索区域交叉的区域进行加权]}}
}

如果你对 compound query 不是很熟的话,请参考我之前的文章 “开始使用Elasticsearch (2)”。

准备数据

在做这个练习之前,你可以参考我之前的文章 “Elasticsearch:如何制作 GeoJSON 文件并进行地理位置搜索”。在那里我详述了如何把数据导入及使用 GeoJSON 来制作一个边界。 针对今天的练习,我们使用如下的数据:

POST my_locations/_bulk
{ "index" : { "_id" : "3" } }
{ "location" : [ -104.06876, 39.77462 ], "name": "C" }
{ "index" : { "_id" : "4" } }
{ "location" : [ -103.59538, 38.5718 ], "name": "D" }
{ "index" : { "_id" : "5" } }
{ "location" : [ -104.94538, 38.16629 ], "name": "E" }
{ "index" : { "_id" : "1" } }
{ "location" : [ -105.38369, 40.11067 ], "name": "A" }
{ "index" : { "_id" : "6" } }
{ "location" : [ -107.99602, 39.17918 ], "name": "F" }
{ "index" : { "_id" : "2" } }
{ "location" : [ -104.34051, 40.03688 ], "name": "B" }

运行上面的命令,创建相应的索引模式。按照之前的文章,为了展示的目的,我们也创建了一个 GeoJSON 文件:

simple.json

{"type": "FeatureCollection","features": [{"type": "Feature","properties": {},"geometry": {"type": "Polygon","coordinates": [[[-106.10465,40.16875],[-106.0736,39.33315],[-105.142,39.16482],[-103.85329,39.18889],[-103.52723,39.77609],[-104.17935,40.27545],[-105.17305,40.33465],[-106.10465,40.16875]          ]]}},{"type": "Feature","properties": {},"geometry": {"type": "Polygon","coordinates": [[[-109.07025,41.00014],[-109.07025,36.99584],[-102.02114,36.99584],[-102.02114,41.00014],[-109.07025,41.00014]                ]]}}
]
}

我们可以按照文章 “Elasticsearch:如何制作 GeoJSON 文件并进行地理位置搜索” 中所介绍的那样制作相应的边界:

如上图所示,文档 A, B, C 位于定义的 Polygon  之内,而 D, E, F 则不在 Polygon 之内。我们现在的要求是:

  1. 搜索到所有位于长方形内的所有文档
  2. 针对位于 Polygon 内的所有文档进行加权,从而使得它们的得分较高

搜索结果

按照上面的要求,我们可以进行如下的搜索:

GET my_locations/_search
{"query": {"bool": {"must": [{"geo_shape": {"location": {"shape": {"type": "polygon","coordinates": [[[-109.07025,41.00014],[-109.07025,36.99584],[-102.02114,36.99584],[-102.02114,41.00014],[-109.07025,41.00014]  ]]}}}}],"should": [{"geo_polygon": {"location": {"points": [[-106.10465,40.16875],[-106.0736,39.33315],[-105.142,39.16482],[-103.85329,39.18889],[-103.52723,39.77609],[-104.17935,40.27545],[-105.17305,40.33465],[-106.10465,40.16875]          ]}}}]}}
}

请注意在 must 中,我们使用的是在 GeoJSON 文件 simple.json 中的 rectangle 的坐标,而在 should 中我们使用的是 ploygon 的坐标。由于 rectange 可以看做是 ploygon 的一种特殊形式,我们统一使用 geo_shape 来进行搜索。当然在这里针对 rectangle 的搜索你也可以使用 geo_bounding_box 来进行搜索。

搜索的结果如下:

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 6,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "my_locations","_type" : "_doc","_id" : "3","_score" : 1.0,"_source" : {"location" : [-104.06876,39.77462],"name" : "C"}},{"_index" : "my_locations","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {"location" : [-105.38369,40.11067],"name" : "A"}},{"_index" : "my_locations","_type" : "_doc","_id" : "2","_score" : 1.0,"_source" : {"location" : [-104.34051,40.03688],"name" : "B"}},{"_index" : "my_locations","_type" : "_doc","_id" : "4","_score" : 0.0,"_source" : {"location" : [-103.59538,38.5718],"name" : "D"}},{"_index" : "my_locations","_type" : "_doc","_id" : "5","_score" : 0.0,"_source" : {"location" : [-104.94538,38.16629],"name" : "E"}},{"_index" : "my_locations","_type" : "_doc","_id" : "6","_score" : 0.0,"_source" : {"location" : [-107.99602,39.17918],"name" : "F"}}]}
}

从返回的结果来看,A, B, C 文档的得分较高,并排在前面。

如果我们不使用加权:

GET my_locations/_search
{"query": {"bool": {"must": [{"geo_shape": {"location": {"shape": {"type": "polygon","coordinates": [[[-109.07025,41.00014],[-109.07025,36.99584],[-102.02114,36.99584],[-102.02114,41.00014],[-109.07025,41.00014]  ]]}}}}]}}
}

搜索的结果是:

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 6,"relation" : "eq"},"max_score" : 0.0,"hits" : [{"_index" : "my_locations","_type" : "_doc","_id" : "3","_score" : 0.0,"_source" : {"location" : [-104.06876,39.77462],"name" : "C"}},{"_index" : "my_locations","_type" : "_doc","_id" : "4","_score" : 0.0,"_source" : {"location" : [-103.59538,38.5718],"name" : "D"}},{"_index" : "my_locations","_type" : "_doc","_id" : "5","_score" : 0.0,"_source" : {"location" : [-104.94538,38.16629],"name" : "E"}},{"_index" : "my_locations","_type" : "_doc","_id" : "1","_score" : 0.0,"_source" : {"location" : [-105.38369,40.11067],"name" : "A"}},{"_index" : "my_locations","_type" : "_doc","_id" : "6","_score" : 0.0,"_source" : {"location" : [-107.99602,39.17918],"name" : "F"}},{"_index" : "my_locations","_type" : "_doc","_id" : "2","_score" : 0.0,"_source" : {"location" : [-104.34051,40.03688],"name" : "B"}}]}
}

从上面,我们可以看出来 A,B,C 的结果不一定是在前面。

在上面需要注意的一点是:geo_shape 搜索在最新的版本中是建议可替代 geo_polygon,但是在实际的使用中,我发现 geo_shape 的搜索是不给任何分数的,score 为 0。geo_bounding_box 以及 geo_polygon 是可以给出一个分数的。在这种应用场景中建议使用它们来计分。

更多推荐

Elasticsearch:如何针对一个区域的搜索结果进行加权

本文发布于:2024-03-23 21:40:18,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1743145.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:搜索结果   区域   Elasticsearch

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!