Skip to content
Advertisement

Elastic Search on multiple fields with fuzziness matches and sort on multiple fields combined scores

I’m working with Elastic Search in Laravel my index has 3 fields text,mood,haloha_id. First I want to match "haloha_id"(consider haloha_id as post and text as comments on that post) if match then do further matching. suppose "haloha_id" is matched now I want to match a substring in the “text” field then match “mood”(which is integer either 0,1,2 etc) “mood should be matched only if some of “text” is matched otherwise not.I’m making Like Mine query mean comments matching with user’s comments for the specific post will be shown only. The issue in my query is that

  • My own Comments doesn’t appear to the top hence its matched 100%

  • If someone “mood” and “comments” matched 100% to mine then it doesn’t appear to the top.

    I removed “mood” related query but score has no effect it means score doesn’t include mood matched score.

Here is my query.

 "query"=>[      

    "bool"=>[                                
        "should"=>[
            "match"=>[
                "text"=>[
                    "query"=>$userHaloha->filtered_text,
                    "fuzziness"=>"AUTO",                
                ]
            ]                           
        ],
        "minimum_should_match"=>1,
        "must"=>[
            "match"=>[
                "mood"=>$userHaloha->mood,            
            ],
            "match"=>[
                "haloha_id"=>$userHaloha->haloha_id
            ]
        ] 

Advertisement

Answer

Query is self explanatory. I have added “haloha_id” to filter block (which doesnot score documents), “text” to must block(to score documents) and “mood” to should block(to boost documents)

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "haloha_id": "5ecf6bff25a36366cd134db2"
          }
        }
      ],
      "must": [
        {
          "match": {
            "text": {
              "query": "chilli ",
              "fuzziness": "auto"
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "mood": {
              "value": 2
            }
          }
        }
      ]
    }
  }
}

Issue in mood:3 getting higher ranked than mood:2(searched term in should clause) is due to sharding

From docs

If you notice that two documents with the same content get different scores or that an exact match is not ranked first, then the issue might be related to sharding. By default, Elasticsearch makes each shard responsible for producing its own scores. However since index statistics are an important contributor to the scores, this only works well if shards have similar index statistics. The assumption is that since documents are routed evenly to shards by default, then index statistics should be very similar and scoring would work as expected. However in the event that you either:

use routing at index time, query multiple indices, or have too little data in your index then there are good chances that all shards that are involved in the search request do not have similar index statistics and relevancy could be bad.

If you have a small dataset, the easiest way to work around this issue is to index everything into an index that has a single shard (index.number_of_shards: 1), which is the default. Then index statistics will be the same for all documents and scores will be consistent.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement