Skip to content
Advertisement

SOLR issue with words containing dash, hypens etc

for some reason my SOLR installation acts wonky (im also a newbie fo this topic..)

example: in my DB i have an item named “Brandname XX-7 Yadda Ladida”

if i search:

Brandname XX7 I don’t get the item on the results (first 20) at all

Brandname XX-7 I get the expected result in 8th position; first position is taken by item “Brandname XX-2 Yadda Ladida”

Brandname XX-7 Ladida I get the expected result in 7th position; first position is taken by item “Brandname XX-2 Yadda Ladida”

Brandname XX-7 Yadda Ladida I get the expected result AGAIN in 7th position; first position is taken by item “Brandname XX-2 Yadda Ladida”

PS. eveything is case insensitive

what am I doing wrong??? please advise..

this is my managed-schema xml file http://pastebin.com/Z9nc36QD

UPDATE this is an example query searching for “boss dd-7”

  "debug":{
    "rawquerystring":"Brandname xx-7",
    "querystring":"Brandname xx-7",
    "parsedquery":"_text_:Brandname (_text_:xx _text_:7)",
    "parsedquery_toString":"_text_:Brandname (_text_:xx _text_:7)",

Advertisement

Answer

ok, not it works by simply removing this line in my schema

<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25" />

and adding

             <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />

my final code (consider my field is text_general)

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
          <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement