lucene - Solr wildcard issue with '-' character -
i using solr , tokenizing field follows:
<field name="title" type="text_general" multivalued="false" indexed="true" stored="true"> <analyzer> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </field>
i append * @ each search field matching result: title:app* example app* give me app,application , similar result
but if search term '-' in query fails return anything. example:
title:child-play* not return result title:child-play !!
can point me might issue.
after debug got : title:child-play
"debug":{ "rawquerystring":"title:child-play", "querystring":"title::child-play", "parsedquery":"title::child title::play", "parsedquery_tostring":"title::child title::play",
for title:child-play*
"debug":{ "rawquerystring":"companyname:child-play*", "querystring":"companyname:child-play*", "parsedquery":"companyname:child-play*", "parsedquery_tostring":"companyname:child-play*",
i recommend use worddelimiterfilterfactory
just change type of field "custom type", in case it's 'text_general"
<field name="title" type="text_general"/>
then need create new type
for example, settings. can customise how want.
<fieldtype name="text_general" class="solr.textfield" omitnorms="false" positionincrementgap="100" multivalued="true"> <analyzer type="index"> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.stopfilterfactory" words="stopwords.txt" ignorecase="true"/> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.worddelimiterfilterfactory" types="wdfftypes.txt" generatenumberparts="0" stemenglishpossessive="0" splitoncasechange="1" preserveoriginal="1" catenateall="1" catenatewords="1" catenatenumbers="1" generatewordparts="1" splitonnumerics="1"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.stopfilterfactory" words="stopwords.txt" ignorecase="true"/> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.worddelimiterfilterfactory" types="wdfftypes.txt" generatenumberparts="1" stemenglishpossessive="0" splitoncasechange="1" preserveoriginal="1" catenateall="1" catenatewords="1" catenatenumbers="1" generatewordparts="1" splitonnumerics="1"/> </analyzer> </fieldtype>
please read more information here
https://wiki.apache.org/solr/analyzerstokenizerstokenfilters
arguments:
generatewordparts: (integer, default 1) if non-zero, splits words @ delimiters. example:"camelcase", "hot-spot" -> "camel", "case", "hot", "spot" generatenumberparts: (integer, default 1) if non-zero, splits numeric strings @ delimiters:"1947-32" ->"1947", "32" splitoncasechange: (integer, default 1) if 0, words not split on camel-case changes:"bugblaster-xl" -> "bugblaster", "xl". example 1 below illustrates default (non-zero) splitting behavior. splitonnumerics: (integer, default 1) if 0, don't split words on transitions alpha numeric:"fembot3000" -> "fem", "bot3000" catenatewords: (integer, default 0) if non-zero, maximal runs of word parts joined: "hot-spot-sensor's" -> "hotspotsensor" catenatenumbers: (integer, default 0) if non-zero, maximal runs of number parts joined: 1947-32" -> "194732" catenateall: (0/1, default 0) if non-zero, runs of word , number parts joined: "zap-master-9000" -> "zapmaster9000" preserveoriginal: (integer, default 0) if non-zero, original token preserved: "zap-master-9000" -> "zap-master-9000", "zap", "master", "9000" protected: (optional) pathname of file contains list of protected words should passed through without splitting. stemenglishpossessive: (integer, default 1) if 1, strips possessive "'s" each subword.
Comments
Post a Comment