FIX: Use the same mode for chinese search when indexing and querying. (#14780)

The `白名单` term becomes `名单白名单` after it is processed by cppjieba in :query mode. However, `白名单` is not tokenized as such by cppjieba when it appears in a string of text. Therefore, this may lead to failed matches as the search data generated while indexing may not contain all of the terms generated by :query mode. We've decided to maintain parity for now such that both indexing and querying uses the same :mix mode. This may lead to less accurate search but our plan is to properly support CJK search in the future.
2025-05-31 20:45:35 +08:00 · 2021-11-01 10:14:47 +08:00
parent a059c7251f
commit a03c48b720
2 changed files with 4 additions and 12 deletions
--- a/spec/components/search_spec.rb
+++ b/spec/components/search_spec.rb
@ -1107,7 +1107,7 @@ describe Search do
    it 'splits English / Chinese and filter out stop words' do
      SiteSetting.default_locale = 'zh_CN'
      data = Search.prepare_data(sentence).split(' ')
-      expect(data).to eq(["Discourse", "中国", "基础", "设施", "基础设施", "网络", "正在", "组装"])
+      expect(data).to eq(["Discourse", "中国", "基础设施", "网络", "正在", "组装"])
    end

    it 'splits for indexing and filter out stop words' do
@ -1119,12 +1119,6 @@ describe Search do
    it 'splits English / Traditional Chinese and filter out stop words' do
      SiteSetting.default_locale = 'zh_TW'
      data = Search.prepare_data(sentence_t).split(' ')
-      expect(data).to eq(["Discourse", "太平", "平山", "太平山", "森林", "遊樂區"])
-    end
-
-    it 'splits for indexing and filter out stop words' do
-      SiteSetting.default_locale = 'zh_TW'
-      data = Search.prepare_data(sentence_t, :index).split(' ')
      expect(data).to eq(["Discourse", "太平山", "森林", "遊樂區"])
    end