FEATURE: Split up text segmentation for Chinese and Japanese.

* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4fe5381f613279901f36ae255c909573
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
This commit is contained in:
Alan Guo Xiang Tan
2022-01-26 15:24:11 +08:00
parent 9ddd1f739e
commit 930f51e175
14 changed files with 406 additions and 72 deletions

View File

@ -0,0 +1,14 @@
# frozen_string_literal: true
class SearchTokenizeChineseValidator
def initialize(opts = {})
end
def valid_value?(value)
!SiteSetting.search_tokenize_japanese
end
def error_message
I18n.t("site_settings.errors.search_tokenize_japanese_enabled")
end
end

View File

@ -0,0 +1,14 @@
# frozen_string_literal: true
class SearchTokenizeJapaneseValidator
def initialize(opts = {})
end
def valid_value?(value)
!SiteSetting.search_tokenize_chinese
end
def error_message
I18n.t("site_settings.errors.search_tokenize_chinese_enabled")
end
end