Add Ellipsis to SENTENCE_SPLITTER (#1044)

Ellipsis is used to terminal sentences in Chinese and used a lot at end of title.
This commit is contained in:
Felix Yan 2020-09-12 03:56:46 +08:00 committed by GitHub
parent 12e01656f0
commit 9ee4c21f6b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -202,7 +202,7 @@ class Entry:
SENTENCE_SPLITTER = re.compile( SENTENCE_SPLITTER = re.compile(
r""" r"""
( # A sentence ends at one of two sequences: ( # A sentence ends at one of two sequences:
[.!?\u203C\u203D\u2047\u2048\u2049\u3002\uFE52\uFE57\uFF01\uFF0E\uFF1F\uFF61] # Either, a sequence starting with a sentence terminal, [.!?\u2026\u203C\u203D\u2047\u2048\u2049\u22EF\u3002\uFE52\uFE57\uFF01\uFF0E\uFF1F\uFF61] # Either, a sequence starting with a sentence terminal,
[\'\u2019\"\u201D]? # an optional right quote, [\'\u2019\"\u201D]? # an optional right quote,
[\]\)]* # optional closing brackets and [\]\)]* # optional closing brackets and
\s+ # a sequence of required spaces. \s+ # a sequence of required spaces.