Support title splitting for fullwidth CJK terminals (#1163)

* Split by fullwidth terminals without spaces.
* Add test
* Update write.feature
This commit is contained in:
eshrh 2021-01-23 18:29:43 -05:00 committed by GitHub
parent ef563c807f
commit 8a78c34917
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 22 additions and 5 deletions

View file

@ -28,6 +28,20 @@ Feature: Writing new entries.
| basic_folder |
| basic_dayone |
Scenario Outline: CJK entry should be split at fullwidth period without following space.
Given we use the config "<config_file>.yaml"
And we use the password "test" if prompted
When we run "jrnl "
And we run "jrnl -1"
Then the output should contain "| "
Examples: configs
| config_file |
| basic_onefile |
| basic_encrypted |
| basic_folder |
| basic_dayone |
Scenario Outline: Writing an entry from command line should store the entry
Given we use the config "<config_file>.yaml"
And we use the password "bad doggie no biscuit" if prompted

View file

@ -204,14 +204,17 @@ class Entry:
# https://github.com/fnl/segtok
SENTENCE_SPLITTER = re.compile(
r"""
( # A sentence ends at one of two sequences:
[.!?\u2026\u203C\u203D\u2047\u2048\u2049\u22EF\u3002\uFE52\uFE57\uFF01\uFF0E\uFF1F\uFF61] # Either, a sequence starting with a sentence terminal,
(
[.!?\u2026\u203C\u203D\u2047\u2048\u2049\u22EF\uFE52\uFE57] # Sequence starting with a sentence terminal,
[\'\u2019\"\u201D]? # an optional right quote,
[\]\)]* # optional closing brackets and
\s+ # a sequence of required spaces.
)""",
[\]\)]* # optional closing bracket
\s+ # AND a sequence of required spaces.
)
|[\uFF01\uFF0E\uFF1F\uFF61\u3002] # CJK full/half width terminals usually do not have following spaces.
""",
re.VERBOSE,
)
SENTENCE_SPLITTER_ONLY_NEWLINE = re.compile("\n")