Fix improper charset handling in PY2 path of u(x)

Knowing fully that I may have just added another layer of impropriety, the above line fixed the charset errors I was getting.

I'll illustrate this change with an example string `Charšet`, entered (e.g. through stdin) in UTF-8 encoding.
To the best of my knowledge, the previous version would first have encoded this string to `Char\xc5\xa1et` (i.e., it encoded each byte not in ascii range as a hex escape code), and then have parsed this string to `Charšet` (i.e. after "r" it sees Unicode code point U+00c5 and U+00a1).

My version simply takes this str for what it is: an UTF-8 representation of the unicode string `Charšet`.
This commit is contained in:
Thijs van Dijk 2015-03-22 16:34:58 +01:00
parent 1a65ae57cb
commit 354cc3244c

View file

@ -67,12 +67,7 @@ def set_keychain(journal_name, password):
def u(s):
"""Mock unicode function for python 2 and 3 compatibility."""
if PY3:
return str(s)
elif isinstance(s, basestring) and type(s) is not unicode:
return unicode(s.encode('string-escape'), "unicode_escape")
return unicode(s)
return s if PY3 or type(s) is unicode else s.decode("utf-8")
def py2encode(s):
"""Encode in Python 2, but not in python 3."""