Commit Graph

17 Commits (jebbatime)

Author SHA1 Message Date
Damien George 7c85c7c210 py/unicode: Fix check for valid utf8 being stricter about contn chars. 2018-11-26 16:13:08 +11:00
tll 68c28174d0 py/objstr: Add check for valid UTF-8 when making a str from bytes.
This patch adds a function utf8_check() to check for a valid UTF-8 encoded
string, and calls it when constructing a str from raw bytes.  The feature
is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and
is enabled if unicode is enabled.  It costs about 110 bytes on Thumb-2, 150
bytes on Xtensa and 170 bytes on x86-64.
2017-09-06 16:43:09 +10:00
Damien George e9404e5f5f tests: Improve coverage of array, range, dict, slice, exc, unicode. 2016-10-17 11:43:47 +11:00
Paul Sokolovsky d1771bbae0 tests/unicode_subscr.py: Detailed test for subscripting unicode strings. 2016-07-25 19:28:19 +03:00
Damien George 75a811a6df tests: Move int+unicode test to unicode-specific test directory. 2015-09-07 21:36:24 +01:00
Damien George 0be3c70cd8 py/lexer: Raise SyntaxError when unicode char point out of range. 2015-09-07 17:19:17 +01:00
Damien George 51b9a0d0c4 py/objstr: Make string formatting 8-bit clean. 2015-08-29 23:13:51 +01:00
Damien George 7ed58cb663 py: Support unicode (utf-8 encoded) identifiers in Python source.
Enabled simply by making the identifier lexing code 8-bit clean.
2015-06-09 10:58:07 +00:00
Damien George 9dd3640464 tests: Add missing tests for builtins, and many other things. 2015-04-04 22:05:30 +01:00
Damien George 92ab95f215 tests: Add some tests to improve coverage. 2015-01-29 14:56:09 +00:00
stijn a3efe04dce Use mode/encoding kwargs in io and unicode tests
mode argument is used to assert it works
encoding argument is used to make sure CPython uses the correct encoding
as it does not automatically use utf-8
2014-10-21 22:10:38 +03:00
Damien George 1694bc733d py: Add stream reading of n unicode chars; unicode support by default.
With unicode enabled, this patch allows reading a fixed number of
characters from text-mode streams; eg file.read(5) will read 5 unicode
chars, which can made of more than 5 bytes.

For an ASCII stream (ie no chars > 127) it only needs to do 1 read.  If
there are lots of non-ASCII chars in a stream, then it needs multiple
reads of the underlying object.

Adds a new test for this case.  Enables unicode support by default on
unix and stmhal ports.
2014-07-19 18:34:04 +01:00
Paul Sokolovsky ed07d035d5 tests: Add basic test for unicode file i/o. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky 63143c94ce tests: Test for explicit start/end args to str methods for unicode. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky b1949e4c09 tests: Add tests for unicode find()/rfind()/index(). 2014-06-27 00:04:19 +03:00
Paul Sokolovsky 17994d1bd3 tests: Add test for unicode string iteration. 2014-06-27 00:04:19 +03:00
Chris Angelico 1e3781bc35 tests: Add unicode test. 2014-06-27 00:04:17 +03:00