micropython

Commit Graph

Author	SHA1	Message	Date
Damien George	7c85c7c210	py/unicode: Fix check for valid utf8 being stricter about contn chars.	2018-11-26 16:13:08 +11:00
tll	68c28174d0	py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64.	2017-09-06 16:43:09 +10:00
Damien George	e9404e5f5f	tests: Improve coverage of array, range, dict, slice, exc, unicode.	2016-10-17 11:43:47 +11:00
Paul Sokolovsky	d1771bbae0	tests/unicode_subscr.py: Detailed test for subscripting unicode strings.	2016-07-25 19:28:19 +03:00
Damien George	75a811a6df	tests: Move int+unicode test to unicode-specific test directory.	2015-09-07 21:36:24 +01:00
Damien George	0be3c70cd8	py/lexer: Raise SyntaxError when unicode char point out of range.	2015-09-07 17:19:17 +01:00
Damien George	51b9a0d0c4	py/objstr: Make string formatting 8-bit clean.	2015-08-29 23:13:51 +01:00
Damien George	7ed58cb663	py: Support unicode (utf-8 encoded) identifiers in Python source. Enabled simply by making the identifier lexing code 8-bit clean.	2015-06-09 10:58:07 +00:00
Damien George	9dd3640464	tests: Add missing tests for builtins, and many other things.	2015-04-04 22:05:30 +01:00
Damien George	92ab95f215	tests: Add some tests to improve coverage.	2015-01-29 14:56:09 +00:00
stijn	a3efe04dce	Use mode/encoding kwargs in io and unicode tests mode argument is used to assert it works encoding argument is used to make sure CPython uses the correct encoding as it does not automatically use utf-8	2014-10-21 22:10:38 +03:00
Damien George	1694bc733d	py: Add stream reading of n unicode chars; unicode support by default. With unicode enabled, this patch allows reading a fixed number of characters from text-mode streams; eg file.read(5) will read 5 unicode chars, which can made of more than 5 bytes. For an ASCII stream (ie no chars > 127) it only needs to do 1 read. If there are lots of non-ASCII chars in a stream, then it needs multiple reads of the underlying object. Adds a new test for this case. Enables unicode support by default on unix and stmhal ports.	2014-07-19 18:34:04 +01:00
Paul Sokolovsky	ed07d035d5	tests: Add basic test for unicode file i/o.	2014-06-27 00:04:20 +03:00
Paul Sokolovsky	63143c94ce	tests: Test for explicit start/end args to str methods for unicode.	2014-06-27 00:04:20 +03:00
Paul Sokolovsky	b1949e4c09	tests: Add tests for unicode find()/rfind()/index().	2014-06-27 00:04:19 +03:00
Paul Sokolovsky	17994d1bd3	tests: Add test for unicode string iteration.	2014-06-27 00:04:19 +03:00
Chris Angelico	1e3781bc35	tests: Add unicode test.	2014-06-27 00:04:17 +03:00

17 Commits (jebbatime)