Every time I end up having to use it, I find new failures in its design which are even more mentally retarded than the time before. Is my suspicion unfounded, or does UTF-8 have some deep rabbit-hole beneath the surface?Īll of my contact with Python is to fix broken stuff, where most of the time the fix is to start over with a better platform. I don't doubt its usefulness, I just get concerned at the lack of anyone presenting any real contrary position at all. This almost never happens unless it's a format lock-in to secure future compliance of the developer, or incur royalties if a piece of a program happens to be able to encode it (MP3 is that way). If it has the blessing of that much of the world, that must include communists and degenerates. I have no direct problem supporting UTF-8 and running tests against UTF-8 input and those icon characters it has, but its dominance has me very suspicious. Everyone recommends it as a standard, even Microsoft. The latter whores itself out and has no criticism of UTF-8 at all, and the former only really covers the obvious points about some operations being mildly less convenient with variable-width encoding and sometimes using 3 bytes instead of 2 for an asian glyph. Infogalactic has UTF-8 at accounting for 85% of websites in 2015, and Wikipedia claims 98% presently. Actually that is really a different manifestation of insistence on fixed width rather than UTF-16 specifically, but that type of thinking is still the problem. Python 3 sort of behaves itself, while in the background it insists that if a single string character would be multi-byte in UTF-8 then the entire fucking string gets converted to a fixed width encoding big enough for that. programming for niggers) 2 tried to "help" with Unicode in the same way that cats "help" ensure that everything is knocked successfully onto the floor. Windows purports to use wide-strings for files, but it is broken. Even C++ has an entire wide-string type and string library suite for it which is widely known as a massive trap to avoid and put UTF-8 bytes inside normal strings instead. UTF-16 has fucked everything up for zero benefit in return. This is indeed literally as retarded as having to use a gas-powered generator to recharge an electric car.Įven worse, the UTF-8 system could store Unicode up to 0x7FFFFFFF, but that UTF-16 extension only takes it up to 0x10FFFF, and now Unicode itself specifies 0x10FFFF as the limit just to appease UTF-16. for using pairs of characters in that range to represent 0x10000 and above. It looks to me that they got up to 0xD000 before admitting that it is going to run dry. The Unicode character space started getting flooded with every alphabet in existence with all the durka durkas and the ching chang chongs, and symbols for mathematics, music and such. Java, C# and JavaScript all signed up to it for their fixed width string storage. Two bytes per character, costing a load of extra storage for most files, and absolutely no plan for what happens when we reach 0xFFFF. Then some butthurt happened in the early 1990s, and someone just had to insist on fixed byte width, and out popped UTF-16. The variable width is an issue for a few types of program such as word processors, though most operations (copying, sorting, searching) need not care. UTF-8 is proudly variable-width, matching ASCII in most documents while using only those upper half byte values to encode non-ASCII, a robust system which reduces impact when an unaware program reads it as ASCII. in return for accepting that the character encoding is no longer one byte each, and moreover may be variable width. Standard ASCII is one byte per character, ending at 0x7F, which gives just about enough space for control bytes, spaces, punctuation and uppercase/lowercase Latin.Įxtended ASCII uses the upper half to include some accented letters, and there have been attempts to use dreaded "code pages" for switching alphabet entirely, but Unicode is the proper international solution.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |