Ansicht umschalten
Avatar von Fubar70
  • Fubar70

573 Beiträge seit 28.12.2011

Das Verbrechen von Microsoft

Sorry fuer den clickbait aber ich sehs schon grenzwertig verbrecherisch was Microsoft damals angerichtet hatte um ihr OS in den Markt zu druecken und sowas sollte m.M.n. in einer historischen Nachbetrachtung zur Sprache kommen.

Ich meine die "lokalisierten Windows Versionen" / codepages: Sie sind damals hergegangen und haben damit angefangen, je nach Windows Country Version den Bereich 127-255 in einem Byte unterschiedlich zu interpretieren. Also das was IBM noch als lustige Block Art fuer Menus usw nutzte, das haben sie zum Speichern von echter Information verwendet. Und dabei vergessen(?) den jeweils verwendeten encoding standard in die Payload zu stecken. Der Empfaenger(Decoder) der Information hatte keine Chance an die Originalinformation zu kommen, wenn er nicht zufaellig im Besitz der jeweiligen Windows Country Version war plus das (Meta)wissen ueber das werwendete encoding hatte. Ich erinner mich noch an die Zettel mit Windows Country Version, die auf den Disketten unserer auslaendischen Kollegen klebten.

---

Details hab ich mal woanders auf Englisch zusammengeschrieben, wens interessiert:

Bytes Without a Meaning: The IBM/Microsoft Codepage Fubar - And Its Relevance Today

How Global Standards Work
Real quick: IT is about processing information, computers can just handle bits and bytes. So when information is to be stored, it must be converted to bytes, i.e. "encoded". When information is to be restored, so that a system can act or a human can understand it must be "decoded" from bytes.

Note: Text created by and for humans is only a tiny little fraction of information which can be encoded in bytes, by and for systems, in the information processing chain.

If information is exchanged, the decoder must know how and exactly how the encoder converted his information to bytes in order to be able to do the reverse thing and get back the information.

In order to allow information exchange with others, tons of globally agreed standards had been created, defining exactly how encoding of information into bytes has to happen.

Typically many systems, on different "layers" are involved when information is exchanged - and all of them need to be able only to decode their part in order to act correctly, i.e. forward to the next system and in the end, dependent on use case, maybe display sth. like this text to a human, so he can decode within the final layer, his brain. Hopefully ;-)

Here the main possibilities how standards define how to encode information within their layer (the rest in any arbitrary byte sequence is just treated as "payload", i.e. opaque for the current layer).

By position of bits (e.g. IP, TCP, ...), then w/o identifiers

By identifier / value pairs, where identifiers are either

fixed bit sequences

numbers

alphanumerics up to human readable words (e.g. HTTP headers) - prefixed or wrapped within defined 'special' characters, like '<', '/>' or ANSI escape sequences.

When it is done with words, those are, by definition for a global standard, always ASCII.

The Mistake

Now the nasty thing which happened long ago: IBM came up with this not so bad idea regarding how more information could be encoded within the same amount of bytes, for a certain type of information: Symbols intended for humans - by mapping unstandardised bit sequences, using various mappings, i.e. lookup tables.

IBM did this just for eye candy (text base menus) - but MS, trying to get their OS landed worldwide, for symbols which humans use to encode text. Via this, humans could enter or get text including symbols of their local language and e.g. print it out locally - using an affordable OS. This boosted the worldwide perception of the OS with no questions asked, creating a 'de-facto' standard (asian countries were later also covered using 2 Byte wide standards).

But in their de-facto standards for text encoding, they forgot(?) to specify within the payload the encoding standard applied(!) (e.g. by defining a byte at position 0, making decoding possible for anybody with JUST the payload, the bytes at hand). Instead, they hard coded(!) it into the "localized versions" of their OS itself. Result: In global companies people had to put stickers on media, informing the receiver about the codepage used to encode the text... You had to basically install that Russian windows in order to be able to read the presentation from your Russian colleague and there was no way around that.

Fortunately this mess was resolved by the unicode indirection, defining ONE intermediate lookup table to encode ALL symbols humans (not systems) globally use to write down information. And the encoding of the keys in THIS table into the final bytes is done via standards like UTF-8.

Unfortunately, though, textual information for humans is often meant to last and is being stored, in order to be decoded often over time - unlike e.g. a proprietary routing protocol or the fancy text menu enhancements of IBM. Those come and go, texts last.

Only in recent years we overcame the problem and code pages bear next to zero relevance anymore, for most computing tasks.

And even if they would: It is unacceptable to 'punish' each and any user of a computing language for a domain specific engineering problem.

I have not much knowledge about the windows platform so I quote wikipedia: Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms.

I consider the case close, the problem WAS huge in the human text processing domain - but today it is practically of no relevance any more.

Note: We still don't see the unicode -> bytes encoding standard within the payload itself (except in one of them) but the commonly accepted one UTF-8 is able to detect 'itself', having just payload at hand. Further, in networking, all wrapping protocols, able to also carry text payload for humans, do define identifiers for the decoding standard to apply.

Das Posting wurde vom Benutzer editiert (25.08.2020 00:00).

Bewerten
- +
Ansicht umschalten