2016-04-19

How Microsoft Make the World a Worse Place

"Microsoft has no taste" - Steve Jobs, 1995.

The computer world began from Unix. BSD, Linux and Mac can be considered as Unix-like. The opposite side is Windows, created by M$. Yes, money is major stuff Microsoft care about, instead of making the world a better place.

Originally, the computer operating environment is simple and unified. Directory root is "/". A path looks like /bin/program/subfolder. But Windows changed that. As you can see, disk is something like C:. Path is combined with "\" instead of "/". A full path is something like "C:\Programs\xxx\yyy" The worse part is: nothing is unified in Windows. For example, the network path becomes something like "\\myserver\xxx", no more something like "C:".

The ending of a string line is LF (line feed) in Unix. Microsoft changed that to LF+CR (Carry Return). If you have a file used in Linux or Mac and your colleague edits it in Windows. All LF characters might be converted to LF+CR. Then, it will make you crazy when you commit the file to Git or other version control systems. Git will tell you every single line is modified.

The issue of character encoding is the worse part. For internationalization and localization, now we have various Unicode encoding formats such as UTF-8, UTF-16 and UTF-32. Guess what Microsoft use? The answer is none! Microsoft uses wide character type (LPWSTR) which is two bytes per character. LPWSTR and UTF-16 are different things. UTF-16 is variable-length encoding while LPWSTR is fixed as two bytes.


Wide character (wchar_t or LPWSTR) can't represent all possible glyphs of languages in the world. So, Microsoft used something called "Code Page". In Windows, you have to set a Code Page for all the programs to correctly interpret the characters in a string. This is caused by WinAPIs that process string encoding based on system Code Page.

There isn't Code Page issue in Unix-like OSes. Both Linux and Mac OS X well support unicode encoding, especially UTF-8. UTF-8 is widely used in internet formats such as HTML, XML. An UTF-8 code unit is 8 bits (1 byte). So the type is char* in C or std:string in C++. For example, when opening a file by calling fopen(). The type of the path argument is char* and the API knows how to deal with it when it contains UTF-8 encoded characters. As you can see, the usage is unified, simple and easy to use no matter the input string is ASCII or UTF-8 encoded.

In summary, M$ have successfully make the computer world a worse place in the past two decades.