Almost all web traffic now uses the utf-8 encoding, a clever hack which works because ascii is a seven-bit code but web traffic uses 8-bit bytes.
If the first bit is 0, treat the byte as ascii.
if the first bit is 1, treat the byte as part of a multi-byte unicode character.
multi-byte characters in utf-8 can officially be up to four bytes long, with 11 of those 32 bits used for tracking the size of the multi-byte block. That leaves 2^21 code points available, about two million in total, easily enough for every alphabet you could need to write on a website, and all without breaking ascii.
Almost all web traffic now uses the utf-8 encoding, a clever hack which works because ascii is a seven-bit code but web traffic uses 8-bit bytes.
multi-byte characters in utf-8 can officially be up to four bytes long, with 11 of those 32 bits used for tracking the size of the multi-byte block. That leaves 2^21 code points available, about two million in total, easily enough for every alphabet you could need to write on a website, and all without breaking ascii.
Very neat!