Thanks Hank Green.

  • nycki@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 hours ago

    Almost all web traffic now uses the utf-8 encoding, a clever hack which works because ascii is a seven-bit code but web traffic uses 8-bit bytes.

    • If the first bit is 0, treat the byte as ascii.
    • if the first bit is 1, treat the byte as part of a multi-byte unicode character.

    multi-byte characters in utf-8 can officially be up to four bytes long, with 11 of those 32 bits used for tracking the size of the multi-byte block. That leaves 2^21 code points available, about two million in total, easily enough for every alphabet you could need to write on a website, and all without breaking ascii.