Languages, Alphabets, and Encodings

Gordon P. Hemsley, <>


This document strives to record the alphabets used in the world's language, and to identify the Web-accessible encodings that are capable of rendering text written in such alphabets.

Structure of this document

The document is first split up by script, identified by their 4-character script tag, as listed in the IANA Language Subtag Registry. Each script page then lists languages that use alphabets derived from that base script. Each language then lists the primary letters of the alphabet, plus certain secondary letters and punctuation characters commonly in use in that language.

These lists will be in the form of tables which describe the Unicode codepoint, block, and description of each character, and they will be in alphabetical order, where possible. [UNICODE]

Following the lists of characters will be a list of Web-accessible encodings that are capable of displaying text in that language, in order of preference. In general, UTF-8 is recommended for all newly created documents in any language. [ENCODING]

Full table of contents

  1. Latin script [Latn]
    1. English [en]
    2. Kashubian [csb]
    3. Hawaiian [haw]
    4. Hiligaynon [hil]


Encoding Standard. Anne van Kesteren. WHATWG.
The Alphabets of Europe. Michael Everson. .
Latin-derived alphabet. Wikipedia, The Free Encyclopedia.
Unicode Standard. Unicode Consortium.