Unicode Pages
These pages contain some material related to
Unicode and the UTF-8
encoding.
There is a collection of UTF-8 encoded sample
text files which might be helpful to test unicode implementations
or fonts. I also made a generated, simplistic index of the
unicode chart tables in text format,
since the official
charts are in PDF format and might therefore be difficult to use on
text terminals, or for testing Unicode fonts.
Unicode sample text files (UTF-8 encoding)
numbers.txt
This file contains a table of digits and numerals
in various writing systems and presentation forms.
latinres.txt
A table of characters whose glyphs are very close
resemblants to those of the Latin alphabet. Useful for
comparing similarities and differences.
diaeresis.txt
A (non-exhaustive) table of characters with
diaeresis, useful for consistency comparisons.
alphabets.txt
A selection of alphabets for different languages
and writing systems.
runic.txt
This file contains transliteration tables of
old Germanic and Anglo-Saxon Runes (Fuþorks).
utf8boxes.txt
A file to test the line drawing characters from
Unicode Section 2500..257F.
You can only really expect this to work on monospace
or text terminal fonts.
utf8block.txt
A file to test the block drawing characters from
Unicode Section 2580..259F.
You can only really expect this to work on monospace
or text terminal fonts.
utf8super.txt
A file for testing superscript / subscript
characters present in the Unicode standard.
iroha.txt
A file containing the Japanese Iroha poem
in Hiragana, Katakana and Kanji (encoded in UTF-8).
utf8runes.txt
utf8cards.txt
A file for testing the card game symbols hearts,
spades, clubs and diamonds; also the inverted versions.
utf8chess.txt
A file containing some sample chess boards
for the testing of the Unicode chess game characters.
charsets.txt
Code tables of various 8-bit character sets, including
all those from the ISO-8859 series and also some
others. The file itself is encoded in UTF-8.
hearts.txt
This file lists Unicode points of glyphs containing
hearts or heart-like shapes.
tengwar.txt
A test file for the support of the Tengwar
script, as developed by J.R.R. Tolkien.
The Tengwar characters in this file are encoded following
Michael Everson's Tengwar Proposal, in the Private Use block
of Plane 0 (starting from U+E000).
ring.txt
An extract from Tolkien's original work Lord Of The Rings,
featuring the first mentioning of the ring inscription.
I transcribed the text from a paperback edition of his
books.
More UTF-8 sample files can be found in e.g. Markus Kuhn's example section.
Sample text files (other encodings)
koi8boxes.txt (KOI-8 encoding)
A test file containing the box drawing
elements of the KOI-8 code.
fulltable.txt
(any ASCII compatible 8-bit encoding)
A file containing a table of all possible
8-bit characters. This file can be used with any
8-bit encoding which includes the 7-bit ASCII
block (e.g. ISO-8859-n for all n,
KOI-8, CP850, CP437, etc.).
Author: Christian Steinruecken