1

mrahmedcomputing

KS3, GCSE, A-Level Computing Resources

Lesson 5. ASCII and Unicode


Lesson Objective

  1. Understand the purpose of ASCII and Unicode.
  2. Be able to convert text into binary.
  3. Be able to explain the term "Character Set".
  4. Be able to calculate file size of Text.

Lesson Notes

7 bit ASCII Table

ASCII (pronounced "az-kee" or "ass-key" if American) stands for the American Standard Code for Information Interchange. It serves as a character encoding standard used for electronic communication between computers, telecommunications equipment, and other devices. Here are some key points about ASCII:

  1. Character Encoding: ASCII assigns standard numeric values to letters, numerals, punctuation marks, and other characters commonly used in computers. Each character is represented by a unique numerical code.
  2. 128 Values: Initially, ASCII had only 128 code values, of which only 95 are printable characters. These include digits (0 to 9), lowercase letters (a to z), uppercase letters (A to Z), and punctuation symbols. The remaining 33 codes were non-printing control characters, such as carriage return and line feed.
  3. Binary Representation: ASCII encodes characters into seven-bit integers. For instance, the lowercase letter "i" is represented by binary 1101001 (hexadecimal 69 or decimal 105).
  4. Evolution and Scope: While modern computer systems have transitioned to Unicode (which has millions of code points), the first 128 Unicode code points align with the original ASCII set. ASCII remains a fundamental foundation for character encoding in computing.

Despite being an American standard, ASCII does not include a code point for the cent symbol (¢) or support English terms with diacritical marks (such as résumé and jalapeño) or proper nouns with diacritical marks (such as Beyoncé).

NOTE: Binary values in the table are incorrect. Will fix it later when I have some time.

Binary Dec Hex Char Binary Dec Hex Char Binary Dec Hex Char
0100000 32 20 1000001 64 40 @ 1100001 96 60 `
0100001 33 21 ! 1000010 65 41 A 1100010 97 61 a
0100010 34 22 " 1000011 66 42 B 1100011 98 62 b
0100011 35 23 # 1000100 67 43 C 1100100 99 63 c
0100100 36 24 $ 1000101 68 44 D 1100101 100 64 d
0100101 37 25 % 1000110 69 45 E 1100110 101 65 e
0100110 38 26 & 1000111 70 46 F 1100111 102 66 f
0100111 39 27 ' 1001000 71 47 G 1101000 103 67 g
0101000 40 28 ( 1001001 72 48 H 1101001 104 68 h
0101001 41 29 ) 1001010 73 49 I 1101010 105 69 i
0101010 42 2A * 1001011 74 4A J 1101011 106 6A j
0101011 43 2B + 1001100 75 4B K 1101100 107 6B k
0101100 44 2C , 1001101 76 4C L 1101101 108 6C l
0101101 45 2D - 1001110 77 4D M 1101110 109 6D m
0101110 46 2E . 1001111 78 4E N 1101111 110 6E n
0101111 47 2F / 1010000 79 4F O 1110000 111 6F o
0110000 48 30 0 1010001 80 50 P 1110001 112 70 p
0110001 49 31 1 1010010 81 51 Q 1110010 113 71 q
0110010 50 32 2 1010011 82 52 R 1110011 114 72 r
0110011 51 33 3 1010100 83 53 S 1110100 115 73 s
0110100 52 34 4 1010101 84 54 T 1110101 116 74 t
0110101 53 35 5 1010110 85 55 U 1110110 117 75 u
0110110 54 36 6 1010111 86 56 V 1110111 118 76 v
0110111 55 37 7 1011000 87 57 W 1111000 119 77 w
0111000 56 38 8 1011001 88 58 X 1111001 120 78 x
0111001 57 39 9 1011010 89 59 Y 1111010 121 79 y
0111010 58 3A : 1011011 90 5A Z 1111011 122 7A z
0111100 59 3B ; 1011100 91 5B [ 1111100 123 7B {
0111101 60 3C < 1011101 92 5C \ 1111101 124 7C |
0111110 61 3D = 1011110 93 5D ] 1111110 125 7D }
0111111 62 3E > 1011111 94 5E ^ 1111111 126 7E ~
1000000 63 3F ? 1100000 95 5F _ 1111111 127 7F DEL

8 bit ASCII

8-bit ASCII, also known as Extended ASCII, builds upon the original American Standard Code for Information Interchange (ASCII) system. To enhance its foundational capabilities, 8-bit ASCII includes 8 binary digits (or bits) for each character.

ASCII represents characters using 7 bits (128 code points). However, 8-bit ASCII extends this to 256 characters by utilizing 8 bits per character.
The additional bit allows for a broader range of characters, including special symbols, accented letters, and other language-specific characters.

In summary, 8-bit ASCII enhances the original character encoding by allowing more characters and symbols, making it versatile for different contexts.

A Spooky Ghost

        _,.--.
      .'      `-.
     /   O O   \
    |          /
    |         /
    |        /
     \      /
      `.__.'
    

A Cat

        /\_/\
       ( o.o )
      > ^ <
     /  ---  \
    /         \
   /           \
  

????

        /\
        /  \
       / o o \
      /   ^   \
     /         \
    /_/-\___\_\
    

An Apple

        ,--./,-.
        / #      \
       |          |
        \        / 
         `._,._,'
    

Unicode

Unicode, formally known as The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium. Its purpose is to support the use of text written in all of the world's major writing systems.

Unicode assigns a unique number to every character, regardless of the platform, program, or language. Before Unicode, various character encodings existed, each with limitations. These early encoding methods could not cover all languages and often conflicted with one another. Unicode changed this by providing a consistent way to represent characters across different languages.

Unicode uses 16 bits to represent characters.

Here are examples characters in the Unicode Character Set:

  • こんにちは (Japanese)
  • 厦灣 (Chinese)
  • 한국 (Korean)
  • ćōčīūīū (Hawaiian)
  • العربية (Arabic)
  • Hello World (English)
  • سلام الليكم (Urdu)
  • বাইলার বালার (Bengali)
  • हमात वारीन्र (Hindi)
  • Γεια σου (Greek)
3