Lesson 6. Compression and Encryption

Lesson Objective

Understand the difference between lossless and lossy compression.
Explain run length encoding and dictionary based compression.
Define symmetric and asymmetric encryption.
Understand how and why hashing may be used to encrypt data.

Lesson Notes

Data Transfer

Data is constantly transferred within computer systems and across networks. Typically, these transfers occur at high speeds and with accuracy.

However, as distances increase, data transfer becomes slower. Longer distances also make data more susceptible to interference.

Additionally, storage space can be limited, posing a challenge for managing data effectively.

Compression?

Compression is a technique used to reduce the size of digital files, making them more manageable for storage and transmission.

Lossy and Lossless Compression

There are 2 types of compression you need to be aware of.

Lossy Compression: By using this type of compression, some data is permanently removed from the original file, resulting in a smaller file size. For example, by reducing the color depth of an image, shades of colors are averaged, leading to a smaller file. JPEG format uses lossy compression.
By using Lossy compression, some quality is sacrificed in favor of smaller file sizes.

Lossless Compression: With lossless compression, files are reduced in size without any loss of data. It is ideal for files where data integrity is crucial, such as text, spreadsheets, and financial records. However, lossless compression typically doesn't achieve the same level of reduction as lossy methods. Various lossless standards exist, ensuring that files can be restored to their original quality.

In summary, lossy compression prioritizes smaller sizes at the cost of quality, whereas lossless compression maintains data integrity without sacrificing quality .

Run Length Encoding

Run-length encoding (RLE) is a form of lossless data compression. RLE aims to reduce file size while preserving data integrity. It is commonly used for simple graphic images, such as icons, line drawings, and animations. In RLE, runs of data (sequences where the same value occurs consecutively) are stored as a single data value along with a count. Instead of representing each run individually, RLE condenses them into a more compact format.

Imagine a screen with plain black text on a white background.
The scan line might look like this:
WWWWWWWWBWWWWWWWBBBWWWWWWWWWWWWWBBWWWWWW.
By applying RLE, it becomes: 8W1B7W3B13W2B6W.
This represents the original 67 characters in only 18.

00000000000001000000000000111000000000000000000000000100000000000000

130 11 120 31 240 11 140

Need Example...

Dictionary Coding

Dictionary coding is a data compression technique that replaces repeated strings of data with shorter codes. It is a lossless compression technique, meaning the original data can be perfectly reconstructed from the compressed data. Dictionary coding is used in many different applications, such as text compression, code compression, and image compression.

“Fear leads to anger; anger leads to hatred; hatred leads to conflict; conflict leads to suffering.”

----

1 10 2 10 3 10 4 9 10 4 10 2 10 3 10 5 9 10 5 10 2 10 3 10 6 9 10 6 10 2 10 3 10 7 8

Reference	Data	Frequency
1	Fear	1
2	leads	4
3	to	4
4	anger	2
5	hatred	2
6	conflict	2
7	suffering	1
8	.	1
9	;	3
10	[space]	15

Original File Size:
8 bits x 98 chars
784 bits
÷ 8 = 98 bytes

Compressed File Size:
2⁴ = 16 (4 bits needed for 10)
4 x 35 = 140 bits
÷ 8 = 17.5 bytes

Data Size: 784 - 140 = 644 saved

Percentage Decrease: (784 - 140)/788*100 = 82.21%

Benefits of Compression

Efficient Transmission: Smaller files mean fewer data packets, resulting in faster transmission.
Enhanced Download Speeds: Faster downloads (less data) for videos, audio, and images (including speech for VOIP systems).
Data Allowance Savings: Reduced file sizes lead to lower data consumption. Beneficial for users with limited data plans.
Webpage Loading Speed: Accelerates loading of webpages containing images.
Storage Optimization: Less disk space required on servers and local devices.
Improved Streaming Experience: Enables smoother streaming of music, games, and videos.

Encryption

Encryption is the use of an algorithm and a key to convert message data into a form that is unreadable without the key to decrypt the data.

Used to secure information during transmission so that only the sender and receiver can understand the message.

Decryption is the use of an algorithm and a key to convert encrypted message data into plain text.

Symmetric Encryption (Private Key)

Symmetric encryption is any technique where the same key is used to both encrypt and decrypt the data. The Caesar Cipher is one of the simplest symmetric encryption techniques, and of course, one of the easiest to crack.

Here are some symmetric encryption methods:

Caesar Cipher

The Caesar cipher is a substitution cipher where each letter in the plaintext is replaced by a letter a fixed number of positions down the alphabet. It is named after Julius Caesar, who used it in his private correspondence.

How does it work?

Choose a shift value (usually between 1 and 25). Shift each letter of the alphabet by the chosen value. For example, with a left shift of 3, 'D' becomes 'A', 'E' becomes 'B', and so on. Now look at the example below:

Plain: “THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG”
Ciphertext (with a left shift of 3): “QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD”

To decrypt, perform a right shift by the same value.

Here is another example. Can you work out the cipher key?

Plain: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher: FGHIJKLMNOPQRSTUVWXYZABCDE

RJJY RJ FY YMJ TQI GFWS
“MEET ME AT THE OLD BARN”

Here is a diagram of a Caesar Cipher wheel.

Vernam Cipher

The Vernam Cipher, also known as the “One-Time Pad”, is a powerful and unbreakable encryption technique when used correctly.

The Vernam Cipher falls under the category of symmetric encryption. Both the sender and recipient share the same secret key.

The Vernam Cipher uses a random or pseudorandom stream of data (the keystream). Each character of the plaintext is combined with the corresponding character from the keystream using the Boolean “exclusive or” (XOR) function. The resulting ciphertext is generated.

When used with a truly random and secret key of the same length as the plaintext, the Vernam Cipher is unbreakable. It provides exceptional security because the keystream is unpredictable and used only once (hence the term “one-time pad”).

Example:

Cat = 01000011 01100001 01110100

pop = 01110000 01101111 01110000

You then perform an XOR operation (like binary addition) using the keyword.

Ciphertext = 00111111 00001110 00000100

Ciphertext = 63, 14, 4 (Denary)

Ciphertext = 3F, 0E, 04 (Hex)

Extra:
The Vernam Cipher was devised in 1918 by Gilbert S. Vernam, an engineer at AT&T. It introduced a crucial improvement to the Vigenère cipher, attributed to the 16th-century French cryptographer Blaise de Vigenère.

A	B	Z
0	0	0
0	1	1
1	0	1
1	1	0

Cryptanalysis

Cryptanalysis involves attempting to “break” a cryptographic code without having knowledge of the specific cipher or the secret key. While symmetric encryption relies on a shared secret key between the sender and receiver, modern computers can easily compromise it. Despite efforts to keep the key secret, the vulnerability remains.

Exmaple Technique: Brute Force Attack

A brute force attack is an attempt to crack a password using a trial and error approach and hoping, to guess it correctly. This is a dated attack method, but is still effective and commonly used by hackers. Depending on the strength of a password, cracking it can take anywhere from a few seconds to many years.

Asymmetric Encryption (Public Key)

Asymmetric encryption is a technique which uses different keys for encryption and decryption, allowing computers over the Internet to securely communicate with each other. Here are a list of steps outlining the public key encryption process:

Key generation: Each person (or their computer) must generate a pair of keys that identifies them: a private key and a public key. The sender's public key and private key are mathematically related, and the receiver's public and private keys are mathematically related. The current nationally recommended key length is 2048, or even 3072 bits.
Key exchange: The sending and receiving computers exchange public keys with each other via a reliable channel, like TCP/IP. The private keys are never exchanged.
Encryption: The sending computer encrypts the secret data using the receiving computer's public key and a mathematical operation. The power of public key encryption is in that mathematical operation. It's a "one-way function", which means it's incredibly difficult for a computer to reverse the operation and discover the original data. Even the public key cannot be used to decrypt the data. The math of the one-way function relies on prime numbers, the difficulty of factoring large primes, and modular arithmetic.
Sending encrypted data: The sender can now safely transmit the encrypted data over the Internet without worry of third parties.
Decryption: Now the receiver can decrypt the message, using their private key. The only key that can be used to decrypt the message (in the world!).

Example - Computers A and B:

A will only have a private key known only to A.
A will also have a public key, which is mathematically related to A's private key. It is called a public key because anyone can access it.
B will also have a private key and a related public key.
For A to send a secure message to B, A will first encrypt the message using B's public key.
As the public and private keys are related, the message can only be decrypted by B using B's private key.
As no-one else knows B's private key, even if the message was intercepted, it could not be decrypted.

Digital Signature?

While encryption typically involves using the recipient's public key to secure a message, a malicious third party could encrypt a message using your own public key. This deceptive approach might lead you to believe that the message is legitimate, but in reality, it could be from an unauthorized source.

To ensure the authenticity and integrity of an electronic message, a sender can digitally sign it. Digital signatures use the sender's private key to create a unique cryptographic hash of the message. Recipients can verify this signature using the sender's public key.

Hashing ##########

A hashing function is a way to turn any length input into a unique fixed length code. This code is called a hash.

It is very difficult to turn a hash back (almost impossible - one way encryption) into the original input, so hashing is often used to protect sensitive data, such as passwords.

Here is an example:

Input: “mypassword”
Hash: “1234567890abcdef1234567890abcdef”

Input: “myPassword”
Hash: “fedcba0987654321fedcba0987654321”

To store a password securely, a website or app will hash the password before storing it in its database.

When you try to log in, the website or app will hash your password again and compare it to the stored hash. If the two hashes match, then you are allowed to log in.

If a hacker steals a password database, they will only see the hashes, not the actual passwords. Hashing is a very important security measure for protecting passwords and other sensitive data.

Popular methods:

SHA-1
MD5

mrahmedcomputing