1

mrahmedcomputing

KS3, GCSE, A-Level Computing Resources

Lesson 9. Language Translators


Lesson Objective

  • Understand the role of each of the following: assembler, compiler, interpreter.
  • Explain the differences between compilation and interpretation.
  • Describe the stages of compilation: lexical analysis, syntax analysis, code generation and optimisation.
  • Explain why an intermediate language such as bytecode is produced as the final output by some compilers and how it is subsequently used.
  • Understand the difference between source code and object (executable) code.
  • Explain linkers and loaders and the use of libraries.

Lesson Notes

Programming Languages

Programming languages have evolved over time, and we can group them into four main categories based on how close they are to the computer's inner workings.

The first two generations, like machine code and assembly language, are like speaking directly to the machine in its own language, which is complex and hard for humans to understand.

The third and fourth generations, like C++ and Python, are more like natural languages, making them easier for people to write and understand. They act as an intermediary, translating our instructions into machine code the computer can execute.


Low Level Languages

Think of your computer as a machine that only speaks a specific language of ones and zeros (machine code). Low-level languages are like translators that let you communicate directly with the machine in its native tongue. Unlike "higher-level" languages like Python or Java, which are more human-friendly, low-level languages require you to understand the specific instructions, or commands, that the processor can understand. This gives you fine-grained control but also makes them more complex and less portable (they won't work on every computer). Essentially, low-level languages bridge the gap between your ideas and the machine's capabilities, but at the cost of being closer to the hardware and less intuitive for humans.

Machine Code (Gen 1)

Machine code, also known as machine language, is the fundamental language of computers. It's a set of instructions written in binary (0s and 1s) that the processor (CPU) directly understands and executes. It's the lowest level of programming, meaning it's the closest you can get to speaking directly to the computer's hardware.

Here's a breakdown of its key features:

  • Binary language: Made up of 0s and 1s, representing the on and off states of electrical circuits within the CPU.
  • Directly executed by CPU: No need for translation, the CPU understands machine code instructions immediately.
  • Specific to processor architecture: Each CPU has its own unique instruction set, making machine code specific to that hardware.
  • Low-level and complex: Difficult for humans to read and write due to its binary nature and absence of human-readable syntax.
  • Fast and efficient: Since it's directly executed by the CPU, it can be very fast and efficient, especially for tasks requiring precise hardware control.

You should note ALL the other generations of programming languages will convert your program into this machine code.

Assembly Code (Gen 2)

Assembly code is a low-level programming language that acts as a bridge between human-readable code and the machine code understood by computers. Think of it like a translator allowing you to communicate directly with the computer's hardware, albeit in a way more intuitive than its native binary language.

Here are some key features of assembly code:

  • Closer to machine code: Unlike higher-level languages like Python or Java, assembly instructions closely resemble the actual instructions the processor can execute. This gives you fine-grained control over hardware but also makes it less portable (won't work on all systems) and more complex to use.
  • Uses mnemonics: Instead of writing binary code directly, assembly uses human-readable symbols called mnemonics. For example, "MOV" might represent the instruction to move data from one location to another. These mnemonics are still closer to machine code than natural language but easier to understand.
  • Requires understanding of instruction sets: Each processor architecture has its own unique set of instructions, so the specific assembly language you use will depend on the hardware you're targeting.
  • Powerful but often less popular: While offering precise hardware control, assembly can be tedious and error-prone, making it less frequently used than higher-level languages in modern development. It's often reserved for tasks requiring maximum performance or direct hardware interaction, like device drivers or embedded systems programming.

Example: Machine Code

00000000 00000001 00000001 00000000 00000001 00000001 00000001 00000001 00000000 00000000 00000000 00000001 00000001 00000000 00000000 00000001 00000001 00000001 00000000 00000000 00000001 00000001 00000001 00000001 00000001 00000001 00000000 00000001 00000000 00000001

Example: Assembly Code

LDR R1, 301 ;load contents of 301 into R1

LDR R2, 302 ;load contents of 302 into R2

ADD R1, R1, R2 ;add R2 to R1, store in R1

LDR R3, 303 ;load contents of 303 into R3

ADD R1, R1, R3 ;Add R3 to R1, store in R1

STR R1, 304 ;store result in 304

Extra Note: Many machine code and assembly instructions contain two parts:

  1. the opcode - this is the actual instruction
  2. the operand - this is a value that the instruction uses or manipulates

Both opcode and operand values are ultimately represented in binary.


High Level Languages

High level languages are not dependent upon the computer architecture and require a translator or compiler to convert them to machine code.

Python, VB, C#, etc (Gen 3)

Writing directly in machine code is incredibly difficult and not feasible for larger programs. Assembly language, while a step up, still requires memorizing specific codes and instructions, making it complex and error-prone for intricate projects.

Enter third-generation languages! These languages, like English or French, use familiar words and structures, allowing you to focus on what you want your program to do instead of how it does it. You can group and organize your code into routines and subroutines, making it easier to understand and maintain.

Java, php, C, C+, C#, Pascal, Cobol, Visual Basic, etc are also examples of third generation programming languages.

SQL, Prolog, MySQL (Gen 4)

Fourth-generation languages take a different approach than traditional "how-to" programming. Instead of meticulously telling the computer every step to take (procedural), these languages focus on "what needs to be done" (declarative). Think of it like giving instructions instead of writing the entire recipe!

This makes them especially useful for tasks like accessing databases, where you specify what information you need rather than the intricate steps to retrieve it.

And the best part? They're designed to be understood easily! With syntax mirroring everyday language and forgiving error handling, even people without extensive programming experience can feel comfortable using them.

Prolog, MySQL are also examples of forth generation programming languages.

Example: Java

String name = "Ahmed";

system.out.println(name);

int n1 = 4;

int n2 = 5;

system.out.println(n1 + n2);

Example: SQL

FROM Customers SELECT firstnames

WHERE age > 24


Programming Language Translators

Translators

Computers can only run machine code. When you write a program in a second, third or fourth generation language then the program must be translated into machine language with either a Assembler or Compiler or Interpreter.

Image of tranlator process

Assembler

An assembler is a program that translates an assembly language program into machine code.

Compiler

A compiler a the source code from a high level language and translates it into machine code. The resulting code will run much faster as it requires no translation at run time. Compilers turn high level code into one workable executable (exe) files.

Compilation is the process of translating high-level source code into machine code (object code) before execution.

The entire program is converted into machine code by a compiler. Compilation happens once, before execution. Compiled programs run faster than interpreted ones. Examples: C, C++, Rust.

Interpreter

An interpreter analyses and executes each line of code within a high level programming language. It does not look at any other part of the code except the line it is running. Execution is slower because the program is analysed line by line at runtime. The advantage is that the code runs immediately without having to wait for the code to be compiled. It is also clear to see which part of the code is being executed.

Interpretation involves executing the source code line-by-line using an interpreter. The interpreter reads and executes the code directly. Interpretation happens during runtime. Interpreted programs can be modified while running. Examples: Python, JavaScript, Ruby.


Bytecode?

Bytecode serves as an intermediate representation that allows cross-platform compatibility. Code written in bytecode can be executed on any machine with a compatible virtual machine or interpreter.


A-Level Question: Explain the stages of compilation: lexical analysis, syntax analysis, code generation and optimisation.

  1. Lexical Analysis: Splits source code into lexemes (basic units).
  2. Syntax Analysis (Parsing): Builds a syntax tree from tokens generated in the previous stage.
  3. Semantic Analysis: Ensures code adheres to language rules and semantics.
  4. Intermediate Code Generation: Produces an intermediate representation (e.g., three-address code).
  5. Code Optimization: Improves code efficiency (e.g., removing dead code, loop unrolling).
  6. Code Generation: Translates intermediate code into machine code.

Library Programs

A library in computing refers to a collection of resources used during software development to implement computer programs. These resources can include:

Advantages:

  • Efficiency (saves time and effort).
  • Reliability (tested and optimized).
  • Reusability (used in different parts of projects).
  • Community support.

Disadvantages:

  • Dependency issues.
  • Compatibility challenges.
  • Overhead for small tasks.

Linkers

Ensures that programs can successfully interact with library functions by accurately assigning memory addresses for both calls to those functions and subsequent returns.

Loaders

Loaders copy programs and subroutines into main memory.

The loader will allocate a memory address the to the programs and subroutines that is being loaded.


Source Code vs. Object Code

Source code refers to high-level code or assembly code that is generated by humans or programmers. Source code is easy to read and modify. It is written using any high-level language (e.g., C, C++, Java, Python) or intermediate language. Source code often contains comments for better understanding. Source code serves as the blueprint for software applications, instructing them on how to operate.

Object code refers to low-level code that is understandable by machines (computers). Object code is generated from source code after going through a compiler or other translator. It is in executable machine code format. Object code contains a sequence of machine-understandable instructions that the Central Processing Unit (CPU) can execute directly. Unlike source code, object code is not in plain text; it is in binary formats.


3