How to Learn Assembly Language

how to learn assembly languageAssembly language (also known as “assembly” or sometimes “assembler”) is a programming language used to write software that closely corresponds to how a microprocessor actually executes instructions. It is primarily concerned with the activities of a central processing unit (CPU): binary arithmetic, storing values in memory, and jumping (“branching”) to other parts of a program. Although a great deal of knowledge about the computer system is required, assembly language has relatively few commands to memorize.

As it mirrors the architecture of computer systems (from the design of CPUs, to the ways that memory and peripherals are made accessible to the system), familiarity with the components of a computer system will be needed. This information is often given at the start of computer science courses – such as Computer Science for Everyone with Java at Udemy.com. In addition, learning binary arithmetic and conversion to and from binary, decimal, and hexadecimal number systems would be extremely useful.

Getting Started

Unlike high-level languages, such as C and Pascal, you cannot easily transfer assembly language programs from one type of computer system to another. In general, you will learn assembly language for one CPU and will have to learn a new set of instructions for use with a different CPU. However, in many cases, the concepts and understanding that you gain can be applied to the new system.

To begin, only a few pieces of software are required:

  • An assembler or cross-assembler for the target machine you are writing programs for.
  • A text editor, such as Notepad, Vi, or Programmer’s Notepad.
  • Some means of loading software onto the machine.

An assembler takes in text files containing assembly language and produces executable code for a particular processor.

A cross-assembler is a piece of software that runs on one machine (for example, a Windows PC) but produces executable code for a different type of machine (for example, the Commodore 64, Sega MegaDrive, or Nintendo GameBoy.) Despite the subtle difference, people often use the term “assembler” when referring to cross-assemblers.

Depending on the platform you are working on, and whether or not you are writing programs for a different machine, you may need some conversion tools to take the output from the assembler and turn it into a file that can be loaded on the target machine or by an emulator.

Processors

Over the years, there has been a lot of variation in processor architecture, as many different manufacturers have released microprocessors and they have evolved in power and capability.

All processors are made up of at least four key parts:

  1. Register Array
    The CPU’s registers are like memory cells, capable of storing single values used when working. The number of registers (and their names) differs between different processors. Special registers hold values indicating the memory location of the instruction in the current program (program counter, PC), the stack pointer (SP, which is outlined later in this article), and status flags that, amongst other things, indicate the result of certain instructions.
  2. Arithmetic Logic Unit (ALU)
    Performs binary arithmetic on binary numbers held in the processor’s registers.
  3. Control Unit
    Manages the loading, decoding and execution of instructions.
  4. System Bus
    Corresponds to the input and output pins on the microprocessor chip – for interfacing with memory by setting the memory address on the address bus, and reading or writing data through the data bus. The control bus includes output signals for telling electronic circuits what type of instruction is being performed, and inputs for things like interrupt signals and reset.

Microcontrollers are similar class of device. However, while you must connect microprocessors to external memory circuits and storage devices from which to load programs, microcontrollers have an area of memory on the chip itself that you use to store the program.

Assembly Language Instructions

An instruction in assembly language consists of a short mnemonic used to “name” it, followed by any required arguments. Each instruction usually sits on its own line, and the assembler converts each line to a sequence of numbers that are actually used to tell the processor what to do.

The names of instructions and registers are different between processors. However, you should begin to see how the concepts generally remain unchanged.

The three main groups of instructions are:

  1. Data transfer instructions
  2. Arithmetic instructions
  3. Program control instructions

Data transfer instructions are concerned with moving information between processor registers and memory.

When writing assembly language for the Zilog Z80 (a common 8-bit processor from the mid 1970s), a fixed value can be loaded into the register A using the instruction:

ld a, 0x08

The 0x prefix is used to indicate that the number that follows has been written in hexadecimal notation. Some assemblers use a dollar sign ($) instead.

The same instruction on the 6502 processor (another 8-bit processor, as used in the Apple II, Commodore 64, and Nintendo Entertainment System) looks like this:

lda #$08

The 6502 also has a register named A. However, in this second example, the # symbol tells the assembler to use direct addressing mode – which indicates that the value typed is the actual value to use, and is not (for example) a memory address. Not all assemblers require you to specify when you are using the direct addressing mode.

On the Motorola 68000, which is a 16-bit processor, the operation looks even more different but there is no change in functionality:

move.b #$08,d0

68000 data registers are numbered, D0–D7. The .b at the end of the instruction mnemonic is used to tell the assembler that you want to move a byte value.

Basic Arithmetic

Arithmetic instructions perform math such as addition, subtraction, multiplication and division. For example, the code below loads a number into register A and then subtracts another number from it. The SBC instruction on the 6502 processor is designed to work on the register A and put the result of this arithmetic back into the same register, but this may be different on other platforms.

lda #07
sbc #01

If the result is negative or zero then certain bits of the processor’s status flags are changed. This can be used in combination with the next group of instructions.

Program Control and Conditions

Program control instructions are primarily concerned with jumping to other areas of the program, and can be used to create the if statements and loops used in higher-level languages.

_wait_sub_cpu_r:
    ld bc, 0x1A01
    in a, (c)
    bit 5, a
    jp nz, _wait_sub_cpu_r

The example above uses a few instructions that are not introduced here. But it is, in effect, a demonstration of how to write a do…while loop. The key part to understand is the use of bit and jp on the final two lines of code.

The bit instruction here tests whether the fifth bit of the byte stored in register A is set (1) or clear (0). Like many of the arithmetic instructions, bit sets the value of status flags depending on the result of the operation. If it is set, the z flag (indicating zero) is cleared. If bit 5 is clear, z is set.

The jump instruction (jp) loops back to the start of the example if the z flag is not set, an action that is specified using the argument nz (not-zero).

Compare instructions are arithmetic in nature, but do not store the result of the calculation. They are used to check whether a number is less than, equal to, or greater than a second number. To do this, the instruction generally subtracts one value from another. The status flags indicate whether the result will be zero (the numbers are equal), negative (the first value in the calculation is less than the second), or not-negative (positive – the first number in the calculation is greater than the second).

Labels and Variables

_wait_sub_cpu_r is an example of a label in assembly language. Labels point to locations in the program and are calculated by the assembler as the address of the instruction (or data) that follows in the source code. During assembly, anytime an instruction (such as the jump in the example above) references a label, the assembler inserts the memory location it points to instead.

In the example below, the assembly directive defb is used to tell the assembler to insert the value 8 into the program. The label K_ASCII is given to the memory location occupied by this value.

K_ASCII: defb 8

In most computer systems, the program is loaded into random access memory (RAM) before it is executed. This means that the values used in a running program can be changed. K_ASCII has become a usable variable and you can change the value in the memory cell that K_ASCII points to:

ld a, 0
ld (K_ASCII), a

The brackets around K_ASCII indicate the addressing mode: that you want to load the value from register A into the contents of the memory address specified by the label.

The Stack

Most processors have instructions for dealing with a memory construct known as the stack. This is not a separate area of memory, only a different way of working with the memory in a system.

A special register, called the stack pointer (SP), holds the address of the next memory cell to be used. Pushing a value to the stack stores that value in the memory location specified by the SP. The SP is then decremented to point to the next available location.

Popping or pulling a value from the stack returns the value in the memory location specified by the SP. The SP is then incremented to point to the previous location.

The stack is a convenient method of storing variables temporarily.

On the 6502, the instructions PHA and PLA are used to push values and pop values to and from the stack. Note that they can only work with the accumulator (register A). The example below loads 5 into the accumulator and then stores this value onto the stack using PHA. It then changes the value of A. The original value is then retrieved with a call to PLA.

lda #5
pha 
lda #2
pla

The stack is used by a certain class of program control instructions known as calls. These form the basis of how functions and procedures are implemented in assembly language.

When jumping to another part of the program using a call instruction, the processor pushes the return address (the program counter) onto the stack. When the function is finished, this address is then popped off the stack and put back into the program counter, so that the program can continue from where it left off.

Moving Forward

This article is only an overview of assembly language and the main concepts. To continue, you will need to find the tools and documentation for the platform that you are intending to program. In particular, look for reference information and data sheets about the CPU and its instruction set.

You should become familiar with binary arithmetic and number systems, and a basic knowledge of digital electronics, logic, and circuits will also help you to understand what is actually happening inside the computer. The key to assembly language is building that understanding, and then applying it in your programming.