UA Language Reference
This document is the complete reference for the Unified Assembly (UA) instruction set — the portable assembly language used by the UA compiler.
Table of Contents
- Source File Format
- Precompiler Directives
- Comments
- Labels
- Variables
- Functions
- Registers
- Numeric Literals
- String Literals
- Standard Libraries
- Instruction Set
- Data Movement
- Arithmetic
- Bitwise Logic
- Shift Operations
- Comparison
- Control Flow
- Stack Operations
- System
- Variables
- Memory Allocation
Source File Format
- File extension:
.UA - Encoding: ASCII / UTF-8
- One instruction per line
- Whitespace (spaces and tabs) is ignored around operands
- Blank lines are allowed
- Lines are terminated by newline (
\nor\r\n)
Example:
; A minimal UA program
LDI R0, 42
LDI R1, 8
ADD R0, R1
HLT
Precompiler Directives
Before lexing, the UA precompiler evaluates lines starting with @. Directives are not assembly instructions — they control conditional compilation, file inclusion, and stub markers.
| Directive | Description |
|---|---|
@IF_ARCH <arch> | Begin a conditional block — included only when -arch matches |
@IF_SYS <system> | Begin a conditional block — included only when -sys matches |
@ENDIF | Close the most recent @IF_ARCH or @IF_SYS block |
@IMPORT <path> | Include another .ua file (each file imported at most once) |
@DUMMY [message] | Emit a stub diagnostic to stderr; no code generated |
@DEFINE <NAME> <VALUE> | Define a compile-time text macro — every occurrence of NAME in subsequent non-directive lines is replaced with VALUE |
@ARCH_ONLY <a>,<b>,... | Abort compilation unless -arch matches one of the listed architectures |
@SYS_ONLY <s>,<t>,... | Abort compilation unless -sys matches one of the listed systems |
Conditional Compilation
@IF_ARCH x86
; This section is only assembled when -arch x86
LDI R0, 42
@ENDIF
@IF_SYS linux
; This section is only assembled when -sys linux
INT #0x80
@ENDIF
Conditional blocks can be nested (up to 64 levels):
@IF_ARCH x86
@IF_SYS win32
; x86 + Windows only
@ENDIF
@ENDIF
File Import
@IMPORT "lib/utils.ua"
@IMPORT helpers.ua ; unquoted form also accepted
- Paths are resolved relative to the importing file's directory
- Each unique file is imported once — duplicate
@IMPORTlines are skipped - Imported files are preprocessed recursively (their directives are evaluated)
- Import nesting is limited to 16 levels
Namespace Prefixing
All labels, function definitions, and variable declarations in an imported file are automatically prefixed with the filename (without extension and path):
; main.ua
@IMPORT "lib/math.ua"
CALL math.add_values ; calls add_values defined in math.ua
GET R0, math.result ; accesses variable 'result' from math.ua
math.double_it() ; syntactic sugar for CALL math.double_it
The prefix is derived from the filename: lib/math.ua → math, helpers.ua → helpers.
This prevents name collisions when importing multiple files and provides clear provenance for every symbol.
Architecture & System Guards
@ARCH_ONLY and @SYS_ONLY are hard constraints — if the current target does not match any entry in the comma-separated list, compilation is immediately aborted with an error. This is ideal for library files that only make sense for specific platforms.
; This file only compiles for ARM-family targets
@ARCH_ONLY arm, arm64
; This file only compiles for Linux or macOS
@SYS_ONLY linux, macos
Unlike @IF_ARCH/@IF_SYS (which silently skip code), @ARCH_ONLY/@SYS_ONLY fail loudly and stop the build. Use @IF_* for conditional sections within a universal file; use @ARCH_ONLY/@SYS_ONLY to restrict an entire file to specific targets.
Compile-Time Macros (@DEFINE)
@DEFINE SCON 0x98
@DEFINE SBUF 0x99
@DEFINE BAUD 0xFD
Defines a compile-time text substitution. Every subsequent occurrence of NAME on non-directive lines is replaced with VALUE before the line reaches the lexer. Replacement is token-boundary-aware — only whole identifiers matching NAME are substituted (partial matches inside other words are not affected).
| Rule | Description |
|---|---|
| Syntax | @DEFINE <NAME> <VALUE> — name is an identifier; value is the rest of the line (trimmed) |
| Limit | Up to 512 macros per compilation unit |
| Name length | Maximum 63 characters |
| Value length | Maximum 63 characters |
| Scope | Global — a @DEFINE is visible to all lines processed after it, including code from subsequent @IMPORT files |
| No redefinition | Defining the same name twice appends a second entry; the first match wins |
| Whole-token only | @DEFINE P0 0x80 replaces P0 but not DPH0 or P0x |
Typical use — hardware register definitions:
@IMPORT hw_mcs51 ; imports @DEFINE P0 0x80, SCON 0x98, etc.
LDI R0, 0x50
LDI R1, SCON ; expands to: LDI R1, 0x98
STORE R0, R1
Because macros are expanded before the lexer runs, the final token stream contains only literal numeric values — no runtime cost, no RAM overhead.
Hardware Definition Libraries
UA ships with hardware definition libraries that use @DEFINE to expose register addresses for common platforms. Import them with @IMPORT just like standard libraries:
| Library | Target | Import |
|---|---|---|
hw_mcs51 | Intel 8051 SFRs | @IMPORT hw_mcs51 |
hw_x86_pc | x86 PC I/O ports | @IMPORT hw_x86_pc |
hw_riscv_virt | RISC-V QEMU virt MMIO | @IMPORT hw_riscv_virt |
hw_arm_virt | ARM/ARM64 QEMU virt MMIO | @IMPORT hw_arm_virt |
Each library file begins with an @ARCH_ONLY guard that prevents accidental use on an incompatible target.
Stub Markers
@DUMMY This feature is not yet implemented
@DUMMY
Prints a diagnostic to stderr during compilation. No code is emitted.
Origin Address (@ORG)
@ORG 0x0000 ; decimal or hex address
@ORG 0x000B
Sets the origin address for subsequent code. In the generated binary the compiler pads the output with zero bytes until the target address is reached.
| Rule | Description |
|---|---|
| Forward only | The target address must be ≥ the current program counter. Moving backwards is a fatal error. |
| All architectures | @ORG is universal — it works on every supported backend (x86, x86_32, ARM, ARM64, RISC-V, MCS-51). |
| Bare-metal use case | Primarily used for placing interrupt vectors and ISRs at hardware-mandated addresses. |
Example — 8051 interrupt vectors:
@ARCH_ONLY mcs51
@ORG 0x0000 ; reset vector
JMP main
@ORG 0x000B ; Timer 0 interrupt vector
timer0_isr:
; ... handle interrupt ...
RETI
@ORG 0x0030 ; main program
main:
; initialisation code
Opcode Compliance
After parsing, the compiler validates every instruction against a per-opcode compliance table that specifies which architectures and systems support each opcode. If any instruction is not supported by the target, compilation fails with a clear diagnostic:
UA Compliance Error
-------------------
Line 5: opcode 'SYS' is not supported on architecture 'mcs51'
Supported architectures: x86, x86_32, arm, arm64, riscv
All 37 built-in opcodes are currently universal (supported on all architectures and systems). As architecture-specific instructions are added in future phases, the compliance table ensures safe, portable code — and @IF_ARCH / @ARCH_ONLY provide the mechanism to write platform-specific alternatives.
Comments
Line comments begin with ; and extend to the end of the line:
LDI R0, 10 ; this is a comment
; this entire line is a comment
There are no block comments.
Labels
Labels mark code addresses for use with jump and call instructions. A label is an identifier followed by a colon (:):
start:
LDI R0, 0
loop:
INC R0
CMP R0, 100
JNZ loop
HLT
Rules:
- Labels consist of letters (
a-z,A-Z), digits (0-9), underscores (_), and dots (.) - Labels must start with a letter or underscore
- Maximum length: 128 characters
- Labels are case-sensitive
- Duplicate labels within a file are an error
- Forward references are supported (two-pass assembly)
- Dots are used for namespace-qualified names (e.g.,
math.start)
Variables
Variables are compiler-managed named storage locations. Unlike registers, variables persist across function calls and can be accessed from anywhere in the program.
Declaring Variables
VAR counter ; declare variable, initialized to 0
VAR result, 42 ; declare variable with initial value
Writing to Variables
SET counter, R0 ; store register value into variable
SET result, 99 ; store immediate value into variable
Reading from Variables
GET R0, counter ; load variable value into register
GET R1, result
Storage Model
Variables are stored differently depending on the target architecture:
| Backend | Storage | Size | Address Range |
|---|---|---|---|
| x86-64 | Data section after code | 8 bytes | RIP-relative addressing |
| x86-32 | Data section after code | 4 bytes | Absolute addressing |
| ARM | Data section after code | 4 bytes | MOVW/MOVT + LDR/STR via r12 |
| 8051 | Internal RAM (direct) | 1 byte | 0x08–0x7F |
Rules:
- Maximum 256 variables per program (120 for 8051 due to RAM limits)
- Variable names follow the same rules as labels
- Variables must be declared before use with
SETorGET VARdeclarations with an initial value emit initialization code at the declaration point- On 8051, immediate values in
SETare limited to 8 bits (0–255)
Example: Using Variables
VAR x, 10
VAR y, 20
GET R0, x ; R0 = 10
GET R1, y ; R1 = 20
ADD R0, R1 ; R0 = 30
SET x, R0 ; x = 30
HLT
Functions
Functions are labels with declared parameter names. The parameter list documents which variables the function expects to be available.
Defining Functions
my_function(arg1, arg2):
GET R0, arg1
GET R1, arg2
ADD R0, R1
RET
Both syntaxes are equivalent:
my_func: ; plain label (no documented parameters)
my_func(a, b): ; function definition (parameters a, b documented)
Calling Functions
Functions can be called with CALL or using syntactic sugar:
CALL my_function ; standard call
my_function() ; syntactic sugar — equivalent to CALL my_function
CALL my_function(R0, R1) ; call with argument annotations
Rules
- Maximum 8 parameters per function definition
- Parameters must be declared as variables (using
VAR) before the function is called - Function definitions are labels — they follow all label rules
- The parameter list is metadata for documentation and validation; the actual argument passing uses
SET/GETon the named variables
Complete Example
JMP main
VAR a
VAR b
add(a, b):
GET R0, a
GET R1, b
ADD R0, R1
RET
main:
SET a, 15
SET b, 27
CALL add
; R0 now contains 42
HLT
Registers
UA provides 16 virtual registers named R0 through R15. The actual number of usable registers depends on the target backend:
| Backend | Usable Registers | Notes |
|---|---|---|
| x86-64 | R0–R7 (8) | R8–R15 rejected (would require REX.B encoding) |
| x86-32 | R0–R7 (8) | Maps to IA-32 32-bit registers |
| ARM | R0–R7 (8) | Maps directly to ARM r0–r7; r12 used as scratch |
| 8051 | R0–R7 (8) | Maps to bank-0 registers |
Register names are case-insensitive: R0, r0, and R0 are the same register.
x86-64 Register Mapping
| UA Register | x86-64 Register | Purpose |
|---|---|---|
| R0 | RAX | Accumulator / return value |
| R1 | RCX | General purpose |
| R2 | RDX | General purpose |
| R3 | RBX | General purpose (callee-saved) |
| R4 | RSP | Stack pointer |
| R5 | RBP | Base pointer |
| R6 | RSI | General purpose |
| R7 | RDI | General purpose |
Warning: R4 (RSP) and R5 (RBP) are the stack and base pointers. Modifying them directly can corrupt the stack.
x86-32 (IA-32) Register Mapping
| UA Register | x86-32 Register | Purpose |
|---|---|---|
| R0 | EAX | Accumulator / return value |
| R1 | ECX | General purpose |
| R2 | EDX | General purpose |
| R3 | EBX | General purpose (callee-saved) |
| R4 | ESP | Stack pointer |
| R5 | EBP | Base pointer |
| R6 | ESI | General purpose |
| R7 | EDI | General purpose |
Warning: R4 (ESP) and R5 (EBP) are the stack and base pointers. Modifying them directly can corrupt the stack.
ARM (ARMv7-A) Register Mapping
| UA Register | ARM Register | Purpose |
|---|---|---|
| R0 | r0 | General purpose / return value |
| R1 | r1 | General purpose |
| R2 | r2 | General purpose |
| R3 | r3 | General purpose |
| R4 | r4 | General purpose |
| R5 | r5 | General purpose |
| R6 | r6 | General purpose |
| R7 | r7 | General purpose |
Note: ARM r12 (IP) is used internally as a scratch register for large immediates. r13 (SP), r14 (LR), and r15 (PC) are reserved and cannot be used as UA registers.
8051 Register Mapping
| UA Register | 8051 Register | Direct Address |
|---|---|---|
| R0 | R0 | 0x00 |
| R1 | R1 | 0x01 |
| R2 | R2 | 0x02 |
| R3 | R3 | 0x03 |
| R4 | R4 | 0x04 |
| R5 | R5 | 0x05 |
| R6 | R6 | 0x06 |
| R7 | R7 | 0x07 |
All registers are in 8051 register bank 0.
ARM64 (AArch64) Register Mapping
| UA Register | AArch64 Register | Purpose |
|---|---|---|
| R0 | X0 | General purpose / return value |
| R1 | X1 | General purpose |
| R2 | X2 | General purpose |
| R3 | X3 | General purpose |
| R4 | X4 | General purpose |
| R5 | X5 | General purpose |
| R6 | X6 | General purpose |
| R7 | X7 | General purpose |
Note: X9 and X10 are used internally as scratch registers. X30 (LR) and X31 (SP/XZR) are reserved.
RISC-V (RV64I) Register Mapping
| UA Register | RISC-V Register | ABI Name | Purpose |
|---|---|---|---|
| R0 | x10 | a0 | Argument / return value |
| R1 | x11 | a1 | Argument |
| R2 | x12 | a2 | Argument |
| R3 | x13 | a3 | Argument |
| R4 | x14 | a4 | Argument |
| R5 | x15 | a5 | Argument |
| R6 | x16 | a6 | Argument |
| R7 | x17 | a7 | Argument / syscall number |
Note: x5 (t0) and x6 (t1) are used internally as scratch registers. x0 (zero, hardwired), x1 (ra), and x2 (sp) are reserved.
Numeric Literals
Numbers can be expressed in three bases:
| Format | Prefix | Example | Value |
|---|---|---|---|
| Decimal | (none) | 42, -7, 0 | 42, -7, 0 |
| Hexadecimal | 0x or 0X | 0xFF, 0x1A | 255, 26 |
| Binary | 0b or 0B | 0b1010, 0B11001100 | 10, 204 |
Immediate values are prefixed with # in the instruction:
LDI R0, 42 ; decimal
LDI R1, 0xFF ; hexadecimal
AND R0, 0b1111 ; binary mask
Range Limits
| Backend | Immediate Range | Notes |
|---|---|---|
| x86-64 | -2,147,483,648 to 2,147,483,647 | 32-bit sign-extended to 64-bit |
| x86-32 | -2,147,483,648 to 2,147,483,647 | 32-bit native |
| ARM | -2,147,483,648 to 2,147,483,647 | 32-bit via MOVW/MOVT |
| ARM64 | -2,147,483,648 to 2,147,483,647 | 32-bit via MOVZ/MOVK (up to 64-bit with multiple MOVK) |
| RISC-V | -2,147,483,648 to 2,147,483,647 | 32-bit via LUI+ADDI |
| 8051 | -128 to 255 | 8-bit values |
String Literals
String literals are enclosed in double quotes and used with the LDS instruction:
LDS R0, "Hello, World!\n"
Escape Sequences
| Sequence | Character | Byte Value |
|---|---|---|
\n | Newline | 0x0A |
\t | Horizontal tab | 0x09 |
\r | Carriage return | 0x0D |
\0 | Null byte | 0x00 |
\\ | Backslash | 0x5C |
\" | Double quote | 0x22 |
Any other character following a backslash is kept as-is.
Storage
String data is stored in a string table appended after the variable data section. Each string is null-terminated. Duplicate string literals are automatically de-duplicated by the backend — identical strings share the same storage.
The layout of the output binary is:
[ code section ][ variable data ][ string data ]
Standard Libraries
UA ships with standard library files in the lib/ directory adjacent to the compiler executable. Library files are imported using @IMPORT with the std_ prefix:
@IMPORT std_io
@IMPORT std_string
When a @IMPORT path starts with std_, the compiler automatically resolves it to the lib/ directory next to the executable, appending .ua if needed.
std_io
I/O functions for console output. Uses @IF_ARCH / @IF_SYS precompiler guards to provide platform-specific implementations.
| Function | Description |
|---|---|
std_io.print | Write a null-terminated string to stdout. Pass string address in R0. All registers may be clobbered. |
Supported platforms:
| Architecture | System | Syscall Convention |
|---|---|---|
| x86 | linux | SYSCALL — RAX=1 (write), RDI=fd, RSI=buf, RDX=count |
| x86 | win32 | Write dispatcher translates Linux convention to WriteFile API |
| x86_32 | linux | INT 0x80 — EAX=4 (write), EBX=fd, ECX=buf, EDX=count |
| arm | linux | SVC #0 — R7=4 (write), R0=fd, R1=buf, R2=count |
| riscv | linux | ECALL — a7=64 (write), a0=fd, a1=buf, a2=count |
ARM64 and 8051 are not yet supported (ARM64: syscall register X8 not accessible; 8051: no OS).
@IMPORT std_io
LDS R0, "Hello, World!\n"
CALL std_io.print
HLT
std_string
String manipulation functions. Uses architecture-neutral MVIS instructions and works on all backends.
| Function | Description |
|---|---|
std_string.strlen | Compute the length of a null-terminated string. Pass address in R0, result returned in R1. Clobbers R0, R2, R3. |
8051 Note:
strlenusesLOADBwith R0 as the pointer (indirect addressing via@R0), which only accesses internal RAM (0x00–0xFF).
@IMPORT std_string
LDS R0, "test"
CALL std_string.strlen ; R1 = 4
std_math
Integer math utility functions. Architecture-neutral — works on all backends.
| Function | Description |
|---|---|
std_math.pow | Integer exponentiation. Set std_math.base and std_math.exp, then CALL std_math.pow. Result in R0. |
std_math.factorial | Compute n!. Set std_math.n, then CALL std_math.factorial. Result in R0. |
std_math.max | Return the larger of two values. Set std_math.a and std_math.b, then CALL std_math.max. Result in R0. |
std_math.abs | Absolute value. Set std_math.val, then CALL std_math.abs. Result in R0. |
Example:
@IMPORT std_math
; Compute 2^10 = 1024
SET std_math.base, 2
SET std_math.exp, 10
CALL std_math.pow ; R0 = 1024
; Compute 5! = 120
SET std_math.n, 5
CALL std_math.factorial ; R0 = 120
; max(7, 42) = 42
SET std_math.a, 7
SET std_math.b, 42
CALL std_math.max ; R0 = 42
; abs(-15) = 15
SET std_math.val, -15
CALL std_math.abs ; R0 = 15
std_arrays
Byte-array utility functions for working with BUFFER-allocated memory. Architecture-neutral — works on all backends.
| Function | Description |
|---|---|
std_arrays.fill_bytes | Fill a buffer region with a byte value. Set std_arrays.dst (address), std_arrays.count (length), std_arrays.value (byte). |
std_arrays.copy_bytes | Copy bytes between buffers. Set std_arrays.src (source address), std_arrays.dst (dest address), std_arrays.count (length). |
Example:
@IMPORT std_arrays
BUFFER my_buf, 32
; Fill 32 bytes with 0xFF
GET R0, my_buf
SET std_arrays.dst, R0
SET std_arrays.count, 32
SET std_arrays.value, 0xFF
CALL std_arrays.fill_bytes
; Copy first 16 bytes of my_buf to another location
BUFFER other_buf, 16
GET R0, my_buf
SET std_arrays.src, R0
GET R0, other_buf
SET std_arrays.dst, R0
SET std_arrays.count, 16
CALL std_arrays.copy_bytes
Instruction Set
UA defines 37 instructions organized into nine categories. This is the Minimum Viable Instruction Set (MVIS).
Data Movement
| Mnemonic | Syntax | Description |
|---|---|---|
MOV | MOV Rd, Rs | Copy register Rs into Rd |
LDI | LDI Rd, #imm | Load immediate value into Rd |
LOAD | LOAD Rd, Rs | Load from memory: Rd ← [Rs] |
STORE | STORE Rs, Rd | Store to memory: [Rd] ← Rs |
LDS | LDS Rd, "str" | Load string address: Rd ← pointer to string literal |
LOADB | LOADB Rd, Rs | Load byte from memory: Rd ← zero-extend(byte [Rs]) |
STOREB | STOREB Rs, Rd | Store byte to memory: byte [Rd] ← low byte of Rs |
Examples:
LDI R0, 100 ; R0 = 100
MOV R1, R0 ; R1 = R0 (copy)
LOAD R2, R0 ; R2 = memory[R0]
STORE R1, R0 ; memory[R0] = R1
8051 Note:
LOADandSTOREuse indirect addressing (@Ri) and require the pointer register to be R0 or R1.
LDS — Load String Address
LDS loads the address of a null-terminated string literal into a register. The string data is stored after the variable data section in the output binary. Duplicate strings are de-duplicated by the backend.
LDS R0, "Hello, World!\n" ; R0 = pointer to string data
String literals support escape sequences: \n (newline), \t (tab), \r (carriage return), \0 (null), \\ (backslash), \" (double quote).
x86-64 Note:
LDSusesLEA r64, [RIP+disp32](RIP-relative addressing). The displacement is patched during code generation.x86-32 Note:
LDSusesLEA r32, [disp32]with an absolute address fixup.ARM Note:
LDSusesMOVW+MOVTto load the absolute string address.ARM64 Note:
LDSusesMOVZ+MOVKto load the absolute string address.RISC-V Note:
LDSusesLUI+ADDIto load the string address.8051 Note:
LDSemitsMOV DPTR, #imm16(stub — 8051 has limited string support).
LOADB / STOREB — Byte-Granularity Memory Access
LOADB reads a single byte from the address in Rs and zero-extends it into Rd. STOREB writes the low byte of Rs to the address in Rd.
LOADB R1, R0 ; R1 = zero-extend(byte at address R0)
STOREB R1, R0 ; byte at address R0 = low byte of R1
These instructions are essential for traversing null-terminated strings character by character.
Arithmetic
| Mnemonic | Syntax | Description |
|---|---|---|
ADD | ADD Rd, Rs or ADD Rd, #imm | Rd = Rd + Rs/imm |
SUB | SUB Rd, Rs or SUB Rd, #imm | Rd = Rd - Rs/imm |
MUL | MUL Rd, Rs or MUL Rd, #imm | Rd = Rd × Rs/imm |
DIV | DIV Rd, Rs or DIV Rd, #imm | Rd = Rd ÷ Rs/imm |
INC | INC Rd | Rd = Rd + 1 |
DEC | DEC Rd | Rd = Rd - 1 |
Examples:
LDI R0, 10
LDI R1, 3
ADD R0, R1 ; R0 = 13
SUB R0, 1 ; R0 = 12
MUL R0, R1 ; R0 = 36
DIV R0, R1 ; R0 = 12
INC R0 ; R0 = 13
DEC R0 ; R0 = 12
x86-64 Note:
DIVuses signed division (IDIV). The polyfill saves and restores RDX becauseIDIVclobbers RDX:RAX.8051 Note:
MULandDIVuse the hardwareMUL AB/DIV ABinstructions via the accumulator (A) and B register.
Bitwise Logic
| Mnemonic | Syntax | Description |
|---|---|---|
AND | AND Rd, Rs or AND Rd, #imm | Rd = Rd & Rs/imm |
OR | OR Rd, Rs or OR Rd, #imm | Rd = Rd | Rs/imm |
XOR | XOR Rd, Rs or XOR Rd, #imm | Rd = Rd ^ Rs/imm |
NOT | NOT Rd | Rd = ~Rd (bitwise complement) |
Examples:
LDI R0, 0xFF
AND R0, 0x0F ; R0 = 0x0F (mask lower nibble)
OR R0, 0xF0 ; R0 = 0xFF
XOR R0, 0xFF ; R0 = 0x00
NOT R0 ; R0 = ~R0
Shift Operations
| Mnemonic | Syntax | Description |
|---|---|---|
SHL | SHL Rd, Rs or SHL Rd, #imm | Shift Rd left by Rs/imm bits |
SHR | SHR Rd, Rs or SHR Rd, #imm | Shift Rd right by Rs/imm bits |
Examples:
LDI R0, 1
SHL R0, 4 ; R0 = 16 (1 << 4)
SHR R0, 2 ; R0 = 4 (16 >> 2)
x86-64 Note: Register-based shifts use the CL register. The backend saves/restores RCX automatically.
8051 Note:
SHLemitsRL A(rotate left),SHRemitsRR A(rotate right), repeated n times for immediate operands. These are rotations, not true logical shifts — bits wrap around.
Comparison
| Mnemonic | Syntax | Description |
|---|---|---|
CMP | CMP Ra, Rb or CMP Ra, #imm | Compare Ra with Rb/imm (sets flags) |
CMP performs a subtraction without storing the result. The flags (zero, carry, sign) are set and used by subsequent conditional jumps (JZ, JNZ, JL, JG).
Example:
CMP R0, R1
JZ equal ; jump if R0 == R1
JNZ not_equal ; jump if R0 != R1
JL less ; jump if R0 < R1 (signed)
JG greater ; jump if R0 > R1 (signed)
8051 Note: Register comparison uses
CLR C; SUBB A,Rn. Immediate comparison usesCJNE A,#imm,$+3.
Control Flow
| Mnemonic | Syntax | Description |
|---|---|---|
JMP | JMP label | Unconditional jump |
JZ | JZ label | Jump if zero flag is set (equal after CMP) |
JNZ | JNZ label | Jump if zero flag is clear (not equal after CMP) |
JL | JL label | Jump if less (signed, after CMP) |
JG | JG label | Jump if greater (signed, after CMP) |
CALL | CALL label | Call subroutine (pushes return address) |
RET | RET | Return from subroutine |
Example:
CALL my_function
HLT
my_function:
LDI R0, 42
RET
x86-64 Note:
JMP,JZ,JNZ,JL,JG, andCALLuse 32-bit relative offsets (rel32), allowing jumps up to ±2 GB.JLemits0F 8C rel32(JL near),JGemits0F 8F rel32(JG near).ARM Note:
JLemitsBLT(condition code 0xB),JGemitsBGT(condition code 0xC). Both use 24-bit signed offsets (±32 MB).ARM64 Note:
JLemitsB.LT(condition 0xB),JGemitsB.GT(condition 0xC). Both use 19-bit signed offsets (±1 MB).RISC-V Note:
JLemitsBLT t0, x0(branch if scratch < 0 after CMP subtraction).JGemitsBLT x0, t0(swapped operands: branch if 0 < scratch, i.e., result > 0). Both use B-type encoding with a 12-bit signed offset.8051 Note:
JMPandCALLuse 16-bit absolute addresses (LJMP/LCALL).JZandJNZuse 8-bit relative offsets (range: -128 to +127 bytes).JLemitsJC rel8(2 bytes — carry flag is set bySUBBif the first operand is less).JGuses a 6-byte polyfill:JC $+4; JZ $+2; SJMP target(skip if less or equal, jump if strictly greater).
Stack Operations
| Mnemonic | Syntax | Description |
|---|---|---|
PUSH | PUSH Rs | Push register onto the stack |
POP | POP Rd | Pop top of stack into register |
Example:
LDI R0, 99
PUSH R0 ; save R0
LDI R0, 0 ; overwrite R0
POP R0 ; restore R0 (R0 = 99 again)
8051 Note:
PUSHandPOPuse direct addressing (PUSH direct/POP direct) with the register's bank-0 address (0x00–0x07).
System
| Mnemonic | Syntax | Description |
|---|---|---|
INT | INT #imm | Software interrupt |
SYS | SYS | System call (OS-level) |
NOP | NOP | No operation |
HLT | HLT | Halt execution |
Examples:
NOP ; do nothing
INT 0x21 ; software interrupt 0x21
SYS ; invoke OS system call
HLT ; stop
x86-64 Note:
HLTgeneratesRET(return to caller / OS).INTgenerates the nativeINT ninstruction (CD nn).SYSgenerates theSYSCALLinstruction (0F 05).x86-32 Note:
SYSgeneratesINT 0x80(CD 80) for Linux system calls.ARM Note:
SYSgeneratesSVC #0(supervisor call).ARM64 Note:
SYSgeneratesSVC #0.RISC-V Note:
SYSgeneratesECALL.8051 Note:
HLTgenerates an infinite loop (SJMP $, opcode80 FE).INT #ngeneratesLCALLto the interrupt vector address(n × 8) + 3.SYSis not supported (no operating system on 8051).
Variable Instructions
| Mnemonic | Syntax | Description |
|---|---|---|
VAR | VAR name or VAR name, #imm | Declare a named variable (optional initializer) |
SET | SET name, Rs or SET name, #imm | Store a register or immediate into a variable |
GET | GET Rd, name | Load a variable's value into a register |
Examples:
VAR counter, 0 ; declare with initial value
VAR flags ; declare (default 0)
LDI R0, 42
SET counter, R0 ; counter = 42
SET flags, 0xFF ; flags = 0xFF
GET R1, counter ; R1 = 42
GET R2, flags ; R2 = 0xFF
x86-64 Note: Variables are stored as 8-byte values in a data section appended after code.
SET/GETuse RIP-relative MOV instructions with 32-bit displacement.x86-32 Note: Variables are 4-byte values accessed via absolute
[disp32]addressing.ARM Note: Variables are 4-byte values. The compiler loads the variable address into r12 (scratch) using MOVW+MOVT, then uses LDR/STR for the actual access. For
SETwith an immediate, r11 is also used as a scratch register.8051 Note: Variables occupy one byte each in internal RAM (direct addresses 0x08–0x7F).
SET/GETuse direct-addressing MOV instructions. Maximum 120 variables.
Memory Allocation
| Mnemonic | Syntax | Description |
|---|---|---|
BUFFER | BUFFER name, size | Allocate a named contiguous byte buffer of the given size |
BUFFER reserves a contiguous block of zero-initialized bytes in the data section (or internal RAM on 8051). Unlike VAR (which stores a single word), BUFFER allocates an arbitrary number of bytes.
Example:
BUFFER my_buf, 64 ; allocate 64 bytes named "my_buf"
GET R0, my_buf ; R0 = address of the buffer
LDI R1, 0x41 ; 'A'
STOREB R1, R0 ; write 'A' to first byte
BUFFER names follow the same rules as variable and label names. The size operand is a mandatory immediate specifying the number of bytes to allocate.
Storage Model:
| Backend | Storage | Location |
|---|---|---|
| x86-64 | Data section after code | After variables, before strings (8-byte aligned start address) |
| x86-32 | Data section after code | After variables, before strings |
| ARM | Data section after code | After variables, before strings |
| ARM64 | Data section after code | After variables, before strings |
| RISC-V | Data section after code | After variables, before strings |
| 8051 | Internal RAM | Consecutive bytes starting after variables (0x08+) |
The data layout with buffers is:
[ code section ][ variable data ][ buffer data ][ string data ]
Accessing buffer contents uses GET to obtain the base address, then LOADB/STOREB with register arithmetic for byte-level access.
8051 Note: Buffer bytes are allocated in internal RAM (direct addresses). Since 8051 RAM is limited to ~120 usable bytes, buffer sizes must be modest. Buffers share the address space with variables.
Operand Rules
Each instruction has a fixed operand shape enforced at parse time:
| Shape | Meaning | Example |
|---|---|---|
reg | A register (R0–R15) | NOT R0 |
reg, reg | Two registers | MOV R0, R1 |
reg, reg_or_imm | Register + register or immediate | ADD R0, R1 or ADD R0, 5 |
imm | An immediate value | INT 0x21 |
label | A label reference | JMP start |
| (none) | No operands | NOP, HLT, RET |
| Instruction | Shape |
|---|---|
| MOV | reg, reg |
| LDI | reg, imm |
| LOAD, STORE | reg, reg |
| ADD, SUB, MUL, DIV, AND, OR, XOR, SHL, SHR | reg, reg_or_imm |
| CMP | reg, reg_or_imm |
| NOT, INC, DEC | reg |
| PUSH, POP | reg |
| JMP, JZ, JNZ, JL, JG, CALL | label |
| INT | imm |
| NOP, HLT, RET, SYS | (none) |
| LDS | reg, string |
| LOADB, STOREB | reg, reg |
| VAR | name [, imm] |
| SET | name, reg_or_imm |
| GET | reg, name |
| BUFFER | name, size |
Incorrect operand shapes produce a compile-time error with the source line number.
Backend-Specific Notes
x86-64
- All operations are 64-bit (REX.W prefix)
LDIusesMOV r64, imm32(sign-extended to 64-bit)DIVis signed (IDIV) and includes a polyfill to save/restore RDXSHL/SHRwith a register operand saves/restores RCX (shift amount must be in CL)LOAD/STOREhandle the RSP (SIB byte) and RBP (displacement byte) special casesHLTemitsRET(0xC3) — returns control to the JIT runner or OSJLemits0F 8C rel32(6 bytes),JGemits0F 8F rel32(6 bytes)BUFFERallocates zero-initialized bytes in the data section after variables
x86-32 (IA-32)
- All operations are 32-bit (no REX prefix)
LDIusesMOV r32, imm32(5 bytes,B8+rd)INC/DECuse the single-byte encodings40+rd/48+rd(not available in 64-bit mode)PUSH/POPuse single-byte encodings50+rd/58+rdDIVis signed (IDIV) usingCDQfor sign extension (instead ofCQO)SHL/SHRwith a register operand saves/restores ECXLOAD/STOREhandle the ESP (SIB byte) and EBP (displacement byte) special casesHLTemitsRET(0xC3)JLemits0F 8C rel32(6 bytes),JGemits0F 8F rel32(6 bytes)BUFFERallocates zero-initialized bytes in the data section after variables- No JIT support — use
-arch x86for JIT execution
ARM (ARMv7-A)
- All instructions are 32-bit fixed width (4 bytes each)
- All instructions use condition code AL (always execute)
LDIusesMOVWfor values 0–65535 (4 bytes); addsMOVTfor larger values (8 bytes total)- ALU instructions with immediates: if the value fits in ARM’s rotated-imm8 encoding, it is encoded inline; otherwise, the value is loaded into r12 (scratch) first
MULuses the ARMMULinstruction;DIVusesSDIV(requires ARMv7VE / integer divide extension)SHL/SHRuse barrel-shifted MOV (LSL/LSR)JMP/JZ/JNZ/CALLuse ARM branch instructions with 24-bit signed offsets (±32 MB range)- Branch offsets account for the ARM pipeline (PC+8)
RETandHLTemitBX LR(branch to link register)INT #nemitsSVC #n(supervisor call)PUSHemitsSTR Rd, [SP, #-4]!(pre-indexed store with writeback)POPemitsLDR Rd, [SP], #4(post-indexed load)NOPemits the canonicalMOV R0, R0(0xE1A00000)- No JIT support on x86 hosts — use for cross-compilation only
8051/MCS-51
- All operations are 8-bit
- Arithmetic and logic route through the accumulator (A register)
LOAD/STOREuse indirect addressing (@R0or@R1only)MUL/DIVuse theMUL AB/DIV ABhardware instructions via the B registerSHL/SHRuse rotate instructions (RL A/RR A) — bits wrap aroundJZ/JNZare limited to ±127 bytes (8-bit relative offset)JLemitsJC rel8(2 bytes — carry flag set bySUBBmeans less-than)JGuses a 6-byte polyfill:JC $+4; JZ $+2; SJMP target(skip if less or equal, jump if strictly greater)BUFFERallocates consecutive bytes in internal RAM (shares address space with variables)JMP/CALLuse 16-bit absolute addressing (LJMP/LCALL)INT #nis polyfilled asLCALL (n*8)+3(standard interrupt vector table layout)HLTemitsSJMP $(0x80, 0xFE) — infinite self-loop