Home/Docs/Language Reference

UA Language Reference

This document is the complete reference for the Unified Assembly (UA) instruction set — the portable assembly language used by the UA compiler.


Table of Contents

  1. Source File Format
  2. Precompiler Directives
  3. Comments
  4. Labels
  5. Variables
  6. Functions
  7. Registers
  8. Numeric Literals
  9. String Literals
  10. Standard Libraries
  11. Instruction Set
  1. Operand Rules
  2. Backend-Specific Notes

Source File Format

  • File extension: .UA
  • Encoding: ASCII / UTF-8
  • One instruction per line
  • Whitespace (spaces and tabs) is ignored around operands
  • Blank lines are allowed
  • Lines are terminated by newline (\n or \r\n)

Example:

; A minimal UA program
    LDI  R0, 42
    LDI  R1, 8
    ADD  R0, R1
    HLT

Precompiler Directives

Before lexing, the UA precompiler evaluates lines starting with @. Directives are not assembly instructions — they control conditional compilation, file inclusion, and stub markers.

DirectiveDescription
@IF_ARCH <arch>Begin a conditional block — included only when -arch matches
@IF_SYS <system>Begin a conditional block — included only when -sys matches
@ENDIFClose the most recent @IF_ARCH or @IF_SYS block
@IMPORT <path>Include another .ua file (each file imported at most once)
@DUMMY [message]Emit a stub diagnostic to stderr; no code generated
@DEFINE <NAME> <VALUE>Define a compile-time text macro — every occurrence of NAME in subsequent non-directive lines is replaced with VALUE
@ARCH_ONLY <a>,<b>,...Abort compilation unless -arch matches one of the listed architectures
@SYS_ONLY <s>,<t>,...Abort compilation unless -sys matches one of the listed systems

Conditional Compilation

@IF_ARCH x86
    ; This section is only assembled when -arch x86
    LDI R0, 42
@ENDIF

@IF_SYS linux
    ; This section is only assembled when -sys linux
    INT #0x80
@ENDIF

Conditional blocks can be nested (up to 64 levels):

@IF_ARCH x86
    @IF_SYS win32
        ; x86 + Windows only
    @ENDIF
@ENDIF

File Import

@IMPORT "lib/utils.ua"
@IMPORT helpers.ua          ; unquoted form also accepted
  • Paths are resolved relative to the importing file's directory
  • Each unique file is imported once — duplicate @IMPORT lines are skipped
  • Imported files are preprocessed recursively (their directives are evaluated)
  • Import nesting is limited to 16 levels

Namespace Prefixing

All labels, function definitions, and variable declarations in an imported file are automatically prefixed with the filename (without extension and path):

; main.ua
@IMPORT "lib/math.ua"

    CALL math.add_values     ; calls add_values defined in math.ua
    GET  R0, math.result     ; accesses variable 'result' from math.ua
    math.double_it()         ; syntactic sugar for CALL math.double_it

The prefix is derived from the filename: lib/math.uamath, helpers.uahelpers.

This prevents name collisions when importing multiple files and provides clear provenance for every symbol.

Architecture & System Guards

@ARCH_ONLY and @SYS_ONLY are hard constraints — if the current target does not match any entry in the comma-separated list, compilation is immediately aborted with an error. This is ideal for library files that only make sense for specific platforms.

; This file only compiles for ARM-family targets
@ARCH_ONLY arm, arm64

; This file only compiles for Linux or macOS
@SYS_ONLY linux, macos

Unlike @IF_ARCH/@IF_SYS (which silently skip code), @ARCH_ONLY/@SYS_ONLY fail loudly and stop the build. Use @IF_* for conditional sections within a universal file; use @ARCH_ONLY/@SYS_ONLY to restrict an entire file to specific targets.

Compile-Time Macros (@DEFINE)

@DEFINE SCON    0x98
@DEFINE SBUF    0x99
@DEFINE BAUD    0xFD

Defines a compile-time text substitution. Every subsequent occurrence of NAME on non-directive lines is replaced with VALUE before the line reaches the lexer. Replacement is token-boundary-aware — only whole identifiers matching NAME are substituted (partial matches inside other words are not affected).

RuleDescription
Syntax@DEFINE <NAME> <VALUE> — name is an identifier; value is the rest of the line (trimmed)
LimitUp to 512 macros per compilation unit
Name lengthMaximum 63 characters
Value lengthMaximum 63 characters
ScopeGlobal — a @DEFINE is visible to all lines processed after it, including code from subsequent @IMPORT files
No redefinitionDefining the same name twice appends a second entry; the first match wins
Whole-token only@DEFINE P0 0x80 replaces P0 but not DPH0 or P0x

Typical use — hardware register definitions:

@IMPORT hw_mcs51              ; imports @DEFINE P0 0x80, SCON 0x98, etc.

    LDI  R0, 0x50
    LDI  R1, SCON              ; expands to: LDI R1, 0x98
    STORE R0, R1

Because macros are expanded before the lexer runs, the final token stream contains only literal numeric values — no runtime cost, no RAM overhead.

Hardware Definition Libraries

UA ships with hardware definition libraries that use @DEFINE to expose register addresses for common platforms. Import them with @IMPORT just like standard libraries:

LibraryTargetImport
hw_mcs51Intel 8051 SFRs@IMPORT hw_mcs51
hw_x86_pcx86 PC I/O ports@IMPORT hw_x86_pc
hw_riscv_virtRISC-V QEMU virt MMIO@IMPORT hw_riscv_virt
hw_arm_virtARM/ARM64 QEMU virt MMIO@IMPORT hw_arm_virt

Each library file begins with an @ARCH_ONLY guard that prevents accidental use on an incompatible target.

Stub Markers

@DUMMY This feature is not yet implemented
@DUMMY

Prints a diagnostic to stderr during compilation. No code is emitted.

Origin Address (@ORG)

@ORG 0x0000       ; decimal or hex address
@ORG 0x000B

Sets the origin address for subsequent code. In the generated binary the compiler pads the output with zero bytes until the target address is reached.

RuleDescription
Forward onlyThe target address must be ≥ the current program counter. Moving backwards is a fatal error.
All architectures@ORG is universal — it works on every supported backend (x86, x86_32, ARM, ARM64, RISC-V, MCS-51).
Bare-metal use casePrimarily used for placing interrupt vectors and ISRs at hardware-mandated addresses.

Example — 8051 interrupt vectors:

@ARCH_ONLY mcs51

@ORG 0x0000          ; reset vector
    JMP main

@ORG 0x000B          ; Timer 0 interrupt vector
timer0_isr:
    ; ... handle interrupt ...
    RETI

@ORG 0x0030          ; main program
main:
    ; initialisation code

Opcode Compliance

After parsing, the compiler validates every instruction against a per-opcode compliance table that specifies which architectures and systems support each opcode. If any instruction is not supported by the target, compilation fails with a clear diagnostic:

  UA Compliance Error
  -------------------
  Line 5: opcode 'SYS' is not supported on architecture 'mcs51'
  Supported architectures: x86, x86_32, arm, arm64, riscv

All 37 built-in opcodes are currently universal (supported on all architectures and systems). As architecture-specific instructions are added in future phases, the compliance table ensures safe, portable code — and @IF_ARCH / @ARCH_ONLY provide the mechanism to write platform-specific alternatives.


Comments

Line comments begin with ; and extend to the end of the line:

    LDI R0, 10      ; this is a comment
; this entire line is a comment

There are no block comments.


Labels

Labels mark code addresses for use with jump and call instructions. A label is an identifier followed by a colon (:):

start:
    LDI R0, 0
loop:
    INC R0
    CMP R0, 100
    JNZ loop
    HLT

Rules:

  • Labels consist of letters (a-z, A-Z), digits (0-9), underscores (_), and dots (.)
  • Labels must start with a letter or underscore
  • Maximum length: 128 characters
  • Labels are case-sensitive
  • Duplicate labels within a file are an error
  • Forward references are supported (two-pass assembly)
  • Dots are used for namespace-qualified names (e.g., math.start)

Variables

Variables are compiler-managed named storage locations. Unlike registers, variables persist across function calls and can be accessed from anywhere in the program.

Declaring Variables

    VAR  counter             ; declare variable, initialized to 0
    VAR  result, 42          ; declare variable with initial value

Writing to Variables

    SET  counter, R0         ; store register value into variable
    SET  result, 99          ; store immediate value into variable

Reading from Variables

    GET  R0, counter         ; load variable value into register
    GET  R1, result

Storage Model

Variables are stored differently depending on the target architecture:

BackendStorageSizeAddress Range
x86-64Data section after code8 bytesRIP-relative addressing
x86-32Data section after code4 bytesAbsolute addressing
ARMData section after code4 bytesMOVW/MOVT + LDR/STR via r12
8051Internal RAM (direct)1 byte0x08–0x7F

Rules:

  • Maximum 256 variables per program (120 for 8051 due to RAM limits)
  • Variable names follow the same rules as labels
  • Variables must be declared before use with SET or GET
  • VAR declarations with an initial value emit initialization code at the declaration point
  • On 8051, immediate values in SET are limited to 8 bits (0–255)

Example: Using Variables

    VAR  x, 10
    VAR  y, 20

    GET  R0, x           ; R0 = 10
    GET  R1, y           ; R1 = 20
    ADD  R0, R1          ; R0 = 30
    SET  x, R0           ; x = 30
    HLT

Functions

Functions are labels with declared parameter names. The parameter list documents which variables the function expects to be available.

Defining Functions

my_function(arg1, arg2):
    GET  R0, arg1
    GET  R1, arg2
    ADD  R0, R1
    RET

Both syntaxes are equivalent:

my_func:          ; plain label (no documented parameters)
my_func(a, b):    ; function definition (parameters a, b documented)

Calling Functions

Functions can be called with CALL or using syntactic sugar:

    CALL my_function         ; standard call
    my_function()            ; syntactic sugar — equivalent to CALL my_function
    CALL my_function(R0, R1) ; call with argument annotations

Rules

  • Maximum 8 parameters per function definition
  • Parameters must be declared as variables (using VAR) before the function is called
  • Function definitions are labels — they follow all label rules
  • The parameter list is metadata for documentation and validation; the actual argument passing uses SET/GET on the named variables

Complete Example

    JMP  main

    VAR  a
    VAR  b

add(a, b):
    GET  R0, a
    GET  R1, b
    ADD  R0, R1
    RET

main:
    SET  a, 15
    SET  b, 27
    CALL add
    ; R0 now contains 42
    HLT

Registers

UA provides 16 virtual registers named R0 through R15. The actual number of usable registers depends on the target backend:

BackendUsable RegistersNotes
x86-64R0–R7 (8)R8–R15 rejected (would require REX.B encoding)
x86-32R0–R7 (8)Maps to IA-32 32-bit registers
ARMR0–R7 (8)Maps directly to ARM r0–r7; r12 used as scratch
8051R0–R7 (8)Maps to bank-0 registers

Register names are case-insensitive: R0, r0, and R0 are the same register.

x86-64 Register Mapping

UA Registerx86-64 RegisterPurpose
R0RAXAccumulator / return value
R1RCXGeneral purpose
R2RDXGeneral purpose
R3RBXGeneral purpose (callee-saved)
R4RSPStack pointer
R5RBPBase pointer
R6RSIGeneral purpose
R7RDIGeneral purpose

Warning: R4 (RSP) and R5 (RBP) are the stack and base pointers. Modifying them directly can corrupt the stack.

x86-32 (IA-32) Register Mapping

UA Registerx86-32 RegisterPurpose
R0EAXAccumulator / return value
R1ECXGeneral purpose
R2EDXGeneral purpose
R3EBXGeneral purpose (callee-saved)
R4ESPStack pointer
R5EBPBase pointer
R6ESIGeneral purpose
R7EDIGeneral purpose

Warning: R4 (ESP) and R5 (EBP) are the stack and base pointers. Modifying them directly can corrupt the stack.

ARM (ARMv7-A) Register Mapping

UA RegisterARM RegisterPurpose
R0r0General purpose / return value
R1r1General purpose
R2r2General purpose
R3r3General purpose
R4r4General purpose
R5r5General purpose
R6r6General purpose
R7r7General purpose

Note: ARM r12 (IP) is used internally as a scratch register for large immediates. r13 (SP), r14 (LR), and r15 (PC) are reserved and cannot be used as UA registers.

8051 Register Mapping

UA Register8051 RegisterDirect Address
R0R00x00
R1R10x01
R2R20x02
R3R30x03
R4R40x04
R5R50x05
R6R60x06
R7R70x07

All registers are in 8051 register bank 0.

ARM64 (AArch64) Register Mapping

UA RegisterAArch64 RegisterPurpose
R0X0General purpose / return value
R1X1General purpose
R2X2General purpose
R3X3General purpose
R4X4General purpose
R5X5General purpose
R6X6General purpose
R7X7General purpose

Note: X9 and X10 are used internally as scratch registers. X30 (LR) and X31 (SP/XZR) are reserved.

RISC-V (RV64I) Register Mapping

UA RegisterRISC-V RegisterABI NamePurpose
R0x10a0Argument / return value
R1x11a1Argument
R2x12a2Argument
R3x13a3Argument
R4x14a4Argument
R5x15a5Argument
R6x16a6Argument
R7x17a7Argument / syscall number

Note: x5 (t0) and x6 (t1) are used internally as scratch registers. x0 (zero, hardwired), x1 (ra), and x2 (sp) are reserved.


Numeric Literals

Numbers can be expressed in three bases:

FormatPrefixExampleValue
Decimal(none)42, -7, 042, -7, 0
Hexadecimal0x or 0X0xFF, 0x1A255, 26
Binary0b or 0B0b1010, 0B1100110010, 204

Immediate values are prefixed with # in the instruction:

    LDI   R0, 42        ; decimal
    LDI   R1, 0xFF      ; hexadecimal
    AND   R0, 0b1111    ; binary mask

Range Limits

BackendImmediate RangeNotes
x86-64-2,147,483,648 to 2,147,483,64732-bit sign-extended to 64-bit
x86-32-2,147,483,648 to 2,147,483,64732-bit native
ARM-2,147,483,648 to 2,147,483,64732-bit via MOVW/MOVT
ARM64-2,147,483,648 to 2,147,483,64732-bit via MOVZ/MOVK (up to 64-bit with multiple MOVK)
RISC-V-2,147,483,648 to 2,147,483,64732-bit via LUI+ADDI
8051-128 to 2558-bit values

String Literals

String literals are enclosed in double quotes and used with the LDS instruction:

    LDS  R0, "Hello, World!\n"

Escape Sequences

SequenceCharacterByte Value
\nNewline0x0A
\tHorizontal tab0x09
\rCarriage return0x0D
\0Null byte0x00
\\Backslash0x5C
\"Double quote0x22

Any other character following a backslash is kept as-is.

Storage

String data is stored in a string table appended after the variable data section. Each string is null-terminated. Duplicate string literals are automatically de-duplicated by the backend — identical strings share the same storage.

The layout of the output binary is:

[ code section ][ variable data ][ string data ]

Standard Libraries

UA ships with standard library files in the lib/ directory adjacent to the compiler executable. Library files are imported using @IMPORT with the std_ prefix:

@IMPORT std_io
@IMPORT std_string

When a @IMPORT path starts with std_, the compiler automatically resolves it to the lib/ directory next to the executable, appending .ua if needed.

std_io

I/O functions for console output. Uses @IF_ARCH / @IF_SYS precompiler guards to provide platform-specific implementations.

FunctionDescription
std_io.printWrite a null-terminated string to stdout. Pass string address in R0. All registers may be clobbered.

Supported platforms:

ArchitectureSystemSyscall Convention
x86linuxSYSCALL — RAX=1 (write), RDI=fd, RSI=buf, RDX=count
x86win32Write dispatcher translates Linux convention to WriteFile API
x86_32linuxINT 0x80 — EAX=4 (write), EBX=fd, ECX=buf, EDX=count
armlinuxSVC #0 — R7=4 (write), R0=fd, R1=buf, R2=count
riscvlinuxECALL — a7=64 (write), a0=fd, a1=buf, a2=count

ARM64 and 8051 are not yet supported (ARM64: syscall register X8 not accessible; 8051: no OS).

@IMPORT std_io
    LDS  R0, "Hello, World!\n"
    CALL std_io.print
    HLT

std_string

String manipulation functions. Uses architecture-neutral MVIS instructions and works on all backends.

FunctionDescription
std_string.strlenCompute the length of a null-terminated string. Pass address in R0, result returned in R1. Clobbers R0, R2, R3.

8051 Note: strlen uses LOADB with R0 as the pointer (indirect addressing via @R0), which only accesses internal RAM (0x00–0xFF).

@IMPORT std_string
    LDS  R0, "test"
    CALL std_string.strlen   ; R1 = 4

std_math

Integer math utility functions. Architecture-neutral — works on all backends.

FunctionDescription
std_math.powInteger exponentiation. Set std_math.base and std_math.exp, then CALL std_math.pow. Result in R0.
std_math.factorialCompute n!. Set std_math.n, then CALL std_math.factorial. Result in R0.
std_math.maxReturn the larger of two values. Set std_math.a and std_math.b, then CALL std_math.max. Result in R0.
std_math.absAbsolute value. Set std_math.val, then CALL std_math.abs. Result in R0.

Example:

@IMPORT std_math

    ; Compute 2^10 = 1024
    SET  std_math.base, 2
    SET  std_math.exp, 10
    CALL std_math.pow        ; R0 = 1024

    ; Compute 5! = 120
    SET  std_math.n, 5
    CALL std_math.factorial  ; R0 = 120

    ; max(7, 42) = 42
    SET  std_math.a, 7
    SET  std_math.b, 42
    CALL std_math.max        ; R0 = 42

    ; abs(-15) = 15
    SET  std_math.val, -15
    CALL std_math.abs        ; R0 = 15

std_arrays

Byte-array utility functions for working with BUFFER-allocated memory. Architecture-neutral — works on all backends.

FunctionDescription
std_arrays.fill_bytesFill a buffer region with a byte value. Set std_arrays.dst (address), std_arrays.count (length), std_arrays.value (byte).
std_arrays.copy_bytesCopy bytes between buffers. Set std_arrays.src (source address), std_arrays.dst (dest address), std_arrays.count (length).

Example:

@IMPORT std_arrays

    BUFFER  my_buf, 32

    ; Fill 32 bytes with 0xFF
    GET  R0, my_buf
    SET  std_arrays.dst, R0
    SET  std_arrays.count, 32
    SET  std_arrays.value, 0xFF
    CALL std_arrays.fill_bytes

    ; Copy first 16 bytes of my_buf to another location
    BUFFER  other_buf, 16
    GET  R0, my_buf
    SET  std_arrays.src, R0
    GET  R0, other_buf
    SET  std_arrays.dst, R0
    SET  std_arrays.count, 16
    CALL std_arrays.copy_bytes

Instruction Set

UA defines 37 instructions organized into nine categories. This is the Minimum Viable Instruction Set (MVIS).

Data Movement

MnemonicSyntaxDescription
MOVMOV Rd, RsCopy register Rs into Rd
LDILDI Rd, #immLoad immediate value into Rd
LOADLOAD Rd, RsLoad from memory: Rd ← [Rs]
STORESTORE Rs, RdStore to memory: [Rd] ← Rs
LDSLDS Rd, "str"Load string address: Rd ← pointer to string literal
LOADBLOADB Rd, RsLoad byte from memory: Rd ← zero-extend(byte [Rs])
STOREBSTOREB Rs, RdStore byte to memory: byte [Rd] ← low byte of Rs

Examples:

    LDI   R0, 100       ; R0 = 100
    MOV   R1, R0        ; R1 = R0 (copy)
    LOAD  R2, R0        ; R2 = memory[R0]
    STORE R1, R0        ; memory[R0] = R1

8051 Note: LOAD and STORE use indirect addressing (@Ri) and require the pointer register to be R0 or R1.

LDS — Load String Address

LDS loads the address of a null-terminated string literal into a register. The string data is stored after the variable data section in the output binary. Duplicate strings are de-duplicated by the backend.

    LDS  R0, "Hello, World!\n"   ; R0 = pointer to string data

String literals support escape sequences: \n (newline), \t (tab), \r (carriage return), \0 (null), \\ (backslash), \" (double quote).

x86-64 Note: LDS uses LEA r64, [RIP+disp32] (RIP-relative addressing). The displacement is patched during code generation.

x86-32 Note: LDS uses LEA r32, [disp32] with an absolute address fixup.

ARM Note: LDS uses MOVW+MOVT to load the absolute string address.

ARM64 Note: LDS uses MOVZ+MOVK to load the absolute string address.

RISC-V Note: LDS uses LUI+ADDI to load the string address.

8051 Note: LDS emits MOV DPTR, #imm16 (stub — 8051 has limited string support).

LOADB / STOREB — Byte-Granularity Memory Access

LOADB reads a single byte from the address in Rs and zero-extends it into Rd. STOREB writes the low byte of Rs to the address in Rd.

    LOADB R1, R0     ; R1 = zero-extend(byte at address R0)
    STOREB R1, R0    ; byte at address R0 = low byte of R1

These instructions are essential for traversing null-terminated strings character by character.

Arithmetic

MnemonicSyntaxDescription
ADDADD Rd, Rs or ADD Rd, #immRd = Rd + Rs/imm
SUBSUB Rd, Rs or SUB Rd, #immRd = Rd - Rs/imm
MULMUL Rd, Rs or MUL Rd, #immRd = Rd × Rs/imm
DIVDIV Rd, Rs or DIV Rd, #immRd = Rd ÷ Rs/imm
INCINC RdRd = Rd + 1
DECDEC RdRd = Rd - 1

Examples:

    LDI   R0, 10
    LDI   R1, 3
    ADD   R0, R1         ; R0 = 13
    SUB   R0, 1          ; R0 = 12
    MUL   R0, R1         ; R0 = 36
    DIV   R0, R1         ; R0 = 12
    INC   R0             ; R0 = 13
    DEC   R0             ; R0 = 12

x86-64 Note: DIV uses signed division (IDIV). The polyfill saves and restores RDX because IDIV clobbers RDX:RAX.

8051 Note: MUL and DIV use the hardware MUL AB / DIV AB instructions via the accumulator (A) and B register.

Bitwise Logic

MnemonicSyntaxDescription
ANDAND Rd, Rs or AND Rd, #immRd = Rd & Rs/imm
OROR Rd, Rs or OR Rd, #immRd = Rd | Rs/imm
XORXOR Rd, Rs or XOR Rd, #immRd = Rd ^ Rs/imm
NOTNOT RdRd = ~Rd (bitwise complement)

Examples:

    LDI   R0, 0xFF
    AND   R0, 0x0F       ; R0 = 0x0F (mask lower nibble)
    OR    R0, 0xF0       ; R0 = 0xFF
    XOR   R0, 0xFF       ; R0 = 0x00
    NOT   R0             ; R0 = ~R0

Shift Operations

MnemonicSyntaxDescription
SHLSHL Rd, Rs or SHL Rd, #immShift Rd left by Rs/imm bits
SHRSHR Rd, Rs or SHR Rd, #immShift Rd right by Rs/imm bits

Examples:

    LDI   R0, 1
    SHL   R0, 4          ; R0 = 16 (1 << 4)
    SHR   R0, 2          ; R0 = 4  (16 >> 2)

x86-64 Note: Register-based shifts use the CL register. The backend saves/restores RCX automatically.

8051 Note: SHL emits RL A (rotate left), SHR emits RR A (rotate right), repeated n times for immediate operands. These are rotations, not true logical shifts — bits wrap around.

Comparison

MnemonicSyntaxDescription
CMPCMP Ra, Rb or CMP Ra, #immCompare Ra with Rb/imm (sets flags)

CMP performs a subtraction without storing the result. The flags (zero, carry, sign) are set and used by subsequent conditional jumps (JZ, JNZ, JL, JG).

Example:

    CMP   R0, R1
    JZ    equal          ; jump if R0 == R1
    JNZ   not_equal      ; jump if R0 != R1
    JL    less           ; jump if R0 < R1 (signed)
    JG    greater        ; jump if R0 > R1 (signed)

8051 Note: Register comparison uses CLR C; SUBB A,Rn. Immediate comparison uses CJNE A,#imm,$+3.

Control Flow

MnemonicSyntaxDescription
JMPJMP labelUnconditional jump
JZJZ labelJump if zero flag is set (equal after CMP)
JNZJNZ labelJump if zero flag is clear (not equal after CMP)
JLJL labelJump if less (signed, after CMP)
JGJG labelJump if greater (signed, after CMP)
CALLCALL labelCall subroutine (pushes return address)
RETRETReturn from subroutine

Example:

    CALL  my_function
    HLT

my_function:
    LDI   R0, 42
    RET

x86-64 Note: JMP, JZ, JNZ, JL, JG, and CALL use 32-bit relative offsets (rel32), allowing jumps up to ±2 GB. JL emits 0F 8C rel32 (JL near), JG emits 0F 8F rel32 (JG near).

ARM Note: JL emits BLT (condition code 0xB), JG emits BGT (condition code 0xC). Both use 24-bit signed offsets (±32 MB).

ARM64 Note: JL emits B.LT (condition 0xB), JG emits B.GT (condition 0xC). Both use 19-bit signed offsets (±1 MB).

RISC-V Note: JL emits BLT t0, x0 (branch if scratch < 0 after CMP subtraction). JG emits BLT x0, t0 (swapped operands: branch if 0 < scratch, i.e., result > 0). Both use B-type encoding with a 12-bit signed offset.

8051 Note: JMP and CALL use 16-bit absolute addresses (LJMP/LCALL). JZ and JNZ use 8-bit relative offsets (range: -128 to +127 bytes). JL emits JC rel8 (2 bytes — carry flag is set by SUBB if the first operand is less). JG uses a 6-byte polyfill: JC $+4; JZ $+2; SJMP target (skip if less or equal, jump if strictly greater).

Stack Operations

MnemonicSyntaxDescription
PUSHPUSH RsPush register onto the stack
POPPOP RdPop top of stack into register

Example:

    LDI   R0, 99
    PUSH  R0             ; save R0
    LDI   R0, 0          ; overwrite R0
    POP   R0             ; restore R0 (R0 = 99 again)

8051 Note: PUSH and POP use direct addressing (PUSH direct / POP direct) with the register's bank-0 address (0x00–0x07).

System

MnemonicSyntaxDescription
INTINT #immSoftware interrupt
SYSSYSSystem call (OS-level)
NOPNOPNo operation
HLTHLTHalt execution

Examples:

    NOP                  ; do nothing
    INT   0x21           ; software interrupt 0x21
    SYS                  ; invoke OS system call
    HLT                  ; stop

x86-64 Note: HLT generates RET (return to caller / OS). INT generates the native INT n instruction (CD nn). SYS generates the SYSCALL instruction (0F 05).

x86-32 Note: SYS generates INT 0x80 (CD 80) for Linux system calls.

ARM Note: SYS generates SVC #0 (supervisor call).

ARM64 Note: SYS generates SVC #0.

RISC-V Note: SYS generates ECALL.

8051 Note: HLT generates an infinite loop (SJMP $, opcode 80 FE). INT #n generates LCALL to the interrupt vector address (n × 8) + 3. SYS is not supported (no operating system on 8051).

Variable Instructions

MnemonicSyntaxDescription
VARVAR name or VAR name, #immDeclare a named variable (optional initializer)
SETSET name, Rs or SET name, #immStore a register or immediate into a variable
GETGET Rd, nameLoad a variable's value into a register

Examples:

    VAR  counter, 0          ; declare with initial value
    VAR  flags               ; declare (default 0)

    LDI  R0, 42
    SET  counter, R0         ; counter = 42
    SET  flags, 0xFF         ; flags = 0xFF

    GET  R1, counter         ; R1 = 42
    GET  R2, flags           ; R2 = 0xFF

x86-64 Note: Variables are stored as 8-byte values in a data section appended after code. SET/GET use RIP-relative MOV instructions with 32-bit displacement.

x86-32 Note: Variables are 4-byte values accessed via absolute [disp32] addressing.

ARM Note: Variables are 4-byte values. The compiler loads the variable address into r12 (scratch) using MOVW+MOVT, then uses LDR/STR for the actual access. For SET with an immediate, r11 is also used as a scratch register.

8051 Note: Variables occupy one byte each in internal RAM (direct addresses 0x08–0x7F). SET/GET use direct-addressing MOV instructions. Maximum 120 variables.

Memory Allocation

MnemonicSyntaxDescription
BUFFERBUFFER name, sizeAllocate a named contiguous byte buffer of the given size

BUFFER reserves a contiguous block of zero-initialized bytes in the data section (or internal RAM on 8051). Unlike VAR (which stores a single word), BUFFER allocates an arbitrary number of bytes.

Example:

    BUFFER  my_buf, 64       ; allocate 64 bytes named "my_buf"
    GET     R0, my_buf       ; R0 = address of the buffer
    LDI     R1, 0x41         ; 'A'
    STOREB  R1, R0           ; write 'A' to first byte

BUFFER names follow the same rules as variable and label names. The size operand is a mandatory immediate specifying the number of bytes to allocate.

Storage Model:

BackendStorageLocation
x86-64Data section after codeAfter variables, before strings (8-byte aligned start address)
x86-32Data section after codeAfter variables, before strings
ARMData section after codeAfter variables, before strings
ARM64Data section after codeAfter variables, before strings
RISC-VData section after codeAfter variables, before strings
8051Internal RAMConsecutive bytes starting after variables (0x08+)

The data layout with buffers is:

[ code section ][ variable data ][ buffer data ][ string data ]

Accessing buffer contents uses GET to obtain the base address, then LOADB/STOREB with register arithmetic for byte-level access.

8051 Note: Buffer bytes are allocated in internal RAM (direct addresses). Since 8051 RAM is limited to ~120 usable bytes, buffer sizes must be modest. Buffers share the address space with variables.


Operand Rules

Each instruction has a fixed operand shape enforced at parse time:

ShapeMeaningExample
regA register (R0–R15)NOT R0
reg, regTwo registersMOV R0, R1
reg, reg_or_immRegister + register or immediateADD R0, R1 or ADD R0, 5
immAn immediate valueINT 0x21
labelA label referenceJMP start
(none)No operandsNOP, HLT, RET
InstructionShape
MOVreg, reg
LDIreg, imm
LOAD, STOREreg, reg
ADD, SUB, MUL, DIV, AND, OR, XOR, SHL, SHRreg, reg_or_imm
CMPreg, reg_or_imm
NOT, INC, DECreg
PUSH, POPreg
JMP, JZ, JNZ, JL, JG, CALLlabel
INTimm
NOP, HLT, RET, SYS(none)
LDSreg, string
LOADB, STOREBreg, reg
VARname [, imm]
SETname, reg_or_imm
GETreg, name
BUFFERname, size

Incorrect operand shapes produce a compile-time error with the source line number.


Backend-Specific Notes

x86-64

  • All operations are 64-bit (REX.W prefix)
  • LDI uses MOV r64, imm32 (sign-extended to 64-bit)
  • DIV is signed (IDIV) and includes a polyfill to save/restore RDX
  • SHL/SHR with a register operand saves/restores RCX (shift amount must be in CL)
  • LOAD/STORE handle the RSP (SIB byte) and RBP (displacement byte) special cases
  • HLT emits RET (0xC3) — returns control to the JIT runner or OS
  • JL emits 0F 8C rel32 (6 bytes), JG emits 0F 8F rel32 (6 bytes)
  • BUFFER allocates zero-initialized bytes in the data section after variables

x86-32 (IA-32)

  • All operations are 32-bit (no REX prefix)
  • LDI uses MOV r32, imm32 (5 bytes, B8+rd)
  • INC/DEC use the single-byte encodings 40+rd/48+rd (not available in 64-bit mode)
  • PUSH/POP use single-byte encodings 50+rd/58+rd
  • DIV is signed (IDIV) using CDQ for sign extension (instead of CQO)
  • SHL/SHR with a register operand saves/restores ECX
  • LOAD/STORE handle the ESP (SIB byte) and EBP (displacement byte) special cases
  • HLT emits RET (0xC3)
  • JL emits 0F 8C rel32 (6 bytes), JG emits 0F 8F rel32 (6 bytes)
  • BUFFER allocates zero-initialized bytes in the data section after variables
  • No JIT support — use -arch x86 for JIT execution

ARM (ARMv7-A)

  • All instructions are 32-bit fixed width (4 bytes each)
  • All instructions use condition code AL (always execute)
  • LDI uses MOVW for values 0–65535 (4 bytes); adds MOVT for larger values (8 bytes total)
  • ALU instructions with immediates: if the value fits in ARM’s rotated-imm8 encoding, it is encoded inline; otherwise, the value is loaded into r12 (scratch) first
  • MUL uses the ARM MUL instruction; DIV uses SDIV (requires ARMv7VE / integer divide extension)
  • SHL/SHR use barrel-shifted MOV (LSL/LSR)
  • JMP/JZ/JNZ/CALL use ARM branch instructions with 24-bit signed offsets (±32 MB range)
  • Branch offsets account for the ARM pipeline (PC+8)
  • RET and HLT emit BX LR (branch to link register)
  • INT #n emits SVC #n (supervisor call)
  • PUSH emits STR Rd, [SP, #-4]! (pre-indexed store with writeback)
  • POP emits LDR Rd, [SP], #4 (post-indexed load)
  • NOP emits the canonical MOV R0, R0 (0xE1A00000)
  • No JIT support on x86 hosts — use for cross-compilation only

8051/MCS-51

  • All operations are 8-bit
  • Arithmetic and logic route through the accumulator (A register)
  • LOAD/STORE use indirect addressing (@R0 or @R1 only)
  • MUL/DIV use the MUL AB / DIV AB hardware instructions via the B register
  • SHL/SHR use rotate instructions (RL A / RR A) — bits wrap around
  • JZ/JNZ are limited to ±127 bytes (8-bit relative offset)
  • JL emits JC rel8 (2 bytes — carry flag set by SUBB means less-than)
  • JG uses a 6-byte polyfill: JC $+4; JZ $+2; SJMP target (skip if less or equal, jump if strictly greater)
  • BUFFER allocates consecutive bytes in internal RAM (shares address space with variables)
  • JMP/CALL use 16-bit absolute addressing (LJMP/LCALL)
  • INT #n is polyfilled as LCALL (n*8)+3 (standard interrupt vector table layout)
  • HLT emits SJMP $ (0x80, 0xFE) — infinite self-loop