UA Beginner's Guide

Welcome to Unified Assembly (UA) — a portable assembly language that lets you write low-level code once and compile it to six different CPU architectures. This guide will take you from zero to running programs.

What Is UA?
Building the Compiler
Your First Program
Understanding the Basics
Working with Numbers
Variables
Functions
Hello World with I/O
Standard Libraries
Conditional Compilation
Common Patterns
Architecture-Specific Code
Standard Libraries and Cross-Platform I/O
Tutorial: Your First Interactive App (The UA Calculator)
Using @DEFINE — Hardware Constants Without Runtime Cost
Working with Arrays (std_array)
Working with Vectors (std_vector)
File I/O with std_iostream
What to Read Next

What Is UA?

Most assembly languages are tied to a single CPU — x86 assembly only works on Intel/AMD chips, ARM assembly only works on ARM chips, etc. UA changes that. You write your program once using UA's unified instruction set, then compile it to any of these targets:

Architecture	Flag	Typical Use
x86-64	`-arch x86`	Desktop PCs, servers (64-bit)
x86-32 (IA-32)	`-arch x86_32`	Legacy 32-bit x86 systems
ARM (ARMv7-A)	`-arch arm`	Smartphones, Raspberry Pi
ARM64 (AArch64)	`-arch arm64`	Apple Silicon, modern ARM servers
RISC-V (RV64I)	`-arch riscv`	Open-source hardware, embedded
8051 (MCS-51)	`-arch mcs51`	Microcontrollers, embedded

The core instruction set — called MVIS (Minimum Viable Instruction Set) — is identical across all targets. The compiler handles the translation to native machine code behind the scenes.

Building the Compiler

UA is a single-binary compiler written in C99 with zero dependencies. Build it with one command:

Linux / macOS:

cd src
gcc -std=c99 -Wall -Wextra -pedantic -o ua \
    main.c lexer.c parser.c codegen.c precompiler.c \
    backend_8051.c backend_x86_64.c backend_x86_32.c \
    backend_arm.c backend_arm64.c backend_risc_v.c \
    emitter_pe.c emitter_elf.c emitter_macho.c

Windows:

cd src
gcc -std=c99 -Wall -Wextra -pedantic -o ua.exe ^
    main.c lexer.c parser.c codegen.c precompiler.c ^
    backend_8051.c backend_x86_64.c backend_x86_32.c ^
    backend_arm.c backend_arm64.c backend_risc_v.c ^
    emitter_pe.c emitter_elf.c emitter_macho.c

That's it. No build system, no package manager, no dependencies.

Your First Program

Create a file called first.ua:

; first.ua — my first UA program
    LDI  R0, 10       ; load the number 10 into register R0
    LDI  R1, 32       ; load 32 into R1
    ADD  R0, R1        ; R0 = R0 + R1 = 42
    HLT                ; stop execution

Compile and Run

JIT execution (runs immediately on your x86-64 machine):

ua first.ua -arch x86 --run

Compile to a Linux ELF binary:

ua first.ua -arch x86 -sys linux -o first
./first

Compile to a Windows executable:

ua first.ua -arch x86 -sys win32 -o first.exe
first.exe

Cross-compile for a different architecture:

ua first.ua -arch arm -o first.bin          # ARM binary
ua first.ua -arch riscv -sys linux -o first  # RISC-V ELF
ua first.ua -arch mcs51 -o firmware.bin      # 8051 firmware

Understanding the Basics

Registers

Registers are the fastest storage locations inside a CPU. UA gives you 8 general-purpose registers named R0 through R7. Think of them as named variables that live directly in the CPU:

Register	Typical Use
`R0`	Accumulator, return values
`R1`–`R3`	General computation
`R4`	Stack pointer (use with care!)
`R5`	Base pointer (use with care!)
`R6`–`R7`	General computation

Tip: Avoid modifying R4 and R5 directly — they manage the call stack on most architectures.

The compiler maps these to real hardware registers automatically. For example, R0 becomes RAX on x86-64, r0 on ARM, and a0 on RISC-V.

Instructions

Every UA instruction fits on one line and follows this pattern:

    MNEMONIC  operand1, operand2    ; optional comment

For example:

    LDI   R0, 42       ; load immediate: R0 = 42
    ADD   R0, R1        ; add registers:  R0 = R0 + R1
    MOV   R2, R0        ; copy register:  R2 = R0
    INC   R0            ; increment:      R0 = R0 + 1
    NOP                 ; do nothing (no operands)

Operands can be:

Registers: R0, R1, ... R7
Immediates (numbers): 42, 0xFF, 0b1010
Labels: start, loop, my_function
Strings: "Hello\n"

Labels and Control Flow

Labels mark positions in your code. They end with a colon and let you jump around:

start:
    LDI  R0, 0          ; R0 = 0 (our counter)

loop:
    INC  R0              ; R0 = R0 + 1
    CMP  R0, 10          ; compare R0 with 10
    JNZ  loop            ; if R0 != 10, go back to "loop"

    ; When we get here, R0 = 10
    HLT

Jump instructions:

Instruction	Meaning
`JMP label`	Always jump to label
`JZ label`	Jump if last comparison was equal (zero)
`JNZ label`	Jump if not equal (not zero)
`JL label`	Jump if less than (signed)
`JG label`	Jump if greater than (signed)

All jump instructions follow a CMP instruction that sets the CPU flags.

Comments

Comments start with ; and extend to the end of the line:

    LDI  R0, 42     ; this is a comment
; this entire line is a comment

Working with Numbers

UA supports three number formats:

Format	Examples	Notes
Decimal	`42`, `-7`, `0`	Default
Hexadecimal	`0xFF`, `0x1A`	Prefix `0x`
Binary	`0b1010`, `0b11001100`	Prefix `0b`

    LDI  R0, 255        ; decimal
    LDI  R1, 0xFF       ; hex (same as 255)
    LDI  R2, 0b11111111 ; binary (same as 255)

Variables

Registers are fast but limited. Variables give you named storage that persists across function calls:

    ; Declare variables
    VAR  counter, 0      ; variable "counter" initialized to 0
    VAR  result           ; variable "result" (default: 0)

    ; Write to a variable
    LDI  R0, 42
    SET  counter, R0     ; counter = 42
    SET  result, 99      ; result = 99 (immediate value)

    ; Read from a variable
    GET  R0, counter     ; R0 = 42
    GET  R1, result      ; R1 = 99

Complete example — counting to 10:

    VAR count, 0

    LDI R0, 0

loop:
    INC  R0
    SET  count, R0
    CMP  R0, 10
    JNZ  loop

    ; count = 10, R0 = 10
    HLT

Functions

Functions in UA are labels with an optional parameter list. You call them with CALL and return with RET:

    JMP  main              ; skip past function definitions

; Function: add two numbers
; Input:  variables a, b
; Output: R0 = a + b
add(a, b):
    GET  R0, a
    GET  R1, b
    ADD  R0, R1
    RET

main:
    VAR  a
    VAR  b

    SET  a, 15
    SET  b, 27
    CALL add
    ; R0 now holds 42

    HLT

Key points:

Function parameters are just variable names — they document what the function expects
Arguments are passed via named variables (SET before CALL)
Return values are typically left in R0
CALL pushes the return address; RET pops it and jumps back
Always JMP past function bodies at the top of your program (or they'll execute when the program starts!)

Syntactic Sugar

You can call functions with a shorthand syntax:

    add()                  ; same as: CALL add
    std_io.print()         ; same as: CALL std_io.print

Hello World with I/O

UA includes a standard I/O library for printing to the console. Here's the classic Hello World:

@IMPORT std_io

    LDS   R0, "Hello, World!\n"
    CALL  std_io.print
    HLT

Step by step:

@IMPORT std_io — imports the standard I/O library (provides std_io.print)
LDS R0, "Hello, World!\n" — loads the address of the string into R0
CALL std_io.print — calls the print function (expects string address in R0)
HLT — stops the program

Compile and run:

ua hello.ua -arch x86 -sys linux -o hello
./hello
Hello, World!

String Escape Sequences

Escape	Meaning
`\n`	Newline
`\t`	Tab
`\r`	Carriage return
`\0`	Null byte
`\\`	Backslash
`\"`	Double quote

Standard Libraries

UA ships with several standard libraries, all written in UA itself:

std_io — Console Output

@IMPORT std_io

    LDS  R0, "Hello!\n"
    CALL std_io.print       ; print string pointed to by R0

std_string — String Utilities

@IMPORT std_string

    LDS  R0, "test"
    CALL std_string.strlen  ; R1 = 4 (string length)

std_math — Integer Math

@IMPORT std_math

    ; Power: 2^10 = 1024
    SET  std_math.base, 2
    SET  std_math.exp, 10
    CALL std_math.pow        ; R0 = 1024

    ; Factorial: 5! = 120
    SET  std_math.n, 5
    CALL std_math.factorial  ; R0 = 120

    ; Maximum: max(7, 42) = 42
    SET  std_math.a, 7
    SET  std_math.b, 42
    CALL std_math.max        ; R0 = 42

    ; Absolute value: abs(-15) = 15
    SET  std_math.val, -15
    CALL std_math.abs        ; R0 = 15

std_arrays — Byte Array Operations

@IMPORT std_arrays

    BUFFER my_buf, 32

    ; Fill buffer with 0xFF
    GET  R0, my_buf
    SET  std_arrays.dst, R0
    SET  std_arrays.count, 32
    SET  std_arrays.value, 0xFF
    CALL std_arrays.fill_bytes

File I/O with `std_iostream`

Read and write files using stream-style operations.
Supported on: x86, x86_32, arm, arm64, riscv (not 8051/MCS-51).

@IMPORT std_io
@IMPORT std_iostream

    BUFFER read_buf, 128

    ; --- Write to a file ---
    LDS  R0, "output.txt"
    SET  std_iostream.iostream_path, R0
    SET  std_iostream.iostream_mode, 1        ; 1 = write (create/truncate)
    CALL std_iostream.fopen

    LDS  R0, "Hello, file!\n"
    SET  std_iostream.iostream_buf, R0
    SET  std_iostream.iostream_count, 13
    CALL std_iostream.fwrite

    CALL std_iostream.fclose

    ; --- Read it back ---
    LDS  R0, "output.txt"
    SET  std_iostream.iostream_path, R0
    SET  std_iostream.iostream_mode, 0        ; 0 = read
    CALL std_iostream.fopen

    GET  R0, read_buf
    SET  std_iostream.iostream_buf, R0
    SET  std_iostream.iostream_count, 128
    CALL std_iostream.fread                   ; R0 = bytes read

    GET  R0, read_buf
    CALL std_io.print                         ; prints "Hello, file!\n"

    CALL std_iostream.fclose

Shared variables (set before calling a function):

Variable	Purpose
`iostream_path`	Pointer to null-terminated file path string
`iostream_mode`	`0` = read, `1` = write (create/truncate)
`iostream_buf`	Pointer to source or destination buffer
`iostream_count`	Number of bytes to read or write
`iostream_fd`	File descriptor / handle (set by `fopen`)

Conditional Compilation

UA's precompiler lets you write platform-specific code in a single file:

@IF_ARCH / @IF_SYS — Conditional Blocks

@IF_ARCH x86
    ; This code is only compiled for x86-64
    LDI R0, 64
@ENDIF

@IF_ARCH mcs51
    ; This code is only compiled for 8051
    LDI R0, 8
@ENDIF

@IF_SYS linux
    ; Linux-only code
    SYS
@ENDIF

@ARCH_ONLY / @SYS_ONLY — Hard Guards

If your entire file is architecture-specific, use hard guards at the top:

@ARCH_ONLY x86, x86_32     ; abort compilation if target isn't x86 family
@SYS_ONLY linux, win32      ; abort compilation if system isn't Linux or Windows

Unlike @IF_ARCH (which silently skips code), @ARCH_ONLY stops compilation with an error if the target doesn't match.

Common Patterns

Counting Loop

    LDI  R0, 0          ; counter = 0
    LDI  R1, 10         ; limit = 10

loop:
    ; ... do work with R0 ...
    INC  R0
    CMP  R0, R1
    JNZ  loop            ; repeat until R0 == 10
    HLT

Conditional (If/Else)

    CMP  R0, 0
    JZ   is_zero         ; if R0 == 0, go to is_zero

not_zero:
    ; R0 is not zero — do something
    JMP  done

is_zero:
    ; R0 is zero — do something else

done:
    HLT

Maximum of Two Values

    ; Assume R0 and R1 hold two values
    CMP  R0, R1
    JG   r0_bigger       ; if R0 > R1, skip

    MOV  R0, R1          ; R0 was smaller, replace with R1

r0_bigger:
    ; R0 now holds the larger value
    HLT

Stack Save/Restore

my_function:
    PUSH R3              ; save R3 (we'll clobber it)
    PUSH R6              ; save R6

    ; ... use R3 and R6 freely ...

    POP  R6              ; restore R6 (LIFO order!)
    POP  R3              ; restore R3
    RET

Working with Byte Buffers

    BUFFER data, 16          ; allocate 16 bytes
    GET    R0, data          ; R0 = address of buffer

    ; Write 'A' to first byte
    LDI    R1, 0x41          ; 'A' = 0x41
    STOREB R1, R0            ; buffer[0] = 'A'

    ; Read it back
    LOADB  R2, R0            ; R2 = buffer[0] = 0x41

Complete Program: Sum 1 to N

@IMPORT std_io

    JMP main

; sum_n: compute 1 + 2 + ... + R0
; Input:  R0 = N
; Output: R0 = sum
sum_n:
    LDI  R1, 0           ; R1 = accumulator (sum)
    LDI  R2, 1           ; R2 = counter (starts at 1)

sum_loop:
    ADD  R1, R2           ; sum += counter
    INC  R2               ; counter++
    CMP  R2, R0
    JG   sum_done         ; if counter > N, done
    JMP  sum_loop

sum_done:
    ADD  R1, R0           ; add N itself
    MOV  R0, R1           ; return sum in R0
    RET

main:
    LDI  R0, 10           ; compute sum(1..10)
    CALL sum_n
    ; R0 = 55
    HLT

Architecture-Specific Code

The MVIS instructions work everywhere, but UA also offers architecture-specific opcodes for when you need hardware features unique to a particular CPU. These opcodes will only compile for their supported architectures.

For a quick overview:

Opcode	Architectures	Purpose
`CPUID`	x86, x86_32	Query CPU identification info
`RDTSC`	x86, x86_32	Read hardware timestamp counter
`BSWAP`	x86, x86_32	Byte-swap register (endian conversion)
`PUSHA`	x86_32 only	Push all general-purpose registers
`POPA`	x86_32 only	Pop all general-purpose registers
`DJNZ`	mcs51 only	Decrement register and jump if not zero
`CJNE`	mcs51 only	Compare and jump if not equal
`SETB`	mcs51 only	Set a bit
`CLR`	mcs51 only	Clear a bit or the accumulator
`RETI`	mcs51 only	Return from interrupt
`WFI`	arm, arm64, riscv	Wait for interrupt (low-power sleep)
`DMB`	arm, arm64	Data memory barrier
`EBREAK`	riscv only	Debugger breakpoint
`FENCE`	riscv only	Memory ordering fence

Using an architecture-specific opcode on the wrong target will produce a compliance error at compile time:

  UA Compliance Error
  -------------------
  Line 5: opcode 'CPUID' is not supported on architecture 'arm'
  Supported architectures: x86, x86_32

Use @IF_ARCH to guard architecture-specific code:

@IF_ARCH x86
    CPUID                ; only compiled for x86-64
@ENDIF

@IF_ARCH mcs51
    DJNZ R0, loop       ; only compiled for 8051
@ENDIF

For full details, see:

MVIS Opcodes Reference — complete MVIS instruction set
Architecture-Specific Opcodes — non-MVIS instructions per architecture

Standard Libraries and Cross-Platform I/O

One of UA's most powerful features is writing portable I/O code that works across every supported target — without you needing to know the underlying operating system or CPU conventions. This chapter explains how the std_io library achieves this and how you can use it in your own programs.

The Problem: Every Platform Is Different

Printing a string to the screen sounds simple, but every OS and CPU does it differently:

Platform	Mechanism	Syscall Write	Syscall Read	Syscall Register
x86-64 Linux	`SYSCALL`	1	0	RAX (R0)
x86-32 Linux	`INT 0x80`	4	3	EAX (R0)
ARM Linux	`SVC #0`	4	3	r7 (R7)
ARM64 Linux	`SVC #0`	64	63	X8 (via R7)
RISC-V Linux	`ECALL`	64	63	a7 (R7)
x86-64 Windows	Win32 API	WriteFile	ReadFile	dispatcher

The syscall numbers are different, the registers for arguments are different, and even the instruction to trigger the call is different. Writing raw syscalls would force you to maintain six separate codepaths.

The Solution: `std_io` and the Precompiler

UA solves this with two mechanisms working together:

The SYS instruction — a single MVIS opcode that emits the correct syscall instruction for the target (SYSCALL, INT 0x80, SVC #0, ECALL, or a Win32 API call).
Precompiler conditionals (@IF_ARCH, @IF_SYS) — let the library set up the right registers for each platform, all in one source file.

The std_io library uses these to give you two simple, universal functions:

@IMPORT std_io

    ; --- Printing ---
    LDS  R0, "Hello, world!\n"
    CALL std_io.print          ; print null-terminated string at R0

    ; --- Reading ---
    BUFFER input, 64           ; allocate a 64-byte buffer
    GET  R0, input             ; R0 = buffer address
    LDI  R1, 64               ; R1 = max bytes to read
    CALL std_io.read           ; read from stdin into buffer
    ; R0 = bytes actually read (on Linux)

That's it. The same code compiles unchanged for x86-64, ARM, ARM64, RISC-V, x86-32, on both Linux and Windows (where applicable).

How `std_io.print` Works (Under the Hood)

Let's trace what happens when you call std_io.print on different platforms.

Every version of print follows the same algorithm:

Save the buffer pointer into the register the OS expects.
Compute strlen — walk bytes with LOADB until a null byte is found.
Load the syscall number and file descriptor into their platform-specific registers.
Call SYS to invoke the OS.
RET back to the caller.

The only things that change between platforms are which registers hold what and which syscall number to use.

Example: x86-64 Linux vs. ARM Linux

x86-64 Linux — the write syscall expects:

R0 (RAX) = syscall number (1)
R7 (RDI) = file descriptor (1 = stdout)
R6 (RSI) = buffer pointer
R2 (RDX) = byte count

@IF_ARCH x86
print:
    MOV  R6, R0          ; RSI = buf
    ; ... strlen loop sets R2 = length ...
    LDI  R0, 1           ; RAX = 1 (write)
    LDI  R7, 1           ; RDI = 1 (stdout)
    SYS                  ; SYSCALL
    RET
@ENDIF

ARM Linux — the write syscall expects:

R7 = syscall number (4)
R0 = file descriptor (1 = stdout)
R1 = buffer pointer
R2 = byte count

@IF_ARCH arm
@IF_SYS linux
print:
    MOV  R1, R0          ; r1 = buf
    ; ... strlen loop sets R2 = length ...
    LDI  R7, 4           ; r7 = 4 (write)
    LDI  R0, 1           ; r0 = 1 (stdout)
    SYS                  ; SVC #0
    RET
@ENDIF
@ENDIF

Notice how the algorithm is identical — only the register assignments and syscall numbers differ. The @IF_ARCH / @IF_SYS blocks ensure only the correct version is compiled.

How `std_io.read` Works

The read function is even simpler because there is no strlen step — the caller provides the maximum byte count directly in R1.

The calling convention is:

R0 = pointer to a buffer (created with BUFFER)
R1 = maximum number of bytes to read

After the call, R0 contains the number of bytes actually read (on Linux). On Windows, the byte count is stored internally by the dispatcher.

A complete example:

@IMPORT std_io

    BUFFER my_buf, 128       ; 128-byte input buffer
    JMP main

main:
    ; Prompt the user
    LDS  R0, "Enter your name: "
    CALL std_io.print

    ; Read input
    GET  R0, my_buf          ; R0 = buffer address
    LDI  R1, 127             ; leave 1 byte for null terminator
    CALL std_io.read         ; R0 = bytes read

    ; Null-terminate the input
    MOV  R2, R0              ; R2 = bytes read
    GET  R3, my_buf
    ADD  R3, R2              ; R3 = address of first byte after input
    LDI  R1, 0
    STOREB R1, R3            ; write null terminator

    ; Echo it back
    LDS  R0, "Hello, "
    CALL std_io.print
    GET  R0, my_buf
    CALL std_io.print
    HLT

The ARM64 Trick: Hidden Register X8

AArch64 (ARM64) Linux is unique: it puts the syscall number in register X8, but UA only maps R0–R7 to X0–X7. There's no R8!

The UAS compiler handles this transparently. When you write SYS and compile for ARM64, the backend automatically emits:

MOV X8, X7      ; copy R7 (X7) into X8
SVC #0          ; supervisor call

This means you use the same convention as ARM and RISC-V — put the syscall number in R7 — and the compiler takes care of the rest. You never need to worry about X8.

The Win32 Dispatcher

On Windows, there are no syscall numbers. Instead, the SYS instruction jumps to a built-in syscall dispatcher in the generated executable:

If R0 = 1 (or any nonzero value), the dispatcher calls WriteFile via kernel32.dll → prints the buffer.
If R0 = 0, the dispatcher calls ReadFile via kernel32.dll → reads into the buffer.

The register setup is the same as Linux x86-64 (R6 = buffer, R2 = count), so the std_io library's x86 section handles both Linux and Windows without any @IF_SYS split. The x86 print and read blocks are simply guarded by @IF_ARCH x86 and work on both operating systems.

Writing Your Own Precompiler-Guarded Libraries

You can use the same technique in your own code. The key precompiler directives are:

Directive	Purpose
`@IF_ARCH x86`	Include block only for x86-64
`@IF_ARCH arm64`	Include block only for AArch64
`@IF_SYS linux`	Include block only for Linux
`@IF_SYS win32`	Include block only for Windows
`@ENDIF`	End a conditional block
`@ARCH_ONLY x86, arm`	Abort compilation if arch doesn't match
`@SYS_ONLY linux`	Abort compilation if system doesn't match

Conditionals nest freely:

@IF_SYS linux
    @IF_ARCH arm
        ; ARM + Linux only
    @ENDIF
    @IF_ARCH arm64
        ; ARM64 + Linux only
    @ENDIF
@ENDIF

Quick Reference: Syscall Register Cheat Sheet

When writing your own syscalls (beyond what std_io provides), here's the register mapping:

x86-64 (SYSCALL):

Argument	UA Register	Native Register
Syscall#	R0	RAX
Arg 1	R7	RDI
Arg 2	R6	RSI
Arg 3	R2	RDX
Return	R0	RAX

ARM / ARM64 / RISC-V (SVC / ECALL):

Argument	UA Register	ARM	ARM64	RISC-V
Syscall#	R7	r7	X7→X8	a7
Arg 1	R0	r0	X0	a0
Arg 2	R1	r1	X1	a1
Arg 3	R2	r2	X2	a2
Return	R0	r0	X0	a0

x86-32 (INT 0x80):

Argument	UA Register	Native Register
Syscall#	R0	EAX
Arg 1	R3	EBX
Arg 2	R1	ECX
Arg 3	R2	EDX
Return	R0	EAX

Tutorial: Your First Interactive App (The UA Calculator)

You've learned registers, instructions, control flow, I/O, and standard libraries. Now let's put it all together by building something real: an interactive calculator that reads two numbers and an operator from the keyboard, performs the math, and prints the result.

This tutorial walks through tests/calc.ua line by line.

Why a Calculator?

A calculator is the perfect first "real" program because it exercises every major UA concept:

Concept	How the Calculator Uses It
`BUFFER`	Allocating memory to capture keyboard input
`VAR` / `GET` / `SET`	Storing and retrieving the parsed numbers
`CALL`	Invoking library functions (`std_io.print`, `std_io.read`, `std_string.parse_int`, `std_string.to_string`)
`CMP` / `JZ`	Branching on the operator character
`ADD` / `SUB`	The actual arithmetic
`LOADB` / `STOREB`	Byte-level work inside `parse_int` and `to_string`
`@IMPORT`	Pulling in standard libraries
`@ARCH_ONLY`	Restricting to supported targets

The ASCII Trap: Why You Need `parse_int` and `to_string`

This is the single most important concept for beginners working close to the metal.

When you type 5 on your keyboard and press Enter, the operating system does not hand your program the number 5. It hands you the byte 0x35 (decimal 53) — the ASCII code for the character '5'. A newline (0x0A) follows it.

So the "number" 42 arrives as three bytes in your buffer:

Buffer:  [ 0x34 ] [ 0x32 ] [ 0x0A ]   ← raw bytes
           '4'      '2'      '\n'

If you tried to ADD these bytes directly, you'd get 0x34 + 0x32 = 0x66 — which is the letter 'f'. Not helpful.

std_string.parse_int fixes this. It walks each byte, subtracts 48 (the ASCII value of '0'), multiplies a running total by 10, and adds the digit:

'4' → 0x34 - 48 = 4      total = 0 * 10 + 4 = 4
'2' → 0x32 - 48 = 2      total = 4 * 10 + 2 = 42
'\n' → stop
                          Result: 42  ✓

The reverse problem hits when you want to print a result. The number 123 is a single integer in a register — but the screen expects three separate ASCII characters ('1', '2', '3'). std_string.to_string extracts digits by dividing by 10, converts each to ASCII (add 48), and writes them to a buffer.

Walking Through `calc.ua`

1. Header and Imports

@ARCH_ONLY x86, x86_32, arm, arm64, riscv
@IMPORT std_io
@IMPORT std_string

@ARCH_ONLY locks the program to architectures that support console I/O (everything except the bare-metal 8051). The two @IMPORT lines pull in the I/O and string-conversion libraries.

2. Data Section

    BUFFER input_1, 32       ; keyboard buffer for first number
    BUFFER input_2, 32       ; keyboard buffer for second number
    BUFFER input_op, 8       ; keyboard buffer for operator (+, -)
    BUFFER output_buf, 32    ; output buffer for result string

    VAR num1                 ; parsed first operand
    VAR num2                 ; parsed second operand
    VAR result               ; computed result

BUFFER reserves raw byte arrays. When you call std_io.read, the OS writes the user's keystrokes into these buffers. 32 bytes is generous for a number — it accommodates up to 31 digits plus a null terminator.

VAR creates named integer variables. After parsing a string into an integer, we SET the variable so we can GET it back later.

Key difference: GET R0, input_1 gives you the address of the buffer (a pointer). GET R0, num1 gives you the value stored in the variable.

3. Reading and Parsing a Number

    LDS  R0, "Enter first number: "
    CALL std_io.print

    GET  R0, input_1         ; R0 = address of input buffer
    LDI  R1, 31              ; max bytes to read
    CALL std_io.read         ; OS fills buffer with keystrokes

    GET  R0, input_1         ; R0 = buffer address (for parse_int)
    CALL std_string.parse_int
    SET  num1, R0            ; save the integer

This three-step pattern — prompt → read → parse — repeats for each input. Notice we pass 31 (not 32) to std_io.read, leaving room for a null terminator.

After parse_int returns, R0 holds a real integer that ADD and SUB can work with.

4. Operator Dispatch

    GET  R3, input_op        ; R3 = address of operator buffer
    LOADB R0, R3             ; R0 = first byte (the operator character)

    LDI  R1, 43              ; '+' = ASCII 43
    CMP  R0, R1
    JZ   do_add

    LDI  R1, 45              ; '-' = ASCII 45
    CMP  R0, R1
    JZ   do_sub

    LDS  R0, "Error: unknown operator\n"
    CALL std_io.print
    HLT

Here's where UA's clean branching shines. Compare this to raw x86 assembly:

UA (clear intent)	x86-64 (cryptic)
`CMP R0, R1`	`cmp al, 0x2B`
`JZ do_add`	`je 0x004010A0`

In UA, you compare two named registers and jump to a human-readable label. No magic hex addresses, no implicit flag registers, no mental gymnastics. The intent is front and center: "if the byte equals '+', go do addition."

The fall-through case (neither '+' nor '-') prints an error and halts — defensive programming even in assembly.

5. Performing the Math

do_add:
    GET  R0, num1
    GET  R1, num2
    ADD  R0, R1              ; R0 = num1 + num2
    SET  result, R0
    JMP  show_result

do_sub:
    GET  R0, num1
    GET  R1, num2
    SUB  R0, R1              ; R0 = num1 - num2
    SET  result, R0
    JMP  show_result

Load both operands from variables, perform one arithmetic instruction, save the result. This is as clean as assembly gets.

6. Converting Back and Printing

show_result:
    GET  R0, result          ; R0 = computed integer
    GET  R1, output_buf      ; R1 = output buffer address
    CALL std_string.to_string

    LDS  R0, "Result: "
    CALL std_io.print

    GET  R0, output_buf
    CALL std_io.print

    LDS  R0, "\n"
    CALL std_io.print

    HLT

to_string converts the integer in R0 into ASCII characters, writing them into output_buf. Then we print a label, the result string, and a newline.

Inside `parse_int`: Digit-by-Digit Conversion

For the curious, here's the core loop from std_string.parse_int with annotations:

parse_int:
    MOV  R3, R0          ; R3 = string pointer
    LDI  R0, 0           ; running total = 0
    LDI  R6, 10          ; constant: multiplier (and newline sentinel!)
    LDI  R7, 48          ; constant: ASCII '0'

parse_int_loop:
    LOADB R1, R3         ; load one byte from the string
    CMP  R1, ...         ; if null, newline, or \r → stop
    JZ   parse_int_done
    SUB  R1, R7          ; convert ASCII to digit (e.g. '5' - 48 = 5)
    MUL  R0, R6          ; total = total × 10
    ADD  R0, R1          ; total = total + digit
    INC  R3              ; next character
    JMP  parse_int_loop

Notice the clever reuse of R6: the value 10 serves double duty as both the multiplication constant and the newline character check.

Inside `to_string`: Digits in Reverse

The hardest part of to_string is that division extracts digits backwards — least significant first:

123 ÷ 10 = 12 remainder 3  → write '3'
 12 ÷ 10 =  1 remainder 2  → write '2'
  1 ÷ 10 =  0 remainder 1  → write '1'

The buffer now contains "321". A swap-based reversal loop fixes the order to "123" before null-terminating.

Since UA has no MOD instruction, the remainder is computed manually:

    MOV  R0, R3          ; R0 = value
    MOV  R2, R3          ; R2 = value (backup)
    DIV  R0, R7          ; R0 = value / 10
    PUSH R0              ; save quotient
    MUL  R0, R7          ; R0 = quotient × 10
    SUB  R2, R0          ; R2 = value - quotient×10 = remainder

This is the kind of low-level trick you learn working close to the metal — and it works identically on every architecture UA supports.

Compiling and Running

Linux (x86-64):

./uas tests/calc.ua -arch x86 -sys linux -format elf -o calc
chmod +x calc
./calc

Windows (x86-64):

uas.exe tests/calc.ua -arch x86 -sys win32 -format pe -o calc.exe
calc.exe

ARM64 Linux (e.g. Raspberry Pi 4 with 64-bit OS):

./uas tests/calc.ua -arch arm64 -sys linux -format elf -o calc
chmod +x calc
./calc

Sample session (identical on every platform):

Enter first number: 100
Enter second number: 58
Operator (+ or -): -
Result: 42

Exercises

Try extending the calculator on your own:

Add multiplication: Check for '*' (ASCII 42) and use MUL R0, R1.
Add division: Check for '/' (ASCII 47) and use DIV R0, R1. What happens if the second number is 0?
Loop it: Instead of HLT after printing, JMP back to main to let the user do multiple calculations.
Handle negatives on input: Modify parse_int to check for a leading '-' (ASCII 45) and negate the result if found.

Using `@DEFINE` — Hardware Constants Without Runtime Cost

When writing bare-metal or embedded programs, you deal with hardware register addresses — magic hex numbers like 0x98 or 0x89. Scattering these through your code makes it hard to read and easy to get wrong. The @DEFINE directive solves this by giving names to constants at compile time.

What `@DEFINE` Does

@DEFINE LED_PORT  0x80
@DEFINE DELAY     255

After these lines, every time the precompiler sees LED_PORT on a code line, it replaces it with 0x80 — before the lexer ever sees it. The result is exactly as if you had typed 0x80 yourself. No variable, no RAM, no runtime overhead.

Your First `@DEFINE` Program

Create a file called define_demo.ua:

; define_demo.ua — @DEFINE basics
@DEFINE ANSWER   42
@DEFINE LIMIT    10

    LDI  R0, 0          ; counter = 0

loop:
    INC  R0
    CMP  R0, LIMIT      ; expands to: CMP R0, 10
    JNZ  loop

    LDI  R1, ANSWER      ; expands to: LDI R1, 42
    HLT

Compile and run:

ua define_demo.ua -arch x86 --run

The precompiler replaces LIMIT → 10 and ANSWER → 42 before lexing. The machine code is identical to writing the numbers directly — but your source is self-documenting.

Important Rules

Rule	What It Means
Whole-token only	`@DEFINE P0 0x80` replaces `P0` but not `DPH0` or `P0x`
One per line	Each `@DEFINE` goes on its own line
Order matters	A macro is only visible to lines after its `@DEFINE`
Max 512 macros	More than enough for any hardware platform
No nesting	`@DEFINE A B` then `@DEFINE B 5` — `A` expands to `B`, not to `5`

Hardware Libraries: Pre-Built `@DEFINE` Collections

Writing @DEFINE for every register by hand would be tedious. UA ships with hardware definition libraries that do it for you:

Library	Import	What You Get
hw_mcs51	`@IMPORT hw_mcs51`	8051 SFRs: `P0`, `SCON`, `SBUF`, `TMOD`, `TH1`, `IE`, etc.
hw_x86_pc	`@IMPORT hw_x86_pc`	PC I/O ports: `PORT_COM1`, `PORT_VGA_CMD`, `PORT_KEYBOARD`, etc.
hw_riscv_virt	`@IMPORT hw_riscv_virt`	QEMU virt: `UART0_BASE`, `CLINT_BASE`, `PLIC_BASE`, etc.
hw_arm_virt	`@IMPORT hw_arm_virt`	QEMU virt: `PL011_BASE`, `GIC_DIST`, `GIC_CPU`, etc.

These libraries are guarded by @ARCH_ONLY, so they only compile for the correct target.

Example: 8051 UART Transmit

Here's a real bare-metal program that configures the 8051 serial port and transmits the letter 'A'. Compare the macro version (left) with what the precompiler produces (right):

What you write:

@IMPORT hw_mcs51

    LDI  R0, 0x20
    LDI  R1, TMOD          ; Timer 1, Mode 2
    STORE R0, R1

    LDI  R0, 0xFD
    LDI  R1, TH1            ; 9600 baud
    STORE R0, R1

    LDI  R0, 0x50
    LDI  R1, SCON           ; Serial Mode 1, REN
    STORE R0, R1

    LDI  R0, 0x41           ; 'A'
    LDI  R1, SBUF           ; transmit!
    STORE R0, R1
    HLT

What the precompiler produces (after macro expansion):

    LDI  R0, 0x20
    LDI  R1, 0x89           ; TMOD = 0x89
    STORE R0, R1

    LDI  R0, 0xFD
    LDI  R1, 0x8D           ; TH1  = 0x8D
    STORE R0, R1

    LDI  R0, 0x50
    LDI  R1, 0x98           ; SCON = 0x98
    STORE R0, R1

    LDI  R0, 0x41
    LDI  R1, 0x99           ; SBUF = 0x99
    STORE R0, R1
    HLT

The human reads SCON and SBUF; the CPU sees 0x98 and 0x99. Best of both worlds.

Compile the full demo:

ua examples/8051_uart_tx.ua -arch mcs51 -o uart_tx.bin

When to Use `@DEFINE` vs. `VAR`

Feature	`@DEFINE`	`VAR`
When it's resolved	Compile time (precompiler)	Run time (CPU instructions)
RAM cost	Zero	Uses memory for each variable
Can change at runtime	No — it's a fixed substitution	Yes — `SET` / `GET` read/write freely
Best for	Hardware addresses, magic numbers, configuration constants	Counters, accumulators, user data

Rule of thumb: If a value is known at compile time and never changes, use @DEFINE. If it needs to change while the program runs, use VAR.

Working with Arrays (`std_array`)

The std_array library gives you a fixed-size byte array — similar to C++ std::array<uint8_t, N>. You allocate a BUFFER, tell the library where it lives and how big it is, then call functions like fill, front, back, at, and set_at.

Quick reference

Function	Input variables	Output (R0)	Description
`front`	`ptr`, `size`	First byte	Like `arr[0]`
`back`	`ptr`, `size`	Last byte	Like `arr[size-1]`
`data`	`ptr`	Buffer address	Like `arr.data()`
`begin`	`ptr`	Buffer address	Like `arr.begin()`
`end`	`ptr`, `size`	`ptr + size`	Like `arr.end()`
`empty`	`size`	1 if empty, else 0	Like `arr.empty()`
`size_of`	`size`	Element count	Like `arr.size()`
`at`	`ptr`, `index`	Byte at index	Like `arr[i]` / `arr.at(i)`
`set_at`	`ptr`, `index`, `value`	(none)	Like `arr[i] = v`
`fill`	`ptr`, `size`, `value`	(none)	Like `arr.fill(v)`

Example: Fill and read back

@IMPORT std_array

BUFFER my_arr, 8

JMP main

main:
    ; --- Setup: point std_array at our buffer ---
    GET  R0, my_arr
    SET  std_array.ptr, R0
    SET  std_array.size, 8

    ; --- Fill every slot with 0x42 ---
    SET  std_array.value, 0x42
    CALL std_array.fill

    ; --- Read the first element ---
    CALL std_array.front         ; R0 = 0x42

    ; --- Overwrite index 3 with 0xFF ---
    SET  std_array.index, 3
    SET  std_array.value, 0xFF
    CALL std_array.set_at

    ; --- Read it back ---
    SET  std_array.index, 3
    CALL std_array.at            ; R0 = 0xFF

    ; --- Check size and empty ---
    CALL std_array.size_of       ; R0 = 8
    CALL std_array.empty         ; R0 = 0 (not empty)

    HLT

Save as array_demo.ua and run:

uas array_demo.ua -arch x86 --run        # JIT on x86-64
uas array_demo.ua -arch arm   -o out.bin  # Cross-compile for ARM
uas array_demo.ua -arch mcs51 -o out.bin  # Cross-compile for 8051

Why does this work on every architecture?
The std_array library uses only MVIS instructions (LOADB, STOREB, ADD, CMP, etc.). MVIS is the portable subset supported by all six backends.

How it works under the hood

The library doesn't allocate memory itself — you provide a BUFFER and pass its address via the ptr variable. This keeps the library purely computational with zero hidden state, making it safe and predictable on bare-metal targets like the 8051.

Working with Vectors (`std_vector`)

The std_vector library is a dynamic-size byte vector — similar to C++ std::vector<uint8_t>. Unlike std_array, the vector tracks a logical size that can grow from 0 up to a maximum capacity. You still pre-allocate a BUFFER as backing storage, then use push_back, pop_back, and resize to manage elements.

Quick reference

Function	Input variables	Output (R0)	Description
`clear`	(none)	(none)	Reset to empty (`size = 0`)
`push_back`	`value`	(none)	Append byte (ignored if full)
`pop_back`	(none)	Removed byte (0 if empty)	Remove & return last element
`front`	`ptr`	First byte	Like `vec[0]`
`back`	`ptr`, `vec_size`	Last byte	Like `vec[size-1]`
`data`	`ptr`	Buffer address	Like `vec.data()`
`begin`	`ptr`	Buffer address	Like `vec.begin()`
`end`	`ptr`, `vec_size`	`ptr + vec_size`	Like `vec.end()`
`empty`	`vec_size`	1 if empty, else 0	Like `vec.empty()`
`size_of`	`vec_size`	Current element count	Like `vec.size()`
`capacity_of`	`capacity`	Max capacity	Like `vec.capacity()`
`at`	`ptr`, `index`	Byte at index	Like `vec[i]`
`set_at`	`ptr`, `index`, `value`	(none)	Like `vec[i] = v`
`resize`	`new_size`	(none)	Grow (zero-fill) or shrink

Example: Stack-like push/pop usage

@IMPORT std_vector

BUFFER my_vec, 64

JMP main

main:
    ; --- Setup: point std_vector at our buffer ---
    GET  R0, my_vec
    SET  std_vector.ptr, R0
    SET  std_vector.capacity, 64
    CALL std_vector.clear            ; start empty

    ; --- Push three values ---
    SET  std_vector.value, 10
    CALL std_vector.push_back        ; vec = [10]
    SET  std_vector.value, 20
    CALL std_vector.push_back        ; vec = [10, 20]
    SET  std_vector.value, 30
    CALL std_vector.push_back        ; vec = [10, 20, 30]

    ; --- Check size and front/back ---
    CALL std_vector.size_of          ; R0 = 3
    CALL std_vector.front            ; R0 = 10
    CALL std_vector.back             ; R0 = 30

    ; --- Pop the last element ---
    CALL std_vector.pop_back         ; R0 = 30, vec = [10, 20]

    ; --- Resize to grow (zero-fills new slots) ---
    SET  std_vector.new_size, 5
    CALL std_vector.resize           ; vec = [10, 20, 0, 0, 0]

    ; --- Read the zero-filled element ---
    SET  std_vector.index, 3
    CALL std_vector.at               ; R0 = 0

    ; --- Shrink back ---
    SET  std_vector.new_size, 1
    CALL std_vector.resize           ; vec = [10]

    CALL std_vector.front            ; R0 = 10

    HLT

Save as vector_demo.ua and run:

uas vector_demo.ua -arch x86 --run        # JIT on x86-64
uas vector_demo.ua -arch riscv -o out.bin  # Cross-compile for RISC-V
uas vector_demo.ua -arch mcs51 -o out.bin  # Cross-compile for 8051

Key differences from `std_array`

	`std_array`	`std_vector`
Size	Fixed (you set `size` once)	Dynamic (`push_back` / `pop_back` / `resize`)
Tracked state	None (stateless library)	`vec_size` is managed internally
Initialization	Just set `ptr` and `size`	Must call `clear` after setting `ptr` and `capacity`
Growth	N/A — size never changes	`push_back` appends; silently ignored if at capacity
Shrink	N/A	`pop_back` removes last; `resize` truncates

Safety notes

No bounds checking: at and set_at do not validate the index. Accessing out-of-range indices causes undefined behavior — just like C arrays.
Capacity is a hard limit: push_back silently does nothing when vec_size == capacity. The vector never reallocates (there is no heap).
pop_back on empty: Returns 0 instead of crashing. Check empty first if you need to distinguish zero from an actual element.

What to Read Next

Document	What It Covers
MVIS Opcodes Reference	Every MVIS instruction with syntax, behavior, and backend notes
Architecture-Specific Opcodes	Non-MVIS instructions: x86, 8051, ARM, RISC-V specifics
Language Reference	Complete syntax reference — registers, operands, string literals
Compiler Usage	CLI flags, output formats, build instructions
Architecture	Internal compiler pipeline and design

UA Beginner's Guide

Table of Contents

What Is UA?

Building the Compiler

Your First Program

Compile and Run

Understanding the Basics

Registers

Instructions

Labels and Control Flow

Comments

Working with Numbers

Variables

Functions

Syntactic Sugar

Hello World with I/O

String Escape Sequences

Standard Libraries

std_io — Console Output

std_string — String Utilities

std_math — Integer Math

std_arrays — Byte Array Operations

File I/O with std_iostream

Conditional Compilation

@IF_ARCH / @IF_SYS — Conditional Blocks

@ARCH_ONLY / @SYS_ONLY — Hard Guards

Common Patterns

Counting Loop

Conditional (If/Else)

Maximum of Two Values

Stack Save/Restore

Working with Byte Buffers

Complete Program: Sum 1 to N

Architecture-Specific Code

Standard Libraries and Cross-Platform I/O

The Problem: Every Platform Is Different

The Solution: std_io and the Precompiler

How std_io.print Works (Under the Hood)

Example: x86-64 Linux vs. ARM Linux

How std_io.read Works

The ARM64 Trick: Hidden Register X8

The Win32 Dispatcher

Writing Your Own Precompiler-Guarded Libraries

Quick Reference: Syscall Register Cheat Sheet

Tutorial: Your First Interactive App (The UA Calculator)

Why a Calculator?

The ASCII Trap: Why You Need parse_int and to_string

Walking Through calc.ua

1. Header and Imports

2. Data Section

3. Reading and Parsing a Number

4. Operator Dispatch

5. Performing the Math

6. Converting Back and Printing

Inside parse_int: Digit-by-Digit Conversion

Inside to_string: Digits in Reverse

Compiling and Running

Exercises

Using @DEFINE — Hardware Constants Without Runtime Cost

What @DEFINE Does

Your First @DEFINE Program

Important Rules

Hardware Libraries: Pre-Built @DEFINE Collections

Example: 8051 UART Transmit

When to Use @DEFINE vs. VAR

Working with Arrays (std_array)

Quick reference

Example: Fill and read back

How it works under the hood

Working with Vectors (std_vector)

Quick reference

Example: Stack-like push/pop usage

Key differences from std_array

Safety notes

What to Read Next

File I/O with `std_iostream`

The Solution: `std_io` and the Precompiler

How `std_io.print` Works (Under the Hood)

How `std_io.read` Works

The ASCII Trap: Why You Need `parse_int` and `to_string`

Walking Through `calc.ua`

Inside `parse_int`: Digit-by-Digit Conversion

Inside `to_string`: Digits in Reverse

Using `@DEFINE` — Hardware Constants Without Runtime Cost

What `@DEFINE` Does

Your First `@DEFINE` Program

Hardware Libraries: Pre-Built `@DEFINE` Collections

When to Use `@DEFINE` vs. `VAR`

Working with Arrays (`std_array`)

Working with Vectors (`std_vector`)

Key differences from `std_array`