Performance Benchmarks

Nebula is engineered for high-performance execution.

Executive Summary

Benchmark	Nebula	Python 3.11	Speedup
Fibonacci(28)	0.05s	0.20s	4x
Sum Loop (1M)	0.15s	0.45s	3x
String Concat	0.08s	0.12s	1.5x
Matrix Math	0.22s	0.65s	3x

Methodology

All benchmarks run on:

CPU: AMD Ryzen 7 5800X
RAM: 32GB DDR4
OS: Windows 11 / Linux (Ubuntu 22.04)
Nebula: v1.0.0 (--vm mode)
Python: 3.11.4

Each test run 10 times, median reported.

Detailed Benchmarks

Fibonacci (Recursive)

Classic recursive implementation:

fn fib(n) do
    if n <= 1 do
        give n
    end
    give fib(n - 1) + fib(n - 2)
end

log(fib(28))

n	Nebula	Python	Speedup
25	0.01s	0.05s	5x
28	0.05s	0.20s	4x
30	0.12s	0.52s	4.3x
35	1.20s	5.40s	4.5x

Loop Performance

Simple counter loop:

fn loop_test(n) do
    sum = 0
    i = 0
    while i < n do
        sum = sum + i
        i = i + 1
    end
    give sum
end

log(loop_test(1000000))

Iterations	Nebula	Python	Speedup
100K	0.03s	0.10s	3.3x
1M	0.15s	0.45s	3x
10M	1.50s	4.80s	3.2x

String Operations

String concatenation and manipulation:

fn string_test() do
    result = ""
    for i = 0, 10000 do
        result = result + "x"
    end
    give len(result)
end

Operations	Nebula	Python	Note
10K concat	0.08s	0.12s	String interning helps

Constant Folding

Static math expressions:

# These are computed at compile time
perm result = 2 + 3 * 4 - 1
perm circle = 3.14159 * 10 * 10

Expression	Nebula	Python
Static math	0.00s	0.01s

The Nebula compiler folds constant expressions, so runtime cost is zero.

Why Nebula is Fast

1. NanBoxing

All values fit in 64 bits. No heap allocation for primitives.

Number:  [64-bit IEEE 754 float]
Integer: [NaN-tagged 48-bit integer]
Boolean: [NaN-tagged single bit]
Pointer: [NaN-tagged 48-bit address]

2. Global Indexing

Variables are array indices, not hash lookups:

# Source
x = 10

# VM sees
STORE_GLOBAL_0  # Direct array access

3. String Interning

String comparison is O(1):

perm a = "hello"
perm b = "hello"
log(a == b)  # Pointer comparison, instant

4. Peephole Optimization

Redundant bytecode is eliminated:

LOAD_CONST 1    →    LOAD_CONST 3
LOAD_CONST 2
ADD

5. Specialized Instructions

Common patterns have dedicated opcodes:

LOAD_LOCAL 0    →    LOAD_LOCAL_0 (single byte)
INC_LOCAL 0     (increment without load/store)

Running Your Own Benchmarks

fn benchmark(name, iterations, func) do
    start = clock()
    i = 0
    while i < iterations do
        func()
        i = i + 1
    end
    elapsed = clock() - start
    log(name, ":", elapsed, "ms")
end

benchmark("fib", 1000, fn() = fib(20))

Comparison with Other Languages

Language	Fib(30)	Relative
C (gcc -O3)	0.01s	1x
Rust	0.01s	1x
Nebula	0.12s	12x
Lua	0.18s	18x
Python	0.52s	52x
Ruby	0.65s	65x

Nebula is significantly faster than Python while maintaining similar ease of use.