Performance Benchmarks

Nebula is engineered for high-performance execution.

Executive Summary

BenchmarkNebulaPython 3.11Speedup
Fibonacci(28)0.05s0.20s4x
Sum Loop (1M)0.15s0.45s3x
String Concat0.08s0.12s1.5x
Matrix Math0.22s0.65s3x

Methodology

All benchmarks run on:

  • CPU: AMD Ryzen 7 5800X
  • RAM: 32GB DDR4
  • OS: Windows 11 / Linux (Ubuntu 22.04)
  • Nebula: v1.0.0 (--vm mode)
  • Python: 3.11.4

Each test run 10 times, median reported.

Detailed Benchmarks

Fibonacci (Recursive)

Classic recursive implementation:

fn fib(n) do
    if n <= 1 do
        give n
    end
    give fib(n - 1) + fib(n - 2)
end

log(fib(28))
nNebulaPythonSpeedup
250.01s0.05s5x
280.05s0.20s4x
300.12s0.52s4.3x
351.20s5.40s4.5x

Loop Performance

Simple counter loop:

fn loop_test(n) do
    sum = 0
    i = 0
    while i < n do
        sum = sum + i
        i = i + 1
    end
    give sum
end

log(loop_test(1000000))
IterationsNebulaPythonSpeedup
100K0.03s0.10s3.3x
1M0.15s0.45s3x
10M1.50s4.80s3.2x

String Operations

String concatenation and manipulation:

fn string_test() do
    result = ""
    for i = 0, 10000 do
        result = result + "x"
    end
    give len(result)
end
OperationsNebulaPythonNote
10K concat0.08s0.12sString interning helps

Constant Folding

Static math expressions:

# These are computed at compile time
perm result = 2 + 3 * 4 - 1
perm circle = 3.14159 * 10 * 10
ExpressionNebulaPython
Static math0.00s0.01s

The Nebula compiler folds constant expressions, so runtime cost is zero.

Why Nebula is Fast

1. NanBoxing

All values fit in 64 bits. No heap allocation for primitives.

Number:  [64-bit IEEE 754 float]
Integer: [NaN-tagged 48-bit integer]
Boolean: [NaN-tagged single bit]
Pointer: [NaN-tagged 48-bit address]

2. Global Indexing

Variables are array indices, not hash lookups:

# Source
x = 10

# VM sees
STORE_GLOBAL_0  # Direct array access

3. String Interning

String comparison is O(1):

perm a = "hello"
perm b = "hello"
log(a == b)  # Pointer comparison, instant

4. Peephole Optimization

Redundant bytecode is eliminated:

LOAD_CONST 1    →    LOAD_CONST 3
LOAD_CONST 2
ADD

5. Specialized Instructions

Common patterns have dedicated opcodes:

LOAD_LOCAL 0    →    LOAD_LOCAL_0 (single byte)
INC_LOCAL 0     (increment without load/store)

Running Your Own Benchmarks

fn benchmark(name, iterations, func) do
    start = clock()
    i = 0
    while i < iterations do
        func()
        i = i + 1
    end
    elapsed = clock() - start
    log(name, ":", elapsed, "ms")
end

benchmark("fib", 1000, fn() = fib(20))

Comparison with Other Languages

LanguageFib(30)Relative
C (gcc -O3)0.01s1x
Rust0.01s1x
Nebula0.12s12x
Lua0.18s18x
Python0.52s52x
Ruby0.65s65x

Nebula is significantly faster than Python while maintaining similar ease of use.