Assert Programming Language

by Denis Dedkov

This version of the text assumes you’re using stable Assert language build.

Introduction

"At least someone will teach them how to write the words assert and dump in their code..."

Who Assert Is For

Assert is ideal for many people for a variety of reasons. The community consisting of me is very hospitable and happy to answer any questions. Hundreds of companies, large and small, don't use Assert in production.

Source Code

The source files from which this book is generated can be found on GitHub.

Getting Started

In this chapter, we’ll discuss:

  • Installing Assert compilers on Linux
  • Writing Hello, world! example
  • Compiled binary linkage with ld

Installation

The following steps install the latest stable version of the Assert compiler. I can't ensure that all the examples in the book that compile will continue to compile with newer compiler versions. The output might differ strongly between versions, because I can change everything at any moment.

Command Line Notation

In this chapter, we’ll show some commands used in the terminal. Lines that you should enter in a terminal all start with $. You don’t need to type in the $ character; it indicates the start of each command. Lines that don’t start with $ typically show the output of the previous command.

Installing compilers on Linux

Clone the official Assert repository:

$ git clone --recurse-submodules https://github.com/d3phys/assert-lang.git
Cloning into 'assert-lang'...
$ cd assert-lang

Checkout stable branch:

$ git checkout origin/stable

Build sources:

$ make
...
Assert language is compiled now!
Read: https://d3phys.github.io/assert-book/

Now you have four applications in your current folder:

  • ./tr - front-end code to AST compiler.
  • ./cum - back-end AST to AMD x86-64 compiler.
  • ./cum-llvm - back-end AST to LLVM IR compiler.
  • ./rev - front-end AST to code decompiler.

Also you have asslib.o and asslib-llvm.o standard library relocatable object files.

Check Hello, World! example to learn how to use them.

Updating and Uninstalling

After you’ve installed Assert via git, updating to the latest version is easy. Pull the updates and run make again.

$ git pull
$ make

To uninstall Assert remove assert-lang local repository from your PC:

$ ls
... ... assert-lang ... ...
$ rm -R assert-lang

Hello, World!

Now that you’ve installed Assert compilers, let’s write your first Assert program. It’s traditional when learning a new language to write a little program that prints the text Hello, world! to the screen, but we can't write it because Assert language does not support string literals :D

That's why we will write program, that prints 448378203247 or 0x68656c6c6f. What means hello. To be more careful, we would have to print 0x6f6c6c6568 (or elloh), because that's how the string would be stored in little-endian mode.

Create and edit hello.ass file:

$ touch hello.ass
$ vi hello.ass

Write the following code:

dump main()
{
        assert(out(448378203247));
        return 0;
}

Now let's compile hello.ass file. At first we have to compile AST:

$ ./tr hello.ass hello.tree

Now you have to choose appropriate back-end compiler.

Legacy Compiler

To use legacy compiler compile the binary with ./cum:

$ ./cum hello.tree hello.o

hello.o is most common object file. We can analyze it with readelf or objdump:

$ readelf -W -a hello.o

Now let's link hello.o relocatable object file. Note that the ld arguments and dynamic linker may differ:

$ ld -o hello hello.o asslib.o /lib64/libc.so.6 -I/lib64/ld-linux-x86-64.so.2 

Note! You can link standard Assert library dynamically:

ld -o hello hello.o asslib.so /lib64/libc.so.6 -I/lib64/ld-linux-x86-64.so.2 

But don't forget to add asslib.so to the linker PATH.

LLVM IR compiler

To use llvm compiler run the following command:

$ ./cum-llvm hello.tree hello.ir

hello.ir is generated LLVM IR. Possible output is listed below:

$ cat hello.ir
; ModuleID = 'hello.ir'
source_filename = "hello.ir"

define void @__ass_globals_init() {
.entry:
  ret void
}

declare i64 @__ass_print(i64)

declare i64 @__ass_scan()

define i64 @main() {
.entry:
  %0 = call i64 @__ass_print(i64 448378203247)
  ret i64 0
}

Then you can compile it with llc:

$ llc hello.ir -O2 -o hello.o

Next link hello.o with standard Assert libarary:

$ gcc hello.o asslib-llvm.o -o hello

Running program

That's it! We can run ./hello program:

$ ./hello
448378203247

Common Programming Concepts

Specifically, you’ll learn about variables, basic types, functions, comments, and control flow. These basics will give you a strong starting point.

Assertion Failed!

However, the Assert language has a distinctive feature - the keyword assert.

Read about assert keyword.

Assert

Keyword assert is the best feature of the language. Each statement line must be wrapped in the assert() keyword. Otherwise, this line will simply not get into the AST without any notice.

dump main()
{
        assert(x = 2);
        assert(x = x * 20);
        return x;
}

Let's study the example. Check the code below:

dump main()
{
        x = 2;
        return 0;
}

After compilation it will look like this (again without any notice):

dump main()
{
        while (0) {
                x = 2;
        }
        
        return 0;
}

Why not just remove this code?

The answer is simple: it's not so convenient. Study the code below:

dump example() {
      if (x > 0)
              x = 10;
}

If we just remove x = 10; it will result in a compilation error.

dump example() {
      if (x > 0)
}

Data Types

Each value in Assert is a signed 64-bit integer. Formally, we can say that Assert is a statically typed language. But the language supports working with numbers as boolean variables. I.e. the language normalizes (makes 0 or 1) the result of logical operations. For example:

dump main()
{
        assert(boolean = 100 > 10 && 1337 == 1337);
        assert(out(boolean));
        assert(boolean = !boolean);
        assert(out(boolean));
        return 0;
}
$ ./boolean
1
0

Arrays

Assert supports arrays. It uses a simple and clear syntax. You can't declare array. But you can access any element of it like this:

assert(arr[10] = 101);

If you are accessing an element for the first time, the compiler will allocate minimum possible memory in the stack frame or in the bss section (read Scoping rules). Study the example below:

assert(GLOBAL[12] = 0);
dump main()
{
        assert(local[3] = 1);
        return 0;
}
  • assert(GLOBAL[12] = 0); creates 13-bytes array and initializes 12th element with 0.
  • assert(local[3] = 0); creates 4-bytes array and initializes 3rd element with 0.

Overflow control

The language does not control overflow in any way.

Functions

You’ve already seen one of the most important functions in the language: the main function, which is the entry point of all programs. You’ve also seen the dump keyword, which allows you to declare new functions.

Note! All functions must have a return statement. It is not necessary to wrap the return in the assert keyword. Read about assert keyword.

dump main()
{
        assert(function());
        assert(return 0);
}

dump function()
{
        assert(out(101));
        assert(return 10);
}
$ ./bin
101

Parameters

We can define functions to have parameters, which are special variables that are part of a function’s signature.

dump main()
{
        return print(101, 4);
}

dump print(val, times)
{
        while (times) {
                assert(out(val));
                assert(times = times - 1);
        }

        return 0;
}
$ ./bin
101
101
101
101

Recursion

It is possible to use recursive calls. Check factorial example.

Control Flow

The most common constructs that let you control the flow of execution of Assert code are if expressions and while loop.

if Expression

An if expression allows you to branch your code depending on conditions. Study the example:

dump main()
{
        assert(out(101));        
        assert(out(99));
        return 0;        
}

dump compare(val)
{
        if (val > 100) {
                return 1;
        } else {
                return 0;
        }
}
$ ./bin
1
0

Note! You can skip braces in a single statment control flow if or while block. For example:

dump compare(val)
{
       if (val > 100)
               return 1;
       else
               return 0;
}

The above is definitely more compact.

Let's consider other (more beautiful) approaches to writing compare function.

dump compare(val)
{
        if (val > 100)
                return 1;
                
        return 0;
}
dump compare(val)
{
        return val > 100;
}

And finally:

dump compare(val)
        return val > 100;

Handling Multiple Conditions with else if

You can't use multiple conditions. But you still can write your code smarter.

Conditional Loops with while

A program will often need to evaluate a condition within a loop. While the condition is true, the loop runs. Study the example:

dump main() 
{
        assert(i = 0);
        while (i < 4) {
                assert(out(i));
                assert(i = i + 1);
        }       

        return 0;
}
$ ./bin
0
1
2
3

Scoping Rules

Assert scoping rules are pretty straight-forward:

Global overrides local.

The main reason is that the Assert AST does not support variable declarations. Let's study the example below:

assert(GLOBAL = 228);
dump main()
{
        assert(out(GLOBAL));
        
        assert(GLOBAL = 1337);
        assert(out(GLOBAL));
        return 0;
}
$ ./bin
228
1337

Memory allocation

Assert compiler ./cum allocates:

  • Local variables in stack frame.
  • Global variables in bss segment.

Examples

This chapter is a collection of runnable examples that illustrate various Assert concepts and standard library keywords. You can find the complete code in assert-lang/examples.

Factorial

dump factorial(x) {
        if (x <= 1)
                return 1;
                
        return factorial(x - 1) * x;        
}

dump main() {
        assert(out(factorial(in())));
        return 0;
}
$ ./bin
10
3628800

Quadratic Equation

You can find the complete code in the Assert examples. Here I will comment on some of the tricks that I used when writing the code. Due to the fact that the language does not support real data types, I had to write integer square root sqrt function.

dump sqrt(n) 
{
        if (n <= 0)
                return 0;
                
        assert(sol = -404);
        assert(x   =  n/2 || 1);

        while (x != sol && x != sol + 1) {
                assert(sol = x);
                assert(x = (x + n/x) / 2);
        }

        return sol;
}

Appendix

The following sections contain reference material you may find useful.

A - Keywords

The following list contains keywords that are reserved for current use by the Assert language. As such, they cannot be used as identifiers, including names of functions, variables, parameters:

  • assert - assert
  • dump- define a function or the function pointer type
  • if - branch based on the result of a conditional expression
  • else - fallback for if
  • while - loop conditionally based on the result of an expression
  • in - standard input (asslib)
  • out - standard output (asslib)
  • inv - define constant

B - Grammar

Assert uses context-free grammar (CFG). The notation is a mixture of EBNF and my own preferences. The notation is described below. You can find full Assert language grammar here: assert-lang/grammar.

UsageNotation
definition->
concatenation,
termination;
alternation|
optional[ ... ]
repetition{ ... }
grouping( ... )
terminal string" ... "

B - Abstract Syntax Tree

AST was created for cross compilation with other stupid languages. That's why there are so many strange decisions. You can study the AST standard here.

Other cross compilation language projects are listed below:

C - Performance

LLVM IR Compiler

It generates LLVM IR. You can run all possible optimizations that llvm supports.

Legacy Compiler

Of course, there are some performance issues. Let's compare the performance of the C compiler and the Assert compiler using the factorial as an example.

#include <stdio.h>
#include <stdint.h>

int64_t factorial(int64_t x)
{
        if (x <= 1)
                return 1;
                
        return x * factorial(x - 1);
}

int main()
{
        printf("%ld", factorial(20));
        for (size_t i = 100000000; i > 0; i--)
                factorial(20);
        
        return 0;
}
$ g++ -o factorial_O0 -O0 factorial.cpp
$ g++ -o factorial_O1 -O1 factorial.cpp

Study the Assert code here:

dump factorial(x) {
        if (x <= 0)
                return 1;

        return x * factorial(x - 1);
}

dump main() {
        assert(x = 100000000);
        assert(out(factorial(20)));
        while (x > 0) {
                assert(factorial(20));
                assert(x = x - 1);
        }
        
        return 0;
}

Here is the C code and compiler optimization flags below:

$ ./tr factorial.ass factorial.tree
$ ./cum factorial.tree factorial.o
$ ld -o factorial factorial.o asslib.o /lib64/libc.so.6 -I/lib64/ld-linux-x86-64.so.2 

Linux perf unility gives the following results:

 Performance counter stats for './factorial':

          5 843,50 msec task-clock:u              #    0,999 CPUs utilized          
                 0      context-switches:u        #    0,000 /sec                   
                 0      cpu-migrations:u          #    0,000 /sec                   
                55      page-faults:u             #    9,412 /sec                   
    26 042 804 818      cycles:u                  #    4,457 GHz                    
    55 400 158 492      instructions:u            #    2,13  insn per cycle         
     6 500 036 070      branches:u                #    1,112 G/sec                  
       184 308 574      branch-misses:u           #    2,84% of all branches        

       5,848005098 seconds time elapsed

       5,841837000 seconds user
       0,000000000 seconds sys

Performance counter stats for './factorial_O0':

          5 839,94 msec task-clock:u              #    0,999 CPUs utilized          
                 0      context-switches:u        #    0,000 /sec                   
                 0      cpu-migrations:u          #    0,000 /sec                   
               114      page-faults:u             #   19,521 /sec                   
    26 115 713 798      cycles:u                  #    4,472 GHz                    
    26 203 004 347      instructions:u            #    1,00  insn per cycle         
     6 200 471 267      branches:u                #    1,062 G/sec                  
       220 171 781      branch-misses:u           #    3,55% of all branches        

       5,846114561 seconds time elapsed

       5,835352000 seconds user
       0,003325000 seconds sys
 Performance counter stats for './factorial_O1':

          4 124,80 msec task-clock:u              #    1,000 CPUs utilized          
                 0      context-switches:u        #    0,000 /sec                   
                 0      cpu-migrations:u          #    0,000 /sec                   
               113      page-faults:u             #   27,395 /sec                   
    18 127 811 023      cycles:u                  #    4,395 GHz                    
    19 803 003 158      instructions:u            #    1,09  insn per cycle         
     6 100 470 116      branches:u                #    1,479 G/sec                  
       400 124 522      branch-misses:u           #    6,56% of all branches        

       4,125413963 seconds time elapsed

       4,123597000 seconds user
       0,000000000 seconds sys

Performance is generally a complex thing. But we can conclude that the compiler generates code comparable to the -O0 g++ option.

B - Supported Architectures

LLVM IR Compiler

LLVM IR is a platform-independent intermediate representation that can be used to represent code for any target architecture that is supported by LLVM.

Legacy Compiler

The following list contains supported architectures:

  • amd64 - also known as em64t or AMD, Intel x86-64.

F - Standard Library

Assert Standard Library (or asslib) is an interface between an abstract Assert language and an operating system. With its help, the execution of some keywords is implemented.

If you dump a compiled file, you can see standard names in the ELF-symtab section (6-7 in the output below):

$ readelf -s compiled.o

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS hello.o
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 .rodata
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 .data
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 .bss
     6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __ass_print
     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __ass_scan
     8: 0000000000000038     0 NOTYPE  GLOBAL DEFAULT    1 _start

Sources

You can find the source code in the official Assert repository:

  • asslib.s - standard library implementation for legacy backend
  • asslib-llvm.c - standard library implementation for llvm backend
  • STDLIB - ELF64 configuration file.