C - Performance

LLVM IR Compiler

It generates LLVM IR. You can run all possible optimizations that llvm supports.

Legacy Compiler

Of course, there are some performance issues. Let's compare the performance of the C compiler and the Assert compiler using the factorial as an example.

#include <stdio.h>
#include <stdint.h>

int64_t factorial(int64_t x)
{
        if (x <= 1)
                return 1;
                
        return x * factorial(x - 1);
}

int main()
{
        printf("%ld", factorial(20));
        for (size_t i = 100000000; i > 0; i--)
                factorial(20);
        
        return 0;
}
$ g++ -o factorial_O0 -O0 factorial.cpp
$ g++ -o factorial_O1 -O1 factorial.cpp

Study the Assert code here:

dump factorial(x) {
        if (x <= 0)
                return 1;

        return x * factorial(x - 1);
}

dump main() {
        assert(x = 100000000);
        assert(out(factorial(20)));
        while (x > 0) {
                assert(factorial(20));
                assert(x = x - 1);
        }
        
        return 0;
}

Here is the C code and compiler optimization flags below:

$ ./tr factorial.ass factorial.tree
$ ./cum factorial.tree factorial.o
$ ld -o factorial factorial.o asslib.o /lib64/libc.so.6 -I/lib64/ld-linux-x86-64.so.2 

Linux perf unility gives the following results:

 Performance counter stats for './factorial':

          5 843,50 msec task-clock:u              #    0,999 CPUs utilized          
                 0      context-switches:u        #    0,000 /sec                   
                 0      cpu-migrations:u          #    0,000 /sec                   
                55      page-faults:u             #    9,412 /sec                   
    26 042 804 818      cycles:u                  #    4,457 GHz                    
    55 400 158 492      instructions:u            #    2,13  insn per cycle         
     6 500 036 070      branches:u                #    1,112 G/sec                  
       184 308 574      branch-misses:u           #    2,84% of all branches        

       5,848005098 seconds time elapsed

       5,841837000 seconds user
       0,000000000 seconds sys

Performance counter stats for './factorial_O0':

          5 839,94 msec task-clock:u              #    0,999 CPUs utilized          
                 0      context-switches:u        #    0,000 /sec                   
                 0      cpu-migrations:u          #    0,000 /sec                   
               114      page-faults:u             #   19,521 /sec                   
    26 115 713 798      cycles:u                  #    4,472 GHz                    
    26 203 004 347      instructions:u            #    1,00  insn per cycle         
     6 200 471 267      branches:u                #    1,062 G/sec                  
       220 171 781      branch-misses:u           #    3,55% of all branches        

       5,846114561 seconds time elapsed

       5,835352000 seconds user
       0,003325000 seconds sys
 Performance counter stats for './factorial_O1':

          4 124,80 msec task-clock:u              #    1,000 CPUs utilized          
                 0      context-switches:u        #    0,000 /sec                   
                 0      cpu-migrations:u          #    0,000 /sec                   
               113      page-faults:u             #   27,395 /sec                   
    18 127 811 023      cycles:u                  #    4,395 GHz                    
    19 803 003 158      instructions:u            #    1,09  insn per cycle         
     6 100 470 116      branches:u                #    1,479 G/sec                  
       400 124 522      branch-misses:u           #    6,56% of all branches        

       4,125413963 seconds time elapsed

       4,123597000 seconds user
       0,000000000 seconds sys

Performance is generally a complex thing. But we can conclude that the compiler generates code comparable to the -O0 g++ option.