Turing Garbage Collection Proof of Concept

TheCool1Kevin · **Posted:** Mon Jul 02, 2018 10:21 pm

Garbage Collection PoC

A few whiles back, I started on my Game Engine for Turing, along with a preliminary 2D Physics Engine. Little beknownst to me, OpenTuring is an object-oriented programming language, which meant that I could store vectors in objects instead of separate components. Wow! Coming from a background of C# programming, I did not understand the concept of "clean up your own trash," and found "free after use" quite irritating and annoying. So, my research began. I stumbled upon the work of Foundry and his work runtime manipulations of the Turing opcodes were simply amazing. And along with this, did the possibility and idea of a garbage collector begin to form in my head.

Before we begin I'll assume you have a solid grasp of object-oriented concepts, and a solid understanding of pointers and references. This article aims to demonstrate the furthest limits of the Turing language, with no practical goal in mind.

Introduction and the Problem

Say you want to allocate an object. In C, you would do something like this:

c:

vec2_t* myVector = (vec2_t *) malloc(sizeof(vec2_t));

Now, running the following code, you'll realize that the program will quickly gobble up a lot of memory, leading to an eventual crash:

c:

while(true)
vec2_t* myVector = (vec2_t *) malloc(sizeof(vec2_t));

However, in Java or C#, the same code doesn't crash the program - why?

C#:

while(true)
Vector2 myVector = new Vector2();

In C, everytime the loop runs, the program allocates some space on the heap, and when the heap runs out of memory, it requests some more from the operating system, and when the operating system runs out - well, the program crashes (process a little simplified). This means that with every malloc you call, you need to call free, making development quite difficult. Imagine writing a complicated program (say a game engine), and you forget to free some object in the game loop... good luck figuring out where the memory leak is coming from!
Which is why garbage collection exists - for idiots like me (and maybe you). The term garbage refers to objects that are still allocated in memory but are not in use by the program.

C#:

A objA = new A();
objA = null;

Above is a simple example of garbage. The program creates a new object objA, and then proceeds set it to null. Thus, after the second line, the object allocated on the first line becomes garbage as it can no longer be accessed through the reference pointer objA. A garbage collector is a form of automatic memory management, and the collector attempts to reclaim memory occupied by garbage. Sounds simple, doesn't it? Nope.

The Solution

The solution is to find out when and where garbage occurs. Then we can mark and garbage, and sweep it right into the trash can. This is rightfully named the mark and sweep algorithm.
Before we even start marking the garbage, we need to keep track of every single object that we create. This, unfortunately, is not viable without a custom object creation routine. The PoC code provides a way of doing this through the New procedure.

Turing:

proc New(adr : addressint, obj : cheat addressint)

Essentially to create any object, one would:

Turing:

var myInstance : unchecked ^MyClass New(addr(myInstance), MyClass)

Which is significantly longer than the default:

Turing:

var myInstance : ^MyClass new myInstance

Note the usage of the unchecked pointer.
The PoC creation routine injects code at runtime to create a new object (as Turing provides no way of doing it properly), and it registers it with the garbage collector by calling GC.

There are 2 checks the GC will perform to determine whether a pointer has become garbage or not.

Turing:

proc TryMarkGarbage(idx : int)

Turing:

proc MarkStack(ptr : int)

Note that these are never to be called by the user from outside the Objects module.
The first check is rather simple. It checks for whether an object pointer's value has changed from its value assigned during the object's creation. If so, then we assume the original object reference to be lost, and thus the object has become garbage. This is demonstrated in our previous example with objA. Essentially:

Turing:

var myInstance : unchecked ^MyClass New(addr(myInstance), MyClass)
New(addr(myInstance), MyClass)

After line 1, an object will be allocated to the heap, and its address will be stored in myInstance. After the second line, a new object is created, and its reference will be stored in myInstance, thus the first reference is lost, and the object becomes garbage.
The second check is slightly more complicated. It exploits the fact that variable within procedures and functions are stored on the stack in Turing. To demonstrate:

Turing:

proc A()
var foo : int := 0
put "Lower address: ", addr(foo)
end A
proc B()
A()
var bar : int := 0
put "Higher address: ", addr(bar)
end B
var baz : int := 0
B()

As can be seen, addr(foo) will be at a lower address than addr(bar) (since x86 stacks grow downwards instead of upwards), which means that if a variable is at a higher stack frame, then it is inaccessible. However, to exploit this property, we must determine what the absolute top of the stack (i.e., lowest possible stack address) is so we don't accidentally free a potentially in use variable in the data section of our program. We can approximate this with a variable directly before a function call, like baz in this case. Thus given two pointers, ptr1, ptr2, if both are higher than the stack_top, and ptr1 < ptr2, then ptr1 is garbage, else ptr2 is garbage. Well, basically anyways.

With the inner workings explained, I can show you the code and explain the demo:
https://gist.github.com/TheCool1Kevin/58e870a5a7b0f8868e5c8304125c9da3

The Demo and PoC

Due to the quirks of the stack-based GC, the majority of the code must be encapsulated within a Main procedure. This shouldn't be much of a burden. The basic skeleton of Main.tu is as follows:

Turing:

import Objects in "Objects.tu"
var stack_lower_limit : int % Required, don't move line location!
proc Main()
Init(addr(stack_lower_limit)) % Init GC
% Main code goes below
% blah blah blah...
end Main
Main()

One new concept is the Objects.SweepAll() call. This essentially performs a global sweep of all the variables. Usually, with Java and C#, there would be a GC worker thread that carries out the global sweep every so often, however, since Turing sucks, we'll have to manually invoke it. Due to the annoyance of manually invoking stuff, I've tried my best to avoid the usage of the Objects.SweepAll() call. Tests that do not rely on the Objects.SweepAll() call will say "no sweep."
Upon running, I am aware that there is a single object left in the heap. The GC will not clear every single object from the heap, only the majority. In the demo Main.tu, there is one stray object left in the heap because MarkStack() is only called when New() is called, and thus the stack does not get cleared, as no new object is created. To enable GC in your project, you would simply download the gist posted above as a zip file, and extract it to your project folder. From there, you can use the Objects.tu module to create, and free objects. Please note that your main program must follow the template above, and you must use the custom object creation API provided.

Edit: Added last test case to prove I'm not a fraud

Deployment and Limitations

In order for this library to be usable, you would have to implement the one very important missing feature. Let's demonstrate:

Turing:

var myInstance1 : unchecked ^MyClass New(addr(myInstance1), MyClass)
var myInstance2 := myInstance1
myInstance1 := nil
Objects.MarkAndSweep()
myInstance2 -> doStuff()

This code will fail because myInstance2 will be referring to a freed object. In order to make this a viable library, we would have to keep track of object references AND pointers references. So, we would have to keep track of myInstance2 and myInstance1. One limitation would be that instead of allowing var myInstance2: = myInstance1, we would have to do something like:

Turing:

var myInstance2 : unchecked^ MyClass Equate(myInstance2, myInstance1)

Which gets infinitly more complex when dealing with returning pointers and passing pointers as method parameters. This is probably why a GC in Turing cannot truely be possible.
The End
Again, kudos to Foundry for their work on Opcodes.tu Head Bang

Cheers, Kevin. So cool

	Computer Science Canada Programming C, C++, Java, PHP, Ruby, Turing, VB Username: Password: Register
Wiki Blog Search Turing Chat Room Members