Data Oriented Programming and Recursion

...and other lose thoughts about how to structure code. This is sort of a continuation to the page about global variables. After writing it I got to thinking about how I wanted to improve code structure. As previously stated I wanted to move away from having true globals and towards statics with getters and setters.
Though it went deeper than that, structs were also something I had problems with. When storing character data I'd have a global array of structs that contained all the data for a single character. So for example you'd have:

struct actor {
	double xPos, yPos;
	double xSpeed, ySpeed;
	int health;

And then an array storing these structs, most often as an array of pointers which we'll come back to.
Just using getter and setter I felt didn't do enough for code readability and at the same time I came across Data Oriented Programming in which instead of an array of structs you create a single struct containing arrays of the data. In C this would look like:

struct actors {
	double *xPos, *yPos;
	double *xSpeed, *ySpeed;
	int *health;

When adding a new actor you'd then realloc the arrays and put the values for the latest actor at the back of each. The main reason DOP puts forth for doing this is that all similar data is contiguous in memory. So all x positions come after each other in one long array as does all the speed and health data in their own arrays. This is supposed to help with cache misses since the processor can now preload the next number in the array and be sure it'll be used next.
Contrast this to having the data in individual structs. You could store it as an array of actual structs instead of pointers in which case the memory would be contiguous from one struct to the next but the data would still be mixed since you have doubles, ints, bools and chars in a single struct those would all come after each other in memory and you wouldn't have the clean arrays with a single data type. Also the most natural thing to do code wise is to have a constructor that you call when you need a new actor struct which then creates the struct on the heap and returns a pointer to it. So what you usually end up with is an array of struct pointers and not an array of structs.
I don't know about the performance benefits of DOP but to me it look esthetically pleasing so I thought I'd give it a try. I decided to get rid of the structs completely and just use the code files for encapsulation, grouping the related arrays into the same file.
My actor.c file then starts off something like this:

static int g_nbrActors = 0;
static double *g_xPos = NULL;
static double *g_yPos = NULL;
static double *g_xSpeed = NULL;
static double *g_ySpeed = NULL;

void actor_add( const double xPos, const double yPos )
	const int i = g_nbrActors;

	g_xPos = realloc( g_xPos, sizeof( double ) * g_nbrActors );
	g_xPos[ i ] = xPos;


Consider what this does to the way you handle the data for each actor. When using structs you'd have a function that took a pointer to a single struct and then performed all the calculations that needed to run during a single frame on it and then moved onto the next struct, usually using a for loop. So the update function might look something like this:

struct actor **g_actors;
void actor_update()
	for( int i = 0; i < g_nbrActors; i++ ) {
		actor_calculate_speed( g_actors[ i ] );
		collision( g_actors[ i ] );

I strive to only pass functions the values from the struct that they need but when you use structs pointers the temptation to just pass the entire pointer is usually too great so that's what I end up doing to the detriment of code readability.
By getting rid of the structs and using DOP it would instead become a chore to pass a function all the variables pertaining to a single actor and you are instead forced to to pass only the relevant ones.
This is what the actor_update() function looks like now instead:

void actor_update()
	actor_calculate_speed( 0, g_nbrActors, g_xSpeed, g_ySpeed, g_topSpeed, g_xAcc, g_yAcc );
	collision( 0, g_nbrActors, g_indexAI, g_xPos, g_yPos, g_dx, g_dy, g_xSpeed, g_ySpeed, g_halfSize );

This at first looks more confusing due to an increase in function arguments but there is a method to the madness. Each function is run consecutively for all the actors before moving onto the next function. So the speed for each actor will be calculated by the actor_calculate_speed() function, then the collision() function will be run for all actors. As opposed to before where a cluster of functions would be run for a single actor before moving onto the next actor. So you'd first set the speed, do collision detection, set the position e.t.c. all for a single actor before moving onto the next. I then use this basic template when creating a function:

void foo( const int counter, const int counterCutoff, double *dataToModifiy, const double *dataToReadFrom )
	if( counter == counterCutoff ) return;

	*dataToModify = *dataToReadFrom + some calculations...;

	foo( counter + 1, counterCutoff, ++dataToModify, ++dataToReadFrom );

Both dataToModify and dataToReadFrom are arrays that the function will get initialized with. Const is used to make it quick to tell what data is being changed and which is just being read from. If we take the above actor object as an example it might look something like this:

void actor_set_x_pos( const int actor, const int actorCutoff, double *xPos, const double *xSpeed )
	if( actor == actorCutoff ) return;

	*xPos += *xSpeed + calculations...;
	actor_set_x_pos( actor + 1, actorCutoff, ++xPos, ++xSpeed );

And it would get initialized like this:

void actor_update()
	actor_set_x_pos( 0, g_nbrActors, g_xPos, g_xSpeed );

This is now using recursion which I find esthetically pleasing and better for code clarity. Recursive functions can replace for loops and for loops are something that often perform very specific tasks that should be broken out into their own functions anyway and so it tends to flatten to code. Using recursion is something that's often difficult to do since the type of code flow is not something most people intuitively create when they're not used to it. Using DOP I find helps to make recursion more natural because it's easier to have a recursive function go through data that has been collated into groups of same type, rather than having different data thrown into blob structs.
It also helps to minimize the use of global variables because while the function might need to be initialized with the first element in a global array - which is a pointer - it then has no idea where that pointer came from. It just increments it before calling itself and reads the value until the counter reaches it's cutoff. In the above example actor_set_x_pos() get initialized with the globals g_xPos and g_xSpeed but then treats them as local variables passed to it by arguments and just increments them.

Most importantly however this has freed me from thinking too hard about how to structure the code. When I need to create a new function that has to be run for all actors I know it will start with a counter initialized to zero and a cutoff equal to the total number of actors. Then I just add arguments for what data needs to be modified and any additional data needed to perform the calculations gets added as pointers to consts. At the end of the function it just increments all the values and calls itself. Simple. I'm starting to think that my problem is really that I don't like programming. Modeling systems like games and something like Conway's game of life is fun but thinking about how to structure code and reading about abstract programming concepts just isn't so by severely limiting the amount of features available in the language I might have gotten to a point where I don't have to think about it because there's just one way of doing it. I immediately know how a function should function so to speak.

Another thing I've started to do is to make functions more self sufficient. Functions might have some condition attached to them that decides if they should run or not and before I might have have used an if statement from the calling function like so:

void caller()
	if( flag ) {

Instead I've now started always calling the function and then have it abort at the start if it isn't supposed to run:

void caller()
	foo( flag );
void foo( bool flag )
	if( !flag ) return;

Again this helps to flatten the code but I've also found that it helps to make functions more like modules that you can change around the order of without breaking everything. Before I'd have a very rigid logic framework at the top scope deciding how everything should be called. Refactoring the function often required not just changing the function but also the piece of code in the above framework and forgetting this sometimes led to bugs being introduced. By making all logic related to the function a single module this is avoided.
Although when using recursion this has led me to start using the dreaded goto command since you might need to jump to the next iteration like this:

void foo( const int counter, const int counterCutoff, bool flag )
	if( counter == counterCutoff ) return;
	if( flag ) goto nextIteration;

	foo( counter + 1, counterCutoff, flag )

It could be done with an if statement but I like using gotos better since again it's flatter, especially when there's multiple conditions that could abort a function.

From all of this I'm starting to think that C++ not only took a wrong turn but went in the completely wrong direction. Maybe what we need isn't a superset of C but a subset? Maybe even C has too many features? From doing this I could get rid of structs, extern, typedef and I'd say defines but I remember I need those for compiler flags. There's also one multi-dimensional array returned by get_walls() in the tutorials that uses defines as labels but I want to simplify that away by going to multiple one dimensional arrays.
Now that we're on the subject I'd like to talk about my preference for one dimensional arrays. Stuff that requires X and Y components like position and speed always have those split into two variables instead of having an array of two values holding them. This is because I find it to be clearer to have things like "xPos = 10" in the code rather than pos[ 0 ] = 10 or pos[ X ] = 10. It's also a bit clearer to see if a function modifies the values when called, so if you use "foo( xPos, yPos )" you know foo() won't modify them but if you pass an array to a function like:

double pos[2];
foo( pos );
Because arrays decay to pointers you can't tell if foo() changes the position without looking at the signature to see if it's "foo( double[] )" or "foo( const double[] )". Although this does increase the amount of arguments most functions take since anything involving x and y becomes two variables instead of one I still like it better.

Performance was an issue I ran into, most likely caused by everything being recursive. Using O2 optimizations I saw a >100% increase in performance, something I've never seen with any other project. Again I assume the problem is that recursion is slower than iteration but since most of the recursive functions are tail calls the compiler can optimize them into an iterative form which is what I assume is done when O2 is used though I don't know Assembly well enough to confirm this. In either case O2 has now become a necessity. It would be nice to know if O2 solves all the problems or if writing it in an iterative style with for loops would have been faster still. The project didn't get far enough in an iterative form before I switched to recursion for me to go back and compare it unfortunately.

Multi-threading is another thing that I haven't really worked out with this new approach. What I used to do was to make sure that no value of a struct was being modified that could also be read by another actor and then use OpenMP to automagically multi-thread a for loop. Something like this:

#pragma omp parallel for
for( int i = 0; i < nbrActors; i++ ) {
	foo( actors[ i ];
This however required storing lots of values into temporary variables since you couldn't directly change variables during the for loop that were also being read from. I haven't worked out a new way of doing this. OpenMP seems to be out of the equation now that I use recursion instead of for loops but splitting up the calculations so that one small calculation is done for all the data in a single array instead of having all different calculations done for one large struct at a time might allow you to not have to store a bunch of temp values globally. This since you now only need to store a few temp variables at a time for one function before placing their values into the proper array. But I haven't worked out a way of doing this yet.

To talk a bit more about structs I find that by avoiding them it also helps with readability because you no longer have the arrow operator everywhere, though this necessitates the use of temporary variables and getter functions which might not be everyone's cup of tea. For example I had this piece of code in the draw function before:

DrawRectangle( actor->x - actor->halfsize, actor->y - actor->halfsize, actor->size, actor->size, actor->color);

DrawRectangle() function needs a bunch of variables from an actor struct but consider how it looks like without the struct:

const double x = actor_get_x_position( i );
const double y = actor_get_y_position( i );
const double halfSize = actor_get_halfsize( i );
const double size = actor_get_size( i );
const Color color = actor_get_color( i );

DrawRectangle( x - halfSize, y - halfSize, size, size, color);

Obviously there's a lot more lines of code but at the same timer the DrawRectangle() call has shrunk significantly and it's easier to see what's happening particularly the subtractions stand out more. This becomes even more noticeable when used to avoid composition, so something like foo->bar->x goes away.
Also a small discursion about the getters: Starting out I sometimes used getters that returned two values using pointers. For example the getter for the position of an actor looked like this:

void actor_get_position( const int actor, double *x, double *y )
	*x = g_xPos[ actor ];
	*y = g_yPos[ actor ];

But I eventually decided to stop fighting C's lack of multiple return values and split those functions up like so:

double actor_get_x_position( const int actor ) { return g_xPos[ actor ]; }
double actor_get_y_position( const int actor ) { return g_yPos[ actor ]; }

This allows you to const all the variables that receive values from the getters. Before you'd first have to define two non-const variables and then pass pointers to them to the getter function which looked messy and again made them not const. I also have a suspicion that the compiler is able to optimize the getter functions better if they just consist of a single return statement.

To conclude this way to long rambling I really like this way of programming. It simplifies everything and is just a plain more fun way of programming for me. It takes some getting used to since the intuitive thing everyone does when they're beginners is group variables into structs based on "objects" and then create collections of those structs. This is not the only OOPish behavior you tend to fall into when using structs, there's also constructors/destructors and composition. All of these concepts are pretty facile and easy to gravitate towards because they appear obvious but to me the code improved every time I got rid of one of them. In fact this whole exercise has cemented my conviction that OOP is really just a simple beginners mistake lifted up into dogma by people who are way too clever for their own good.
Completely getting rid of structs isn't technically necessary for doing DOP but by getting rid of them I can have the smug satisfaction of knowing that I've taken the code even farther away from OOP.
Also when doing the refactoring all the functions that took explicit arguments instead of struct pointers where the easiest to refactor. So I feel vindicated in eschewing structs in the first place.
Now I'm not able to find any example in literature of other people doing it this way which is usually a sign that what you're doing is wrong but I dunno. I wasn't happy with how the code looked before so why not change it?