Memory Allocator

Introduction

We use memory allocators nearly every time we write software. They allow our programs to be dynamic and handle a wide variety of input, or user interactions. In languages like C the programmer interacts directly with the memory allocator through calling malloc() and free(). Higher level languages extract away the need for the programmer to worry about memory management, usually through a garbage collector. But what does a memory allocator do? Hopefully we can figure that out through exploring one possible implementation.

Headers

Here are all the headers I believe we will need to complete this project. Apologies if we don’t end up using some of them, just try to cover my bases.

#include <assert.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "gallocc.h"

#ifndef GALLOCC_H_
#define GALLOCC_H_

#include <stdlib.h>
#include <stdbool.h>

#define HEAP_INC 1024

typedef struct heap_seg {
  size_t size;
  struct heap_seg *prev;
  struct heap_seg *next_free;
} heap_seg;

size_t round_bytes(size_t);

bool segment_free(heap_seg*);
size_t segment_size(heap_seg*);

void update_back(heap_seg*, heap_seg*, bool);

int heap_init(size_t);
void *malloc(size_t);
void free(void*);
void *calloc(size_t, size_t);
void *realloc(void*, size_t);

#endif // GALLOCC_H_

Exploration

You have hopefully heard a concept in programming called the “heap”. Each time we dynamically allocate memory in our program it ends up on the heap. We know that the computer has all the memory available, so it must have given some of this memory to our heap. For now let’s just imagine that asking for this memory in our program is super complicated, so we will use a statically allocated buffer that we manage ourselves.

Let’s start off by laying our groundwork for this exploration.

#define HEAP_SIZE 1024

void* heap_base;

typedef struct heap_seg {
  size_t size;
  struct heap_seg *prev;
} heap_seg;

For now, we will set our heap to a fixed size of 1024 bytes, and we will use a global variable (☢ I know, I know ☢) to keep track of the base of our heap. Then we define a struct for each segment of our heap, which for now holds a size_t to hold the size in bytes, and a pointer for the previous segment. We will later see how we can perform some cleaver transformations on our size to store more information.

Initialization

int heap_init(void* base) {
  if (!base) { return -1; }
  heap_base = base;
  *(heap_seg*)heap_base = (heap_seg){HEAP_INC, NULL};
  return 0;
}

From there we need to set our heap up so that users can begin to use it. In a real heap implementation this would be included in the first call to malloc, but with our static version this won’t be possible. Our only real setup here is to initialize our first heap segment, which will occupy the entire space by itself. We want to keep heap segments as large as possible so that we have a higher likelihood of being able to fit a requested allocation.

Once our heap is ready to go we can begin handing out pointers to different segments. Sounds easy right?

Allocation

We will leave error handling out for right now to focus in on the core logic.

void *heap_malloc(size_t bytes) {
  heap_seg *prev = NULL;
  heap_seg *ptr = heap_base;
  while (ptr < (heap_seg*)(heap_base + HEAP_SIZE)) {
    if ((ptr->size & 0x1) == 0 && ptr->size >= bytes + sizeof(heap_seg)) {
      // Store old capacity
      size_t old_cap = ptr->size;
      // Heap segment is free, and big enough
      ptr->size = sizeof(heap_seg) +
                      ((bytes + (sizeof(heap_seg) - 1)) / sizeof(heap_seg)) *
                          sizeof(heap_seg) |
                  0x1;
      void *user_block = (void *)(ptr + 1);
      if (old_cap > ptr->size) {
        heap_seg *next_block = ptr + ptr->size / sizeof(heap_seg);
        next_block->size = old_cap - (ptr->size ^ 0x1);
      }
      return user_block;
    }
    prev = ptr;
    ptr = (void*)ptr / (ptr->size & (~1));
  }

  return NULL;
}

Yeah, it’s a bit more going on that I would have assumed at first, but it won’t be too hard to step through. Also note that we are currently covering what is called first fit right now, where we find the first chunk able to fulfil the allocation request, we will cover a different allocation strategy later.

We first start at the base of our heap, and begin checking at each spot if it is both free, and large enough to handle the requested allocation. Once a chunk if found we note its original capacity, and change it’s size to hold the requested amount of bytes. This is also where we set its previous element. You may notice that we don’t set the size to be equal to bytes as the user requested, instead we have to factor in a few more things.

The first portion of the size is the heap_seg struct itself, which in the current implementation is 16 bytes. The next section is the data, which needs to be aligned on 8 byte boundaries. We do this for efficient memory access, as a word on modern 64-bit systems is 8 bytes. To accomplish this we round up the requested size to the nearest multiple of 8, so 16 bytes stays at 16, and 17 bytes rounds up to 24.

Once we put this all together we set the least significant bit to 1. This final bit allows us to keep track of if the heap segment is in use. We are able to do this because each size will be a multiple of 8, meaning that its lower 3 bits will always be 0. We are currently using 1 of the 3 bits, so we can potentially even store more flags if needed.

There we go, we have now produced a heap segment that can be returned to the user. We note the position where they are allowed to begin writing, which we will return at the end of our function. The last step is to see if we need to segment the heap. If the section we allocated is less than the old capacity we get a pointer to the start of the next heap segment, and then set its size to the remaining space in the heap.

That’s how we hand out segments of our heap to users! If we don’t find a suitable segment we advance to the next segment and start over again.

int main() {
  uint8_t heap[HEAP_SIZE];
  heap_init(heap);

  int *block = heap_malloc(4);
  assert(block != NULL);
  *block = 4;

  int *num = heap_malloc(4);
  assert(num != NULL);
  *num = 7;

  printf("Num holds: %d\n", *num);
  printf("Block holds: %d\n", *block);
}

Num holds: 7
Block holds: 4

It was fairly easy to hand out memory, so how does the user give that memory back?

Freeing

In our most basic implementation “freeing” memory will be nothing more than setting its least significant bit back to 0.

void heap_free(void* block) {
  if (!block) { return; }
  heap_seg *header = (heap_seg*)block - 1;
  header->size ^= 1;
}

But we may run into a bit of a problem here. What happens if a user allocated a many 8 byte segments at the start of their program, and then frees them all back once their program gets under way. Then they need to allocate larger segments of memory. With our current free function we would have to skip over all of these tiny segments till we found one large chunk to hand back to the user. What if all of those small segments added together would have been enough to handle their allocation.

This is where we have to rejoin our heap segments to ensure that we do not “fragment” the heap. Surprisingly this is also fairly easy to accomplish once you can visualize the possible states our heap may be in.

When a user requests to free a block there are only 3 possible states that it may be in.

The segment above or below is free, in which case we need to combine our segment with the adjacent free segment.
Both segments above and below are free, in which case we need to combine all 3 into a larger block.
The segments above and below are still in use, in which case we simply set the LSB to 0.

If we follow this behavior on every free call we ensure that every block in our heap will fall into 1 of these 3 states. The implementation here is just to check the block above and below the current block, then apply any needed transformations.

void heap_free(void* block) {
  if (!block) { return; }
  heap_seg *header = (heap_seg*)block - 1;
  // Get prev/next pointers
  heap_seg *prev = header->prev;
  heap_seg *next = (void*)header + (header->size & (~1));
  if (prev > (heap_seg*)heap_base && !(prev->size & 1)) {
    prev->size += header->size;
    header = prev;
  }
  if (next < heap_base + HEAP_SIZE && !(next->size & 1)) {
    header->size += next->size;
  }

  header->size ^= 1;
}

Expanding on the previous function we obtain pointers to the blocks directly before and after the current block. Then we can check each block, and see if it needs to be merged. If the previous block is free we update it’s size to include the current block, and update our header pointer to this previous block. Then the next block is checked, and we update our size accordingly. The last step is to zero out the “in use” bit of our header.

Note that we do not need to explicitly do anything with the freed memory. By updating the size of the surrounding nodes, we ensure that it will be skipped over when searching for the closest free block. Also, by updating our header pointer we ensure that the following logic will work for any of the 3 possible states of our segment.

With our allocation and freeing functions complete, that is pretty much it for the most basic of heap allocators.

Example Usage

int main() {
  uint8_t heap[HEAP_SIZE];
  heap_init(heap);

  int *block = heap_malloc(4);
  *block = 4;

  int *num = heap_malloc(4);
  *num = 7;

  int *test = heap_malloc(4);
  *test = 66;

  printf("Address of block: %p\n", block);
  printf("Address of num: %p\n", num);
  printf("Address of test: %p\n", test);

  printf("Size of block: %d\n", ((heap_seg*)block-1)->size ^ 0x1);
  printf("Size of num: %d\n", ((heap_seg*)num-1)->size ^ 0x1);
  printf("Size of test: %d\n", ((heap_seg*)test-1)->size ^ 0x1);

  heap_free(num);
  heap_free(test);

  num = heap_malloc(32);
  *num = 1;

  printf("Address of block: %p\n", block);
  printf("Address of num: %p\n", num);
  printf("Size of block: %d\n", ((heap_seg*)block-1)->size ^ 0x1);
  printf("Size of num: %d\n", ((heap_seg*)num-1)->size ^ 0x1);
}

Address of block: 0x7ffcef5b94a8  <-- Heap Base
Address of num: 0x7ffcef5b94b8    <-- Heap Base + 16
Address of test: 0x7ffcef5b94c8   <-- Heap Base + 32

Size of block: 16
Size of num: 16
Size of test: 16

Address of block: 0x7ffcef5b94a8  <-- Heap Base
Address of num: 0x7ffcef5b94b8    <-- Heap Base + 16 (same address as above)

Size of block: 16
Size of num: 40   <-- 32 byte allocation + 8 byte header

Takeaways

Implementing a naive memory allocator is fairly easy. It won’t be the fastest one in the world, nor the most secure. Either way, this small exploration shows us the basics of how we can hand out chunks of memory to a user, and allow them to be freed back to us.

The first fit algorithm that we chose to implement above is one of the quicker ways to allocate memory (we can also add a global variable that points to the first free block, to further increase speed). While speed is an important factor when it comes to allocating memory, sometimes the user might wish to be more space efficient as opposed to time efficient.

Instead of finding the first available spot we can find the best fit segment which is the smallest segment large enough to handle the requested allocation. One of the major advantages of this algorithm is that it avoids splitting large heap segments, which we should try to preserve for large allocations.

The biggest problem with splitting large heap segments is that we fragment our heap. Our next large allocation may not be able to find a contiguous segment to hold it, thus requiring our allocator to go out to the operating system and request more memory. For this reason, our next iteration will explore this algorithm.

We will see these functions used in the next section, but I will introduce them now so that you can become familiar with their behavior.

size_t round_bytes(size_t bytes) {
  return (bytes + (sizeof(void *) - 1)) / sizeof(void *) * sizeof(void *);
}

bool segment_free(heap_seg *seg) { return !(seg->size & 0x1); }

size_t segment_size(heap_seg *seg) { return seg->size & (~1); }

Our first new function will perform the byte rounding aligning us to the nearest block of 8 bytes. The next function checks if a block is free, and finally we have a function to get the true size of a block.

Our current implementation also relies on a statically allocated buffer, meaning once we fill this up we won’t be able to hand out more memory. Let’s see how we can make this behavior more dynamic by requested more memory from the operating system.

sbrk()

While the exact details of how a program is laid out in memory is not required to understand this portion, it will be very helpful.

Any of the programs that you choose run on your computer are running in user-space. The CPU scheduling that an operating system does won’t apply here, as we are more concerned with the memory layout of our program. Essentially, when our program is running the operating system provides it a virtual memory space, which our program sees as one contiguous memory space that it has control over.

A program like Firefox or OBS does not need to be concerned with the other programs running on the computer. Under the hood each program may be spread out all over physical memory, or intertwined together with other programs. For our user-space programs to work the operating system “lies” to them, providing a single contiguous section of memory. Our program then runs as if it is the only program on the computer.

When are program is run, it is loaded into memory with a layout as depicted below.

The stack holds local variables and any other data in which it’s size is known at run time. On certain architectures the stack may also be used to hold function arguments. Program Data/Code holds things like constants, uninitialized variables, and the code itself. Lastly, the heap is our program’s dynamic memory space, which can hold variables whos size or quantity is unknown at compile time.

The amount of stack space a program has is set on a operating system level on linux. Running ulimit -a will show all the current limits for your user. On my machine stack size is set to 8192 kb or ~8 mb. A program uses more of the stack by subtracting a certain number of bytes from the stack pointer, making sure to keep this aligned to the architecture specific value.

On the other hand, the heap is not something the user has direct control over from their assembly. In order to get more memory for our heap we have to ask the operating system.

I have been referring to our memory allocator as a “heap allocator”, where the heap refers to the “heap” in memory. Just like our buffer of 1024 bytes in our original implementation, the heap is a large contiguous section of memory. Yet unlike our 1024 byte buffer the heap can grow and shrink as needed. This is accomplished with the system call sbrk().

It is worth noting that modern implementations of malloc() use mmap instead of calls to sbrk. There are many reasons why this change was made, but it essentially boils down to the fact that mmap fits our use cases better, and is safer to use.

Using sbrk directly in our code is part of the Posix system API. The Posix API allows our programs to be run on any Posix compliant system, meaning it is essentially an operating system abstraction layer. sbrk (and brk) have both been deprecated in the Posix API.

Our implementation will still explore sbrk as it's way to gain more memory.

So how does sbrk work?

sbrk allows a user to specify the number of bytes by which the heap should grow. This is controlling the program break, which is the address directly beyond the data segment of our program. sbrk has 2 use cases, incrementing the program break, or when called with 0, returning the address of the program break. Let’s take a look at how these work.

#include <unistd.h>
#include <stdio.h>
int main() {
  void *brk = sbrk(0);
  sbrk(4);
  void* new_brk = sbrk(0);

  printf("Program-Break:  %p\n", brk);
  printf("Program-Break:  %p\n", new_brk);

  printf("Distance-Between: %d\n", new_brk - brk);
}

Program-Break:	0x563e57045000
Program-Break:	0x563e57045004
Distance-Between:	4

The first call to sbrk returns the heap base, equivalent to the starting address of our 1024 byte buffer. We then increase the program break by 4 bytes, which returns the address of the previous program break. Therefore, we have to call sbrk(0) again to get the new program break. We see that the distance between the 2 pointers is exactly 4 bytes, exactly what we requested.

The reason why we have to call printf after acquiring the memory address is that it allocates memory internally.

This is great! We can now increase the programs heap segment whenever we want. Let’s take a look at how we can implement this in our malloc function.

Adding sbrk()

Now that we will be using sbrk directly in our program we are going to start running into a problem. The malloc implementation on our machine will be calling sbrk or mmap to increase its heap, and we will be trying to do the same. The issue is that our heaps are intertwined. If my malloc and the system’s malloc are both being run they will step on each other’s toes.

We can get around this by “overriding” malloc for our program.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
void *malloc(size_t size) {
  exit(72);
}

int main() {
  void *memory = malloc(400);
}

jackson@jtop:/tmp$ gcc test.c && ./a.out
jackson@jtop:/tmp$ echo $?
72

Our custom malloc implementation simply exits the program with a code of 72. Of course, this isn’t very useful, but it demonstrates that we can override the malloc functionality. The first portion of our malloc function will remain the same, initializing our heap, and handing out the first segment. For our static memory allocator we had to call an initialization function in main, but we will have to encode that behavior directly in malloc now.

Heap Initialization

int heap_init(size_t bytes) {
    heap_base = sbrk(0);
    size_t increment =
        (bytes > HEAP_INC) ? sizeof(heap_seg) + round_bytes(bytes) : HEAP_INC;
    sbrk(increment);
    *heap_base = (heap_seg){increment, NULL};
    if ((int64_t)(heap_end = sbrk(0)) == -1) {
      return -1;
    }
    return 0;
}

Our first step is to set up our base pointer, then we check to see how many bytes need to be allocated. We have discussed this before, but it boils down to rounding our size to the closest multiple of 8 that can hold the allocation. Lastly we check if the allocation was successful, and return based on that.

`malloc()` with `sbrk()`

From there we scan through our heap one node at a time until we find a suitable spot. Same as before, updating links in our list as we move along.

void *malloc(size_t bytes) {
  if (!heap_base) {
    heap_init(bytes);
  }
  if (bytes == 0) {
    return NULL;
  }

  // Start from heap base, find first open segment that can hold 'bytes'
  heap_seg *prev = NULL;
  heap_seg *ptr = heap_base;
  while (ptr < (heap_seg *)heap_end) {
    if (segment_free(ptr) && ptr->size >= bytes + sizeof(heap_seg)) {
      // Store old capacity
      size_t old_cap = ptr->size;
      // Heap segment is free, and big enough
      ptr->size = sizeof(heap_seg) + round_bytes(bytes) | 0x1;
      ptr->prev = prev;
      void *user_block = ptr + 1;
      size_t next_size = old_cap - segment_size(ptr);
      if (next_size <= sizeof(heap_seg)) {
        ptr->size += next_size;
      } else if (old_cap > ptr->size) {
        heap_seg *next_block = (void *)ptr + segment_size(ptr);
        next_block->size = old_cap - segment_size(ptr);
        next_block->prev = ptr;
      }
      return user_block;
    }
    prev = ptr;
    ptr = (void *)ptr + segment_size(ptr);
  }
  // .... continued below

This shouldn’t be any different from what we saw before, except for extracting some behavior out into separate functions. Our next step is where we introduce new behavior. If we reach the end of our heap and have not found a suitable spot our heap needs to expand. The first step will be determining how many bytes we need to increase our heap by, the same decision we made when initializing our heap. From there it will go much the same.

  // .... continued from above
  size_t next_size;
  size_t increment = (bytes > HEAP_INC) ? round_bytes(bytes) : HEAP_INC;
  size_t new_size = round_bytes(bytes) | 0x1;

  sbrk(increment + sizeof(heap_seg));
  ptr->size = new_size + sizeof(heap_seg);
  ptr->prev = prev;
  next_size = increment - segment_size(ptr);
  heap_end = sbrk(0);
  // Check if the next segment should be initialized
  if (next_size <= sizeof(heap_seg)) {
    ptr->size += next_size;
  } else if (next_size > 0) {
    heap_seg *next_block = (void *)ptr + segment_size(ptr);
    next_block->size = next_size;
    next_block->prev = ptr;
  }
  // Return user writable address
  return ptr + 1;
}

We will go over how we can visualize our heap in just a moment, for now let’s go through the code. After we have determined how many bytes are needed, the pointers are updated, and the heap_end pointer is moved. Our last step is to check if the segment following our allocation should be initialized. This is to avoid allocating a space that would be too small to hold any data. If we have a space that is too small we can just include it in the previous segment, otherwise it becomes its own segment.

Before we see this code in action let’s look at how we can visualize our heap.

Observing Behavior

void heap_walk() {
  heap_seg *ptr = heap_base;
  while (ptr < (heap_seg *)heap_end) {
    printf("Segment size: %4ld, Prev: %14p, Value: %ld, Address: %p\n",
           ptr->size, ptr->prev, (ptr->size & 1) ? *(uint64_t *)(ptr + 1) : -1,
           ptr);
    ptr = (void *)ptr + (ptr->size & (~1));
  }
}

We see the same logic that we use to find the next block as we saw in malloc. Each segment is printed out as a uint64_t for now, although this isn’t the most flexible as we will see. Even if this is basic it will help us solve many bugs, and to iron out implementation details.

Now we can see how our newly upgraded malloc works!

int main() {
  int *block = malloc(4);
  *block = 4;

  uint64_t *num = malloc(8);
  *num = 7;

  uint64_t *test = malloc(69);
  *test = 66;

  uint64_t *big_num = malloc(64);
  *big_num = 64;

  char *buf;
  asprintf(&buf, "big num has the value: %ld\n", *big_num);
  puts(buf);

  heap_walk();
  return 0;
}

big num has the value: 64

Segment size:   57, Prev:          (nil), Value:                   0, Address: 0x55d231e99000
Segment size:   57, Prev: 0x55d231e99000, Value:                   0, Address: 0x55d231e99038
Segment size:   57, Prev: 0x55d231e99038, Value:                   0, Address: 0x55d231e99070
Segment size:   25, Prev: 0x55d231e99070, Value:                   4, Address: 0x55d231e990a8
Segment size:   25, Prev: 0x55d231e990a8, Value:                   7, Address: 0x55d231e990c0
Segment size:   89, Prev: 0x55d231e990c0, Value:                  66, Address: 0x55d231e990d8
Segment size:   81, Prev: 0x55d231e990d8, Value:                  64, Address: 0x55d231e99130
Segment size:   49, Prev: 0x55d231e99130, Value: 2336652897525655906, Address: 0x55d231e99180
Segment size:  592, Prev: 0x55d231e99180, Value:                  -1, Address: 0x55d231e991b0
Segment size: 1041, Prev: 0x55d231e991b0, Value: 2338615488418374995, Address: 0x55d231e99400

Woah! This is a lot going on, and way more allocations that we performed ourselves. What happened?

Well, now that we have overwritten the glibc malloc implementation, all the library functions that call malloc are calling our version! This is a great sign, and will be a much better test of our functionality than we will probably end up writing. Let’s break down each of the allocations and see how we may want to tweak some of the behavior.

The first 3 allocations are coming from a system library that makes 3 40 byte allocations on program startup, don’t worry about those.
The next block of 4 allocations comes from main, where we call malloc 4 times.
The next 48 byte allocation is coming from asprintf(). If you’re not familiar with this function I think of it as a bit of a cheat code for strings in C. The function will allocate a buffer to hold the entire string it generates, and gives you the pointer to that string, which you are responsible to free.
The chunk of 592 bytes is a result of the final allocation being larger than the available space. If you total up all the sizes of everything but the last element you will get 1024, and that final allocation is larger than 1024, therefore it forces us to add more space.
Finally, the 1040 byte allocation comes from printf allocating an internal buffer for scratch work.

You may notice something suboptimal. When printf requests a 1024 byte allocation we place it after the 592 byte free block. This means that if our next call to malloc is requesting more than 592 bytes it will be placed after the 1024 block, leaving a hole in the middle of our heap.

What if instead we move our new segment to fill that hole, leaving the empty space at the very end of our heap. That way we can avoid a large empty segment in the middle of our heap. This is beneficial for our implementation because future large allocations will now be able to make use of this space, instead of it being trapped in the middle of the heap. The changes we have to make aren’t too bad either.

Filling our Holes

  // Replacing the block above
  size_t next_size;
  size_t increment = (bytes > HEAP_INC) ? round_bytes(bytes) : HEAP_INC;
  size_t new_size = round_bytes(bytes) | 0x1;

  if (segment_free(prev)) {
    ptr = prev;
    prev = prev->prev;
    sbrk(increment);
    size_t old_size = ptr->size;
    ptr->size = new_size + sizeof(heap_seg);
    next_size = (old_size + increment) - segment_size(ptr);
  } else {
    // Previous chunk is not free
    sbrk(increment + sizeof(heap_seg));
    ptr->size = new_size + sizeof(heap_seg);
    ptr->prev = prev;
    next_size = increment - segment_size(ptr);
  }
  heap_end = sbrk(0);
  // Check if the next segment should be initialized
  if (next_size <= sizeof(heap_seg)) {
    ptr->size += next_size;
  } else if (next_size > 0) {
    heap_seg *next_block = (void *)ptr + segment_size(ptr);
    next_block->size = next_size;
    next_block->prev = ptr;
  }
  // Return user writable address
  return ptr + 1;
  // return NULL;
}

Now you can see we first check if the previous segment is free. This is easy because we have already scanned through our heap, so we know our prev pointer is on the last segment on our heap. If it is free we will slide up to fill it in, and then allocate the remaining needed space at the end. From there both paths follow the same steps to determine if we need to initialize the following block. Finally, we return a pointer to the user’s block just like before.

Now we can see how our heap deals with the situation we saw before.

big num has the value: 64

Segment size:   25, Prev:          (nil), Value:                   4, Address: 0x5620c36cb000
Segment size:   25, Prev: 0x5620c36cb000, Value:                   7, Address: 0x5620c36cb018
Segment size:   89, Prev: 0x5620c36cb018, Value:                  66, Address: 0x5620c36cb030
Segment size:   81, Prev: 0x5620c36cb030, Value:                  64, Address: 0x5620c36cb088
Segment size:  120, Prev: 0x5620c36cb088, Value:                  -1, Address: 0x5620c36cb0d8
Segment size:   49, Prev: 0x5620c36cb0d8, Value: 2336652897525655906, Address: 0x5620c36cb150
Segment size: 1041, Prev: 0x5620c36cb150, Value: 2338615488418374995, Address: 0x5620c36cb180
Segment size:  624, Prev: 0x5620c36cb180, Value:                  -1, Address: 0x5620c36cb590

There we go! The blank space is at the end of our heap, which will help us ensure the heap is being used efficiently.

At this point we have a fully featured memory allocator, it is not the fastest, or the most efficient, but it is correct. The most important part of a memory allocator is that it is correct, and will never hand out a memory address that is invalid.

Upgrades

We can stop here if we really wanted to. An entire general purpose memory allocator in ~130 lines of code, but we discussed a few details of our implementation earlier that we should address now.

First Fit vs Best Fit

first-fit

Just like our algorithm works, we find the first available block, and place our memory allocation there. The issue with this strategy is that we end up chunking our large blocks into smaller blocks. This leaves less large blocks, which will cause us to expand the heap next time we need to handle a large allocation request.

We should instead try to preserve as many large blocks as we can. This algorithm will run slower, because it causes us to scan the entire heap for every allocation. Of course there is always a tradeoff when it comes to programming, here we will see how trading a small amount of time for a more efficient use of space will be worth it.

best-fit

For this algorithm to work we will have to change the way we find a segment. The linear walking of our heap can stay the same, but we will have to maintain a pointer to the smallest block large enough to hold the allocation. If we get the end and have a block, we use it to hold our allocation, otherwise we continue on like we had done before.

Our first-fit algorithm has a higher likelihood of segmenting the heap, and we would like to avoid this as much as possible.

Segment size:   33, Prev:          (nil), Value:                   9, Address: 0x5600eb883000
Segment size:  232, Prev: 0x5600eb883000, Value:                  -1, Address: 0x5600eb883020
Segment size:  265, Prev: 0x5600eb883000, Value:                   7, Address: 0x5600eb883108
Segment size:  241, Prev: 0x5600eb883108, Value:                   2, Address: 0x5600eb883210
Segment size:   24, Prev: 0x5600eb883210, Value:                  -1, Address: 0x5600eb883300
Segment size:  265, Prev: 0x5600eb883300, Value:                  64, Address: 0x5600eb883318

If we now request a 8 byte allocation, can you see where it will be placed?

Segment size:   33, Prev:          (nil), Value:                   9, Address: 0x55a298d57000
Segment size:   25, Prev: 0x55a298d57000, Value:                  24, Address: 0x55a298d57020
Segment size:  208, Prev: 0x55a298d57020, Value:                  -1, Address: 0x55a298d57038
Segment size:  265, Prev: 0x55a298d57000, Value:                   7, Address: 0x55a298d57108
Segment size:  241, Prev: 0x55a298d57108, Value:                   2, Address: 0x55a298d57210
Segment size:   24, Prev: 0x55a298d57210, Value:                  -1, Address: 0x55a298d57300
Segment size:  265, Prev: 0x55a298d57300, Value:                  64, Address: 0x55a298d57318

Just like we said before, instead of finding the free 24 byte block (8 byte allocation, 16 byte header), we end up splitting the larger 232 byte block. How can we fix this in our malloc() function?

  // Start from heap base, find first open segment that can hold 'bytes'
  heap_seg *prev = NULL;
  heap_seg *ptr = heap_base;

  heap_seg *best_prev = NULL;
  heap_seg *best = NULL;
  while (ptr < (heap_seg *)heap_end) {
    if (segment_free(ptr) && ptr->size >= bytes + sizeof(heap_seg) &&
        (!best || segment_size(ptr) < segment_size(best))) {
      best = ptr;
      best_prev = prev;
    }
    prev = ptr;
    ptr = (void *)ptr + segment_size(ptr);
  }

  // Found a suitable block for Best-Fit
  if (best) {
    ptr = best;
    // Store old capacity
    size_t old_cap = ptr->size;
    // Heap segment is free, and big enough
    ptr->size = sizeof(heap_seg) + round_bytes(bytes) | 0x1;
    ptr->prev = prev;
    void *user_block = ptr + 1;
    size_t next_size = old_cap - segment_size(ptr);
    if (next_size <= sizeof(heap_seg)) {
      ptr->size += next_size;
    } else if (old_cap > ptr->size) {
      heap_seg *next_block = (void *)ptr + segment_size(ptr);
      next_block->size = old_cap - segment_size(ptr);
      next_block->prev = ptr;
    }
    return user_block;
  }

This boils down to a slight refactor, with 2 new pointers, and an added condition to our searching algorithm. Instead of stopping at the first free block we check to see if it is big enough, and more importantly, the best-fit we have seen. After scanning through the heap we use the best block if we have one, otherwise we continue on like before.

Comparing to the series of allocations demonstrated above, this new algorithm produces a less segmented heap.

Segment size:   33, Prev:          (nil), Value:                   9, Address: 0x5647b2660000
Segment size:  232, Prev: 0x5647b2660000, Value:                  -1, Address: 0x5647b2660020
Segment size:  265, Prev: 0x5647b2660020, Value:                   7, Address: 0x5647b2660108
Segment size:  241, Prev: 0x5647b2660108, Value:                   2, Address: 0x5647b2660210
Segment size:   25, Prev: 0x5647b2660210, Value:                  24, Address: 0x5647b2660300
Segment size:  265, Prev: 0x5647b2660300, Value:                  64, Address: 0x5647b2660318

Now we can see that our allocation occupies the 24 byte block near the end of the heap, leaving the entire 232 byte block for future allocation that may need more space.

That was a pretty easy change to handle, but you may notice another inefficiency. Every time an allocation is requested we have to start at the beginning of the heap. In these small examples that isn’t an issue, but imagine a heap the is full for the first 1-2 MB, with the first free block sitting at the end. We would spend precious time scanning over blocks that we cannot use. How can we fix this?

Implicit vs Explicit Free Lists

The way our heap is linked together is what is known as an implicit free list, meaning that each node is directly linked together, and we use a check at each step in our list to determine if the segment is free. This works well for a simple implementation, but we are slowed down by traversing a linked list, and calling a function at each step. What if instead we only looked at the free segments.

This is known as an explicit free list, where the free nodes are in their own linked list. This can be implemented with either a regular linked list, or even a circularly linked list. For our best-fit strategy, a regular singly linked list makes the most sense.

We have covered first-fit and best-fit, but there is an additional allocation algorithm called next-fit. Where the searching begins from the last block that was allocated. For this algorithm we might end up starting in the middle of the list, so a circularly linked list would be required.

Each node will now have to hold an additional pointer, adding 8 bytes of overhead. Managing these pointers will be very complicated, and can be very error-prone if we are not careful.

Groundwork

The first step is adding some bookkeeping, keeping track of the first block, and adding the next_free pointer to our struct.

heap_seg *heap_base = NULL;
heap_seg *heap_end = NULL;
heap_seg *first_free = NULL;  // New

typedef struct heap_seg {
  size_t size;
  struct heap_seg *prev;
  struct heap_seg *next_free; // New
} heap_seg;

Then we will update our free() function, as this is the easiest to understand. The function remains most same, we just need to introduce one function to update our new next_free pointers.

void update_back(heap_seg *cur, heap_seg *target, bool free_mode) {
  heap_seg *prev_free = NULL;
  if (cur) {
    prev_free = cur->prev;
  }
  while (prev_free && !segment_free(prev_free) && prev_free >= heap_base) {
    prev_free = prev_free->prev;
  }

  if (prev_free) {
    if (free_mode) {
      target->next_free = prev_free->next_free;
    }
    prev_free->next_free = target;
  } else {
    if (free_mode) {
      target->next_free = first_free;
    }
    first_free = target;
  }
}

This function takes in a current pointer, which is the location from which we will begin searching backwards. Next we take in a target pointer, and finally a boolean that tells the function which mode to use. We can use this to update our linked last so that the links between free elements remains correct.

The basic premise of how this algorithm works is fairly simple, but coming to this conclusion is difficult. It works on the principle that once a segment is in use we can simply skip over it, with the same idea working in reverse when a segment is allocated.

If we have a current node we walk it backwards until we find the previous free node. From there we do one of two things:

If we have a previous free node we know we are somewhere in the middle of the list. Therefore, we make this previous node point to our target. Additionally, if we are in free mode we will want to point ourselves at the next free node in the list (which is the previous nodes next_free).
Otherwise, we know that we are at the front of the list. If we’re in free mode then we set our next to the current first_free, and we always make ourselves the first_free.

You may wonder what the target parameter is, and why it is necessary. This is an abstraction that allows us to use the same update_back function in both allocation and free calls.

When called in an allocation context our target is the segment immediately following our newly allocated segment. If this segment is NULL it will be set as the end of our linked list, otherwise it sits inside the list and pointers will be updated accordingly.

Alternatively, when called in a freeing context, target represents the newly freed segment. In a list of [1, 2, 3] our free call uses 1 as its current, and 2 as its target. An allocation call uses 2 as its current, and 3 as its target. With 2 being the element being acted upon in both cases.

`free`

After merging adjacent segments together we need to move backwards in our last and ensure that the list now includes the current element. This is exactly the same as inserting a node in a linked list.

void free(void *block) {
  if (!block) {
    return;
  }
  heap_seg *header = (heap_seg *)block - 1;
  // Get prev/next pointers
  heap_seg *prev = header->prev;
  heap_seg *next = (void *)header + segment_size(header);
  if (prev > (heap_seg *)heap_base && segment_free(prev)) {
    prev->size += header->size;
    header = prev;
  }
  if (next < (heap_seg *)heap_end && segment_free(next)) {
    header->size += next->size;
    update_back(prev, header, true);
    header->next_free = next->next_free;
    header->size ^= 1;
    return;
  }
  update_back(prev, header, true);
  header->size ^= 1;
}

We are essentially finding the previous free element and telling it to point at the new target, and if it doesn’t exist then we know it is the first free segment.

Another thing to keep in mind is that if we free a block that is surrounded on both sides by in-use segments we need to jump over this block. We handle this in the if next block. After updating links we need to set the current blocks next_free pointer to jump over the in-use block.

When we free the 32 byte block on the heap it merges with the segment before it. Which is exactly what we want our free function to do, but now we have to deal with our explicit free list. The only change is needing to go back and point to the larger block we just made, and pointing ourselves at the next free block.

`malloc`

This is where things start to a lot more complicated in terms of explicit free list. One thing to keep in mind is that a linked list allows us to skip over segments currently in use. So there is no need for us to clean up next_free pointers each time a segment is modified. We will see how this all works in our diagrams and code.

From a basic standpoint, this is how we want malloc to work.

Instead of pointing at our original 256 byte block, we need to traverse backwards and tell the previous free element to point at the new smaller free block. Allocating a new node and updating links is similar to how free works, but we will need to spend a bit more time to get our pointers in the right places, so we can use the same update_back function.

heap_init will also have to change just slightly to ensure our first_free pointer is set.

int heap_init(size_t bytes) {
  heap_base = sbrk(0);
  first_free = heap_base;
  size_t increment =
      (bytes > HEAP_INC) ? sizeof(heap_seg) + round_bytes(bytes) : HEAP_INC;
  if ((sbrk(increment)) == (void *)-1) {
    return -1;
  }
  *heap_base = (heap_seg){increment, NULL, NULL};
  heap_end = sbrk(0);
  return 0;
}

Let’s start by seeing how we can now walk the linked list more efficiently.

void *malloc(size_t bytes) {
  if (!heap_base) {
    heap_init(bytes);
  }
  if (bytes == 0) {
    return NULL;
  }
  // Start from first free, find first open segment that can hold 'bytes'
  heap_seg *ptr = first_free;
  heap_seg *best = NULL;

  while (ptr && ptr < heap_end) {
    if (segment_free(ptr) && ptr->size >= bytes + sizeof(heap_seg) &&
        (!best || segment_size(ptr) < segment_size(best))) {
      best = ptr;
    }
    if (!ptr->next_free) {
      break;
    }
    ptr = ptr->next_free;
  }
  // Continued below ....

Now our searching loop begins at the first free element, and will skip over segments that are in use. This will reduce our searching time on a heap that has high fragmentation, or numerous in-use segments.

One consideration we have to take into mind is that we cannot walk our pointer off the end of the list. If we were to do so we would lose our pointer to the end of the list, so when we encounter this situation we break early.

We know that if we reach the end of the list without finding a best-fit segment we will have to expand our heap in order to accommodate the requested allocation. So it will be ok to leave ptr as the last element, as we will not attempt to put something in its place.

From there we follow the same 1 of 2 paths as before.

  // .... continued from above
  // Found a suitable block for Best-Fit
  if (best) {
    ptr = best;
    // Store old capacity
    size_t old_cap = ptr->size;
    // Heap segment is free, and big enough
    ptr->size = sizeof(heap_seg) + round_bytes(bytes) | 0x1;
    size_t next_size = old_cap - segment_size(ptr);
    if (next_size <= sizeof(heap_seg)) {
      ptr->size += next_size;
      update_back(best, best->next_free, false);
    } else if (old_cap > ptr->size) {
      heap_seg *next_block = (void *)ptr + segment_size(ptr);
      *next_block =
          (heap_seg){old_cap - segment_size(ptr), ptr, ptr->next_free};
      heap_seg *next_next_block = (void *)next_block + segment_size(next_block);
      if (next_next_block < heap_end) {
        next_next_block->prev = next_block;
      }

      // Scan for previous free block, update links
      update_back(best, next_block, false);
    }
    return ptr + 1;
  }
  // continued below ....

Just as we mentioned above, if we find a suitable block we will need to point our linked list to jump around this newly allocated segment. The function accepts the allocated segment, the following block, and false denoting that we are in allocation mode.

  // .... continued from above
  // Ran out of space
  heap_seg *prev = NULL;
  if (!ptr) {
    ptr = heap_base;
    while (ptr < heap_end) {
      prev = ptr;
      ptr = (void *)ptr + segment_size(ptr);
    }
    if (ptr < heap_end) {
      ptr->prev = prev;
    }
  }
  size_t next_size;
  size_t increment = (bytes > HEAP_INC) ? round_bytes(bytes) : HEAP_INC;
  size_t new_size = round_bytes(bytes) | 0x1;

  heap_seg *new_seg = ptr;

  size_t old_size = (new_seg < heap_end) ? new_seg->size : 0;
  if (new_seg < heap_end && new_seg->prev && segment_free(new_seg->prev)) {
    // Previous chunk is free
    new_seg = new_seg->prev;
    if ((sbrk(increment)) == (void *)-1) {
      return NULL;
    }
    new_seg->size = new_size + sizeof(heap_seg);
    next_size = (old_size + increment) - segment_size(new_seg);
  } else {
    // Previous chunk is not free
    size_t cur_size = new_size + sizeof(heap_seg);
    if (round_bytes(bytes) < increment) {
      if ((sbrk(increment)) == (void *)-1) {
        return NULL;
      }
      next_size = old_size + increment - (cur_size ^ 0x1);
    } else {
      if ((sbrk(increment + sizeof(heap_seg))) == (void *)-1) {
        return NULL;
      }
      next_size =
          old_size + increment - (cur_size ^ 0x1) + sizeof(heap_seg);
    }
    new_seg->size = cur_size;
  }
  heap_end = sbrk(0);

  // Check if the next segment should be initialized
  if (next_size <= sizeof(heap_seg)) {
    ptr->size += next_size;
  } else if (next_size > 0) {
    heap_seg *next_block = (void *)new_seg + segment_size(new_seg);
    *next_block = (heap_seg){next_size, new_seg, NULL};

    assert((void *)next_block + segment_size(next_block) == heap_end);
    // Scan for previous free block, update links
    update_back(new_seg, next_block, false);
  }

  // Return user writable address
  return new_seg + 1;
}
// End

One checking is being added here, if we get to this point and our first_free pointer is NULL, we must be out of memory. This also means we don’t have a pointer to the last element in the list, so by walking from the beginning of the list we land on the last element. This allows us to have a valid prev pointer, which sets us up perfectly for the remainder of the function.

The old size is also stored in a variable here, so that it can be added onto the remaining free segment after moving the heap end.

If more memory is needed, we increase our heap size, segment it, and update pointers if needed. The last step is the same as above, we find the previous free element and update our explicit free list.

`calloc()`

If you remember from my previous article on hashmaps, I like using calloc during both the exploration and optimization phase of coding. Of course in performance critical applications every CPU cycle counts, but for most purposes a few extra cycles to clear out memory is more than worth it.

Surprisingly to me, implementing calloc is as simple as a call to malloc, memset to 0, and then returning a pointer. The more interesting thing to me about calloc are the parameters it takes. The glibc calloc takes in the number of elements, and he size of each element, both being a size_t. This makes calloc easier to read in source code, and makes your intentions for the size of the allocation clearer.

void *calloc(size_t count, size_t size) {
  void *ptr = malloc(count * size);
  if (!ptr) {
    return NULL;
  }
  memset(ptr, 0, count * size);
  return ptr;
}

Just remember to include <string.h> for the memset function, and you should be good to go.

`realloc()`

I don’t get to use realloc all that often, but when you find a good use case it always comes out super clean. It shouldn’t be too hard to implement, just take a look at the behavior described in man realloc and you will be good to go.

void *realloc(void *ptr, size_t size) {
  size_t min = (size < segment_size((heap_seg *)(ptr - 1)))
                   ? size
                   : segment_size((heap_seg *)(ptr - 1)) - sizeof(heap_seg);
  free(ptr);
  if (size == 0) {
    return NULL;
  }
  heap_seg *new_ptr = malloc(size);
  if (!new_ptr) {
    return NULL;
  }
  memmove(new_ptr, ptr, min);
  return new_ptr;
}

The first step is to calculate how many bytes will need to be moved. We take the smallest of the 2 values, and store that for later. We can then free the original pointer, and stop here if the user requested a reallocation size of 0 (the same as a free call). A new block is allocated to hold the requested size, and we copy over the original data, finally returning the pointer to the user.

Testing

We are nearly there! Time to see if we can spot any bugs while testing our code. I would recommend using standard library functions that perform allocations, using custom data structures, and performing manual allocations. Of course, it is nearly impossible to catch every possible bug, and test every edge case.

Hashmap Testing

For this case I am going to use the previously implemented hashmap for testing.

The hashmap starts out with a default size of 11, and target load factor of 0.5. I made sure to insert enough elements to ensure that it would resize at least once.

glibc `malloc`

Index:   0, Key:            EMPTY, Value:            EMPTY
Index:   1, Key:           Bailey, Value:               92
Index:   2, Key:            Dylan, Value:               17
Index:   3, Key:            Billy, Value:               25
Index:   4, Key:            EMPTY, Value:            EMPTY
Index:   5, Key:          Jackson, Value:               23
Index:   6, Key:            EMPTY, Value:            EMPTY
Index:   7, Key:            EMPTY, Value:            EMPTY
Index:   8, Key:            EMPTY, Value:            EMPTY
Index:   9, Key:            EMPTY, Value:            EMPTY
Index:  10, Key:             Josh, Value:               11
Index:  11, Key:            EMPTY, Value:            EMPTY
Index:  12, Key:            Steve, Value:               36

Our `malloc`

Index:   0, Key:            EMPTY, Value:            EMPTY
Index:   1, Key:           Bailey, Value:               92
Index:   2, Key:            Dylan, Value:               17
Index:   3, Key:            Billy, Value:               25
Index:   4, Key:            EMPTY, Value:            EMPTY
Index:   5, Key:          Jackson, Value:               23
Index:   6, Key:            EMPTY, Value:            EMPTY
Index:   7, Key:            EMPTY, Value:            EMPTY
Index:   8, Key:            EMPTY, Value:            EMPTY
Index:   9, Key:            EMPTY, Value:            EMPTY
Index:  10, Key:             Josh, Value:               11
Index:  11, Key:            EMPTY, Value:            EMPTY
Index:  12, Key:            Steve, Value:               36

Performance Testing

To test the performance of our hashmap we will perform a linear set of increase size allocations. Without freeing any of the memory allocated is essentially the worst case for our implementation. The biggest weak point of our malloc function is that when it runs out of memory it has to walk the entire heap before it can expand it. This set of allocations will test to see how slow this is.

#include <stdio.h>
#include <time.h>
#include <stdlib.h>         // Swap these 2
/* #include "gallocc.h" */  // Swap these 2

int main() {
  struct timespec start, end;
  clock_gettime(CLOCK_MONOTONIC, &start);
  void *ptr;
  for (int i = 0; i < 64000; i++) {
    ptr = malloc(i);
  }
  clock_gettime(CLOCK_MONOTONIC, &end);

  double elapsed_time = (end.tv_sec - start.tv_sec) * 1000.0 +
    (end.tv_nsec - start.tv_nsec) / 1e6;
  printf("Time: %.3f milliseconds\n", elapsed_time);
}

glibc `malloc`

13192 System calls
67ms

Our `malloc`

15247 System calls
1435ms

Analysis

Our implementation is much slower, partially explained by performing more system calls, and the walking of the list each time we run out of space. If we were to fix any bottleneck, that would be the first place to look. Another way that could be mitigated would be increased the size by which the heap increases each time.

For this speed test I used strace to see how much the glibc malloc was increasing the program break by, which turned out to be 135168 bytes. If we run our version with the original 1024 byte increment it will be much slower, so I decided to match the glibc version. Of course, this is just a trade-off of how much memory you are willing to have in-use at one time.

My goal was not to create the fastest memory allocator possible, but to use this is an opportunity to learn how they work.

Learning Moments

Now that we’ve come to the end I’d like to talk about what I learned during the writing of this article.

Using gdb

I have used gdb from a surface level before, but never to work on complex programs like this. Most of my usage would follow the same pattern, segmentation fault, compile with -g, run in gdb and get a back trace. From there it was usually easy to understand what happened. That was entirely different while writing my memory allocator.

Before overwriting the glibc malloc it was easy to debug using print statements like I was used to. However, this came at the downside of not finding as many bugs with my code. You can only think of so many edge cases before you start running out of ideas. Once we overrode the malloc the bugs were instantly exposed, but this came at the downside of no longer being able to use print debugging or asserts.

Internally the printing functions and assert allocate buffers to do their scratch work, which of course wouldn’t work if our malloc was failing. Time to break out gdb then. A small portion of the bugs could be solved by simply printing a back trace, while most others required stepping through the program. The TUI included with gdb made viewing source code and printing the value of different variable a lot easier.

Overall, I think this is a great style of project for learning gdb, with many complex operations, and the inability to use print debugging.

C Global State

We are often told to avoid global state when writing our programs, and I would generally agree when talking about application style programs. However, when it comes to low level or systems programming it is unavoidable in most situations.

In order for malloc to have as easy of a user interface as it currently does, it has to store some global state. Users expect to be able to call malloc and free anywhere in their code without having to first instantiate some sort of memory allocator object. You already know this, but malloc is the exact opposite of a pure function, it is entirely dependent on the state of the operating system below it, and the other processes running on the system.

Once memory has been allocated there is no easy way to get the start of the program’s heap, so we have to store that in a variable that will be accessible to malloc and its related function. Similarly, there is no system call to get the first free segment in our heap, because that is a concept we created ourselves, not an operating system primitive. So we also store this in a global variable, which allowed us to implement our explicit free list, greatly speeding up calls to malloc.

The last global variable is not entirely necessary, but it helps us avoid making a system call each time we need to reference the end of our heap. Of course, we can always call sbrk(0) to get the end of the heap, but system calls are not free, so we try to avoid making them.

Tradeoffs

We could have avoided tracking the end of our heap by introducing another pointer to our heap_seg struct. Instead of just using the size of each segment to traverse the list, we could have an explicit next pointer. This was our last node could just point to NULL

This would be an optimization we could look at in the future. I avoided doing it here because our heap_seg struct was already 24 bytes, which I feel is a large overhead on each allocation. The glibc version of malloc uses headers of 16 bytes (I am probably wrong here), so for a future rewrite we might consider using one of the other 2 bits we have free at the bottom of the size to store information about being at the end of the list.

There is also the possibility of having a header and a footer, which further increases overhead, but can make the implementation much easier.

Linked Lists

I am personally not the biggest fan of linked lists, but they have their place in a library like this. Arrays are great, but they’re entirely inflexible, both in total size, and size per element. In this case we need our implementation to be adaptable for changes in both, so a linked list is required.

In fact our writing our memory allocator is really just writing a linked list on top of a giant slab of memory. Most of the challenges we solved are the same challenges you encounter working on a linked list, complicated by the fact we want to try to increase the heap by a fixed size, as opposed to calling sbrk for each allocation request.

We can even solve the biggest bottleneck in our implementation by using a concept from most doubly linked lists. In addition to tracking the end of the heap we could also track the last allocated segment. Then when first_free is set to NULL the algorithm would not need to traverse the heap just to get a pointer to this last element. Each time we performed an operation on a segment we would just need to check if its address plus its size gets us past the end of the list.

Man Pages

In addition to using a LSP like clangd to get function definitions and doc comments, man pages are great for learning how to use the functions available in C. sbrk is not the easiest function to understand, but once you read up on how it works you can start by using it in a test program. Most of the printing functions have similar names, so looking up their differences can help you choose the right one.

If you’re unfamiliar with any system call you can look up its man page, which often includes example usage to help get you started.

Assert

This is my first time using asserts throughout a program as mini test cases that need to pass to ensure the implementation is correct. Every time you discover a bug in your code add an assert to ensure that it doesn’t happen again.

The most obvious one I can think of in our memory allocator is ensuring that the address of the last segment plus its size should equal the heap_end pointer. Throughout the coding of this memory allocator I ran into a nasty bug where the last segment was being left off of the heap. It turned out that when calling sbrk I forgot to note the size of the previous element if it was free. This size needed to be added onto the end of our list. The assert now ensures that if this happens the program will exit.

Another helpful thing with asserts is that they can easily be disabled with one like of code.

#define NDEBUG
#include <assert.h>

The NDEBUG line makes all assert lines a nop.

Conclusion

That’s it! A memory allocator in ~250 lines of code. This is definitely a project I will continue to improve upon when I get the time to do so. I find this level of programming fascinating, and it gives me a greater understanding of low level concepts.

Don’t be afraid to reinvent the wheel, even if your wheel ends up being far from perfect. Before writing this memory allocator I had zero knowledge of how they worked, and that made me discover the pitfalls that would be skipped over if I had just looked at completed source code. This video was my introduction to the concept, and even then I strayed very far from the basics laid out in the video.

Just like with the hashmap we worked on last time, I knew how it needed to work in the end, and this guided my implementation. There were many times in writing this where I found myself avoiding a certain implementation because it felt “wrong”. Guess what? In the end those are the exact implementations I ended up landing on. You won’t know if something is the “right way” to do it until you try.

Don’t stop yourself from solving a problem because “there has to be an easier/better way to do it”. Solve the problem first, then optimize and clean up your code. The update_back function is a perfect example of this. I had that exact behavior in 3 different spots of the code. As soon as I saw this I tried to extract it into a function, which failed, because the behavior was not nailed down. This stalled my progress because I hated seeing the same behavior in 3 places. Duplication is bad, but extracting behavior into a function without understanding the desired behavior is worse.

So just start writing the code. You will discover the pitfalls, and then work around it. Write it once, throw it away, and write it better with what you’ve learned.

Memory Allocator

Introduction

Headers

Exploration

Initialization

Allocation

Freeing

Example Usage

Takeaways

sbrk()

Adding sbrk()

Heap Initialization

malloc() with sbrk()

Observing Behavior

Filling our Holes

Upgrades

First Fit vs Best Fit

Implicit vs Explicit Free Lists

Groundwork

free

malloc

calloc()

realloc()

Testing

Hashmap Testing

glibc malloc

Our malloc

Performance Testing

glibc malloc

Our malloc

Analysis

Learning Moments

Using gdb

C Global State

Tradeoffs

Linked Lists

Man Pages

Assert

Conclusion

`malloc()` with `sbrk()`

`free`

`malloc`

`calloc()`

`realloc()`

glibc `malloc`

Our `malloc`

glibc `malloc`

Our `malloc`