13 February 2008

Memory allocation 101

A brief summary of my lectures on memory allocation. The students are in their second year and have little to no exposure to Linux and C programming.

An operating system is a program that (1) adds a layer of abstraction on top of the hardware, (2) manages hardware resources, and (3) solves some problems that otherwise would be quite repetitive and boring for programmers. These three different points of view describe the same thing. They are also rather abstract, with not enough technical “meat” to be remembered.

So let's look at one resource in particular, the memory. We shall discuss two different problems, generally referred to by the names memory allocation and virtual memory. (Granted, the latter sounds like the name of a solution, not of a problem.) Let us look at memory allocation.



RAM (random access memory) is one big array of bytes. Imagine that you write a program for a machine that does not have an operating system. Then you would have to think about where in this big array you would place each of your data structures, and you must be careful to keep them disjoint. When programming, people think in terms of data structures—“I need a queue for the requests,” “I'll model friendship with a graph,” “each person will be a graph node,” and so on. The program should correspond to how programmers think. They should be able to say “give me some space to hold one graph node,” if they precisely defined what they mean by `graph node'. They shouldn't be talking about addresses at all in order to accomplish such a simple task.

So that's the first problem: RAMs don't work the way programmers think. It's the operating system's job to make the memory look like it speaks a higher-level language.

The POSIX standard for operating systems says that a programmer should be able to say Node* n = malloc(sizeof(Node)) to get space for holding a Node, and that later on he must say free(n) if that space is to be made available for other stuff. That's pretty good. (Better is possible. Programmers generally don't think about releasing memory so they should not be forced to say free(n). Unfortunately, no widely-used operating system is able to detect when the programmer needs a data structure no more.)

(1) Saying “give me memory to hold a node” is more abstract than keeping track of specific memory addresses. (2) To let programmers use the more abstract language, the operating system must organize (manage) the memory. (3) The problem of keeping track which space is free and which is in use is solved once in the operating system, instead of once in each application.

Now that we know the problem, how do we solve it? As with any problem, it helps to make details explicit. We have at our disposal a big array and there is a sequence of requests to allocate or deallocate memory that we must honor. A request x=allocate(N) means “please tell me where in memory it is safe for me to store N bytes.” A request free(x) means “OK, I'm done with that memory you told me that is safe to use a while ago.” Obviously, if the operating system answers the same for two consecutive allocation requests, havoc will reign—applications would try to use the same space for distinct data structures. So the operating system must keep track of which areas are free and which areas are in use. When someone asks for N bytes we deliver those from a free area and then mark the N bytes as being a used area. When someone says that an area is no longer needed we mark it as free.

We see now two subproblems: (a) How do we keep track of what is free and what is in use? and (b) How do we choose which free area to allocate? In other words we must choose the data structures and the algorithm. Before we do, let's stop and think what we want to achieve—what makes a solution `good'? One thing that comes to mind is speed. We expect allocation requests to be somewhat frequent since all programs need some sort of data structures. It would be bad if a call to malloc takes forever. Another, more important issue, is that we want to not waste memory. It's easy to see how that might happen. If we have lots of free areas but all of them are just slightly smaller than N then we can't honor an allocation request although plenty of memory is available. Intuitively, a free area should either be very small (so it doesn't waste too much space) or big enough to honor allocation requests. So that's the main objective: not to waste memory. We say that we want to keep fragmentation low.

We discussed two data structures for keeping track of the free space. The simplest is to chain the free areas in a linked list. It is common to force such lists to obey the following invariants:

  • p<pnext or pnext
  • p + psizepnext

Here p is a pointer to a free block, and Λ is a special address called `null'. The first invariant says that blocks are chained in increasing order of their addresses; the second invariant adds that there are no adjacent free blocks. Each free block has two fields, next and size; each used block has one field, size. To make things easier later we use a sentinel, a dummy free block of minimal size that starts at address 0 and that is never allocated. In the figure below, green means free, red means in use, and yellow is the sentinel. (In the figure pnext is written as next(p). You should be familiar with both notations.)


Figure 1: Linked list of free blocks

Allocation is done as in figure 2. The pseudocode implements the first-fit policy: Always pick the free block at the smallest address. This policy is one way to implement the following strategy: “Do not let long lived small used blocks prevent the allocation of bigger blocks.” Why? Because under this scheme small blocks will cluster at small addresses. Before continuing, please take a moment to read and re-read the pseudocode until it sinks in.


First-Fit-Alloc(N)
 p:=FIRST
 WHILE p->next != NULL and p->next->size < N+1
   DO p:=p->next
 IF p->next=NULL THEN SIGNAL OVERFLOW
 r:=p->next+1  /* skip the size */
 e:=p->next->size-(N+1)
 IF e<2
   THEN p->next:=p->next->next
   ELSE
     q:=p->next+N+1
     q->next:=p->next->next
     q->size:=e
     p->next->size:=N+1
     p->next:=q
 RETURN r

Figure 2: First-fit allocation

Deallocation is done as in figure 3.


Ordered-Address-Dealloc(q)
 q:=q-1 /* go back to the size */
 p:=FIRST
 WHILE p->next != NULL and p->next < q
   DO p:=p->next
 Glue(q,p->next)
 Glue(p,q)

Glue(p,q)
 IF p+p->size=q 
   THEN p->size:=p+q->size
        p->next:=q->next
   ELSE p->next:=q

Figure 3: Deallocation

Here are a few exercises before you continue.

Exercise 1. Can the two calls to Glue be reversed?

Exercise 2. What modifications would you make to the first-fit allocation pseudocode in order to implement the best-fit policy? The best-fit policy says that you should choose the smallest free block that is big enough.

Exercise 3. Modify the deallocation procedure so that it works in constant time. Hint: use extra bookkeeping information.

Exercise 4. Suppose the memory has 10 words and the following requests are made: A=allocate(3); B=allocate(4); C=allocate(2); deallocate(B); D=allocate(1). What is the size of the biggest available block if first-fit was used? Best-fit? How much space is lost to internal fragmentation if the buddy system is used? (Ignore any space taken by bookkeeping information.)

Finally, another way of keeping track of the free blocks is to have a few lists, one for each size. This way we'll search through a smaller set when trying to find a block that is big enough. Because there are many possible block sizes we'll restrict our attention to those that are power-of-two. So if we have 2n words in memory we would have an array of n+1 linked lists, for sizes 20,21,⋯,2n. This scheme is called the buddy system. It involves maintaining the following invariant:

  • psize=2k, for some k
  • pmod(psize) = 0

Whenever two adjacent blocks can be glued while maintaining the above invariants they are glued.

Exercise 5. Write down the pseudocode for allocation and deallocation under the buddy system.

Exercise 6. Why did you use doubly-linked lists in the answer to the previous exercise?

In the buddy system sometimes we allocate blocks bigger that we need. For example, instead of allocating 13 words we would allocate 16. This wasted space is called internal fragmentation; the space wasted because the holes are too small is called external fragmentation. How much internal fragmentation do we expect to have in a system that only allocates blocks with size power-of-two? Let's define fragmentation as being f=U/N−1, where U is how much space is used and N is how much it is needed. Let's make the following assumptions:

  • we allocate blocks with sizes 1,2,3,⋯,2n, with no preference for any of the sizes
  • there is no external fragmentation; that is, if w words are requested then 2⌈lgw are allocated
  • 2n is large

Under these assumptions we have:

N=1+2+⋯+2n=2n(2n+1)/2≈22n−1
U=1+20×21+⋯+2n−1×2n≈22n+1/3
f≈1/3

Exercise 7. Fill in the holes in the previous derivation.

It's time to summarize. The operating system offers programmers an interface to memory that matches the way programmers think. In order to do so it has to keep track of used and unused memory and make placement decisions that minimize wasted space. It also has to make those decisions fast. We reviewed briefly three possible mechanisms for doing so. None of them is very good. In practice there are lots of tricks used to lower fragmentation and to increase speed, but no fundamentally different approach is known.



Further reading. You should read at least the introduction of Wilson, Johnstone, Neely, Boles, Dynamic Storage Allocation: A Survey and Critical Review because it is a very good exposition of the high-level problem. (Also, you might want to check out Knuth, Fundamental Algorithms, which inspired much of this material.)

The next episode is about virtual memory. Stay tuned.

No comments:

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.