Menu

On Garbage Collection and Memory Management

July 20, 2012 - Programming

This is a old blog post that I seem to have forgotten to publish when I originally wrote it.
-BC_Programming

When you write a computer program, you are providing the computer with instructions on how it should work with values in memory using the Processor to perform a task. managing the allocation and deallocation of blocks of memory is memory management.

With some programming languages, the burden of implementation is on the programmer. For example, in C, most non-trivial applications are going to need to allocate and free blocks of memory. These are done by the now ubiquitous malloc() and free() functions, present in the C standard Library. What makes this interesting is that it breeds a pattern of memory usage, whereby— with the exception of bugs — memory is only allocated when it is being used by the program; the idea is to deallocate memory blocks as soon as they are no longer going to be used. This has an advantage in that it is ideal for reducing the memory footprint of a program, which is essential if that program runs on an embedded device or otherwise needs to keep a close eye on what it’s allocated. However, there is a tiny drawback with this approach- freeing memory, of course, takes processor time away from other tasks. A negligible amount, but it does; particularly when you perform a bunch of free() calls on smaller memory blocks as opposed to freeing the entire thing a single time.

In any case, with this pattern, there was of course a lot of repetition- Every allocation needs a corresponding free- and only one, otherwise you end up with bugs from freeing the same block of memory twice. This bred Design patterns- basically, constructs both within the language as well as via frameworks that made this task more automatic, so that developers could focus on their goal. With the advent of C++ and Object oriented constructs, there came RAII, or “Resource Acquisition Is Initialization”; the basic idea is that resources are allocated in the constructor of objects, and deallocated in the corresponding destructor; since destructors are guaranteed to run when an error occurs (well, with the exception of using balls-up hardware or a power failure or something) this generally guarantees that the memory is reclaimed. But, on the other hand, if an error occurs, the program will probably be left in an inconsistent state, to the point where it ought to be restarted; and once a process exists, all major operating systems will deallocate the process memory. (that is, leaks cannot last past the lifetime of the application leaking), so it’s questionable whether it is all that important. Also, all it really does is move the semantics; now instead of initializing and deallocating variables, you are initializating and deallocating class instances and calling members of those instances that correspond to the values you would have allocated as variables. Generally speaking, the concept is to manage the larger data structures in your application in a fashion that means destructors will deallocate it if anything goes wrong.

Garbage Collection

PCs have, for quite a number of years, had plenty of memory; even “low-end” machines like a Pentium 2 with 256MB of RAM don’t need to have program that use the absolute minimum amount of memory at all times; there is quite a bit of “leeway”. As a result, methods of trying to deal with the allocation and more specifically the deallocation of memory so that there is less worry on the programmers side as to whether they free’d a variable, should  free a variable, or may have forgotten to free a variable, as well as allowing a slight speedup from not forcibly deallocating everything at the first opportunity. The use of a “Garbage Collector” typifies the implementation of this sort of technique.

Contrary to what some might believe, Garbage Collection is not tied to your language; you can use a conservative Garbage Collector in a language like C or C++ if you wanted to. However, typically, there are two things people are arguing against when they argue against the use of a Garbage Collector:

  • the Microsoft .NET CLR (Common Language Runtime)
  • The Java Virtual Machine

Sometimes, they bring up the point to argue against any sort of language that doesn’t leave clean-up in the programmer’s hands, such as Python or even D. These are the two most common, though. Both of these employ a Garbage Collector for memory management. Neither has a clear way to explicitly deallocate a class instance you created; you generally just allow variables and instances to go out of scope and become inaccessible at which point the Garbage Collector will determine it can clear it and will do so.

The arguments “Against” garbage collection generally come in the form of something like “it makes programmers lazy” or “it’s not REAL programming” or “it’s slow/wasteful”.

All of these, however, are false. It doesn’t make programmer’s lazy, programmers are lazy by definition, that’s why we use functions and subroutines rather then duplicating that code all over; that’s why even in C and C++ design patterns have sprung up to make it easier to deal with manual memory management. Having to manage memory manually doesn’t mean a programmer is lazy; and anybody who thinks managing memory manually is somehow “better” is simply stubborn; especially since they are already using predefined routines for allocation and deallocation anyway (malloc()/free()) what they are usually arguing about is deterministic memory usage; with manual memory management (malloc/free) you can practically tell how much memory your program is using by simply keeping track of what you allocate and deallocate. With Garbage collection, you don’t actually deallocate anything, and when that memory is actually deallocated is not deterministic. This results in two things:

People using System.gc() to force the garbage collector to run, and people complaining about the “jumpy” nature that the allocated memory looks like in a graph. One should never run the garbage collector manually unless they have a good reason; running the garbage collector takes time, and that time is almost always better spent doing other things in your application. The Garbage collector will run on it’s own when needed. The second is overanalysis- the overall footprint of memory usage remains the same, the only thing that changes is how much of that memory is actively holding “active” data. eg. A game written in C++, assuming no memory leaks, might appear to only use more memory during some scenes or intense action, then quickly go back down after that scene is over. A Java or C# Game may instead appear to have it’s memory balloon and stay there for some time. This is simply because they are different; C++ programming typically employs the RAII pattern- (Resource Allocation Is Initialization)- which will result in unused memory instantly being deallocated/destroyed. C# and Java’s Managed memory model puts that task in the hands of the Garbage Collector, which cleans up in a generational fashion. Eventually, the memory is freed. The real question is, Do you really need that memory? Worst case scenario is it get’s swapped out due to other memory demands, but by that time the GC will have kicked in. And even if it does get swapped out, deallocation by the GC will not “touch” the page so it won’t actually need to be swapped back in to be deleted unless something needs to be read from it.

Have something to say about this post? Comment!