OS代写:CS213 Writing A Dynamic Storage Allocator

实现malloc, free, realloc, calloc函数的底层,动态内存分配管理的部分。

malloc

Requirement

This lab requires submitting two versions of your code: one as an initial checkpoint, and the second as your final version.

Introduction

In this lab you will write a dynamic memory allocator which will consist of the malloc, free, realloc, and calloc functions. Your goal is to implement an allocator that is correct, efficient, and fast.

We strongly encourage you to start early. The total time you spend designing and debugging can easily eclipse the time you spend coding.

Bugs can be especially pernicious and difficult to track down in an allocator, and you will probably spend a significant amount of time debugging your code. Buggy code will not get any credit.

This lab has been heavily revised from previous versions. Do not rely on advice or information you may find on the Web or from people who have done this lab before. It will most likely be misleading or outright wrong1 . Be sure to read all of the documentation carefully and especially study the baseline implementation we have provided.

Logistics

This is an individual project. You should do this lab on one of the Shark machines.
To get your lab materials, click “Download Handout” on Autolab, enter your Andrew ID, and follow the instructions. Then, clone your repository on a Shark machine by running:

The only file you will turn in is mm.c, which contains your solution. The provided code allows you to evaluate your implementations. Using the command make will generate four driver programs: mdriver, mdriver-dbg, mdriver-emulate, and mdriver-uninit, as described in section 6. Your final autograded score is computed by driver.pl, as described in section 7.1.

To test your code for the checkpoint submission, run mdriver and/or driver.pl with the -C flag. To test your code for the final submission, you don’t need any additional flags.

You can get some idea of your program performance and the resulting value you will get for the throughput portion of your grade. However, the Autolab servers will generate different throughput values, and these will determine your actual score. This is discussed in more detail in Section 7.

Required Functions

Your allocator will implement the following functions, declared in mm.h and defined in mm.c:

1
2
3
4
5
6
bool mm_init(void);
void *malloc(size_t size);
void free(void *ptr);
void *realloc(void *ptr, size_t size);
void *calloc(size_t nmemb, size_t size);
bool mm_checkheap(int);

We provide you two versions of memory allocators:

mm.c: A fully-functional implicit-list allocator. We recommend that you use this code as your starting point.
Note that the provided code does not implement coalesce block(). Although it is a valid memory allocator, it will result in a high amount of external fragmentation, so you should implement coalescing. We strongly recommend considering all cases you need to implement before writing code for this function; the lecture slides should help you identify and reason about these cases.

mm-naive.c: A functional implementation that runs quickly but gets very poor utilization, because it never reuses any blocks of memory.

Your allocator must run correctly on a 64-bit machine. It must support a full 64-bit address space, even though current implementations of x86-64 machines support only a 48-bit address space.

Your submitted mm.c must implement the following functions:

  • bool mm_init(void): Performs any necessary initializations, such as allocating the initial heap area. The return value should be false if there was a problem in performing the initialization, true otherwise. You must reinitialize all of your data structures in this function, because the drivers call your mm init function every time they begin a new trace to reset to an empty heap.
  • void *malloc(size_t size): The malloc routine returns a pointer to an allocated block payload of at least size bytes. The entire allocated block should lie within the heap region and should not overlap with any other allocated block.
    Your malloc implementation must always return 16-byte aligned pointers.
  • void free(void *ptr): The free routine frees the block pointed to by ptr. It returns nothing. This routine is only guaranteed to work when the passed pointer was returned by an earlier call to malloc, calloc, or realloc and has not yet been freed. free(NULL) has no effect.
  • void realloc(void ptr, size_t size): The realloc routine returns a pointer to an allocated region of at least size bytes with the following constraints:
    • if ptr is NULL, the call is equivalent to malloc(size);
    • if size is equal to zero, the call is equivalent to free(ptr) and should return NULL;
    • if ptr is not NULL, the call is equivalent to free(ptr) followed by malloc(size), except the contents of the new block will be the same as those of the old block, up to the minimum of the old and new sizes.
      Your realloc implementation will have only minimal impact on measured throughput or utilization. A correct, simple implementation will suffice.
  • void *calloc(size_t nmemb, size_t size): Allocates memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated memory. The memory is set to zero before returning.
    Your calloc will not be graded on throughput or performance. A correct, simple implementation will suffice.
  • bool mm_checkheap(int line): The mm checkheap function (the heap consistency checker, or simply heap checker) scans the heap and checks it for possible errors.

A quality heap checker is essential for debugging your malloc implementation. Many malloc bugs are too subtle to debug using conventional gdb techniques. A heap consistency checker can help you isolate the specific operation that causes your heap to become inconsistent.

Because of the importance of the consistency checker, it will be graded; section 7.2 describes the requirements for your implementation in greater detail. We may also require you to write your checkheap function before coming to office hours.

The mm checkheap function takes a single integer argument that you can use any way you want. One technique is to use this this argument to pass in the line number where it was called.

This allows you to print the line number where mm checkheap was called, if you detect a problem with the heap. Note, however, that mm checkheap will also be called with an argument of 0 by the driver.

The semantics for malloc, realloc, calloc, and free match those of the corresponding libc routines. Type man malloc in the shell for more documentation.

Support Routines

The memlib.c package simulates the memory system for your dynamic memory allocator. You can invoke the following functions in memlib.c:

  • void *mem_sbrk(intptr_t incr): Expands the heap by incr bytes, where incr is a nonnegative integer, and returns a generic pointer to the first byte of the newly allocated heap area. The semantics are identical to the Unix sbrk function, except that mem_sbrk will fail for negative arguments. (Data type intptr_t is defined to be a signed integer large enough to hold a pointer. On our machines it is 64-bits long.)
  • void *mem_heap_lo(void): Returns a generic pointer to the first valid byte in the heap.
  • void *mem_heap_hi(void): Returns a generic pointer to the last valid byte in the heap.
    Careful: the definition of mem_heap_hi() given here is correct, but it may not be intuitive!
    If your heap is 8 bytes large, then the “last valid byte” will be 7 bytes from the start and its address will not be aligned.
  • size_t mem heapsize(void): Returns the current size of the heap in bytes.
    You are also allowed to use the following libc library functions: memcpy, memset, printf, fprintf, and sprintf. Other than these functions and the support routines, your mm.c code may not call any externally-defined function.

Programming Rules

  • Any allocator that attempts to explicitly detect which trace is running will receive a penalty of 20 points. On the other hand, you should feel free to write an adaptive allocatorone that dynamically tunes itself according to the general characteristics of the different traces.
  • You should not change any of the interfaces in mm.h, and your program must compile with the provided Makefile. However, we strongly encourage you to use static helper functions in mm.c to break up your code into small, easy-to-understand segments.
  • You are not allowed to define any large global data structures such as large arrays, trees, or lists in your mm.c program. However, you are allowed to declare small global arrays, structs and scalar variables such as integers, floats, and pointers in mm.c. In total, your global data should sum to at most 128 bytes. Global values defined with the const qualifier are not counted in the 128 bytes. This will be tested by automated means, as described in Section 7.1.4.
    The reason for this restriction is that the driver cannot account for such global variables in its memory utilization measure. If you need space for large data structures, you can allocate space for them within the heap.
  • The presence of some undefined behavior is inevitable in malloclab, where we perform unsafe memory operations and generally treat memory as a giant playground. For example, we need to do things like cast between pointer types, perform pointer arithmetic, and write to arbitrary locations in memory. Outside of this lab, it is rarely appropriate to write code in this style, but it is necessary to do so here.
    However, we ask you to minimize the amount of undefined behavior by avoiding it when possible. For example, instead of directly casting between pointer types, you should explicitly alias memory through the use of unions. Additionally, you should limit the use of pointer arithmetic to a few short helper functions, as we have tried to do in the handout code.
  • In the provided baseline code, we use a zero-length array to declare a payload element in the block struct. This is a non-standard compiler extension, but we encourage you to use it for this lab as a way to explicitly declare your block payload.
    One important restriction is that a zero-length array must appear as the last element in its containing struct. However, for malloclab, we will also permit the use of a zero-length array as a member of a union, as long as that union is the last element in its containing struct. This will allow you to explicitly alias the payload with other data that you might also want to store in the same memory address. Though this is not portable, we strongly encourage its use compared to the alternative of casting between pointer types.
  • The use of macro definitions (using #define) in your code is restricted to the following:
    • Macros for debugging purposes, whose names must begin with the prefix “dbg”. See, for example, the debugging macros defined in mm.c. You may create your own, but they must be disabled in submitted code.
    • Definitions of constants. These definitions must not have any parameters.

The practice of using macros instead of function definitions is now obsolete. Modern compilers (when optimization is enabled) perform inline substitution of small functions, eliminating any inefficiencies due to the use of functions rather than macros. In addition, functions provide better type checking and enable better debugging support.

When you run make, it will run a program that checks for disallowed macro definitions in your code. This checker is overly strictit cannot determine when a macro definition is embedded in a comment or in some part of the code that has been disabled by conditional-compilation directives. Nonetheless, your code must pass this checker without any warning messages.

  • The code shown in the textbook (Section 9.9.12, and available from the CS:APP website) is a useful source of inspiration for the lab, but it does not meet the required coding standards.
    It does not handle 64-bit allocations, it makes extensive use of macros instead of functions, and it relies heavily on low-level pointer arithmetic. Similarly, the code shown in K&R does not satisfy the coding requirements. You should use the provided code mm.c as your starting point.

  • It is okay to look at any high-level descriptions of algorithms found in the textbook or elsewhere, but it is not acceptable to copy or look at any code of malloc implementations found online or in other sources, except for the allocators described in the textbook, in K&R, or as part of the provided code.

  • It is okay to adapt code for useful generic data structures and algorithms (e.g., linked lists, hash tables, search trees, and priority queues) from Wikipedia and other repositories. (This code must not have already been customized as part of a memory allocator.) You must include (as a comment) an attribution of the origins of any borrowed code.

  • Your allocator must always return pointers that are aligned to 16-byte boundaries. The driver will check this requirement.

  • Your code must compile without warnings.

Driver Programs

Three driver programs are generated when you run make.

  1. mdriver should be used for testing your code’s performance. It checks the correctness, utilization, and throughput of the standard benchmark traces.
  2. mdriver-dbg should be used when debugging your program. This is the same program as mdriver, with two notable differences:
    • (a) This program is compiled with DEBUG defined, which enables the dbg_macros at the top of mm.c. Without this defined, functions like dbg_printf and dbg_assert will not have any effect.
    • (b) Additionally, this program is compiled with optimization level -O0, which allows GDB to display more meaningful debugging information.
  3. mdriver-emulate is used to perform additional correctness checks on your program, which can result in additional deductions described in section 7.1.4.
    It uses special compilation and memory-management techniques to allow testing your program with a heap making use of the full, 64-bit address space. In addition to the standard benchmark traces, it will run a set of giant traces with very large allocation requests.
  4. mdriver-uninit can be used to detect uses of uninitialized memory, using the MemorySanitizer tool. It only raises an error when uninitialized memory is actually used, and can track initialization status down to each bit.

Trace files

The driver programs are controlled by a set of trace files that are included in the traces subdirectory. Each trace file contains a sequence of commands that instruct the driver to call your malloc, realloc, and free routines in some sequence. The drivers and the trace files are the same ones we will use when we grade your handin mm.c file.

The traces that count towards both your memory utilization and throughput are marked with a ‘*‘ in the output of mdriver. Some of the traces are short traces that do not count toward your memory utilization or throughput but are useful for detecting errors and for debugging, which will not be marked with a ‘*‘ next to them.

When the driver programs are run, they will run each trace file multiple times: once to make sure your implementation is correct, once to determine the space utilization, and between 3 and 20 times to determine the throughput.

Command-line arguments

The drivers accept the following command-line arguments. The normal operation is to run it with no arguments, but you may find it useful to use the arguments during development.

  • -C: Apply the scoring standards for the checkpoint, rather than for the final submission.
  • -f tracefile: Use one particular tracefile instead of the default set of tracefiles for testing correctness, utilization, and throughput.
  • -c tracefile: Run a particular tracefile twice, calling mm init in between each run. Unlike -t, this option tests only for correctness. This option is extremely useful if you want to print out debugging messages.
  • -v level: Set the verbosity level to the specified value. Levels 0-2 are supported, with a default level of 1. Raising the verbosity level causes additional diagnostic information to be printed as each trace file is processed. This can be useful during debugging for determining which trace file is causing your malloc package to fail.
  • -d level: Controls the amount of validity checking performed by the driver. This is entirely distinct from the DEBUG compile-time define.

At debug level 0, very little checking is done, which is useful when testing performance only.

At debug level 1, the driver checks the payload contents to ensure that they are not overwritten between calls into your code. This is the default.

At debug level 2, the driver will also call into your implementation of mm checkheap. This mode is slow, but it can help identify the exact point at which an error occurs.

Additional arguments can be listed by running mdriver -h.

Strategic Advice

You must design algorithms and data structures for managing free blocks that achieve the right balance of space utilization and speed. This involves a trade-offit is easy to write a fast allocator by allocating blocks and never freeing them, or a high-utilization allocator that carefully packs blocks as tightly as possible. You must seek to minimize wasted space while also making your program run fast.

As described in the textbook and the lectures, utilization is reduced below 100% due to fragmentation, taking two different forms:

  • External fragmentation: Unused space between allocated blocks or at the end of the heap
  • Internal fragmentation: Space within an allocated block that cannot be used for storing data, because it is required for some of the manager’s data structures (e.g., headers, footers, and free-list pointers), or because extra bytes must be allocated to meet alignment or minimum block size requirements

To reduce external fragmentation, you will want to implement good block placement heuristics. To reduce internal fragmentation, it helps to reduce the storage for your data structures as much as possible.

Maximizing throughput requires making sure your allocator finds a suitable block quickly. This involves constructing more elaborate data structures than is found in the provided code. However, your code need not use any exotic data structures, such as search trees. Our reference implementation only uses singly- and doubly-linked lists.

Here’s a strategy we suggest you follow in meeting the checkpoint and final version requirements:

  • Checkpoint. The provided code already meets the required utilization performance2 , but it has very low throughput. You can achieve the required throughput by converting to an explicit-list allocator, and then converting that into a segregated free list allocator. Both of these changes will not affect your utilization much.
    You will want to experiment with allocation policies. The provided code implements first-fit search. Some allocators attempt best-fit search, but this is difficult to do efficiently. You can find ways to introduce elements of best-fit search into a first-fit allocator, while keeping the amount of search bounded.
    Depending on whether you place newly freed blocks at the beginning or the end of a free list, you can implement either a last-in-first-out (LIFO) or a first-in-first-out (FIFO) queue discipline. You should experiment with both.
  • Final Version. Building on the checkpoint version, you must greatly increase the utilization and keep a high throughput. You must reduce both external and internal fragmentation.
    Reducing external fragmentation requires achieving something closer to best-fit allocation.
    Reducing internal fragmentation requires reducing data structure overhead. There are multiple ways to do this, each with its own challenges. Possible approaches and their associated challenges include:
  • Eliminate footers in allocated blocks. But, you still need to be able to implement coalescing. See the discussion about this optimization on page 852 of the textbook.
  • Decrease the minimum block size. But, you must then manage free blocks that are too small to hold the pointers for a doubly linked free list.
  • Set up special regions of memory for small, fixed-size blocks. But, you will need to manage these and be able to free a block when given only the starting address of its payload.