C代写:COMP2129 Program Structure, String Functions and Compiler Pipeline in C

代写C语言程序,练习String的用法,以及编译器的使用。

Writing and structuring programs

You should keep the following principles in mind as you develop larger and more complex programs:

  1. Choose descriptive names for your variables and functions
    Self documenting code is easier to read and interpret. Code tells you how, comments tell you why.
  2. Don’t repeat yourself
    Refactor common code into functions so you don’t need to repeat yourself many times.
  3. Avoid creating large functions
    Split up large blocks of code into smaller functions. The Unix philosophy comes into play here, you should aim to create small, concise functions that focus on doing one thing and doing that one thing well.
  4. Prefer concise code
    Bigger functions and programs generally take more effort for a human to interpret.
  5. Prefer immutability
    Use const liberally. Delegate as much work to the compiler as possible, let it check invariants for you.
  6. Don’t reinvent the wheel, use the standard library
    Do you know what functions that come with the C standard library? Spend some time looking though the documentation for the C standard library so you don’t end up recreating a function that already exists.
  7. Use widely accepted naming and coding conventions for the language you are working in
    For example i, j, k are typically reserved for looping variables. It is expected that functions that take non const pointers will mutate them, so mark them as const if your function only needs to read access.
  8. Be consistent
    Use a single standard naming and indention convention throughout your entire codebase

Reading list

  • Code Complete by Steve McConnell
  • The Art of Unix Programming by Eric Raymond
  • The Practice of Programming by Brian Kernighan and Rob Pike
  • The C Programming Language (K & R) by Brian Kernighan and Dennis Ritchie
  • Computer Systems - A Programmer’s Perspective by Randal Bryant and David O’Hallaron

String parsing in C

The C standard library provides the following string functions. Remember to compile with -std=c11.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>
char * fgets(char * str, int num, FILE * stream);
int sscanf(const char * str, const char * format, ...);
#include <stdlib.h>
void free(void * ptr);
void * malloc(size_t size);
void * calloc(size_t count, size_t size);
void * realloc(void * ptr, size_t size);
int atoi(const char * s);
long atol(const char * s);
#include <string.h>
size_t strlen(const char * s);
int strcmp(const char * s1, const char * s2);
char * strsep(char ** stringp, const char * delim);
char * strcat(char * restrict s1, const char * restrict s2);
char * strcpy(char * restrict s1, const char * restrict s2);
char * strtok(char * restrict s1, const char * restrict s2);
int memcmp(const void * s1, const void * s2, size_t n);
void * memset(void * b, int c, size_t len);
void * memmove(void * dst, const void * src, size_t len);
void * memcpy(void * restrict dst, const void * restrict src, size_t n);

The GNU extensions provide the some additional functions. Remember to compile with -std=gnu11.

1
2
3
4
5
6
7
ssize_t getline(char ** lineptr, size_t * linecap, FILE * stream);
ssize_t getdelim(char ** lineptr, size_t * linecap, int delim, FILE * stream);
char * strfry(char * string);
char * strdup(const char * s);
char * stpcpy(char * dest, const char * src);
char * strsep(char ** stringp, const char * delim);
int strcasecmp(const char * s1, const char * s2);

Exercise 1: String function implementations

Write your own implementation of the atoi, strlen, strcpy, strtok and strcasecmp functions.
When you have implemented these functions, you can compare your code to the implementations in glibc.

The C compiler pipeline

Let’s explore what the compiler does behind the scenes when we create a more complex program.
Makefile - builds the program

1
2
3
4
5
6
7
8
9
10
11
12
13
14
CC=clang
CFLAGS=-g -std=c11 -Wall -Werror
TARGET=tasks
.PHONY: clean
all: $(TARGET)
clean:
rm -f $(TARGET)
rm -f *.o
list.o: list.c
$(CC) -c $(CFLAGS) $^ -o $@
tasks.o: tasks.c
$(CC) -c $(CFLAGS) $^ -o $@
tasks: tasks.o list.o
$(CC) $(CFLAGS) $(LDFLAGS) $^ -o $@

tasks.c - the scaffold code for the task list application

1
2
3
4
5
6
7
#include <stdio.h>
#include <stdlib.h>
#include "list.h"
int main(void) {
// ...
return 0;
}

list.c - the implementation of the circular linked list

1
2
3
4
5
6
7
#include "list.h"
// Initializes an empty circular linked list.
void list_init(node* head) {
head->next = head;
head->prev = head;
}
// ...

list.h - function prototypes for a circular linked list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#ifndef LIST_H
#define LIST_H
#include <stdbool.h>
typedef struct node node;
struct node {
void * data;
node* next;
node* prev;
};
// Initializes an empty circular linked list.
void list_init(node* head);
// Inserts given node before the head.
void list_push(node* head, node* n);
// Inserts given node after the head.
void list_append(node* head, node* n);
// Removes the given node from the list.
void list_delete(node* n);
// Returns whether the list is empty.
bool list_empty(node* head);
#endif

Running the make creates the object files tasks.o and list.o and then finally the tasks program.

$ make
clang -c -g -std=c11 -Wall -Werror tasks.c -o tasks.o
clang -c -g -std=c11 -Wall -Werror list.c -o list.o
clang -g -std=c11 -Wall -Werror tasks.o list.o -o task

Preprocessor

Your code is first processed through the C preprocessor. This executes all of the preprocessor directives.
You can examine the raw output of the preprocessor by calling it directly:

$ cpp tasks.c

Or by instructing the compiler to only perform the preprocessing step.

$ clang -E tasks.c

This output is very helpful when debugging the problems related to macros and other preprocessor utilities.

Exercise 2: C preprocessor

  1. What does the #include directive do?
  2. What are include guards and when should they be used?
  3. We have seen how the #define directive can be used to create compile time constants.

The #define directive can also be used to create macros.

1
2
3
4
5
#define		PI			3.14
#define NUM 42
#define STR "string"
#define MIN(a,b) ((a < b) ? (a) : (b))
#define MAX_BUFFER 1024

Similar to the #define directives, macros are substituted into their call site in a very similar manner to text search and replace. Why are the extra brackets around a, b and a < b necessary in the macro definition for MIN? For example what happens with MIN(a++, 1))

Code generation and assembly

The -c flag on clang asks the compiler to preprocess the C code, generate assembly and finally assemble the result into an object file. The object files contain machine code - assembly in binary format for the target CPU. We need to create an object file for every translation unit in our source code (every .c file is a translation unit).

You can ask the compiler to stop after assembly generation with the following command:

$ clang -S -g -std=c11 -Wall -Werror list.c

This command produces list.s - the assembly generated from list.c. clang calls the assembler behind the scenes to turn this into machine code for object file.

You can also extract assembly from object files with objdump. Assembly files have two different syntaxes that are equivalent in functionality. objdump defaults to the AT&T syntax but can also output the Intel syntax.

$ clang -c -g -std=c11 -Wall -Werror list.c
$ objdump -M intel -S

Since we have compiled with -g debugging symbols. objdump can annotate the assembly with the source. Remember that compiling with address sanitizer will affect the source code annotation and output of objdump.

Linker

Now we have two compiled object files, one for each translation unit. The linking stage merges these object files together to generate the executable. Behind the scenes, clang calls the ld linker to perform this task.

Since we often need to use variables and functions that are declared in another translation unit, C defines the concept of linkage. The job of the linker is to connect these translation units together.

  1. A variable or function has internal linkage if it is defined in the current translation unit.
  2. A variable or function has external linkage if it is defined in another translation unit.
  3. Any variable or function that is declared static has internal linkage, it is good practice to declare every variable or function as static unless it needs to be accessible from another translation unit.

Exercise 3: Declarations, definitions and linkage

  1. Which of these are declarations and which are definitions?
  2. Classify the linkages in the above declarations as internal or external.
  3. Which definitions are accessible from another translation unit in the above C file?
  4. What happens if the linker can’t find a function that has external linkage?
  5. Header files often contain only declarations. There is nothing stopping us from putting definitions into the header as well. When would this be useful?

Exercise 4: Task list application

Create an interactive task list application from the provided scaffold.

  1. Your application should load tasks.txt from the current directory and present each line as a task.
  2. Your application should prompt for commands (help, new, delete, move, undo) which manipulate the list.
  3. Your application should save the updated task list and exit once it encounters EOF on stdin.
  4. Your application should be able to handle lines of any length.

Note: since C does not have generics, we have edited the linked list to store the void* data type, now you can use it to store any pointer type. However, this means that you now have to do more than one allocation for every element stored in the list, which isn’t very efficient. You can trivially upgrade list.h to the version used in the Linux kernel with some preprocessor tricks to prevent the need for any double allocations.