forked from zedz/lcthw-cn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ex17.tex
207 lines (181 loc) · 11.2 KB
/
ex17.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
\chapter{Exercise 17: Heap And Stack Memory Allocation}
In this exercise you're going to make a big leap in difficulty and create
an entire small program to manage a database. This database isn't very
efficient and doesn't store very much, but it does demonstrate most of
what you've learned so far. It also introduces memory allocation more
formally and gets you started working with files. We use some file I/O
functions, but I won't be explaining them too well so you can try to
figure them out first.
As usual, type this whole program in and get it working, then we'll discuss:
\begin{code}{ex17.c}
<< d['code/ex17.c|pyg|l'] >>
\end{code}
In this program I am using a set of structures to create a simple
database for an address book. In it I'm using some things you've never
seen, so you should go through it line-by-line, explain what each line
does, and look up any functions you do not recognize. There are few
key things I'm doing that you should pay attention to as well:
\begin{description}
\item [\#define for constants] I use another part of the "C Pre-Processor" to
create constant settings of \ident{MAX\_DATA} and \ident{MAX\_ROWS}. I'll
cover more of what the CPP does, but this is a way to create a constant
that will work reliably. There's other ways but they don't apply in
certain situations.
\item [Fixed Sized Structs] The \ident{Address} struct then uses these
constants to create a piece of data that is fixed in size making it
less efficient, but easier to store and read. The \ident{Database} struct
is then also fixed size because it is a fixed length array of \ident{Address}
structs. That lets you write the whole thing to disk in one move later on.
\item [die function to abort with an error] In a small program like this you
can make a single function that kills the program with an error if there's
anything wrong. I call this \func{die}, and it's used after any failed
function calls or bad inputs to exit with an error using \func{exit}.
\item [errno and perror() for error reporting] When you have an error return
from a function, it will usually set an "external" variable called \ident{errno}
to say exactly what error happened. These are just numbers, so you can use
\func{perror} to "print the error message".
\item [FILE functions] I'm using all new functions like \func{fopen}, \ident{fread},
\ident{fclose}, and \ident{rewind} to work with files. Each of these functions
works on a \ident{FILE} struct that's just like your structs, but it's defined
by the C standard library.
\item [nested struct pointers] There's use of nested structures and getting the
address of array elements that you should study. Specifically code like
\verb|&conn->db->rows[i]| which reads "get the \ident{i} element of \ident{rows}, which is
in \ident{db}, which is in \ident{conn}, then get the address of (\ident{\&}) it".
\item [copying struct prototypes] best shown in \func{Database\_delete}, you can
see I'm using a temporary local \ident{Address}, initializing its \ident{id}
and \ident{set} fields, and then simply copying it into the \ident{rows} array by
assigning it to the element I want. This trick makes sure that all fields
but \ident{set} and \ident{id} are initialized to 0s and is actually
easier to write. Incidentally, you shouldn't be using \func{memcpy} to do
these kinds of struct copying operations. Modern C allows you to simply assign
one struct to another and it'll handle the copying for you.
\item [processing complex arguments] I'm doing some more complex argument parsing,
but this isn't really the best way to do it. We'll get into better option
parsing later in the book.
\item [converting strings to ints] I use the \func{atoi} function to take the
string for the id on the command line and convert it to the \ident{int id}
variable. Read up on this function and similar ones.
\item [allocating large data on the "heap"] The whole point of this program is
that I'm using \func{malloc} to ask the OS for a large amount of memory
to work with when I create the \ident{Database}. I cover this in more
detail below.
\item [NULL is 0 so boolean works] In many of the checks I'm testing that a
pointer is not NULL by simply doing \verb|if(!ptr) die("fail!")| this is
valid because NULL will evaluate to false. You could
be explicit and say \verb|if(ptr == NULL) die("fail!")| as well.\footnote{On some rare systems NULL will
be stored in the computer (represented) as something not 0, but the C standard says you should still be able to write
code as if it has a 0 value. From now on when I say "NULL is 0" I mean its value for anyone
who is overly pedantic.}
\end{description}
\section{What You Should See}
You should spend as much time as you can testing that it works, and running
it with \program{Valgrind} to confirm you've got all the memory usage
right. Here's a session of me testing it normally and then using
\program{Valgrind} to check the operations:
\begin{code}{ex17 output}
\begin{lstlisting}
<< d['code/ex17.out'] >>
\end{lstlisting}
\end{code}
The actual output of \program{Valgrind} is taken out since you should
be able to detect it.
\begin{aside}{OSX Valgrind "Leaks"}
\program{Valgrind} will report that you're leaking small blocks of memory,
but sometimes it's just over-reporting from OSX's internal APIs. If you see it
showing leaks that aren't inside your code then just ignore them.
\end{aside}
\section{Heap vs. Stack Allocation}
You kids these days have it great. You play with your Ruby or Python and
just make objects and variables without any care for where they live. You
don't care if it's on the "stack", and the heap? Fuggedaboutit. You don't
even know, and you know what, chances are your language of choice doesn't
even put the variables on stack at all. It's all heap, and you don't even
\emph{know} if it is.
C is different because it's using the real CPU's actual machinery to do its
work, and that involves a chunk of ram called the stack and another
called the heap. What's the difference? It all depends on where you
get the storage.
The heap is easier to explain as it's just all the remaining memory in
your computer, and you access it with the function \func{malloc} to
get more. Each time you call \func{malloc}, the OS uses internal
functions to register that piece of memory to you, and then returns
a pointer to it. When you're done with it, you use \func{free} to
return it to the OS so that it can be used by other programs. Failing
to do this will cause your program to "leak" memory, but \program{Valgrind}
will help you track these leaks down.
The stack is a special region of memory that stores temporary variables each
function creates as locals to that function. How it works is each argument to
a function is "pushed" onto the stack, and then used inside the function. It
is really a stack data structure, so the last thing in is the first thing out.
This also happens with all local variables like \ident{char action} and
\ident{int id} in \func{main}. The advantage of using a stack for this is
simply that, when the function exits, the C compiler "pops" these variables off
the stack to clean up. This is simple and prevents memory leaks if the
variable is on the stack.
The easiest way to keep this straight is with this mantra: If you
didn't get it from \func{malloc} or a function that got it from \func{malloc},
then it's on the stack.
There's three primary problems with stacks and heaps to watch for:
\begin{enumerate}
\item If you get a block of memory from \func{malloc}, and have that
pointer on the stack, then when the function exits, the pointer will
get popped off and lost.
\item If you put too much data on the stack (like large structs and arrays)
then you can cause a "stack overflow" and the program will abort. In
this case, use the heap with \func{malloc}.
\item If you take a pointer to something on the stack, and then pass that
or return it from your function, then the function receiving it will
"segmentation fault" (segfault) because the actual data will get
popped off and disappear. You'll be pointing at dead space.
\end{enumerate}
This is why in the program I've created a \func{Database\_open} that allocates
memory or dies, and then a \func{Database\_close} that frees everything. If you
create a "create" function, that makes the whole thing or nothing, and then a
"destroy" function that cleans up everything safely, then it's easier to keep
it all straight.
Finally, when a program exits the OS will clean up all the resources for you,
but sometimes not immediately. A common idiom (and one I use in this
exercise) is to just abort and let the OS clean up on error.
\section{How To Break It}
This program has a lot of places you can break it, so try some of these
but also come up with your own:
\begin{enumerate}
\item The classic way is to remove some of the safety checks such that you can
pass in arbitrary data. For example, if you remove the check on line 160
that prevents you from passing in any record number.
\item You can also try corrupting the data file. Open it in any editor and
change random bytes then close it.
\item You could also find ways to pass bad arguments to the program when it's
run, such as getting the file and action backwards will make it create
a file named after the action, then do an action based on the first
character.
\item There is a bug in this program because of \func{strncpy} being poorly
designed. Go read about \func{strncpy} then try to find out what happens
when the \ident{name} or \ident{address} you give is \emph{greater} than
512 bytes. Fix this by simply forcing the last character to \verb|'\0'|
so that it's always set no matter what (which is what strncpy should do).
\item In the extra credit I have you augment the program to create arbitrary
size databases. Try to see what the biggest database is before you
cause the program to die for lack of memory from \func{malloc}.
\end{enumerate}
\section{Extra Credit}
\begin{enumerate}
\item The \func{die} function needs to be augmented to let you pass the \ident{conn}
variable so it can close it and clean up.
\item Change the code to accept parameters for \ident{MAX\_DATA} and \ident{MAX\_ROWS}, store them in the \ident{Database} struct, and write that to the file, thus creating
a database that can be arbitrarily sized.
\item Add more operations you can do on the database, like \ident{find}.
\item Read about how C does it's struct packing, and then try to see why your
file is the size it is. See if you can calculate a new size after adding
more fields.
\item Add some more fields to the \ident{Address} and make them searchable.
\item Write a shell script that will do your testing automatically for you
by running commands in the right order. Hint: Use \verb|set -e| at the
top of a \program{bash} to make it abort the whole script if any command
has an error.
\item Try reworking the program to use a single global for the database connection.
How does this new version of the program compare to the other one?
\item Go research "stack data structure" and write one in your favorite language,
then try to do it in C.
\end{enumerate}