On the ChatGPT’s security understanding of C programs- Part 2


In the last post, I talked about ChatGPT’s understanding of typical memory corruption bugs, present in C/C++ like memory unsafe languages. In this 2nd part, I would assess ChatGPT’s ability to analyze few properties of the program from vulnerability analysis point of view. Before we go ahead, we should note clearly that even for a human, “developing (writing) a program” (based on the description) and “analyzing a program” are two different tasks. A developer may not be a good analyzer and vice versa. We know by now that ChatGPT like generative models are good at generating text/code based on some description. As we will see, they are not really trained to analyze code.

As usual, when I asked ChatGPT if it understood the topic of program analysis, the answer was ‘yes’ with a very good overview of the topic. In that, it particularly mentioned about data dependency analysis. Such an analysis is very important for vulnerability detection/exploitation as well. So, I explored further by giving it a simple code and asked about the dependent variables.

Fig. 1

As we can see in Fig 1, it could give a pretty decent answer. Under typical analysis, we would start by computing the control-flow graph (CFG) and join at the phi-node after IF-ELSE node meeting. So, I thought of asking if it understand the notion of CFG. Not only it understand the notion of CFG (definition), it could also generate a DOT file for this simple code. But as we can see, this is a pretty straight forward code with only one sentence in each branch (IF and Else). So I gave it a bit larger code which I took from a github repo (ffmpeg). It is when it started showing its (mis)understanding about CFG. In order to give few hints, I asked about the concept of basic blocks. As usual, it could give a very good description of basic blocks and algorithm to compute them. But it could not follow the definition & algorithm that it gave to generate BBs for the bigger code. At one point, I gave up when it started generating BBs which had ‘If’ and ‘else’ statements in a single BB node! but let’s move on..

Fig. 2

For a good dataflow analysis, the understanding of pointers (specially for points-to & aliasing) is important. I gave it a simple code to see if it understands the semantics of memory LOAD/STORE based operations. Fig 2 shows the simple example and ChatGPT’s response which is correct. However, when I provided a bit lengthy code, as shown below, it didn’t provide the right answer in 1st try (Question was what variables the value of d depends on). It considered only the direct (immediate) relationship when deriving dependencies (it said it depends only on ‘a’ and ‘c’, missing ‘b’, (i.e. if A depends on B and B depends on C, it won’t say that A depends on C).  I had to provide hints only then it could provide the full dependency chain. I continued with more complex examples and dataflow related queries, including tainted dataflow. It could also propagate dependencies across function calls (interprocedural). However, it somehow assumed that a return from a function depends on all of it’s arguments (cf. pure function) even though the called function may not use every argument to compute the return value. I think it is a good over-approximation to infer (very conservative approach). But in the end, it was good at getting hints.

Code:

int main(void){
    int *p = NULL;  int *q = NULL;                                                                       
    int a, b, c, d;
    p = &a; q = &b;a= 10;*q = 20;
    c = *p+*q; d = c + *p;
    printf("a = %d, b = %d, c = %d, d = %d\n",a,b,c,d); 
    return 0;
}

Next, I tried to explore ChatGPT’s ability to apply control- and data-flow to vulnerability detection queries. My intention to do so is to check if we can make queries in a simple natural language to find possible bugs in our code. 

Fig. 3

I used a code snippet shown in Fig 3. This code has an uninitialized read bug (value of ‘k’ when i = j). I just gave the code of func(). At 1st try, ChatGPT could not find the bug. Then I asked what should be the scenario if I call func(4,4). It is then it could identify the possible uninitialized read of ‘k’. However, when I gave the whole code as shown in Fig 4, and asked what tainted input will trigger the bug, it got fully confused. This was strange for me as well mainly because it could understand earlier that the equal values of ‘i’ and ‘j’ should trigger the bug! This also supports my earlier observation that ChatGPT looses track of flows across functions. But as we know, interprocedural analysis is anyway harder than intraprocedural one!

I tested with my next (and last) example which is related to heap overflow related bug. We can notice that one of the causes for heap memory corruption bug  is the miscalculation of the size passed to ‘malloc’, resulting in shorter heap memory being allocated than anticipated. This may happen when there is a arithmetic operation for the size parameter of malloc call such that there is a integer overflow related issues (e.g. integer truncation, overflow resulted from storing 32-bit int to 16-bit int, etc.). So, I gave a buggy code and asked ChatGPT if it could find the bug related to the insecure call to malloc. This time, it did a very good job in finding the cause correctly!

To summarize, as a general purpose generative model, ChatGPT does have some understanding of the data- and control-flow and of the queries involving vulnerabilities. It does not know (yet) how to use such analyses for better understanding of the vulnerabilities and their cause/effects. It seems very weak when it comes to calculating flows across function. In my small examples, I could not test the flows where the code involves multiple files (like in a real project). My guess is that with some more specialized training and fine-tuning, LLMs, like ChatGPT, can be useful in assisting analysts and heavy-weight program analysis techniques. Another interesting areas that we can explore is to let ChatGPT describe the buggy code to the developer once we detect it. This should be highly useful to the developer in fixing the code.

So, I would say “interesting days ahead”.  


Leave a Reply

Your email address will not be published. Required fields are marked *