CGI programming in C


I have recently been thinking about the bchs stack. This web stack is BSD+C+httpd+Sqlite. It sounds like a joke at first but actually makes a lot of sense considering that these four components were once part of the OpenBSD base system. Sqlite has been removed from the OpenBSD base system because it's too big and messy to audit. You might not even need/want sqlite and could easily roll your own null delimited csv.

Why CGI?

GCI enables a webmaster to easily add logic to html documents. CGI is a much preferable alternative to the “just run a python test server as a privileged user behind an nginx reverse proxy” attitude that seems so common in the server side scripting community and the “just do it all in javascript because we’re so cheap we offload as much processing power as we can onto the client” attitude that’s ubiquitous in web development.

PHP is a well known example of doing logic with cgi.

How CGI works

CGI is fairly simple: GET requests come in as environmental variables, POST requests come in through stdin, and anything that is sent to the client is written to stdoud. This is somewhat similar to inetd. Running httpd+cgi on OpenBSD is very easy because both are part of the base system

# rcctl enable slowcgi httpd
# rcctl start slowcgi httpd 
# rcctl check slowcgi httpd 

Caveats to writing CGI scripts in C

  • OpenBSD httpd runs in a chroot to improve security. This means that whatever libraries your program depends on will either need to be:
    1. placed inside of the chroot so they can be dynamically linked
    2. statically linked (easier option)
  • Debugging can be difficult to setup (write a makefile rule and put your test string from stdin into a file)
  • things will behave unexpectedly break if your query string fields are not in the order that your hardcoded parser expects (like in the demo code)
  • people will crash your server if you do not write a check against POST requests with an infinite size (the demo code is susceptible. Do not run it in production)
  • your f_rust_rated friends will have a public meltdown at the idea of putting DANGEROUS AND UNSAFE C PROGRAMS ON THE INTERNET (largely a non-issue on OpenBSD) despite the fact that they run test servers as privileged users behind an nginx reverse proxy in production.

Benefits to writing CGI scripts in C

  • you learn a lot about static linking and the linker in general
  • avoiding shell escapes is easier in C than writing internet touching scripts in sh
  • ego++;

Programming

Source code is available in my bch-demo git repository. This code is fairly simple. index.c is an index page with 2 form elements. One of these forms uses a POST request and the other uses a GET request. The form fields take a hex color code in order to set the background color of the document and the text color of the document. The body text is simply a dump of all the environmental variables which can be useful for debugging.

If you’re not already using OpenBSD, refer to The OpenBSD FAQ. OpenBSD runs very well as a virtual machine on a linux hypervisor with libvirt+kvm if you lack hardware.

Printing a web page

Printing a web page is the easiest part of this exercise: simply write to stdout. You should write your http headers before writing html so that the page displays properly. An example http header and html tag look like this:

puts("Status: 200 OK\r");
puts("Content-Type: text/html\r");
puts("\r");
puts("<h1>It works</h1>");

Processing a GET request

GET requests come in as an environmental variable. We can read these variables like this:

char *req = getenv("QUERY_STRING");
if(req == NULL) return 1;
int reqlen = strlen(req);

It can be useful to create a copy of this query string to prevent mutilating the original string. It is also useful to create a backup pointer to the copy of the query (which we are modifying) so that we can still free the memory if the modification process mutilates pointers.

char *reqcp = malloc(sizeof(char) * (reqlen + 1));
strlcpy(reqcp, req, sizeof(char) * (reqlen + 1));
char *ptr2reqcp = reqcp;    

The remainder of this program involves parsing the query string. This can be application specific and the demo code is not robust.

Processing a POST request

POST requests come in through stdin. When a POST request is sent the CONTENT_LENGTH variable is set and equal to the number of characters in the request string.

unsigned int reqlen = atoi(getenv("CONTENT_LENGTH"));

After getting the length of the string we can safely allocate memory. A copy of the original request and a backup pointer should me made for the same reasons as in the GET example.

char *req = malloc(sizeof(char) * (reqlen + 1));
char *reqcp = malloc(sizeof(char) * (reqlen + 1));
fread(req, reqlen, 1, stdin);
strlcpy(reqcp, req, sizeof(char) * (reqlen + 1));
char *ptr2reqcp = reqcp;

The remainder of this program is identical to the GET program. Only 9 lines differ between the GET and POST programs.

Demo video