Writing

Software, technology, sysadmin war stories, and more. Feed
Friday, July 12, 2013

There's still some room for C in this modern world

I received a question from an anonymous reader last week:

Other than embedded systems or low level systems programming, do you think there is any room for C in today's world? What features of C++ do you use most often?

They also mentioned that some people are proud to be using C instead of C++ and asked why I thought that. There was a mention of a specific project, but I'm sure we can all think of some which work this way. The Linux kernel is a good example of a project with a leader who has very strong opinions about avoiding all things C++.

I think there is room for C, but there are specific situations which call for it and others where it would be reckless to recommend it. The problem is that statement I just made is basically content-free since you could express it for any programming language and would still be spot-on. So, I'll try to go a little deeper with it.

First, some definitions for the sake of clarity. When I talk about C, I'm talking about writing code which specifically does things in the old school "C way" like using certain types of for loops to iterate through char* spaces. Take a look at this, for instance:

#include <stdio.h>
 
int main() {
  char* x = "foo bar";
  char* y;
   
  for (y = x; *y; ++y) {
    printf("%c\n", *y);
  }
 
  return 0;
}

That will in fact print out the characters in "x", one per line. The question is: do you "get" what's happening in that for loop? Obviously, you're initializing the pointer y to x, and you're also bumping it along one spot every time through, but what's this "*y" business?

Old-school C people sometimes use constructs like this. They know it means "keep going until this thing points at a \0". It could also look like this:

  for (y = x; *y != 0; ++y) {

That has the same effect, but it's a little more verbose about what it's up to: testing for the presence of 0 at whatever's under the pointer.

Why might you do this? Well, for one thing, it saves on a call to strlen() which would make it look like this instead:

#include <stdio.h>
#include <string.h>
 
int main() {
  char* x = "foo bar";
  size_t y, sl = strlen(x);
 
  for (y = 0; y < sl; ++y) {
    printf("%c\n", x[y]);
  }
 
  return 0;
}

This one has to include an extra system header to get the prototype for strlen(), and the resulting binary is a wee bit larger by about 200 bytes on my machine.

There's more to this, though. The strlen has to flip through the entire string until it finds that terminating \0. Then the for loop flips through it again. You now have two passes over the same data where previously you had one. This might be a performance issue... maybe.

Of course, there are plenty of horrible ways to do this, too. For instance, you could neglect to cache the result from strlen, and instead call it every time through the loop. Now you're in real trouble.

#include <stdio.h>
#include <string.h>
 
int main() {
  char* x = "foo bar";
  size_t y;
  
  for (y = 0; y < strlen(x); ++y) {
    printf("%c\n", x[y]);
  }
 
  return 0;
}

Argh! It burns! It burns! Please don't do this.

Why not? Well, now you're scanning the entire string at least once for every character in the string, and none of them are telling you anything new. It's wasted effort on the part of your system, and again, could be a performance issue.

Are you going to notice this kind of mistake? Maybe, maybe not. It depends on what your workloads look like. If you have a bunch of inner loops written like this with really horrible blowup factors and they're constantly being hammered by requests, then sure, it'll probably add up.

So now let's look at what happens when we switch over to using C++ and std::string for this kind of task.

#include <stdio.h>
#include <string>
 
using std::string;

int main() { string x = "foo bar"; string::const_iterator y; for (y = x.begin(); y != x.end(); ++y) { printf("%c\n", *y); } return 0; }

We no longer use pointers or indexes and array lookups. This time, our traversal happens by way of an iterator, and there's a bunch of stuff going on behind the scenes to make it actually work.

If you want to get some idea of what really happens inside those libraries, try compiling the above with '-g', then load it into gdb, do 'start' to get it up to main, and then 'step' through it. There's a lot of stuff happening in there!

By way of comparison, the original C example, when run through the debugger in a stepwise fashion, looks like this:

4	  char* x = "foo bar";
(gdb) step
7	  for (y = x; *y; ++y) {
(gdb) step
8	    printf("%c\n", *y);
(gdb) step
f
7	  for (y = x; *y; ++y) {
(gdb) step
8	    printf("%c\n", *y);
(gdb) step
o
[...]

... you get the idea. It's just running lines 7 and 8 over and over until it eventually falls out of the loop and returns. There's stuff going on, but it doesn't involve any library calls. Here's how it ends:

(gdb) step
r
7	  for (y = x; *y; ++y) {
(gdb) step
11	  return 0;
(gdb) step
12	}

At some level, this represents less work to do. The question is: does it matter? Well, that depends on what you're doing. If you have places where such minutiae would add up to a real difference, then maybe you need to perform such micro-optimizations and use a language which allows for it.

Some languages force you to do this low-level fiddling all of the time. Other languages give you a choice between that kind of behavior and some organized higher-level functions and other helpful bits of code which handle some of it for you but might technically be less efficient.

Then there are languages which always force you down the "expensive" route and have no way around it. Even this might not be a problem if you're only doing lightweight operations with it. This kind of stuff would probably be lost in the noise of startup and shutdown if it's some program which lazily sits there and waits to be poked by a user.

On the other hand, if you have something at the core of a program which is getting thousands of queries per second, you probably won't have that kind of luxury. "Buy more servers" starts getting mighty painful at some point!

It's up to you to pick which way to go for any given part of your code. You can also start with the relatively lazy approach and then optimize later after actually finding hot spots.

In summary, I think old-school C behavior should continue to be possible in order to solve specific problems, but I don't think it's always appropriate. In particular, I think there is a danger of using it to "show off" when something simpler (in terms of code) and yet more expensive (in terms of computing resources) would be better.

...

Next, let's think about costs. I can think of a few:

Runtime (latency), CPU time, RAM footprint, disk space, initial developing programmer time, debugging time, developer documentation time, customer support time, maintenance programmer time, security auditor time, security-patching programmer time.

You might be able to afford more of some and less of others.

Maybe you don't care about latency if it's below 5 seconds, and you're running on a big box that's always plugged in, so CPU time doesn't bother you. It has tons of RAM and a huge disk, so those don't matter either. However, you have precious few programmer cycles to spare, and you have to get it out soon or your company will go out of business. That gives you one way through it.

Someone else needs super low latency with as little CPU consumption as possible since they're on a hand-held device which runs from a tiny little battery and are dealing with finger taps. People want to see it move as soon as they touch it. There's not much memory or long-term storage, either, so you can't be a pig about those.

I suspect these two situations have very different answers.

So how do you figure it out? Painful experience. That's how you justify your worth to clients and potential employers.

...

Finally, to answer the last question about "features of C++ I use", I assume this is asking about what sort of non-pure-C type things I do in my C++ code. That's a fair question, and I'll attempt to list some of the more obvious ones which come to mind, with the understanding this might not be a complete list.

I use classes to organize things. They tend to arrange themselves around the logical "seams" in a design with different moving parts.

That said, I don't usually touch multiple inheritance or any other kind of polymorphic whatsits in day to day operations. It just isn't needed.

I use the stream operator "<<" for logging. I picked this up from a former employer and it agrees with me, so I kept using it for my own work.

I use STL containers for storing all kinds of stuff. I'd say that maps and vectors show up the most, but there are a couple of deque, list and set users in my tree at the moment.

I also use std::string quite a bit. It's a lot harder to screw things up when you eliminate a bunch of uses of pointers. Ordinary C string-like behavior with char* means nothing but pointer wrangling, and there are decades of proven badness from failed attempts at doing it safely.

I do use "new" (instead of malloc) but it's usually just to stick something into a "scoped_ptr<>", which again is something I picked up at a prior job. It basically lets you hand a pointer to an object which will hang onto it and will call the right "delete" on it when that object goes out of scope.

This is great for pointers to utility classes, and that in turn lets you do some fun stuff when it comes time to test classes in isolation. For instance, imagine this:

class Foo {
 public:
  Foo();
  virtual ~Foo();
 
 private:
  scoped_ptr<Util> util_;
};

Somewhere in Foo, something will do "util_.reset(new Util)" or similar. That creates an instance of that class and hangs onto the pointer and then whacks it when the Foo goes out of scope and is deleted itself.

Where this gets interesting is when you start talking about testing.

class MockUtil : public Util {
  // mockable versions of functions found in Util go here
};

This MockUtil will "fit" into the scoped_ptr spot for a Util, and Foo will use it without realizing anything has changed. Now you can eliminate actual calls to Util (which might be costly) and create a bunch of interesting situations without having to set up elaborate fake versions of whatever Util is going to touch.

(Actually getting your MockUtil into Foo to replace the usual Util is left as an exercise for the reader.)

Could you do this with a plain old "Util* util_;" pointer? Of course! But now, you'd have to keep track of it and make sure it gets deleted when Foo gets destroyed. Otherwise, you'd leak it.

The important part is that you couldn't easily do this if you had it as "Util util_;" instead. In that case, you have a genuine Util bound up inside of your class and it's not going to accept anything else. You can't just swing it around to another implementation, and that can make it harder to test the calling code.

The "scoped_ptr" I use is not a part of C++ (and so it exists as my own little helper library in my tree), but I understand that something very much like it now exists in "unique_ptr". This assumes that relying on C++11 support is okay for your project, of course!

Regarding pointers and references, it's pretty much like this: if I pass a reference to a function, it's const. I pass const pointers around if the situation really calls for it. Actual non-const pointers are usually there as a way to have an "out" argument on a function.

// there's a "using std::string" somewhere up above...
 
void log_something(const string& message);
 
bool get_value(const string& var, string* val);

This means you can get some idea of whether your data is going to be accessed or not by looking at how a function is called.

log_something("something which will not be changed");
 
string value;
bool success = get_value("also not changed", &value);

The magic & up there basically says "hey, this function might wind up writing to it" to me.

None of this is perfect, but this is about where I am with it. I definitely do not use the whole buffalo and I'm just fine with that.