A about a month ago, I dug up an old copy of my MUD language from 1990, and took a crack at refactoring some of it in C++, but I quickly got frustrated with just how much that felt like work rather than fun, especially after my recent foray into golang.
So I have this 29+ year old code, written in original K&R ANSI C, and I want to get it, say, at least compiling.
Why not take a shot at doing it in pure C?
My friend Nicolas Noble had told me how much pleasure he was getting having gone back to C from being super-cutting-edge C++, and I now see what he mean’t. Instead of spending time describing code, I’m literally spending time writing code.
Sure, I get pangs of wanting to be able to associate functions closely with a struct, if only for the namespacing aspect, but I also brought a few lessons of my own with me and I’m pretty happy with the clarity and intuitiveness of the code, largely inspired by lessons from golang.
My first three decisions were as follows:
– Functions that can experience errors should return errors, not values,
– Any value that a function/system conceptually “owns” should always be transferred by pointer, including pointers,
– Test.
The first part lead me to actually using errno values, and is simple enough, but the second one was slightly more interesting, because it lets me actually erase a pointer when I close something for you.
////////////////////////////////////////////////////////////////////////////////
// NewHashMap constructs a new hash map instance with a given number of buckets
//
// Use CloseHashMap() to release resources associated with the map when done.
//
// Returns:
// EINVAL if buckets is < 4 or into is NULL
// EDOM if buckets is not a power-of-2
// ENOMEM if out of memory
// 0 on success and stores the address of the map in *into.
//
error_t
NewHashMap(size_t buckets, struct HashMap **into)
{
REQUIRE(buckets >= 4);
REQUIRE(into);
CONSTRAIN(is_power_of_two(buckets);
// size for the base structure and the bucket array
size_t size = sizeof(struct HashMap) + sizeof(struct HashBucket) * buckets;
struct HashMap *instance = calloc(size, 1);
CHECK_ALLOCATION(instance);
instance->capacity = buckets;
*into = instance;
return 0;
}
//////////////////////////////////////////////////////////////////////////////////////
// CloseHashMap frees all resources used by the map and invalidates the pointer to it
//
// Returns EINVAL if `map` is NULL
//
error_t
CloseHashMap(struct HashMap **map)
{
REQUIRE(map);
if (!*map) return 0; // ignore NULL pointer
free_bucket_chain(*map);
free(*map);
*map = NULL;
return 0;
}
I’ve found these patterns incredibly easy and intuitive to work with, and I was able to knock out huge quantities of code that pretty much just worked.
I particularly prefer that when I work with my file-wrapper, instead of
FILE *fp = fopen(...);
if (!fp) {
error-handle
}
...
if (fp) {
fclose(fp);
fp = NULL;
}
I take a more sanguine approach, and there is a unity between the open and the call in that both take the same argument type as the last pointer (i.e the address of my resource-pointer).
struct SourceFile *sfp;
error_t err = SourceFile(..., &sfp);
ON_ERROR_RETURN(err); // if the error isn't 0, return the error
...
CloseSourceFile(&sfp); // not only will it be freed, but it will be undangled for me.
I know some people hate writing tests, but first of all I’m writing way more actual code than I would be if I was working in C++, so I don’t mind a little red-tape, but the system I’ve built feels very REPLy to me: I can confirm as I write code that the functions are doing what I expect, that I’m not making them overly cumbersome or complex (which would make testing them tedious), which then feels like validation of my overall architecture: rather than building/composing complexity, I’m building simplicity that rapidly forms rich, deep functionality.
struct SourceFile {
char filepath[MAX_PATH_LENGTH];
void *mapping;
struct Buffer *buffer;
uint16_t lineNo;
size_t size;
};
This little beastie is the fuel behind my parser-generator’s tokenizer:
enum TokenType {
TOKEN_INVALID,
TOKEN_EOL,
TOKEN_WHITESPACE,
TOKEN_COMMENT,
TOKEN_STRING_LITERAL,
TOKEN_SYMBOL, // almost anything else, unless it is:
TOKEN_LABEL, // a prefix to a regular symbol
TOKEN_IDENTIFIER, // special case for tokens starting with non-alnum symbols
};
struct Token {
enum TokenType type : 16;
uint16_t lineNo;
uint16_t lineOffset;
uint16_t length;
const char * start;
};
// ScanParseable will read and tokenize the next unit of code from a source
// file instance and write the tokens into the provided buffer.
extern error_t ScanParseable(
struct SourceFile *file,
struct Token *tokens, size_t tokensSize,
size_t *tokensScanned /*out*/);
This lightweight layering and lack of a significant hierarchy is actually way more flexible than the C++ hindbrain would want you to think.
On Saturday, I was able to build my unit testing framework, write a C-based hashmap class, intrusive linked list (s&d), create a “component” or subsystem system for controlling startup/shutdown dependency sequences, create my sourcefile, buffer, file mapper and tokenizer, and then in a few hours on sunday get it all exhaustively verified working on Windows, Linux and MacOS.
I won’t deny: there are things from C++ (and Python) I miss:
- References, because testing for null pointers is tedious,
- Member functions, because scoping and – look – this structure and that function are tightly coupled,
- Destructors & automatic lifetime, or something like python’s context managers,
- “using”,
- constexpr,
- dynamic initialization (static const foo = runMeAtStartup())
In exchange for using typedef, I wouldn’t always have to type “struct” infront of my types, but… you know what, for now, I like the explicitness.
But that’s almost entirely it. I might have felt differently if I was having less fun: reinventing the list and the hashmap would have sucked if they hadn’t worked flawlessly first time…
I definitely feel more inclined towards go now.
We’ll see how far this refactoring of “AMUL” gets, and maybe I’ll make the repos public. But so far I’m focused on the compiler, because I know the backend is going to be a harder port, having depended almost entirely on Amiga-specific quirks like a single address space and Message Ports.
The way I used Message Ports would translate nicely to Go, I think, and that’s kind of what I half have in mind…
Recent Comments