Assembler Tricks Just for Kicks

Sat, Jan 21, 2017 3-minute read
The Knot Worldwide Tech Team
The Knot Worldwide Tech Team

Assembler Tricks Just for Kicks

Here at XO Group, we use a number of different technology stacks (e.g. Ruby/.NET/node) and front-end frameworks (e.g. Backbone/Angular/React). I personally come from a more compiled/statically-typed background, mainly because I love messing directly with my own pointers and allocating memory without relying on some garbage collector to do the janitorial work for me (Who do garbage collectors think they are… touching my memories?). Since moving to full-stack development, I haven’t had as much of an opportunity to have an excuse to play with pointers. However, occasionally I do leverage actually some design patterns from my embedded programming years of yore.

A couple years back, I was having to work on a service that would serialize objects from a third-party file and then push the data into our API. My attitude initially was that serializing objects can as cumbersome as peeling a cucumber, Son. Basically, each line in the file was tied to a specific format that I could use to map an object to, so the logic was straightforward, but keeping it DRY was a little difficult at first.

Assuming the developer knows regular expressions, the initial approach most people have when parsing files might look something like this

Note that even with the assignments in the if statements (which some developers really disapprove of, but I’m pragmatic so whatevs), I’m having to keep calling the “matches” function. It’s annoying to repeat myself so many times. In something like Ruby, you can shorten this by using a regex-enabled switch statement:

However, many languages don’t support this construct (including the one I was using for this service… C#). Thus, for less-flexible languages what can we do? Well, I went back to my university days when I was learning assembler (yeah, waaaay back). Assembler does branching logic (if/else conditions) by comparing registers on the processor to a value, and then it will jump to the address of your program that you pass to it if the condition is met. To make the programs less confusing, some creative person came up with this handy-dandy thing called a “jump address table”, which is basically the predecessor to the modern “switch” statement (indeed, most languages compile integer-based switch statements down to jump address tables… I actually checked before I finished this post).

The idea is that you define a table (or array) of addresses in memory that represent the start of each function you’ve defined for that case, and then when you index your array (e.g. array[3]), you jump directly to that location without needing to compare any values.

Example:

This is equivalent to the traditional integer-based switch, it just looks different.

Note that a switch statement can only directly compile to a jump address table if it is an integer-based or referenced-based one (pointers/references are addresses in memory, which are integers).

However, this doesn’t mean we can’t co-opt this construct to something a little more high-level. What I ended up doing is going back is thinking of my regex evaluations as a table as well and made them map directly to function calls. As long as the calls all have the same prototype, then we can treat the logic that calls the functions fairly generic.

So, I’ve utilized the jump-address table concept to extend to regexes and clean up my original if/else logic. This saved me a lot of time on file parsing. There may be other uses for it, but the point is sometimes if you find yourself manually putting in a bunch of conditional logic, you might be able to leverage some data structures creatively to get the same result. Thanks for reading.Originally published at blog.eng.xogrp.com.