Game Engine Dev, Explained for Non-Programmers: Input
This series can be read out of order, but here are some navigation links for your convenience:
<<Introduction | <Previous post | Next post>
Input. Seems simple, right? Press the A button, make the marketable character jump. Easy peasy. But what if I want to press the spacebar instead? What about the right trigger? What if I only want to use one hand? What if I wanted to press the a button now, but I did it one tenth of a second ago? Everything’s falling apart. The intern has drizzled conditional input checks all over the player object. You’re using variables from 5 different files, half of which you don’t remember making. You’re tracking input buffers in the player’s state logic. Nightmare. Nightmare! NIGHTMARE!!
Like many software development tasks, building a proper input system is an exercise in abstraction. The goal is to translate signals from a controller into actions in the game, but problems arise when thinking of this as a 1:1 relationship. Instead, it’s often simpler in the long run to construct multiple abstraction layers which store information and mappings and serve to translate concise, high level requests (e.g. did the player make a “jump” input?) into questions of the lower level hardware (e.g. did the 3rd face button on the gamepad connected to OS port 2 go from unpressed to pressed within the last 10 frames?).
Now, the exact construction of these layers is going to vary significantly based on your preferences, needs, and what your framework/engine can already provide for you, but I’ll describe the system I use, which is fairly generalizable. It involves roughly 4 layers:
- The framework layer 
- The abstracted hardware layer 
- The mapping layer 
- The verb layer 
The framework layer
Premade engines like Game Maker or Godot will usually get you most of the second layer out of the box, sometimes more, but if you’re just using a framework like SDL then this is the only part that will already be done for you. In my case, SDL handles two things for me:
- Allows me to retrieve the complete input state of the keyboard or mouse at any given point in time as a big chunk of data. 
- Triggers events when a gamepad is connected/disconnected, which provide a pointer that can be used to retrieve infromation about the state of that specific gamepad later on. 
The abstracted hardware layer
The goal of this layer is to read the input states obtained from the previous layer and allow us to easily ask, for any given device, whether one of its keys/buttons1 was pressed, released, or is being held. Also to retrieve the current value of unique stuff like analog stick axes or mouse position. I call it the “abstracted hardware” layer because, while we’re still thinking of physical devices at this point, I’ll be fully abstracting away anything that isn’t directly relevant to input; stuff like what exact usb or hardware slot we’re using, or weirdly formatted analog input values.
For the most part, we can just pass on slightly reformatted values from the framework layer, but there are two not-so-trivial problems to solve here:
- How do you tell if a button has just been pressed/released, given that we can only retrieve the current input state at any given time? 
- How do you handle multiple gamepads? 
Taking them in order, the first problem is simple. As mentioned previously, the game runs it’s logic once per “frame”, in my case every 1/60th of a second. While some applications will try to obtain the input state as often as they can, in my case I only have to do so at the start of each frame. But I can also store the input state from the start of the previous frame as well! By doing this, whenever something asks if a key/button was pressed, I just need to check the current state against the previous state. If the button was inactive last frame but is active this frame, that means it was just pressed down2!
The next problem is a little trickier. If you’re making a singleplayer game you can get away with just polling every connected gamepad for input and treating them like one weird schrodinger’s gamepad, but if you’re doing any sort of multiplayer you need to be able to differentiate between each specific gamepad; you need to assign each a unique, persistent ID.
I see many beginners inclined to just look up a controller’s ‘hardware slot’ (the number sometimes indicated by little green lights on controllers) and use that to identify it. Unfortunately, this runs into a number of pitfalls, mainly because it leaves you completely beholden to what the OS wants to do with this ID. Did you know that on Windows it’s almost impossible to change what ID a controller has been assigned without restarting your PC? If you plug in two controllers then unplug the first one, well, congrats: barring a restart (or a 15 minute unplug-replug sesh with 3 more controllers) you are Player 2 forever now. The number of games that don’t account for this is staggering, I’ve seen some singleplayer games that refuse to read anything from any gamepad except the one in the first slot. So this is a non-starter.
My recommended alternative is to assign virtual slot numbers yourself, in order, whenever a gamepad is plugged in. If a gamepad is unplugged it’s slot becomes available to the next gamepad that’s plugged in, so resetting slots is as simple for the user as unplugging everything then replugging it in the desired order. This does mean stuff can get mixed up if something is accidentally unplugged after someone leaves, but I think the gains in flexibility more than make up for this shortcoming.
And that’s that! A layer like this is useful in and of itself when doing stuff like prototyping or simple menus, but if things need to be remapped then we need to move on…
The mapping layer
At this point, things would start to get a little more game-dependent. This part of the input system wouldn’t usually be done as part of the base engine, but the idea can be applied to almost any game so I figured I’d cover it anyway. I’ll be assuming the game is singleplayer to keep things simple, but the idea would be the same in multiplayer games, just copied across multiple profiles.
This layer handles mapping (or ‘binding’) buttons to in-game actions, i.e. defining what will be shown to the player in the in-game input remapping menu. You wouldn’t typically want to allow this for stuff like simple menus (since the player could inadvertently softlock their program) but, in my opinion, any other in-game action should be on the table here. The exact actions and default mappings will, of course, depend on the game.
There’s not much else to say here, this layer basically just exists as a big switchboard. You ask it “what’s the button for jumping?” and it knows it’s the A button, which it can then pass along to the hardware layer to know whether that button is being pressed.
The verb layer
Finally, we’ve reached the end point. Here’s where things get a little interesting. You might have a “jump” button mapped in the previous layer, but sometimes just knowing if the “jump” button was pressed this frame isn’t enough. The main purpose of this layer is to be hardware agnostic (i.e. to check for an action’s corresponding input regardless of what input device the player is using), but it’s also used to implement input buffering, i.e. “storing” inputs for use in the near future. A common example that’s often given as a use-case for this goes like so: in platformers, players often want to jump as soon as they hit the ground, but if they press the jump button slightly before landing then the input will be ‘eaten’ (since you can’t jump in midair), which feels terrible. But if you ‘buffer’ the input by checking (when grounded) not only if the jump button was pressed this frame but at any point in the last 10 frames, then the player character will jump as soon as they hit the ground even if the input came slightly too early. The exact amount of buffer will vary from action to action and game to game (non-action games usually have no need for it at all), but granting this sort of margin of error goes a long way in making a game feel responsive.
So the end goal of this layer is to provide a list of “verbs”3 that can be used to make requests like “was there a [jump] input in the last 10 frames?”. You might have noticed that I already do a little bit of buffering in the abstracted hardware layer in order to check button presses and releases, and while I could handle more long-term buffering at that stage, doing it here (by storing a portion of the data from that layer) is a bit more efficient since I only have to keep track of the inputs the player has mapped to an action, rather than the hundreds of possible buttons that could be mapped.
Each action from the mapping layer usually gets 1-3 verbs (depending if you want to track a button press, release, hold, or some combination of the three). You’d also have verbs for actions that don’t necessarily have a mapping, such as left and right movement when using a gamepad (since that’s usually locked to an analog stick). Again, the point here is to be completely hardware agnostic, the layer will handle identifying what kind of controller the player is using and making the right request to the hardware layer using the correct bindings from mapping layer (or constant values, for actions with no bindings). It stores information for all these verbs every frame in its buffer, and uses this information to answer the aforementioned high-level requests.
Conclusion
Awesome! And with that, we’ve covered every “fundamental” feature a game engine needs to make an absolutely terrible super bare-bones “““game”””! Wow!4 From now on I’ll be talking about some important but less totally foundational engine features; join me next time to hear all about the audio system!
And by hear I mean read, I will not be including audio.
<<Introduction | <Previous post | Next Post>
Referred to henceforth just as “buttons”.
And vice-versa to check if a button was released.
Calling them “actions” felt too ambiguous
Well, we probably didn’t really need sprites, you could make a game with just, like, rectangles.

