Game Engine Dev, Explained for Non-Programmers: Sprites
This series can be read out of order, but here are some navigation links for your convenience:
<<Introduction | <Previous post | (This is the latest post)
Y’know, I’ve always wondered why they’re called that. Wikipedia claims it’s because they “float on top of the background image without overwriting it, much like a ghost or mythological sprite”, but that seems awfully tenuous.
Anyway, what exactly are we talking about? Well, images! This is fairly common knowledge, but ‘sprite’ is the general term for any pre-drawn 2D image you might see in a game. They’re used to depict anything from characters to scenery to UI elements. One important thing to note: self-contained animations (such as a character’s walk cycle) might contain several individual images (i.e. animation frames), but they’re usually considered a single sprite. This makes sense, for reasons we’ll see later.
Now, what exactly goes into a sprite system? The goal when building one can be summarized thusly:
Get the drawing from the artist’s desk.
Convert it into something that can be rendered to the screen efficiently by the computer.
Make a system the programmer can easily use to tell the computer to do so whenever they want.
Get the drawing!
Our precious, precious artists. They are beautiful, innocent, unsullied. They are not to be exposed to anything as filthy as a command line. As such, I must find a way to automatically retrieve their work. Fortunately, we can make this step a part of our build system (see the previous post for details about that).
As mentioned last time, artists will be mostly working with Aseprite files. This is great for them, since they have all sorts of editing tools at their disposal and can work on an entire animation at once, but it presents an issue, since most tools can’t read aseprite files directly. Not to worry though! Thanks to Aseprite’s scripting API I have the ability to automate the export of these files to some pretty precise specifications. Since I need the individual image data of each animation frame for the next step anyway, I’ll export each frame stored in the aseprite file as an individual .png1 file and store it in a temporary folder.
One additional wrinkle: I need the image data, sure, but I also need to keep track of some metadata for each frame; stuff like the name of the sprite, the order of the frame, how long it lasts in the full animation, and a custom origin point. .png files don’t exactly have a predetermined slot for this data, so it’s not obvious where it would go. One option is to give the file a unique ID then create a separate file when exporting that tracks this data and maps it to the file id (which I’ll have to do later anyway), but, given that it’s not actually a lot of data at this point, I can use a neat trick and simplify things by storing that information in the file’s title! Here’s an example of an exported 4-frame animation; the folder name keeps track of the name of the sprite and each file title stores the frame index, duration (in milliseconds), and origin point:
Convert the drawing!
Alright, now that I have the images, I need to be able to render them efficiently in-game! This involves retrieving the image data from the files when the game is running, storing it in a ‘texture’ (essentially a blob of image data that lives in RAM), then rendering that texture to the screen using the GPU. The framework I’m using can do this by loading .png files directly into their own individual textures and using those, but this should be avoided for two reasons:
A game can end up having thousands or even tens of thousands of individual animation frames. Loading them all into memory separately while the game is running would involve just as many filesystem calls, which as I’ve mentioned before can be quite bad for performance.
This one’s more important, and the reason we can’t use a simple asset packer to get around the first problem: rendering a texture involves the CPU moving it from RAM to the GPU and… hm, I feel I should maybe explain these terms…
Sidebar!
Well dear reader, you’ve done great so far, but it’s unofrtunately time to subject you to another computer science lesson (tragic). If you know what a GPU is you can skip ahead to the next heading, but for everyone else, sorry, but I’ve already locked the doors. Now then, in the second post in this series we talked a bit about the CPU and RAM. As a quick refresher:
The CPU (Central Processing Unit) is the part of your computer that does most of the “work” by moving bits of data to and from memory and running simple opertions on them.
RAM (Random Access Memory) is the type of memory in your computer that’s only used while the computer is on. It’s faster to access than permanent storage like your hard drive, and most or all of a program’s information will be stored here while that program is running.
Understanding these is enough to get an idea of what most programs are doing, but once we add certain tasks (like graphics) into the mix, we have to start talking about the GPU! If you’re unaware, that stands for ‘Graphics Processing Unit’ (big surprise). It’s a processing unit just like the CPU in the sense that it receives data from memory, does stuff to that data, then sends it on it’s way. In fact, older PCs didn’t even have any, they weren’t even invented until the late 90s! So what exactly is the difference?
It all comes down to something called parallelism. Take the following computing task:
Read number a and number b from memory.
Add them together.
Take the result and divide it by 2.
Save the result to memory as number c.
This task can be described as sequential, that is, each step depends on the result of the previous step. More accurately, you would say it’s impossible to do better than the sequential approach when trying to complete this task. Now let’s take a look at a different example:
Read number a, number b, and number c from memory.
Add 4 to number a.
Subtract 2 from number b.
Multiply number c by 0.
Save the results to memory in-place.
One thing you might have noticed here: none of the operations on the numbers here really depend on what’s going on with the other numbers, which means this problem is parallelizable, i.e. the steps can be worked on in parallel.
To use a simplified analogy, let’s say your CPU contains 3 little worker elves. If you gave the elves the first task, it doesn’t really help that there are three of them. Even if each elf is assigned it’s own steps, the elf assigned to step 2 would still have to wait around for step 1 to finish, and the elf assigned to step 3 would have to wait for steps 1 and 2 to finish. This means it would take the same amount of time for one elf to do all the work as it would take for 3, 10, or even 100 elves! In fact, more elves might take longer, since they need to pass things around to each other. But when it comes to the second task, more elves helps a lot! Each elf can work on its own number without caring about what the other elves are doing, and things get done 3 times faster than if one elf had to go through all three numbers itself!
So what does this have to do with anything? Well, you can think of the main difference between the CPU and GPU as being their ability to work on things in parallel. For a while, CPUs could only work on tasks sequentially, and this is still their main strength today. The CPU in your typical home computer has about 2-8 elves (or, er, ‘cores’). Now, these are really jacked elves, they eat their protein powder and drink their juice; they can churn through tasks really quickly. But there are only 8 of them. Conversely, GPUs can have thousands of cores. Now, most computing tasks are sequential, the only way you’re getting things done faster is by getting a faster single core. But certain classes of tasks (such as, surprise surprise, graphics) are referred to as ‘embarassingly parallel’, which means more cores → faster performance.
To make this a little more salient, let’s take a typical graphics task: drawing a tinted HD image to the screen. There are 2,073,600 pixels in an image that size (1920x1080). If I were doing things sequentially, I’d have to operate on the color data of each individual pixel one at a time; millions of operations. But none of the pixels care what the other pixels look like. With a GPU, each core can work in parallel to work on thousands of pixels at a time, drastically speeding up the operation. Waow!
Where were we?
Huh? Oh yeah, sprite packing. So, why can’t all my sprite frames stay in their own little .png file? This is because, in order to get the GPU to do anything, the CPU has to send the data relevant to the task to it from RAM, which can often be the slowest part of the process. If each sprite is stored as its own texture in memory, then I’d have to send over a new texture every single time I want to draw a sprite, very slow.
So how does one get around this? Well, textures can get very big; many modern GPUs support textures up to 4096x4096 pixels in size, and almost all support textures up to 2048x2048 pixels. Since most sprites are much smaller than this (especially ones in a retro-style pixel-art game), this means I can “pack” more than one sprite on a single texture! When drawing a sprite frame, I send this whole giant texture, then tell the GPU to only draw the specific part corresponding to that specific frame. This might seem wasteful, and it would be if I was doing this every time I wanted to draw the sprite frame, but organizing things this way allows me to “batch” draw calls. Instead of sending the texture over every time I want to draw a sprite, I instead keep the drawing instructions in reserve. When it’s time to present a finished render to the screen, I send the whole texture over along with all the saved up instructions at once. This means the GPU will happily draw a bunch of sprites all using the same texture!
So how do I get these ‘packed’ textures? It’s technically possible to load a bunch of separate .pngs into one texture when the game starts up, but, as mentioned, it’s better to do this step ahead of time. This means packing all my smaller .pngs into one big .png known as a ‘texture page’2 (a.k.a. texture atlas, sprite atlas, or sprite sheet3). At this stage, since sprites are no longer split up into a bunch of different files, I also produce a separate ‘index’ file for each texture page that keeps track of sprite frame metadata (the stuff we saw earlier, like frame index and duration, as well as where that frame is positioned on the texture page). I can then read the data from this file when drawing to calculate the exact instructions I need to send to the GPU.
Make a system the programmer can easily use to tell the computer to draw sprites whenever they want!
Ok! Ok! We sort of already covered most of this in the previous section, but here are some neat things I included in my sprite system to make using it easier:
Part of the build process saves each sprite’s name to a code file in a special way that lets me refer to each sprite directly by name when scripting.
The most basic way to draw a sprite is by providing the sprite, as well as a position. I built my system to be able to do this by without the requester needing to deal with any of the texture page stuff.
Normally, when drawing a sprite at a given position, it will draw the top-left corner of the sprite there. To make things easier on me and the artists, I added the ability to specify a special “origin” layer in Aseprite, which is tracked by the exporting script. I mentioned this briefly earlier, but basically the ‘origin’ is the point on the sprite that will drawn at the given position. Being able to change this makes it easier to reason about where to draw a sprite (e.g. we usually want every entity’s position in a platformer to be set at their feet. If their sprites were of different heights, we’d have to determine where the top of the sprite would go for each entity in order to draw it. But by setting every entity’s sprite’s origin at that sprite’s feet, we can simply draw the sprite at the entity’s position with no adjustment necessary!). The origin can also be used as the rotation axis when drawing the sprite at an angle.
Scaling the sprite means mapping it to a differently sized rectangular area, but it’s often easier to think about multiplying a sprite’s scale. I wrote some code that lets me use that method instead, it also scales relative to the origin, so e.g. an enemy that gets scaled up will still have its feet on the floor.
Nice!
Yup.
Next time I’ll talk about the input system, stay tuned!
<<Introduction | <Previous post | (This is the latest post)
We use .png files since they’re a lossless format that simply stores the image as-is. While it’s technically possible to use a compressed format like .jpg, this would be counterproductive: compression would damage the image quality of sprites, and rendering the sprite would require either decompressing it ahead of time (defeating the point of compression), or slowly decompressing it whenever it needed to be rendered, which would be terrible for performance. The only time this might make sense is for extremely large image files.
There are several existing software tools that can do this, google “texture packing” and pick your favorite!
Not to be confused with the ‘sprite sheets’ you might have seen on websites like The Spriter’s Resource. These are usually reconstructed manually by extracting image data from game files and lining it up in an appealing way. They are meant to serve as a clean reference for artists rather than an optimized chunk of data.