Emulator Polling vs. Scheduler Game Loop

You've moved from games to emulators, but is the standard game loop efficient enough? Let's discuss the efficiency between polling and scheduler-based loops.

Written by Gregory Gaines

July 10, 2024

4 min read

0 views

Algorithms & Data Structures Software Design Patterns Emulation

Photo by 2Photo Pots on Unsplash

Introduction

If you're writing an emulator, I'm going to take a wild guess that you've tried to make your own game or game engine at one point, haven't you? At some point, you've most likely written a game loop, the heartbeat that drives the game forward. Writing an emulator is no different since it requires a game loop. I will discuss two game loop paradigms, polling, and scheduler-based game loops, and when to apply each.

Prerequisite

Some emulator experience
Same game dev experience

Polling-Based Game Loop

Most of us should be familiar with the traditional polling game loop:

Java
1public void pollingGameLoop() {
2  while (true) {
3    processInput();
4    update();
5    renderFrame();
6  }
7}

A polling game loop is simple and effective! In an emulator, it's a little different. The update function for an emulator contains complex looping logic like stepping the CPU and updated system components. For us, this is what the focus is. Here's what a polling game loop would look like in an emulator update function:

Java
1public void pollingEmulatorRunFrame() {
2  int totalCycles = 0;
3
4  while (totalCycles < CYCLES_PER_FRAME) {
5    // Progress the CPU by one instruction, each instruction 
6    // averages between 2-4 cycles.
7    int executedCycles = cpu.executeInstruction();
8
9    // Update subsystems.
10    gpu.update(executedCycles, interrupt, dma);
11    apu.update(executedCycles);
12    timers.update(executedCycles, interrupt);
13    dma.update(executedCycles);
14    interrupts.update(executedCycles);
15
16    totalCycles += executedCycles;
17  }
18}

In this method, to complete the frame, the CPU is stepped one instruction at a time, then all other subsystems are polled with the number of cycles that were just executed to keep the system in sync. Like a game update function, the emulator's "run frame" function is called 60 times a second. Unfortunately, polling game loops can be slow and wasteful.

Polling-Based Game Loops Can Be Slow

As we know emulators can be computationally expensive. Referencing the polling emulator run frame method, we step the CPU by one instruction, then poll all other components. Constant polling can strain the host CPU because (depending on the emulator) there could be nothing to do or the next component event could be 100 cycles away, yet polled every CPU step. Imagine polling-based game loops like running a race where after every step, you have to look down to make sure your shoes are tied. Not very efficient, right?

Java
1// Progress the CPU by one instruction, each instruction
2// averages between 2-4 cycles.
3int executedCycles = cpu.executeInstruction();
4
5// Update subsystems.
6gpu.update(executedCycles, interrupt, dma); // Next event in 90 cycles
7apu.update(executedCycles); // Next event in 50 cycles
8timers.update(executedCycles, interrupt); // Next event in 10 cycles
9dma.update(executedCycles); // Next event in 90 cycles
10interrupts.update(executedCycles); // Next event in 20 cycles

When designing polling-based components, I experienced coupling between components and state the component shouldn't care about. For example, polled components keep an internal track of cycles to update themselves properly. Additionally, there could be component events that depend on each other, meaning I have to couple those components together in some fashion to keep them in sync. A component should be self-contained and not care about state or components outside of itself. Don't get me wrong, this is fine for a simple emulator, but slowly becomes a bottleneck for more complex emulators. A more efficient alternative is a scheduler-based game loop.

Scheduler-Based Game Loop

A scheduler-based game loop depends on a scheduler that keeps a queue of emulator events and tracks the timing until the event is ready.

Java
1// A scheduler to queue events.
2interface Scheduler {
3  // A scheduled event.
4  class ScheduledEvent {
5    Event event;
6    int cyclesUntilReady;
7  }
8  // Queue an event in the scheduler.
9  public void schedule(Event e, int cyclesUntilEvent);
10  // Pops an event from the top of the scheduler.
11  public ScheduledEvent popNextEvent();
12  // Progress all events in the scheduler.
13  public void progress(int cycles);
14}

The first immediate benefit is components are decoupled from tracking CPU cycles, the scheduler takes care of that. Now, components only have to report how many cycles it takes for events to fire. Here's the emulator run frame updated to use a scheduler:

Java
1public void schedulerEmulatorRunFrame() {
2  // Fill the scheduler with initial events.
3  scheduler.schedule(Event.VBLANK, Lcd.CYCLES_PER_FRAME);
4  scheduler.schedule(Event.HBLANK, Lcd.CYCLES_PER_HDRAW);
5  scheduler.schedule(Event.AUDIO_READY, Apu.CYCLES_PER_SOUND_WAVE);
6
7  while (true) {
8    // Pop the next event from the scheduler.
9    ScheduledEvent nextEvent = scheduler.popNextEvent();
10
11    // Run the CPU until the next event. The CPU can run freely!
12    cpu.executeInstructionsForCycles(nextEvent.cyclesUntilReady);
13    
14    // Move the queue forward.
15    scheduler.progress(nextEvent.cyclesUntilReady);
16    
17    // Handle the now ready event.
18    switch (nextEvent.event) {
19      case Event.VBLANK:
20        gpu.renderFrame();
21        return; // Break out of loop, frame completed!
22      case Event.HBLANK:
23        gpu.enterHBlank();
24        scheduler.schedule(Event.ExitHBLANK, Lcd.CYCLES_PER_HDRAW);
25        scheduler.schedule(Event.DMA, DMA.HBLANK_CYCLES);
26        scheduler.schedule(Event.INTERRUPT, Lcd.HBLANK_INTERRUPT_CYCLES);
27        break;
28      case Event.ExitHBLANK:
29        gpu.exitHBlank();
30        scheduler.schedule(Event.HBLANK, Lcd.CYCLES_PER_HDRAW);
31        scheduler.schedule(Event.DMA, DMA.EXIT_HBLANK_CYCLES);
32        break;
33      case Event.DMA:
34        dma.runDMA();
35        break;
36      case Event.Timer:
37        ...
38     case Event.Interrupt:
39       ...
40    }
41  }
42}

Another improvement is the CPU can run freely! Instead of stepping the CPU one instruction at a time, we tell the CPU how many cycles to run, handle the proper event, and queue new ones. We no longer waste CPU resources by needlessly polling components.

I wonder if instructing the CPU to run a certain amount of instructions generates better assembly than stepping the CPU one instruction at a time. 🤔

Take my racing analogy from before, instead of taking a step and then stopping to look and tie your shoes. Instead, you are told to run some meters, then you are free to stop and tie your shoes. Another benefit is further decoupling by components not knowing each other. Component events that depend on each other are scheduled and handled independently. Overall, we now have efficient CPU utilization, decoupled components, and scalable code.

FPS Comparison

Don't just take my word for it, here are both loop implementations running on my GBA emulator (written in Golang btw):

Polling-Based Gameplay

Scheduler-Based Gameplay

The comparison shows a nice gain of about 2-4 cycles, which might not seem like much since the GameBoy Advance is a relatively older system with outdated hardware. I have a hunch that newer systems will benefit from a scheduler-based loop that utilizes resources efficiently. I may or may not be working on a Nintendo system emulator that is 3D, has 2 screens, and required a scheduler.

Summary

Polling-based game loops can waste CPU utilization by constantly polling components after every CPU step. Polling-based components are burdened with tracking and calculating cycles. Polling-based components can be coupled to each other if they have events that depend on each other, so updating one component will require it to know about another one.

Scheduler-based game loops use a scheduler that is a queue of events. Instead of stepping the CPU one instruction at a time, we pull an event and then run the CPU for the specified number of cycles the event requires. We can handle the event and queue new events until we complete the frame. This flow improves CPU utilization by tracking how long we need to run the CPU for each event instead of needlessly polling.

Scheduling also decouples components from tracking and calculating CPU cycles. The component's only interaction with the scheduler is providing the number of cycles until an event is ready. Components are also decoupled from each other. If one component's events depend on another component's events, they can be scheduled independently. This keeps components decoupled and self-contained. In my GBA emulator, using a scheduler-based loop over a polling-based loop gave an increase of 2-4 cycles.

That's my little summary on both update options, I hope you develop some amazing emulators! If you have one, send me a link using my contacts on my About page. Maybe I'll stop slacking and finish my emulator someday...

Consider signing up for my newsletter or supporting me if you enjoyed the article.

Thanks, and take care!