In programming, it’s very common to want to do multiple things “together”. For example, if we want to load a person’s tweets and Reddit posts, we probably want to call the Twitter API and Reddit API together instead of waiting for one to finish before calling the next.

But what does doing it “together” really mean? Is the computer really doing two things at the same time? Are we really doing anything when we’re waiting for a response from an API? What if we have multiple cores? Those are the kind of questions we’ll explore in this post.

Concurrency and Parallelism in the Kitchen

Let’s forget about programming for a moment and think about cooking. Cooking is a lot like programming. We have a list of things we want to do, and we have to do them in a certain order.

For example, if we want to cook a meal consisting of vegetables and meat, we can do the following:

  1. Chop vegetables (10 minutes)
  2. Boil water (10 minutes)
  3. Cook vegetables (10 minutes)
  4. Chop meat (10 minutes)
  5. Marinate meat (30 minutes)
  6. Cook meat (20 minutes)

If we follow these steps in sequence, this takes a total of 90 minutes.

What if we had another person in the kitchen cooking with us? Each of us could do things together. For example, one of us could handle the vegetables while the other handles the meat:

Person 1:

  1. Chop vegetables (10 minutes)
  2. Boil water (10 minutes)
  3. Cook vegetables (10 minutes)

Person 2:

  1. Chop meat (10 minutes)
  2. Marinate meat (30 minutes)
  3. Cook meat (20 minutes)

Person 1 finishes in 30 minutes, and Person 2 finishes in 60 minutes. This takes a total of 60 minutes. We have saved 30 minutes.

We have basically divided the work into two parts, and each person is responsible for one part. At minimum, it will take 60 minutes to finish the meat. There’s no way to speed it up - the meat has to be chopped and marinated before it can be cooked. Notice that in this setup, both the vegetables and meat are being chopped simultaneously (during the first 10 minutes), even though each task requires the full attention of a person, because there are two different people doing the chopping.

But what if we didn’t have another person? We could still get things done faster:

  1. Chop meat and start marinating meat (10 minutes)
  2. Chop vegetables while boiling water (10 minutes)
  3. Cook vegetables (10 minutes)
  4. wait for meat to finish marinating (10 minutes)
  5. Cook meat (20 minutes)

This still takes a total of just 60 minutes. We have still saved 30 minutes, even without another person.

So how did we do it?

Notice that we didn’t actually do multiple things simultaneously. For example, we didn’t chop vegetables and the meat at the same time, since chopping requires our full attention. But we did boil the water while we were chopping the vegetables. We also started marinating the meat while we were cooking the vegetables.

The key here is that some of these tasks only required us to set something up and wait for it to finish, and that we can get other things done while we wait. Boiling the water takes 10 minutes, but it doesn’t require our full attention while it’s boiling. We can just put the water on the stove and then chop vegetables while we wait for the water to boil. Same for marinating the meat. We can just put the meat in the marinade and then let it sit while we get other things done.

So even with just one person, we can get things done faster by doing multiple things at the same time, but we can only do this when some of the tasks don’t require our full attention. It turns out that this is exactly how computers work too.

Now in everyday language, we often use the terms “parallel”, “simultaneous” and “at the same time” interchangeably, but in computer science, we have to be more precise. There are two terms that are often used: “concurrency” and “parallelism”.

Parallelism, or true parallelism, is when multiple things are actively being done simultaneously, like when we had two people in the kitchen and the full attention of two people was available to chop the vegetables and meat at the same time.

Concurrency is a broader term that means multiple things can progress independently of each other, without one thing having to wait for another, but it doesn’t necessarily mean that multiple things can be actively done at the same time. In the kitchen example, both the second and the third situation are examples of concurrency, but only the example with two people is an example of parallelism.

Concurrency and Parallelism in Computers

Kitchen analogies always work well for explaining programming concepts, but they’re not perfect, as we’ll see in a bit. But for now, let’s see how the kitchen analogy applies to computers.

The cooks in the kitchen are like the CPU cores in a computer. The CPU cores are the part of the computer that actually does the work. Modern computers usually have multiple CPU cores, and each core can do one thing at a time. So if we have two CPU cores, we can do two things at the same time. That is, we can do two things in parallel.

But you knew that already. What you might not know is that the CPU cores are not the only part of the computer that can do work. There are other parts of the computer that can do work too, and they can do work at the same time as the CPU cores. For example, the hard drive can read and write data at the same time as the CPU cores are doing work. The network card can send and receive data at the same time as the CPU cores are doing work. The GPU can render graphics at the same time as the CPU cores are doing work. And so on.

That is key to understanding the other way our kitchen analogy maps beautifully to computers - the nature of the work that can be done at the same time. In the kitchen, some tasks require our full attention, like chopping vegetables, while other tasks don’t, like boiling water. In computers, some tasks require the full attention of the CPU cores, like adding two numbers together, while other tasks don’t, like reading data from the hard drive. For example, if we want to read data from a local file, the CPU cores can ask the hard drive to read the data, and then do other things while the hard drive reads the data. But if we want to do something that requires the CPU cores, like adding two numbers, the CPU cores have to do the work themselves, and can’t do anything else while they’re adding the numbers together.

The vast majority of software out there requires a mix of both types of work, but different situations call for different proportions of each. When most of the work in a program is the type of work that requires the CPU cores, we say that the program is CPU-bound. When most of the work in a program is the type of work that doesn’t require the CPU cores, we say that the program is I/O-bound.

So CPU cores are only concerned with “real” work, like adding numbers, and the small amount of work required to coordinate the I/O-bound work with the other parts of the computer like the hard drive, network card, etc. which handle the bulk of the I/O-bound work in parallel with the CPU cores.

Threads and I/O and Multitasking

So how does all this translate to programming? What does it mean for writing code for a modern computer running on a modern operating system? How do we make our code do multiple things concurrently?

Well, in a modern operating system, the OS is responsible for coordinating the work of the CPU cores. The mechanism to have the CPU cores jump between different tasks, set up I/O-bound work, etc. is all handled internally by the OS. Really, anything meaningful that a program might want to do like reading data from a file, sending data over the network, showing something on the screen, etc. is handled by the OS. If a program wants to be pretty-much anything apart from moving numbers around in memory, it has to ask the OS to do it.

Part of the reason for this is to make things simpler for individual programs. The individual programs are better off not having to worry about these low-level, machine-specific details, but part of the intention is also to gatekeep access to these resources. If every program could directly access the hard drive, the network card, or indeed make the CPU core jump between arbitirary tasks, it would be impossible to ensure a stable system, not to mention a security nightmare!

So anyway, instead of giving programs direct control over the CPU cores, the OS defines an abstraction called a thread. Threads are basically the OS’s way of representing concurrent tasks. Each thread represents an operation that should be run concurrently with other threads. In our code, we get to create a thread and attach some logic to the thread (think passing in a function). This way, we tell the OS what code has to be run concurrently with the rest of the code of our program.

It’s like the steps to prepare the meat, which couldn’t be done any faster because the second step depends on the first, and the third on the second. But the vegetables can be prepared at the same time because those steps don’t depend on any of the steps to prepare the meat. The steps to prepare the meat would be one thread, and the steps to prepare the vegetables another thread.

So by creating a thread, we get to tell the OS what work has to be done, but we don’t get to control exectly when the CPU core switches from one thread to another. Only the OS can control that. The OS decides when to switch from one thread to another, which thread to switch to, etc.

But how exactly does the process to switch between threads work?

Before we answer that question, notice that we have been using the word “concurrently” here, not “parallely”. That’s very intentional. While threads could be executed in parallel, they don’t have to be. Even if we have only one CPU core, we could still achieve concurrency. To drive this point home, let’s only consider a single CPU core when thinking about how the thread switching process works.

So how would a single CPU execute two threads concurrently? Well we’ve already seen how this process works in the kitchen. A single cook is able to do multiple things concurrently by doing things that do require their attention while waiting for things that don’t. Similarly, a single CPU core can execute a thread in sequence, and when it encounters a task that doesn’t require its attention, like reading data from a file, it can ask the OS to do it. At that point, control of the CPU core is handed over to the OS, and the OS will set up that I/O task to be done by the hard drive, and then hand over control to another thread that has CPU work to be done.

Suppose the following two threads needs to be executed concurrently:

The CPU would execute the first thread, and when it encounters the I/O work, it would hand over control of the CPU core to the OS. The OS would then set up the I/O work to be done by another part of the computer, like the hard drive, and then move on to executing the second thread. Once the I/O work is done, the OS would hand over control of the CPU core to the first thread again, and the CPU would continue executing the first thread.

The original thread is said to be “sleeping” while it’s waiting for the I/O to finish. There is nothing to be done while the I/O is happening, so it does not make sense for the OS to return the CPU core to the thread while the I/O is being done. Once the I/O is done, the OS will remember that this thread can now be continued, and once some other thread has finished its CPU work, the OS will hand over control of the CPU core to the original thread again.

Enforcing Concurrency on a Single CPU Core

We’ve seen how a single CPU core can execute multiple threads concurrently. Essentially, we take advantage of the fact that I/O work can happen “in the background” and we use that time to have the CPU core execute other threads. But this is only possible if every thread regularly does I/O work. If every thread only did CPU work, then the CPU core would never be able to execute multiple threads concurrently, because control would never be handed over to the OS.

Because of this issue, the kind of thread-switching we’ve been talking about so far is called cooperative multitasking. The OS can only switch between threads when the currently running threads voluntarily give up control of the CPU core. If a thread never gives up control of the CPU core, the OS can’t switch to another thread. Essentially, each thread has to “cooperate” with the other threads by not hogging the CPU core, otherwise this entire scheme falls apart.

That’s why modern operating systems have another trick up their sleeve. They can forcibly take control of the CPU core from a thread, even if the thread doesn’t explicitly call into the OS to do things like I/O. This is called preemptive multitasking. The OS can preemptively take control of the CPU core from a thread, and then give control to another thread. This way, the OS can ensure that multiple threads are executed concurrently, even if the threads are doing 100% CPU-bound work.

The OS achieves this by setting up a timer that fires every few milliseconds. Think of it like a kitchen timer. When the timer fires, the OS knows that a certain amount of time has passed. Before transferring control of the CPU core to a thread, the OS sets the timer to fire after a certain amount of time. Every time the timer fires, the OS automatically regains control of the CPU core, and is then able to decide what to do next. So even though the thread didn’t voluntarily give up control of the CPU core, the OS can still regain control of the CPU core after a certain amount of time has passed. We’ll explore the details of how this works in a future post, but for now, it’s enough to know by setting up a timer, the OS can forcibly take control of the CPU core from a thread.

At this point, we’ve probably already stretched the kitchen analogy to its breaking point, but let’s try to stretch it a little further. Imagine that a single cook had to chop vegetables and meat in a restaurant kitchen where there’s a constant flow of orders. Chopping either the vegetables or the meat are both busy work that requires the cook’s full attention. But the cook can’t just chop vegetables for an hour and then chop meat for an hour. Both vegetables and meat are needed for every order. What the cook could do is set an alarm for 10 minutes, and then chop vegetables till the alarm goes off. Then the cook could set another alarm for 10 minutes, and then chop meat till the alarm goes off. This way, the cook is able to chop vegetables and meat concurrently, without finishing one before starting the other.

This probably sounds a bit silly, but that’s exactly how an OS ensures that multiple threads are executed concurrently, even on a single core, and even if the threads are doing 100% CPU-bound work.

Wrapping It All Up

We’ve covered a lot of non-trivial concepts in this post, so let’s quickly recap what we’ve learned:

  • Concurrency is when multiple things can progress independently of each other, without one thing having to wait for another. It doesn’t necessarily mean that multiple things can be actively done at the same time.

  • Parallelism is when multiple things are actively being done simultaneously. It’s a subset of concurrency.

  • In programming, concurrency is achieved by using threads. Threads are the OS’s way of representing concurrent tasks. Each thread represents an operation that should be run concurrently with other threads.

  • Threads can be executed in parallel, but they don’t have to be. Even if we have only one CPU core, we could still achieve concurrency.

  • A single CPU core can execute multiple threads concurrently by taking advantage of the fact that I/O work can happen “in the background” and using that time to have the CPU core execute other threads.

  • This kind of thread-switching is called cooperative multitasking. This only works when the currently running threads voluntarily give up control of the CPU core by doing I/O work, or generally by calling into the OS.

  • Modern operating systems can also forcibly take control of the CPU core from a thread, even if the thread doesn’t explicitly call into the OS to do things like I/O. This is called preemptive multitasking.

That’s not even scratching the surface of the topic of concurrency, but it’s a good start. In my opinion, it’s one of the most important topics in programming, and among the most misunderstood and challenging problems in software engineering. Indeed, many a language and framework has been created with the express goal of dealing with the complexities of concurrency, and we’ll dive into some of those in future posts. But for now, I hope this post has helped you understand the basics of concurrency at the system level.

See you in the next post!