Python Concurrency: Threading, Multiprocessing and the GIL Explained

Python concurrency confuses many developers because of the Global Interpreter Lock (GIL). Understanding why the GIL exists, what it prevents, and how to work around it is essential for writing performant Python and for clearing technical interviews.

The Global Interpreter Lock (GIL)

The GIL is a mutex in CPython (the standard Python interpreter) that allows only one thread to execute Python bytecode at a time, even on multi-core machines.

Why does it exist? CPython's memory management (reference counting) is not thread-safe. The GIL prevents two threads from simultaneously modifying the same object's reference count, which would cause memory corruption.

What it means in practice:

python
import threading
import time

counter = 0

def increment(n):
    global counter
    for _ in range(n):
        counter += 1  -- NOT thread-safe! counter += 1 is read-modify-write

threads = [threading.Thread(target=increment, args=(100_000,)) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()

-- Expected: 400_000
-- Actual: some number less than 400_000 (race condition)
-- The GIL does NOT protect individual operations within Python code
-- It protects the interpreter's internal state, not your application logic

The GIL releases periodically (every 100 bytecode instructions by default, or on I/O operations), allowing other threads to run. This is why threading works for I/O-bound tasks — threads spend most of their time waiting for I/O, so the GIL is released frequently.

Threading: For I/O-Bound Tasks

Threading is effective when your program spends time waiting — for network responses, file reads, database queries:

python
import threading
import requests
import time

urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

-- Sequential: ~4 seconds
def fetch_sequential(urls):
    return [requests.get(url).json() for url in urls]

-- Threaded: ~1 second (all requests run concurrently)
def fetch_threaded(urls):
    results = [None] * len(urls)

    def fetch(url, index):
        results[index] = requests.get(url).json()

    threads = [
        threading.Thread(target=fetch, args=(url, i))
        for i, url in enumerate(urls)
    ]
    for t in threads: t.start()
    for t in threads: t.join()
    return results

start = time.time()
fetch_threaded(urls)
print(f"Threaded: {time.time() - start:.1f}s")  -- ~1.0s

Thread Synchronization

python
import threading

-- Lock: mutual exclusion
lock = threading.Lock()
counter = 0

def safe_increment():
    global counter
    with lock:  -- only one thread at a time
        counter += 1

-- RLock: reentrant lock (same thread can acquire multiple times)
rlock = threading.RLock()

-- Semaphore: limit concurrent access
semaphore = threading.Semaphore(5)  -- max 5 concurrent database connections

def query_database():
    with semaphore:
        return db.execute("SELECT ...")

-- Event: signal between threads
ready = threading.Event()

def producer():
    time.sleep(2)
    ready.set()  -- signal that data is ready

def consumer():
    ready.wait()  -- blocks until set()
    print("Data is ready, consuming...")

-- Queue: thread-safe communication
from queue import Queue

task_queue = Queue()

def worker():
    while True:
        task = task_queue.get()
        if task is None:
            break
        process(task)
        task_queue.task_done()

threads = [threading.Thread(target=worker) for _ in range(4)]
for t in threads: t.start()

for task in tasks:
    task_queue.put(task)

task_queue.join()  -- wait for all tasks to complete
for _ in threads: task_queue.put(None)  -- signal workers to stop

Multiprocessing: For CPU-Bound Tasks

Multiprocessing bypasses the GIL by using separate processes — each with its own Python interpreter and memory space:

python
import multiprocessing
import time
import math

def is_prime(n):
    if n < 2: return False
    if n == 2: return True
    if n % 2 == 0: return False
    for i in range(3, int(math.sqrt(n)) + 1, 2):
        if n % i == 0: return False
    return True

numbers = list(range(1_000_000, 1_001_000))

-- Sequential: uses 1 CPU core
def count_primes_sequential(numbers):
    return sum(1 for n in numbers if is_prime(n))

-- Multiprocessing: uses all CPU cores
def count_primes_parallel(numbers):
    with multiprocessing.Pool() as pool:  -- default: one process per CPU core
        results = pool.map(is_prime, numbers)
    return sum(results)

start = time.time()
count_primes_sequential(numbers)
print(f"Sequential: {time.time() - start:.2f}s")  -- ~2.0s

start = time.time()
count_primes_parallel(numbers)
print(f"Parallel: {time.time() - start:.2f}s")  -- ~0.5s on 4-core machine

-- Pool methods
with multiprocessing.Pool(processes=4) as pool:
    -- map: apply function to each item, returns list
    results = pool.map(process_item, items)

    -- imap: lazy version of map, memory-efficient for large datasets
    for result in pool.imap(process_item, items, chunksize=100):
        save_result(result)

    -- starmap: for functions with multiple arguments
    results = pool.starmap(add, [(1, 2), (3, 4), (5, 6)])

    -- apply_async: non-blocking, returns AsyncResult
    future = pool.apply_async(heavy_computation, args=(data,))
    result = future.get(timeout=30)

Sharing State Between Processes

Processes do not share memory — use explicit mechanisms:

python
from multiprocessing import Value, Array, Manager

-- Shared primitive value
counter = Value("i", 0)  -- "i" = integer

def increment(counter):
    with counter.get_lock():
        counter.value += 1

-- Shared array
shared_array = Array("d", [0.0] * 100)  -- "d" = double

-- Manager: for complex shared objects (slower, uses proxy objects)
with Manager() as manager:
    shared_dict = manager.dict()
    shared_list = manager.list()

    processes = [
        multiprocessing.Process(target=worker, args=(shared_dict, i))
        for i in range(4)
    ]
    for p in processes: p.start()
    for p in processes: p.join()

    print(dict(shared_dict))

-- Queue for process communication
from multiprocessing import Queue

def producer(queue):
    for i in range(10):
        queue.put(i)
    queue.put(None)  -- sentinel

def consumer(queue):
    while True:
        item = queue.get()
        if item is None:
            break
        print(f"Processing: {item}")

q = Queue()
p1 = multiprocessing.Process(target=producer, args=(q,))
p2 = multiprocessing.Process(target=consumer, args=(q,))
p1.start(); p2.start()
p1.join(); p2.join()

concurrent.futures: Unified High-Level API

concurrent.futures provides a clean interface for both threading and multiprocessing:

python
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed
import requests

urls = ["https://api.example.com/users/" + str(i) for i in range(1, 21)]

-- ThreadPoolExecutor for I/O-bound work
def fetch(url):
    response = requests.get(url)
    return response.json()

with ThreadPoolExecutor(max_workers=10) as executor:
    -- map: simple, preserves order, blocks until all done
    results = list(executor.map(fetch, urls))

    -- submit + as_completed: process results as they arrive
    futures = {executor.submit(fetch, url): url for url in urls}
    for future in as_completed(futures):
        url = futures[future]
        try:
            data = future.result()
            print(f"Got data from {url}")
        except Exception as e:
            print(f"Failed {url}: {e}")

-- ProcessPoolExecutor for CPU-bound work
def cpu_intensive(n):
    return sum(i * i for i in range(n))

numbers = [10_000_000] * 8

with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(cpu_intensive, numbers))
    print(f"Sum of squares: {results}")

Choosing the Right Tool

Workload	Problem	Solution
HTTP requests, DB queries	I/O-bound	`asyncio` or `ThreadPoolExecutor`
File I/O	I/O-bound	`asyncio` (aiofiles) or `ThreadPoolExecutor`
Image processing, ML	CPU-bound	`ProcessPoolExecutor` or `multiprocessing.Pool`
Mixed I/O + CPU	Both	`asyncio` for I/O + `ProcessPoolExecutor` for CPU
Simple parallelism	Either	`concurrent.futures` (unified API)

code
GIL Impact:
  Threading + I/O-bound: GIL released during I/O → effective concurrency ✓
  Threading + CPU-bound: GIL prevents true parallelism → no speedup ✗
  Multiprocessing: no GIL (separate processes) → true parallelism ✓

Common Interview Questions

Q: Why does the GIL exist in Python and can you remove it?

The GIL protects CPython's non-thread-safe reference counting garbage collector. Removing it would require making every reference count update atomic — either with locks (which would be slower for single-threaded code) or with lock-free atomic operations (complex to implement correctly). The GIL was a pragmatic design decision in 1992. Python 3.13 introduced experimental free-threaded mode (--disable-gil) but it is not yet the default.

Q: Does threading improve performance for CPU-bound tasks in Python?

No. Due to the GIL, only one thread runs Python bytecode at a time. For CPU-bound tasks, threading adds overhead (context switching, lock acquisition) without providing parallelism. The result is often slower than single-threaded code. Use multiprocessing or ProcessPoolExecutor for CPU-bound work.

Q: When would you choose asyncio over threading for I/O-bound tasks?

Asyncio handles thousands of concurrent I/O operations on a single thread with much lower overhead than threading (no thread creation, no context switching). It is better for very high concurrency (web servers, WebSocket handlers). Threading is simpler for lower-concurrency cases and works with libraries that are not async-compatible. If you are using an async framework (FastAPI, aiohttp), use asyncio throughout.

Practice Python on Froquiz

Concurrency is a key topic in senior Python developer interviews. Test your Python skills on Froquiz — covering async, OOP, generators, and more.

Summary

The GIL allows only one thread to execute Python bytecode at a time — it protects CPython internals, not your application data
Threading works well for I/O-bound tasks because the GIL is released during I/O waits
Threading does NOT help CPU-bound tasks — use multiprocessing instead
multiprocessing.Pool spawns separate processes — true parallelism, bypasses the GIL
concurrent.futures provides a clean unified API — ThreadPoolExecutor for I/O, ProcessPoolExecutor for CPU
Use asyncio for very high-concurrency I/O (thousands of simultaneous connections)
Processes do not share memory — use Queue, Value, Array, or Manager for communication