Python Concurrency: Threading, Multiprocessing and the GIL Explained
Python concurrency confuses many developers because of the Global Interpreter Lock (GIL). Understanding why the GIL exists, what it prevents, and how to work around it is essential for writing performant Python and for clearing technical interviews.
The Global Interpreter Lock (GIL)
The GIL is a mutex in CPython (the standard Python interpreter) that allows only one thread to execute Python bytecode at a time, even on multi-core machines.
Why does it exist? CPython's memory management (reference counting) is not thread-safe. The GIL prevents two threads from simultaneously modifying the same object's reference count, which would cause memory corruption.
What it means in practice:
pythonimport threading import time counter = 0 def increment(n): global counter for _ in range(n): counter += 1 -- NOT thread-safe! counter += 1 is read-modify-write threads = [threading.Thread(target=increment, args=(100_000,)) for _ in range(4)] for t in threads: t.start() for t in threads: t.join() -- Expected: 400_000 -- Actual: some number less than 400_000 (race condition) -- The GIL does NOT protect individual operations within Python code -- It protects the interpreter's internal state, not your application logic
The GIL releases periodically (every 100 bytecode instructions by default, or on I/O operations), allowing other threads to run. This is why threading works for I/O-bound tasks β threads spend most of their time waiting for I/O, so the GIL is released frequently.
Threading: For I/O-Bound Tasks
Threading is effective when your program spends time waiting β for network responses, file reads, database queries:
pythonimport threading import requests import time urls = [ "https://httpbin.org/delay/1", "https://httpbin.org/delay/1", "https://httpbin.org/delay/1", "https://httpbin.org/delay/1", ] -- Sequential: ~4 seconds def fetch_sequential(urls): return [requests.get(url).json() for url in urls] -- Threaded: ~1 second (all requests run concurrently) def fetch_threaded(urls): results = [None] * len(urls) def fetch(url, index): results[index] = requests.get(url).json() threads = [ threading.Thread(target=fetch, args=(url, i)) for i, url in enumerate(urls) ] for t in threads: t.start() for t in threads: t.join() return results start = time.time() fetch_threaded(urls) print(f"Threaded: {time.time() - start:.1f}s") -- ~1.0s
Thread Synchronization
pythonimport threading -- Lock: mutual exclusion lock = threading.Lock() counter = 0 def safe_increment(): global counter with lock: -- only one thread at a time counter += 1 -- RLock: reentrant lock (same thread can acquire multiple times) rlock = threading.RLock() -- Semaphore: limit concurrent access semaphore = threading.Semaphore(5) -- max 5 concurrent database connections def query_database(): with semaphore: return db.execute("SELECT ...") -- Event: signal between threads ready = threading.Event() def producer(): time.sleep(2) ready.set() -- signal that data is ready def consumer(): ready.wait() -- blocks until set() print("Data is ready, consuming...") -- Queue: thread-safe communication from queue import Queue task_queue = Queue() def worker(): while True: task = task_queue.get() if task is None: break process(task) task_queue.task_done() threads = [threading.Thread(target=worker) for _ in range(4)] for t in threads: t.start() for task in tasks: task_queue.put(task) task_queue.join() -- wait for all tasks to complete for _ in threads: task_queue.put(None) -- signal workers to stop
Multiprocessing: For CPU-Bound Tasks
Multiprocessing bypasses the GIL by using separate processes β each with its own Python interpreter and memory space:
pythonimport multiprocessing import time import math def is_prime(n): if n < 2: return False if n == 2: return True if n % 2 == 0: return False for i in range(3, int(math.sqrt(n)) + 1, 2): if n % i == 0: return False return True numbers = list(range(1_000_000, 1_001_000)) -- Sequential: uses 1 CPU core def count_primes_sequential(numbers): return sum(1 for n in numbers if is_prime(n)) -- Multiprocessing: uses all CPU cores def count_primes_parallel(numbers): with multiprocessing.Pool() as pool: -- default: one process per CPU core results = pool.map(is_prime, numbers) return sum(results) start = time.time() count_primes_sequential(numbers) print(f"Sequential: {time.time() - start:.2f}s") -- ~2.0s start = time.time() count_primes_parallel(numbers) print(f"Parallel: {time.time() - start:.2f}s") -- ~0.5s on 4-core machine -- Pool methods with multiprocessing.Pool(processes=4) as pool: -- map: apply function to each item, returns list results = pool.map(process_item, items) -- imap: lazy version of map, memory-efficient for large datasets for result in pool.imap(process_item, items, chunksize=100): save_result(result) -- starmap: for functions with multiple arguments results = pool.starmap(add, [(1, 2), (3, 4), (5, 6)]) -- apply_async: non-blocking, returns AsyncResult future = pool.apply_async(heavy_computation, args=(data,)) result = future.get(timeout=30)
Sharing State Between Processes
Processes do not share memory β use explicit mechanisms:
pythonfrom multiprocessing import Value, Array, Manager -- Shared primitive value counter = Value("i", 0) -- "i" = integer def increment(counter): with counter.get_lock(): counter.value += 1 -- Shared array shared_array = Array("d", [0.0] * 100) -- "d" = double -- Manager: for complex shared objects (slower, uses proxy objects) with Manager() as manager: shared_dict = manager.dict() shared_list = manager.list() processes = [ multiprocessing.Process(target=worker, args=(shared_dict, i)) for i in range(4) ] for p in processes: p.start() for p in processes: p.join() print(dict(shared_dict)) -- Queue for process communication from multiprocessing import Queue def producer(queue): for i in range(10): queue.put(i) queue.put(None) -- sentinel def consumer(queue): while True: item = queue.get() if item is None: break print(f"Processing: {item}") q = Queue() p1 = multiprocessing.Process(target=producer, args=(q,)) p2 = multiprocessing.Process(target=consumer, args=(q,)) p1.start(); p2.start() p1.join(); p2.join()
concurrent.futures: Unified High-Level API
concurrent.futures provides a clean interface for both threading and multiprocessing:
pythonfrom concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, as_completed import requests urls = ["https://api.example.com/users/" + str(i) for i in range(1, 21)] -- ThreadPoolExecutor for I/O-bound work def fetch(url): response = requests.get(url) return response.json() with ThreadPoolExecutor(max_workers=10) as executor: -- map: simple, preserves order, blocks until all done results = list(executor.map(fetch, urls)) -- submit + as_completed: process results as they arrive futures = {executor.submit(fetch, url): url for url in urls} for future in as_completed(futures): url = futures[future] try: data = future.result() print(f"Got data from {url}") except Exception as e: print(f"Failed {url}: {e}") -- ProcessPoolExecutor for CPU-bound work def cpu_intensive(n): return sum(i * i for i in range(n)) numbers = [10_000_000] * 8 with ProcessPoolExecutor(max_workers=4) as executor: results = list(executor.map(cpu_intensive, numbers)) print(f"Sum of squares: {results}")
Choosing the Right Tool
| Workload | Problem | Solution |
|---|---|---|
| HTTP requests, DB queries | I/O-bound | asyncio or ThreadPoolExecutor |
| File I/O | I/O-bound | asyncio (aiofiles) or ThreadPoolExecutor |
| Image processing, ML | CPU-bound | ProcessPoolExecutor or multiprocessing.Pool |
| Mixed I/O + CPU | Both | asyncio for I/O + ProcessPoolExecutor for CPU |
| Simple parallelism | Either | concurrent.futures (unified API) |
codeGIL Impact: Threading + I/O-bound: GIL released during I/O β effective concurrency β Threading + CPU-bound: GIL prevents true parallelism β no speedup β Multiprocessing: no GIL (separate processes) β true parallelism β
Common Interview Questions
Q: Why does the GIL exist in Python and can you remove it?
The GIL protects CPython's non-thread-safe reference counting garbage collector. Removing it would require making every reference count update atomic β either with locks (which would be slower for single-threaded code) or with lock-free atomic operations (complex to implement correctly). The GIL was a pragmatic design decision in 1992. Python 3.13 introduced experimental free-threaded mode (--disable-gil) but it is not yet the default.
Q: Does threading improve performance for CPU-bound tasks in Python?
No. Due to the GIL, only one thread runs Python bytecode at a time. For CPU-bound tasks, threading adds overhead (context switching, lock acquisition) without providing parallelism. The result is often slower than single-threaded code. Use multiprocessing or ProcessPoolExecutor for CPU-bound work.
Q: When would you choose asyncio over threading for I/O-bound tasks?
Asyncio handles thousands of concurrent I/O operations on a single thread with much lower overhead than threading (no thread creation, no context switching). It is better for very high concurrency (web servers, WebSocket handlers). Threading is simpler for lower-concurrency cases and works with libraries that are not async-compatible. If you are using an async framework (FastAPI, aiohttp), use asyncio throughout.
Practice Python on Froquiz
Concurrency is a key topic in senior Python developer interviews. Test your Python skills on Froquiz β covering async, OOP, generators, and more.
Summary
- The GIL allows only one thread to execute Python bytecode at a time β it protects CPython internals, not your application data
- Threading works well for I/O-bound tasks because the GIL is released during I/O waits
- Threading does NOT help CPU-bound tasks β use multiprocessing instead
multiprocessing.Poolspawns separate processes β true parallelism, bypasses the GILconcurrent.futuresprovides a clean unified API βThreadPoolExecutorfor I/O,ProcessPoolExecutorfor CPU- Use
asynciofor very high-concurrency I/O (thousands of simultaneous connections) - Processes do not share memory β use
Queue,Value,Array, orManagerfor communication