Where do deadlocks come from?

Deadlocks are one of the most annoying problems that can occur in multi-threaded programming with parts of a program simply stopping work and waiting indefinitely.

At their simplest, deadlocks are caused when one thread of execution, "Thread A", obtains and holds a lock which another thread of execution, "Thread B", requires whilst itself being blocked from obtaining a lock that it requires because "Thread B" already holds it..
Locks are used in multi-threaded programs to protect one area of code from access by multiple threads at the same time. Locks act as a fence around a piece of code with a gate to enter and a gate to leave. Only one thread can be inside the protected area of code at a time. There are many reasons to place locks in code but generally they are used because you need update some state that can be shared between multiple threads.

It's common to talk about code being "thread safe". This generally means that multiple threads can access the code at the same time and the code still operates correctly. Often, but not always, this involves locks. For example, a simple stack class allows you to push items onto it and pop items off of it. If the stack was not "thread safe" then if multiple threads were accessing the stack at the same time you could end up with corrupted data; two threads may think they have both pushed a new item onto the stack but because the code is not "thread safe" one of the items may get lost or the stack itself may become corrupted. Often such a data structure may be made "thread safe" by adding locks to it so that inside the push method a lock is taken out to protect the stack's internal data structure. When two threads try and push items onto the stack at the same time one will succeed in obtaining the lock and one will wait until the other thread has finished and released the lock. This prevents the data structure from becoming corrupted. Assuming all methods use the lock to protect the data structure in a similar way the stack may be declared "thread safe". One thing to realise is that the stack now has a lock inside of it.
Thread Safe Stack.png Object oriented encapsulation means that it's common for thread safe objects to contain locks which the user of the object need not care about. The object presents a thread safe interface and it's up to the object how it achieves that. This is a good thing from a complexity point of view as the object is easier to use, but it could potentially be a bad thing from a deadlock perspective as the programmer may now be using locks without realising it.

As stated above, the simplest cause of deadlocks is when two locks are obtained in one order from one thread and in another order from another thread. The first thread successfully obtains lock 1 and waits to obtain lock 2. The second thread has successfully obtained lock 2 and waits to obtain lock 1. Neither thread can continue without the other thread releasing a lock.

The lock inside a simple stack object is unlikely to cause deadlocks as its only held for the duration of the calls to push and pop, etc. However, if the stack were to provide a means of iterating its contents and calling user supplied code for each item then it may be more prone to deadlocks. If we assume that the iteration needs to hold the stack's lock and the code that the programmer's code that the iteration calls also obtains a lock on another data structure then we have two locks being held for part of the iteration loop; the stacks lock and then the programmers lock. If the programmer also holds his lock whilst pushing or popping items from the stack then we have a lock inversion. The programmer is locking his lock and then the stack's lock, the stack's iteration is locking the stacks lock and then the programmer's lock. Given the right thread scheduling situation two threads in this code could deadlock.
Thread Safe Stack Deadlock.png
The original programmer might realise that this is possible and carefully work around the issue, always ensuring that he holds his own lock before ever calling the stack's iteration code, perhaps, but during maintenance it's unlikely that this rigour will be maintained. Once a lock inversion has been added to a code base a deadlock is possible. The likelihood of a deadlock occurring depends on how often each piece of code is executed, on how many threads and what hardware the program is running on. It's common for lock inversions to lurk in code and never show up in testing because the conditions aren't quite right. This can be especially bad if the potential deadlock only executes rarely as it may lurk in the code for a long period of time and never occur.

Often the first you know of a deadlock is when a customer complains that your program has "hung". Finding and fixing the problem is often made more complex by the fact that the lock inversion responsible may have been present in the code for a long time. It may not be part of recent changes, but recent changes may have subtly changed the thread scheduling within the program and now the deadlock bites where before it was dormant.

I had problems like this in code that I'd developed for a client and I wasted hours debugging, chasing through recent changes because it seemed "obvious" that only the latest changes should be responsible for the new bug. In the end I developed a Lock Inversion Analyser which could look at the program and pin point lock inversions for me, providing the exact call stacks that could cause a deadlock. This tool has saved me hours of debugging time. I then realised that a faster, more streamlined tool could be run as part of my continuous integration build and so I developed Lock Inversion Detector. With my Lock Inversion Detector running as part of my CI build I get to find out as soon as I put lock inversions into my code. I can then run the Lock Inversion Analyser to pinpoint the problems and fix the bugs before they reach my clients.

Leave a comment

About this Entry

Logo competion underway at 99designs.com was the previous entry in this blog.

Finding lock inversions before they become deadlocks is the next entry in this blog.

This is where we write about the development of Lock Explorer, a suite of tools for locating lock inversions, finding deadlocks before they happen and investigating lock usage, contention and performance in multi-threaded code.

Find recent content on the main index or look in the archives to find all content.