Home About Me

Understanding Java Locks: synchronized, CAS, AQS, and the Concurrency Tools Built on Top

synchronized

The synchronized keyword exists to solve one basic problem in multithreaded code: coordinating access to shared resources. When a method or block is marked with synchronized, Java guarantees that only one thread can execute that protected section at a time.

In early Java versions, synchronized had a reputation for being a heavyweight lock with poor performance.

Why was it considered expensive?

Because monitor locks relied on the operating system’s Mutex Lock. Java threads map to native OS threads, so suspending or waking a thread required help from the operating system. That process involves switching between user mode and kernel mode, and those context switches are relatively costly.

Things improved significantly after Java 6. The JVM introduced a series of optimizations for synchronized, including spin locks, adaptive spinning, lock elimination, lock coarsening, biased locking, and lightweight locking. These changes reduced the overhead of locking enough that synchronized is now used extensively in both JDK source code and many open-source frameworks.

How synchronized is used

There are three common forms.

1. Synchronizing an instance method

When synchronized is applied to an instance method, the lock belongs to the current object instance. A thread must acquire that instance’s monitor before entering the method.

2. Synchronizing a static method

When applied to a static method, the lock belongs to the Class object for that class.

Static members are not tied to any single instance. No matter how many objects are created, the static member exists only once at the class level. That means a thread calling a non-static synchronized method and another thread calling a static synchronized method on the same class do not block each other. They are using different locks:

  • instance method: the current object’s lock
  • static method: the class lock

3. Synchronizing a code block

A synchronized block lets you choose the lock explicitly.

  • synchronized(this|object) means the thread must acquire the lock of the specified object before entering the block.
  • synchronized(ClassName.class) means the thread must acquire the lock of the class object.

A few practical takeaways

  • synchronized on a static method and synchronized(SomeClass.class) both lock the class.
  • synchronized on an instance method locks the object instance.
  • Avoid synchronized(String a) whenever possible, because the JVM’s string constant pool caches strings. Reusing pooled string objects as locks can create accidental lock sharing.

Double-checked locking and why volatile matters

A classic interview question is the singleton implemented with double-checked locking.

For that pattern, declaring uniqueInstance as volatile is essential.

The statement:

uniqueInstance = new Singleton();

is not a single indivisible action. It can be broken into three steps:

  1. Allocate memory for uniqueInstance
  2. Initialize uniqueInstance
  3. Point uniqueInstance to the allocated memory

Because the JVM may reorder instructions, the actual execution order can become 1 -> 3 -> 2.

In a single-threaded environment that reordering is harmless, but under concurrency it can break the singleton. Imagine thread T1 performs steps 1 and 3 but has not completed step 2 yet. Thread T2 calls getUniqueInstance(), sees that uniqueInstance is not null, and returns it. What T2 gets is a reference to an object that has not been fully initialized.

Using volatile prevents this kind of instruction reordering, so the pattern works correctly in a multithreaded context.

What happens underneath synchronized

The implementation of synchronized lives at the JVM level.

Synchronized code blocks

If you inspect bytecode with javap, you will find that a synchronized block is implemented with the monitorenter and monitorexit instructions.

Bytecode view of synchronized block

  • monitorenter marks the beginning of the synchronized section
  • monitorexit marks the end

When executing monitorenter, the thread attempts to acquire the object’s monitor.

In HotSpot, that monitor is implemented in C++ as ObjectMonitor. Every Java object is associated with a monitor structure.

This is also why wait() and notify() depend on monitors. They can only be called inside synchronized methods or synchronized blocks; otherwise Java throws java.lang.IllegalMonitorStateException.

When a thread enters monitorenter, it tries to acquire the lock. If the lock counter is 0, the lock is free, so the thread acquires it and the counter becomes 1. When monitorexit runs, the counter goes back to 0, indicating that the lock has been released.

If the lock acquisition fails, the current thread blocks until another thread releases the lock.

Synchronized methods

For synchronized methods, the bytecode looks different.

Bytecode view of synchronized method

There is no explicit monitorenter or monitorexit instruction in the method body. Instead, the method is marked with the ACC_SYNCHRONIZED flag. The JVM checks that access flag to determine that the method must be invoked with synchronization semantics.

Pessimistic locking vs optimistic locking

These two ideas show up everywhere, not only in Java.

Pessimistic locking

Pessimistic locking assumes the worst: someone else is likely to modify the data. So every access that might conflict is protected by a lock in advance.

In databases, common examples are row locks, table locks, read locks, and write locks. In Java, exclusive locks such as synchronized and ReentrantLock reflect the same pessimistic approach.

Optimistic locking

Optimistic locking assumes conflicts are rare. It does not lock first. Instead, it checks during update whether somebody else changed the data in the meantime.

Typical implementations use:

  • a version number
  • CAS (Compare And Swap)

This approach is usually better for read-heavy workloads because it avoids the cost of blocking and waking threads.

Which one fits which scenario?

Neither is universally better.

  • Optimistic locking works well when writes are rare and conflicts are uncommon.
  • Pessimistic locking is often better when writes are frequent and contention is high.

If many threads keep colliding, optimistic retries can become a performance problem. In that kind of write-heavy workload, a pessimistic lock is often the more sensible choice.

Optimistic locking in practice

Version number mechanism

A common design is to add a version field to the table row. Every update increments it.

Suppose an account record has:

  • version = 1
  • balance = $100

Then the sequence might look like this:

  1. Operator A reads the row with version = 1 and plans to deduct $50.
  2. Before A finishes, operator B also reads the row with version = 1 and plans to deduct $20.
  3. A submits the update: balance = $50, version = 1. Since the submitted version matches the current database version, the update succeeds and the row’s version becomes 2.
  4. B then submits balance = $80, version = 1. But the database row is now at version 2, so the update is rejected.

This prevents B from overwriting A’s result with stale data.

CAS algorithm

CAS means compare and swap. It is a well-known lock-free technique.

Lock-free programming tries to coordinate threads without using traditional blocking locks. No thread has to be suspended for others to proceed, so it is also called non-blocking synchronization.

CAS works with three operands:

  • V: the memory location to read and write
  • A: the expected old value
  • B: the new value to write

The update succeeds only if V == A. If so, V is atomically changed to B. Otherwise, nothing happens.

In practice, CAS usually appears inside a spin loop: keep retrying until the update succeeds.

Limitations of CAS

ABA problem

Suppose a thread reads a value A, and later checks again and still sees A. Can it conclude that nothing changed? Not necessarily.

The value may have changed from A to B and then back to A. CAS would see A and incorrectly assume the variable was untouched. This is the classic ABA problem.

Since JDK 1.5, AtomicStampedReference has been able to address this by attaching a stamp or version alongside the reference. Its compareAndSet checks both the expected reference and the expected stamp.

High CPU cost under repeated failure

Spin-based CAS keeps retrying until it wins. If success takes a long time, CPU usage can become very high.

If the JVM can take advantage of the processor’s pause instruction, performance may improve. That instruction can reduce pipeline pressure and also help avoid pipeline flushes caused by memory-order conflicts.

It only works directly on a single shared variable

CAS is naturally suited to a single memory location. If an operation spans multiple shared variables, CAS alone is not enough.

Since JDK 1.5, AtomicReference has made it possible to wrap multiple values inside one object and update the object reference atomically, but that is still a workaround. In many cases, using locks is simpler.

When to use CAS and when to use synchronized

A useful rule of thumb is:

  • CAS is a good fit for read-heavy, low-conflict scenarios
  • synchronized is usually better for write-heavy, high-conflict scenarios

More specifically:

  1. When contention is light, using synchronized may waste CPU on blocking, waking, and context switching. CAS can perform better because it relies on hardware-level atomic operations and usually avoids kernel transitions.
  2. When contention is heavy, CAS may spend a lot of time spinning and burning CPU. In that case its efficiency can drop below synchronized.

It is also worth updating the old stereotype that synchronized is always too heavy. Since Java 6, with biased locking, lightweight locking, and other JVM optimizations, that is no longer generally true. Modern synchronized implementations rely heavily on lock-free queues and a strategy that can be summarized as spin first, block later. That sacrifices some fairness, but improves throughput. Under low contention, performance can be close to CAS; under heavy contention, it can be much better.

Deadlock

A deadlock happens when multiple threads are blocked, each waiting for resources held by the others, so none of them can continue.

A classic case is:

  • thread A holds resource 1 and wants resource 2
  • thread B holds resource 2 and wants resource 1

Now both wait forever.

Deadlock illustration

In a typical demo, thread A enters synchronized(resource1), then sleeps for one second so thread B can enter synchronized(resource2). After waking up, each tries to lock the other resource and both get stuck.

Operating systems describe four necessary conditions for deadlock:

  1. Mutual exclusion: a resource can be held by only one thread at a time.
  2. Hold and wait: a thread holding one resource waits for another.
  3. No preemption: a resource cannot be forcibly taken away.
  4. Circular wait: a cycle of waiting threads exists.

How to avoid deadlock

To prevent deadlock, break at least one of those conditions.

  • Mutual exclusion cannot really be removed when the resource itself requires exclusive access.
  • Hold and wait can be broken by requesting all needed resources at once.
  • No preemption can be weakened if a thread releases resources voluntarily when it cannot obtain the next one.
  • Circular wait can be prevented by acquiring resources in a fixed global order and releasing them in reverse order.

AQS: the foundation under many Java synchronizers

AQS stands for AbstractQueuedSynchronizer. It is the foundation used to build locks and other synchronization utilities such as:

  • ReentrantLock
  • ReentrantReadWriteLock
  • Semaphore

It is one of the core building blocks of java.util.concurrent.

The central idea is simple:

  • if the shared resource is free, the requesting thread acquires it and becomes the active worker
  • if the resource is already taken, AQS uses a queueing and wake-up mechanism to manage waiting threads

That waiting mechanism is based on a CLH queue lock.

A CLH queue is a virtual bidirectional queue: the queue is represented by linked nodes rather than a standalone queue object. AQS wraps each waiting thread in a Node and links those nodes into the queue.

AQS structure diagram

AQS maintains:

  • an int state value to represent synchronization state
  • an internal FIFO queue for threads waiting to acquire the resource

It updates the state atomically using CAS.

AQS state management

The synchronization state is stored in an int field named state.

  • state > 0 typically means the lock has been acquired
  • state = 0 means it is free

AQS exposes three key methods for manipulating that state safely:

  • getState()
  • setState(int newState)
  • compareAndSetState(int expect, int update)

Subclasses usually extend AQS and implement the logic that interprets and updates this state.

The synchronization queue

When a thread fails to acquire the synchronization state, AQS packages the thread and its wait status into a node, appends it to the FIFO queue, and blocks the thread.

When the state is released, AQS wakes the appropriate waiting thread so it can try again.

The implementation relies on a volatile int state plus the waiting queue. Access to the state still revolves around the same three methods:

  • getState()
  • setState(int newState)
  • compareAndSetState(int expect, int update)

Important AQS methods

AQS provides a set of core methods that synchronizers build on top of.

State operations:

  • getState()
  • setState(int newState)
  • compareAndSetState(int expect, int update)

Methods typically overridden by subclasses:

  • tryAcquire(int arg): exclusive acquisition
  • tryRelease(int arg): exclusive release
  • tryAcquireShared(int arg): shared acquisition
  • tryReleaseShared(int arg): shared release
  • isHeldExclusively(): whether the current synchronizer is held exclusively

Template methods provided by AQS:

  • acquire(int arg)
  • acquireInterruptibly(int arg)
  • tryAcquireNanos(int arg, long nanos)
  • acquireShared(int arg)
  • acquireSharedInterruptibly(int arg)
  • tryAcquireSharedNanos(int arg, long nanosTimeout)
  • release(int arg)
  • releaseShared(int arg)

These methods roughly fall into three groups:

  1. exclusive acquisition and release
  2. shared acquisition and release
  3. queue inspection and waiting-thread management

Exclusive vs shared resource access in AQS

AQS supports two basic resource-sharing modes.

1. Exclusive

Only one thread can hold the resource at a time.

ReentrantLock is the standard example. It supports both fair and non-fair locking.

  • Fair lock: threads acquire the lock in queue order
  • Non-fair lock: threads may attempt to grab the lock immediately instead of waiting their turn

ReentrantLock uses non-fair mode by default because it usually gives better throughput.

The main differences are:

  1. In non-fair mode, a thread first attempts to grab the lock immediately with CAS.
  2. If that fails and it later reaches tryAcquire, a non-fair lock may still seize the lock as soon as it becomes available, while a fair lock first checks whether other threads are already waiting.

If both immediate CAS attempts fail, both fair and non-fair modes eventually behave similarly: the thread enters the wait queue.

Non-fair locks generally perform better, but fairness becomes less predictable and queued threads may starve.

2. Shared

Multiple threads may proceed at the same time.

Examples include:

  • Semaphore
  • CountDownLatch

ReentrantReadWriteLock is a mixed case. It supports multiple concurrent readers while still allowing exclusive writes.

When building a custom synchronizer on top of AQS, the implementer mainly needs to define how state is acquired and released. Queue management, failed acquisition, parking, and wake-up logic are already handled by AQS itself.

AQS and the template method pattern

AQS is a classic example of the template method pattern.

The usual approach is:

  1. Create a subclass of AbstractQueuedSynchronizer and override specific methods.
  2. Compose that subclass into a custom synchronization component and invoke AQS’s template methods.

The methods you override define how the resource state is acquired and released. The rest of the synchronization machinery stays in AQS.

This differs from simply implementing an interface. The overall process is fixed by the base class, while subclasses customize selected steps.

A simple analogy would be travel steps like:

buyTicket() -> securityCheck() -> ride() -> arrive()

Most steps remain the same whether you travel by plane or train. Only ride() changes. That is the essence of the template method pattern.

By default, AQS’s override points throw UnsupportedOperationException. Their implementations must be thread-safe and usually short and non-blocking. Most of the higher-level AQS methods are final, so the extension points are deliberately narrow.

ReentrantLock as an AQS example

For ReentrantLock, state = 0 means unlocked.

When thread A calls lock(), it eventually succeeds in tryAcquire() and increments state. Other threads fail to acquire until A calls unlock() enough times to bring state back to 0.

Because the same thread can acquire the lock more than once, state may increase multiple times. That is reentrancy. But every acquisition must be matched by a release, or the lock never truly becomes free.

CountDownLatch as an AQS example

For CountDownLatch, state is initialized to N, the number of tasks to wait for.

Each worker finishes and calls countDown(), which reduces state by one using CAS. When state finally reaches 0, the waiting thread is unparked and await() returns.

Most custom synchronizers use either:

  • exclusive mode: tryAcquire / tryRelease
  • shared mode: tryAcquireShared / tryReleaseShared

But AQS can support both modes in one synchronizer, as seen in ReentrantReadWriteLock.

Semaphore: allowing multiple threads in at once

Unlike synchronized and ReentrantLock, which allow only one thread at a time to access a resource, Semaphore can permit several threads to proceed concurrently.

Conceptually, Semaphore maintains a number of permits.

  • acquire() blocks until a permit becomes available, then takes one
  • release() returns a permit, potentially waking a blocked thread

There is no actual permit object; the semaphore only tracks the permit count.

You can also acquire or release multiple permits at once, though that is less common.

Another commonly used method is tryAcquire(), which returns false immediately if no permit is available.

Semaphore supports both fair and non-fair modes:

  • Fair mode: permits are granted in FIFO order
  • Non-fair mode: threads may barge in and compete aggressively

Its constructors require the number of permits, and one constructor also accepts a fairness flag. The default is non-fair mode.

Like CountDownLatch, Semaphore is implemented as a shared lock on top of AQS. Its state starts as permits. If more threads arrive than the permit count allows, the extra threads are parked and repeatedly check whether state > 0. When a running thread calls release(), state increases by one, allowing one of the waiting threads to succeed.

That is how a semaphore limits the number of concurrently executing threads.

CountDownLatch: a countdown gate

CountDownLatch lets one or more threads wait until a set of other operations has completed.

Internally, it is also a shared-lock style AQS implementation. Its state is initialized to count.

When a thread calls countDown(), tryReleaseShared reduces state through CAS. When another thread calls await(), it blocks as long as state != 0. Only when state reaches 0 are the waiting threads released.

Two classic uses of CountDownLatch

Waiting for several threads to finish before continuing

Initialize the latch with n:

new CountDownLatch(n)

Each worker thread calls countDown() when it finishes. The waiting thread calls await() and resumes only when the count reaches zero.

A typical example is service startup: the main thread waits for several components to load before moving on.

Making multiple threads start at the same time

To maximize parallel start timing, initialize the latch with 1:

new CountDownLatch(1)

All worker threads call await() first and remain blocked. When the main thread calls countDown(), the count becomes zero and they all begin together. This is like a starting gun in a race.

How the interaction works

If there are, say, 550 requests to process, the main thread can wait until all 550 have completed and only then continue with something like:

System.out.println("finish");

The typical pattern is:

  • the main thread starts worker threads
  • the main thread immediately calls await()
  • each worker holds a reference to the latch and calls countDown() when done
  • once all workers have counted down, the main thread resumes

A common trap

Improper use of await() can easily cause deadlock-like behavior.

If the code is written in a way that prevents the count from ever reaching zero, the waiting thread will block forever. This kind of bug is especially easy to introduce in loops where some branch skips the required countDown() call.

That makes CountDownLatch simple in concept, but unforgiving if the countdown logic is incomplete.