Modern Embedded Recipes · 105/152

Compare-And-Swap 패턴 — Stack·Counter·Linked List 적용

2026년 4월 18일 · Hawk · 4분 읽기

#한 줄 요약

“CAS = if (*p == old) *p = new를 원자적으로.” Lock-free의 가장 기본 도구이고, 거의 모든 lock-free 자료구조의 hot path가 CAS loop입니다.

#어떤 상황에서 쓰나

Lock-free counter, stack, queue, hash table을 만들 때 거의 항상 CAS가 hot path에 들어갑니다. Mutex로 보호하면 contention 시 모든 thread가 한 줄로 줄을 서지만, CAS는 경합에 진 thread만 다시 시도합니다.

또 한 가지 흔한 상황은 단순한 원자적 갱신입니다. counter를 증가시키되 일정 max를 넘기지 않는 saturating counter는 fetch_add로는 정확히 표현할 수 없습니다. CAS loop이 깔끔합니다.

#핵심 개념

1
bool compare_exchange_strong(T &expected, T desired);
2
bool compare_exchange_weak  (T &expected, T desired);

1
expected에 현재 *p를 in/out로 받음
2
*p == expected이면 *p = desired, return true
3
아니면 expected = *p, return false (재시도용)

weak는 spurious failure(거짓 실패)가 가능하지만 LL/SC architecture(ARM)에서 더 빠릅니다. loop 안에서는 항상 weak를 씁니다.

1
typical CAS loop
2
do {
3
    T cur = atomic_load(p);
4
    T new_val = compute(cur);
5
} while (!atomic_compare_exchange_weak(p, &cur, new_val));

contention이 크면 backoff(잠시 wait 후 재시도)로 cache line ping-pong을 줄입니다.

#코드 / 실제 사용 예

#기본 CAS loop — saturating counter

1
std::atomic<int> counter{0};
2

3
void inc_saturating(int max) {
4
    int cur, next;
5
    do {
6
        cur = counter.load(std::memory_order_relaxed);
7
        if (cur >= max) return;
8
        next = cur + 1;
9
    } while (!counter.compare_exchange_weak(
10
                cur, next, std::memory_order_release, std::memory_order_relaxed));
11
}

fetch_add로는 max 체크를 atomic하게 못 합니다. CAS loop이 표준 답입니다.

#Lock-free stack push

1
struct node { node *next; int val; };
2
std::atomic<node *> top;
3

4
void push(int v) {
5
    node *n = new node{nullptr, v};
6
    node *cur = top.load(std::memory_order_relaxed);
7
    do {
8
        n->next = cur;
9
    } while (!top.compare_exchange_weak(cur, n,
10
                std::memory_order_release, std::memory_order_relaxed));
11
}

cur가 in/out로 작동하므로 fail 시 자동으로 갱신됩니다. loop 본문이 매우 짧습니다.

#CAS로 mutex try-lock

1
std::atomic<int> locked{0};
2

3
bool try_lock(void) {
4
    int expected = 0;
5
    return locked.compare_exchange_strong(
6
        expected, 1, std::memory_order_acquire);
7
}
8

9
void unlock(void) {
10
    locked.store(0, std::memory_order_release);
11
}

가장 단순한 spinlock입니다. ticket lock이나 MCS lock으로 발전할 수 있습니다.

#Strong vs Weak

1
/* loop 밖 — strong (단순한 의도) */
2
int expected = 0;
3
if (!locked.compare_exchange_strong(expected, 1)) {
4
    /* 누가 이미 잡고 있음 */
5
}
6

7
/* loop 안 — weak (spurious 허용, ARM에서 빠름) */
8
do {
9
    cur = p.load();
10
} while (!p.compare_exchange_weak(cur, cur + 1));

strong은 가짜 실패가 없으니 한 번만 시도할 때 적합하고, weak는 어차피 loop면 더 가볍습니다.

#Exponential backoff

1
void cas_with_backoff(std::atomic<int> *p) {
2
    int delay = 1;
3
    int cur;
4
    do {
5
        cur = p->load();
6
        for (int i = 0; i < delay; i++)
7
            __asm__ volatile("yield" ::: "memory");
8
        if (delay < 1024) delay *= 2;
9
    } while (!p->compare_exchange_weak(cur, cur + 1));
10
}

yield 명령(ARM)이나 pause(x86)로 cache line ping-pong을 줄입니다. contention이 클 때 throughput이 회복됩니다.

#Tagged pointer로 ABA 회피

1
struct tagged_ptr {
2
    node *p;
3
    uint64_t tag;
4
};
5
std::atomic<tagged_ptr> top;     /* DCAS 또는 __int128 atomic 필요 */
6

7
void push(int v) {
8
    node *n = new node{nullptr, v};
9
    tagged_ptr old, neu;
10
    do {
11
        old = top.load();
12
        n->next = old.p;
13
        neu = { n, old.tag + 1 };   /* tag 증가 */
14
    } while (!top.compare_exchange_weak(old, neu));
15
}

ABA를 회피하려면 pointer만으로는 부족하고 tag(혹은 version)을 같이 비교해야 합니다.

#compare-exchange + sequence number (RingBuffer)

1
struct slot {
2
    std::atomic<uint64_t> seq;
3
    T data;
4
};
5

6
bool try_enqueue(T v) {
7
    uint64_t pos = enq_pos.load();
8
    slot &s = buf[pos & mask];
9
    uint64_t seq = s.seq.load(std::memory_order_acquire);
10
    intptr_t diff = (intptr_t)seq - (intptr_t)pos;
11

12
    if (diff == 0) {
13
        if (enq_pos.compare_exchange_weak(pos, pos + 1)) {
14
            s.data = v;
15
            s.seq.store(pos + 1, std::memory_order_release);
16
            return true;
17
        }
18
    } else if (diff < 0) return false;   /* full */
19
    return false;
20
}

Vyukov MPMC queue의 핵심 패턴입니다. CAS로 enqueue 위치를 예약하고 sequence number로 안전한 publish를 합니다.

#측정 / 성능 비교

1
연산                        시간 (Cortex-A72, no contention)
2
atomic load (relaxed)       2 cycle
3
atomic store (release)      4 cycle
4
CAS (uncontended)          ~6 cycle
5
CAS (contended, 2 thread)  ~80 cycle (cache line ping)
6
CAS (contended, 8 thread)  >300 cycle (강한 contention)

contention이 커질수록 CAS는 급격히 비싸집니다. backoff와 sharding이 필수입니다.

1
backoff 효과 (8 thread, 1M iteration)
2
backoff 없음             8.2 s
3
linear backoff           4.1 s
4
exponential backoff      2.3 s

exponential backoff가 throughput을 두 배 이상 회복합니다.

#자주 보는 함정

Strong을 loop 안에

1
do {
2
    cur = p.load();
3
} while (!p.compare_exchange_strong(cur, cur + 1));   /* weak가 더 빠름 */

loop 안에서는 항상 weak가 더 빠릅니다. ARM에서 결정적으로 차이 납니다.

CAS 결과 무시

1
p.compare_exchange_weak(cur, new);    /* return 무시 — fail 처리 안 함 */

CAS는 fail 가능합니다. 반드시 retry 또는 fail path를 정의합니다.

Backoff 없이 hot CAS

1
while (!p.compare_exchange_weak(cur, cur + 1));   /* contention 시 burn */

10개 thread가 backoff 없이 같은 cache line을 두고 싸우면 throughput이 1/10 이하로 떨어집니다.

Memory order 누락

1
p.compare_exchange_weak(cur, new, std::memory_order_relaxed);   /* publish 안 됨 */

CAS는 보통 acquire 또는 release semantic이 필요합니다. relaxed로 두면 다른 thread가 보는 순서가 깨집니다.

ABA 무시

1
/* pointer만 CAS — ABA 발생 가능 */
2
do { cur = p.load(); } while (!p.compare_exchange_weak(cur, cur->next));

A → B → A 사이에 free + 재할당이 일어났다면 CAS는 성공하지만 의미가 다릅니다. tagged pointer나 hazard pointer가 필요합니다.

#정리

CAS는 lock-free의 기본 도구입니다.
loop 안에서는 weak, 단일 시도는 strong을 씁니다.
contention이 크면 exponential backoff가 필수입니다.
ABA는 CAS만으로 해결되지 않습니다. tagged pointer나 hazard pointer가 필요합니다.
Memory order는 일반적으로 publish에 release, consume에 acquire가 표준입니다.
sharding(per-CPU counter 등)으로 contention 자체를 줄이는 것이 가장 좋습니다.
CAS 결과는 반드시 처리합니다. fail path가 명확해야 합니다.

다음 편은 Atomic operation 비용입니다. memory order별 ARM 명령어 차이를 다룹니다.

#관련 항목

Modern Embedded Recipes · 106 of 152

Atomic Operation 비용 분석 — Fence·Cache Line·Contention

memory_order별 ARM 명령어 차이, LSE vs LL/SC, hot spinning 회피까지 atomic 연산의 실측 비용을 정리합니다.

2026년 4월 18일·recipes

MPMC Queue 구현 — Multi-producer Multi-consumer Lock-Free

MPMC와 SPSC 차이, Vyukov 큐, Disruptor의 ring과 sequence, bounded와 unbounded 비교를 실측과 함께 정리합니다.

2026년 4월 18일·recipes

False Sharing 해결 — Cache Line Padding·SoA 적용

False sharing의 원리와 영향, perf c2c 감지, alignas(64) padding, per-CPU 변수, thread-local까지 해결 전략을 정리합니다.

2026년 4월 18일·recipes

Compare-And-Swap 패턴 — Stack·Counter·Linked List 적용

#한 줄 요약

#어떤 상황에서 쓰나

#핵심 개념

#코드 / 실제 사용 예

#기본 CAS loop — saturating counter

#Lock-free stack push

#CAS로 mutex try-lock

#Strong vs Weak

#Exponential backoff

#Tagged pointer로 ABA 회피

#compare-exchange + sequence number (RingBuffer)

#측정 / 성능 비교

#자주 보는 함정

#정리

#관련 항목

Modern Embedded Recipes · 106 of 152

관련 글

Atomic Operation 비용 분석 — Fence·Cache Line·Contention

MPMC Queue 구현 — Multi-producer Multi-consumer Lock-Free

False Sharing 해결 — Cache Line Padding·SoA 적용

이 글을 참조하는 글 (5)