bjmayor的又一个博客·如何正确使用缓存：常见陷阱与最佳实践

前言
一、常见的缓存误用模式
二、正确的缓存策略
三、性能基准测试
四、缓存穿透与雪崩
- 4.1 缓存穿透
- 4.2 缓存雪崩
五、代码实现演示
六、最佳实践总结
七、总结

前言

缓存（Redis/Memcache）是提升系统性能的利器，但错误的使用方式不仅无法带来性能提升，反而会引入更多问题。本文通过真实案例和性能基准测试，帮助初级开发者理解缓存使用的常见误区和正确姿势。

一、常见的缓存误用模式

1.1 错误案例：直接缓存分页结果

很多新手开发者会直接缓存分页查询的结果，例如：

// ❌ 错误做法
func GetUserFollowers(userID int64, page, pageSize int) ([]User, error) {
    cacheKey := fmt.Sprintf("followers:%d:%d:%d", userID, page, pageSize)
    
    // 尝试从缓存获取
    if cached := cache.Get(cacheKey); cached != nil {
        return cached, nil
    }
    
    // 缓存未命中，查询数据库
    followers := db.Query("SELECT * FROM users WHERE ... LIMIT ? OFFSET ?", 
                          pageSize, page*pageSize)
    
    // 将整页结果缓存
    cache.Set(cacheKey, followers, 3600)
    return followers, nil
}

这种做法有什么问题？

1.2 核心问题：内存爆炸和一致性噩梦

假设一个用户有 20,000 个粉丝，客户端可能以不同的分页参数请求：

followers:123:1:20   → 第1页，每页20条
followers:123:2:20   → 第2页，每页20条
followers:123:1:50   → 第1页，每页50条
followers:123:1:100  → 第1页，每页100条
...

问题1：内存成本不可控（致命）

内存是有限的、昂贵的资源。

计算：20,000个粉丝，每个用户数据500字节

可能的分页组合：
- page: 1-1000（深度分页）
- pageSize: 10, 20, 50, 100
- 不同排序: 关注时间、活跃度

保守估计100种组合：
100 × 20条/页 × 500字节 = 1MB（只是第1页的各种变体）
100 × 1000页 × 20条 × 500字节 = 1GB（所有分页组合）

一个用户的粉丝列表就能占用GB级缓存！
10万用户 × 1GB = 100TB缓存 ← 完全不可行

内存成本对比：

方案	单用户查询	1万人查询	10万人查询
缓存列表	~1GB	10TB	100TB ❌
缓存实体（ID索引）	~10MB	~10MB	~10MB ✓
缓存实体（User对象共享）	取决于系统总用户数，不是查询用户数

关键区别：

缓存列表：内存随"查询的用户数"线性增长（10万人查 = 100TB）
缓存实体：内存只和"系统中的实体总数"有关，与查询人数无关
- 假设系统有 100万个用户实体，每个 500字节
- 总内存 = 100万 × 500字节 = 500MB（固定）
- 无论1个人查还是10万人查，User实体都是共享的

实体共享示例：

User#5 的数据在Redis中只存1份：
user:5 → {"id":5, "name":"Alice", ...}  (500字节)

但被N个列表引用：
- follower_ids:123 → [..., 5, ...]  (User A的粉丝)
- follower_ids:456 → [..., 5, ...]  (User B的粉丝)
- following_ids:789 → [..., 5, ...]  (User C的关注)
- recommend:999 → [..., 5, ...]     (推荐列表)

→ 1份数据，无数次复用！

这就是实体缓存的核心优势：内存占用可预测且可控。

问题2：数据一致性无法保证（更致命）

这是更严重的问题：当数据更新时，你不知道该清理哪些缓存键。

场景：User#5 修改了昵称

需要清理的缓存（你能找全吗？）：
followers:123:1:20     ← User#5在这里吗？
followers:123:1:50     ← User#5在这里吗？
followers:123:2:20     ← User#5在这里吗？
followers:456:1:20     ← User#5是456的粉丝吗？
following:789:3:50     ← User#5关注了789吗？
recommend_users:100:1:20  ← 推荐列表里有User#5吗？
search_result:query1:1:20 ← 搜索结果有User#5吗？
... 成百上千个可能包含User#5的列表

你无法精确知道User#5出现在哪些列表缓存中！

三种错误的"解决"方案：

// 方案A：全部清空（❌ 缓存雪崩）
cache.FlushAll()  // 所有用户的所有缓存都失效，数据库瞬间被打垮

// 方案B：模糊删除（❌ 性能差 + 不安全）
cache.DeletePattern("followers:*")  // KEYS命令会阻塞Redis
cache.DeletePattern("following:*")
cache.DeletePattern("recommend_*")
// 1. KEYS * 在生产环境禁止使用（阻塞所有操作）
// 2. 仍然会漏掉一些缓存（如search_result）

// 方案C：维护反向索引（❌ 复杂度爆炸）
// 维护一个映射：user:5 → [所有包含user5的缓存键]
cache.Set("user:5:in_lists", [
    "followers:123:1:20",
    "followers:123:2:20",
    ...  // 需要实时维护这个列表，工程噩梦
])

真实案例：

某电商网站缓存商品列表：
- 首页推荐：cached
- 分类列表：cached  
- 搜索结果：cached
- 促销活动：cached

某商品修改价格后：
- 忘记清理搜索结果缓存 → 用户搜索看到旧价格
- 忘记清理活动页缓存 → 下单时发现价格不一致
- 投诉、退款、业务损失

最后不得不：
1. 定时全量刷新缓存（浪费资源）
2. 缩短TTL到5分钟（缓存效果大打折扣）
3. 接受数据不一致（损害用户体验）

为什么缓存实体可以解决这个问题？

当User#5修改昵称：
只需删除: user:5  ← 只有1个键！

所有引用User#5的列表：
- followers:123 → 存的是ID列表 [1,2,3,5,...]，不需要清理
- 下次查询时会自动获取最新的 user:5 数据

数据一致性得到保证，维护成本极低。

1.3 图示：缓存键爆炸

用户123的粉丝列表（20,000人）

错误的缓存结构：
┌─────────────────────────────────────────┐
│  Cache                                  │
├─────────────────────────────────────────┤
│  followers:123:1:20  → [User1..User20]  │  ← 重复存储User1-20
│  followers:123:1:50  → [User1..User50]  │  ← 重复存储User1-20
│  followers:123:2:20  → [User21..User40] │
│  followers:123:1:100 → [User1..User100] │  ← 重复存储User1-20
│  ... 数百个类似的键                      │
└─────────────────────────────────────────┘

当User1更新资料时，需要清理所有包含User1的缓存键！

二、正确的缓存策略

2.1 原则：缓存实体，而非查询结果

正确的做法是缓存最小粒度的实体，然后在应用层组装。

// ✅ 正确做法
type FollowerService struct {
    cache Cache
    db    Database
}

// 缓存粉丝ID列表（索引） - 使用Redis List
func (s *FollowerService) cacheFollowerIDs(userID int64) error {
    cacheKey := fmt.Sprintf("follower_ids:%d", userID)
    
    // 查询所有粉丝ID
    ids := s.db.Query("SELECT follower_id FROM follows WHERE user_id = ? ORDER BY created_at DESC", userID)
    
    // 存为Redis List（而非JSON）
    s.cache.Del(cacheKey)
    s.cache.RPush(cacheKey, ids...)  // 批量插入
    s.cache.Expire(cacheKey, 3600)
    return nil
}

// 直接获取指定范围的粉丝ID
func (s *FollowerService) getFollowerIDRange(userID int64, start, end int) ([]int64, error) {
    cacheKey := fmt.Sprintf("follower_ids:%d", userID)
    
    // 使用LRANGE直接获取范围内的ID（不需要全部加载！）
    ids := s.cache.LRange(cacheKey, start, end)
    
    if len(ids) == 0 {
        // 缓存未命中，初始化缓存
        s.cacheFollowerIDs(userID)
        ids = s.cache.LRange(cacheKey, start, end)
    }
    
    return ids, nil
}

// 批量获取用户信息（使用缓存）
func (s *FollowerService) getUsersByIDs(ids []int64) ([]User, error) {
    if len(ids) == 0 {
        return []User{}, nil
    }
    
    // 1. 批量构造缓存键
    cacheKeys := make([]string, len(ids))
    for i, id := range ids {
        cacheKeys[i] = fmt.Sprintf("user:%d", id)
    }
    
    // 2. 使用 MGET 批量获取缓存（单次网络往返！）
    cachedValues := s.cache.MGet(cacheKeys...)
    
    // 3. 解析缓存结果，收集未命中的ID
    cached := make(map[int64]User)
    missingIDs := make([]int64, 0)
    
    for i, val := range cachedValues {
        if val != nil {
            var user User
            json.Unmarshal([]byte(val.(string)), &user)
            cached[ids[i]] = user
        } else {
            missingIDs = append(missingIDs, ids[i])
        }
    }
    
    // 4. 批量查询缓存未命中的用户
    if len(missingIDs) > 0 {
        dbUsers := s.db.Query("SELECT * FROM users WHERE id IN (?)", missingIDs)
        
        // 批量回写缓存（使用Pipeline）
        pipe := s.cache.Pipeline()
        for _, user := range dbUsers {
            cached[user.ID] = user
            data, _ := json.Marshal(user)
            pipe.Set(ctx, fmt.Sprintf("user:%d", user.ID), data, 3600*time.Second)
        }
        pipe.Exec(ctx)  // 一次性执行所有SET
    }
    
    // 5. 按原始ID顺序返回结果
    users := make([]User, 0, len(ids))
    for _, id := range ids {
        if user, ok := cached[id]; ok {
            users = append(users, user)
        }
    }
    
    return users, nil
}

// 分页获取粉丝（应用层组装）
func (s *FollowerService) GetFollowers(userID int64, page, pageSize int) ([]User, error) {
    // 1. 计算范围
    start := page * pageSize
    end := start + pageSize - 1
    
    // 2. 直接获取这一页的粉丝ID（使用LRANGE，不加载全部！）
    pageIDs, err := s.getFollowerIDRange(userID, start, end)
    if err != nil {
        return nil, err
    }
    
    // 3. 批量获取用户信息（利用实体缓存）
    return s.getUsersByIDs(pageIDs)
}

2.2 性能陷阱：JSON反序列化的隐藏成本

在实现索引+实体方案时，我们遇到了一个严重的性能问题。

❌ 错误的实现（初版）：

// 将索引存为JSON
func cacheIndex(userID string, ids []string) {
    data, _ := json.Marshal(ids)  // 10,000个ID序列化成JSON
    redis.Set("follower_ids:"+userID, data)
}

func getFollowers(userID string, page, size int) {
    // 每次请求都这样做：
    data := redis.Get("follower_ids:" + userID)
    var allIDs []string
    json.Unmarshal(data, &allIDs)  // ← 反序列化10,000个ID！
    
    // 然后切片
    start := page * size
    pageIDs := allIDs[start:start+size]
    
    return getUsersByIDs(pageIDs)
}

性能测试结果：

平均延迟: 5.0ms  ← 太慢了！

问题分析：

每次查询第1页（20个用户）：
1. Redis GET: 几百KB的JSON数据 → 100µs
2. JSON反序列化10,000个ID → 4ms ❌ 瓶颈！
3. 应用层切片获取20个ID → 1µs
4. Redis MGET 20个用户 → 300µs
总计：5ms

只需要20个ID，却反序列化了10,000个！

✅ 正确的实现（优化后）：

// 使用Redis List存储索引
func cacheIndex(userID string, ids []string) {
    redis.Del("follower_ids:" + userID)
    redis.RPush("follower_ids:"+userID, ids...)  // 存为List
}

func getFollowers(userID string, page, size int) {
    // 直接获取需要的范围
    start := page * size
    end := start + size - 1
    
    pageIDs := redis.LRange("follower_ids:"+userID, start, end)  // ← 只获取20个！
    
    return getUsersByIDs(pageIDs)
}

优化后的性能：

平均延迟: 518µs  ← 提升了10倍！

为什么快了？

查询第1页（20个用户）：
1. Redis LRANGE 0-19: 只返回20个ID → 100µs ✓
2. 无需反序列化 → 0µs ✓
3. Redis MGET 20个用户 → 300µs
4. 应用层组装 → 100µs
总计：518µs

经验总结：

场景	错误做法	正确做法	性能差异
大数组索引	JSON序列化全部	Redis List + LRANGE	10倍
有序集合	JSON数组	Redis ZSet + ZRANGE	类似
无序集合	JSON数组	Redis Set + SRANDMEMBER	类似

核心原则：

❌ 不要用JSON存储大数组/大集合
✅ 优先使用Redis原生数据结构（List/Set/ZSet）
✅ 只取需要的数据，不要"全部加载再切片"

Redis vs Memcache：

这个案例也说明了为什么 Redis 比 Memcache 更适合现代应用：

特性	Memcache	Redis
数据结构	只有 String（必须序列化）	List/Set/ZSet/Hash/String
范围查询	❌ 必须全部读取再切片	✓ LRANGE/ZRANGE 直接获取
内存效率	低（重复序列化）	高（原生数据结构）
性能	慢（JSON反序列化开销）	快（无需反序列化）

Memcache的困境：

// Memcache只能这样做：
data := memcache.Get("follower_ids:123")
var allIDs []int64
json.Unmarshal(data, &allIDs)  // ← 必须反序列化全部！
pageIDs := allIDs[start:end]   // ← 然后切片

Redis的优势：

// Redis可以直接获取范围
pageIDs := redis.LRange("follower_ids:123", start, end)  // ← 一步到位！

结论：对于列表、集合、排行榜等场景，优先选择Redis。

2.3 关键优化：使用 MGET 批量查询

❌ 常见错误：循环查询Redis

// 错误示例：N次网络往返
for _, id := range userIDs {
    key := fmt.Sprintf("user:%d", id)
    user := cache.Get(key)  // 每次都是一次网络请求！
    users = append(users, user)
}

性能影响：

假设查询100个用户，每次Redis往返 0.5ms
总耗时：100 × 0.5ms = 50ms
对于本地Redis可能还好，但跨机房网络往返会更慢

✅ 正确做法：使用 MGET

// 一次网络往返获取所有数据
keys := []string{"user:1", "user:2", ..., "user:100"}
values := cache.MGet(keys...)  // 单次网络请求！
// 总耗时：1 × 0.5ms = 0.5ms

性能对比：

查询方式	网络往返	延迟（100个key）	适用场景
循环 GET	N次	~50ms (本地), >500ms (跨机房)	❌ 永远不要用
MGET	1次	~0.5ms (本地), ~5ms (跨机房)	✅ 批量查询必选

实际项目中的差异：

查询20个粉丝信息：
循环GET：  20 × 0.5ms = 10ms
MGET：     1 × 0.5ms  = 0.5ms   ← 快20倍！

查询100个推荐用户（跨机房）：
循环GET：  100 × 5ms = 500ms   ← 接口超时风险
MGET：     1 × 5ms   = 5ms     ← 可接受

重要提醒：

Redis的 MGET / MSET 是原子操作
Memcache 使用 GetMulti
Pipeline 也能减少往返，但不如MGET方便

2.4 优化后的缓存结构

正确的缓存结构：
┌──────────────────────────────────────┐
│  Cache                               │
├──────────────────────────────────────┤
│  follower_ids:123 → [1,2,3,...20000]│  ← 只存一次ID列表
│                                      │
│  user:1 → {id:1, name:"Alice", ...} │  ← 实体级缓存
│  user:2 → {id:2, name:"Bob", ...}   │
│  user:3 → {id:3, name:"Charlie",...}│
│  ...                                 │
└──────────────────────────────────────┘

优势：
1. 任何分页参数都能复用同一份ID索引
2. 用户实体在多处使用（关注列表、推荐、搜索）都能命中
3. 更新User1资料时，只需删除 user:1 这一个键

2.5 图示：请求流程对比

错误做法（直接缓存分页结果）：
┌──────┐    ┌───────┐    ┌──────────┐
│Client│───→│ Cache │───→│ Database │
└──────┘    └───────┘    └──────────┘
   │            │              │
   │ page=1,20  │              │
   ├───────────→│ MISS         │
   │            ├─────────────→│ SELECT * ... LIMIT 20 OFFSET 0
   │            │←─────────────┤
   │←───────────┤              │
   │            │              │
   │ page=1,50  │              │
   ├───────────→│ MISS ❌      │
   │            ├─────────────→│ SELECT * ... LIMIT 50 OFFSET 0
                     （重复查询相同数据）


正确做法（索引+实体）：
┌──────┐    ┌───────┐    ┌──────────┐
│Client│───→│ Cache │───→│ Database │
└──────┘    └───────┘    └──────────┘
   │            │              │
   │ page=1,20  │              │
   ├───────────→│ 获取ID列表   │
   │            │ (1次查询)    │
   │            ├─────────────→│ SELECT id FROM follows
   │            │←─────────────┤
   │            │              │
   │            │ 批量获取用户 │
   │            │ (20个实体)   │
   │            │ HIT: 18个 ✓  │
   │            ├─────────────→│ SELECT * WHERE id IN (2个)
   │            │←─────────────┤
   │←───────────┤              │
   │            │              │
   │ page=1,50  │              │
   ├───────────→│ 获取ID列表   │
   │            │ HIT ✓        │（复用索引）
   │            │ 批量获取用户 │
   │            │ HIT: 48个 ✓  │（复用之前的实体）
   │            ├─────────────→│ SELECT * WHERE id IN (2个)

三、性能基准测试

为了验证不同缓存策略的性能差异，我们实现了真实的基准测试。

3.1 测试场景

核心设计：模拟多种列表场景，体现实体复用的价值

数据规模：20,000个用户，3个测试用户各有10,000个粉丝
用户重叠：50%的粉丝在不同用户间重叠（模拟真实社交网络）
- User1的粉丝：user 0-9,999
- User2的粉丝：user 5,000-14,999（与User1有5,000重叠）
- User3的粉丝：user 7,500-17,499（与User2有5,000重叠）
请求负载：9,000次混合请求（每个用户3,000次）
请求分布：
- 72% 第1页请求（模拟热点数据）
- 28% 深度分页请求（page 2-122）
- 分页大小随机：20, 40, 60条
测试环境：PostgreSQL 18（Docker本地）+ Redis 7（Docker本地）

为什么这样设计？

在真实场景中，同一个用户会出现在多个不同的列表中：

张三的粉丝列表
李四的粉丝列表
推荐用户列表
搜索结果列表

策略2（朴素缓存）：每个列表都缓存一份完整用户数据 → 重复N倍
策略3（优化缓存）：用户实体只存1份，被所有列表共享 → 零重复

这个测试通过3个用户的粉丝列表（50%重叠），模拟了真实场景的数据复用。

3.2 三种策略对比

策略1：无缓存（基线）

直接查询数据库，每次执行 JOIN 查询。

策略2：朴素缓存（错误做法）

缓存整页结果，Key格式：followers:uid:page:size

策略3：优化缓存（正确做法）

缓存粉丝ID索引 + 用户实体，应用层组装分页。

3.3 测试结果

📊 真实环境测试（PostgreSQL + Redis）：

运行环境：

数据库：PostgreSQL 18（Docker）
缓存：Redis 7（Docker）
测试数据：20,000个用户，3个测试用户（模拟多种列表场景）
测试负载：9,000次混合请求

策略1: 无缓存
  平均延迟: 4.3ms   
  P95 延迟: 5.2ms   
  P99 延迟: 6.0ms
  数据库查询次数: 9,000次 (每次请求1次完整JOIN)
  Redis内存占用: 1.2 MB (baseline)
  缓存键数量: 1

策略2: 朴素缓存（缓存整页结果）
  平均延迟: 198µs    ← 比无缓存快22倍！
  P95 延迟: 253µs    
  P99 延迟: 306µs    
  数据库查询次数: 1,008次 (缓存命中率89%)
  Redis内存占用: 6.0 MB  ← 实测
  缓存键数量: 1,008个列表页
  
策略3: 优化缓存（索引+实体）
  平均延迟: 518µs   ← 比朴素缓存慢2.6倍，但合理
  P95 延迟: 637µs   
  P99 延迟: 731µs
  索引加载次数: 3次（3个用户）
  用户批量查询: 323次
  数据库查询次数: 326次
  Redis内存占用: 6.0 MB  ← 与策略2相同
  缓存键数量: 14,643 (3个索引 + 约14,640个用户实体)
  
  优化关键：
  - 使用 Redis List 存储索引（而非JSON）
  - 使用 LRANGE 直接获取范围内的ID
  - 避免反序列化完整索引的开销

关键发现：

策略	平均延迟	P99延迟	内存占用	缓存键数	适用场景
无缓存	4.5ms	6.8ms	1.2 MB	0	-
朴素缓存	200µs ✓	312µs ✓	6.0 MB	1,008	热点集中
优化缓存	518µs	731µs	6.0 MB	14,643	通用场景 ✓

为什么两者内存占用相同（都是6.0 MB）？

在当前测试场景中，由于存在大量深度分页和多个用户列表：

策略2：缓存了1,008个列表页，包含重复的用户数据
策略3：缓存了14,643个键（3个索引 + 约14,640个用户实体）

虽然策略3缓存键更多，但由于实体去重，总内存占用与策略2相同。

3.4 结果分析

为什么策略3比策略2慢2.6倍？

策略2（朴素缓存）的操作：
1. Redis GET 完整页面（20-50个用户） → 200µs

策略3（优化缓存）的操作：
1. Redis LRANGE 获取ID范围 → ~100µs
2. Redis MGET 批量获取用户 → ~300µs  
3. 应用层组装数据 → ~118µs
总计：518µs（2.6倍开销是合理的）

多出来的开销主要在于：
- 多一次Redis调用（LRANGE获取索引）
- 应用层需要组装数据（按ID顺序重排）

策略3的核心优势：实体复用

在当前测试中，3个用户的粉丝有50%重叠：

策略2：缓存了1,008个列表页，包含大量重复用户数据
策略3：缓存了约14,640个用户实体，但同一用户只存1份

关键对比：

维度	朴素缓存	优化缓存	说明
延迟	200µs ✓	518µs	多1-2次Redis调用
内存	6.0 MB	6.0 MB	当前相同，但…
实体复用	重复存储 ❌	零重复 ✓	同一User在多个列表中共享
数据一致性	无法精确失效 ❌	删1个key即可 ✓
扩展性	新列表需重建 ❌	复用已有实体 ✓

真实场景模拟：多种列表类型

假设系统需要支持5种不同的列表：

用户的粉丝列表
用户的关注列表
推荐用户列表
搜索结果列表
附近的人列表

策略2的内存爆炸：

每种列表 × 每种分页组合 × 每个用户 = 缓存键数量

5种列表 × 200种分页组合 × 10万用户 = 1亿个缓存键
内存占用：不可控，可能达到TB级

策略3的内存可控：

用户实体总数（固定） + 列表索引

100万用户 × 500字节 = 500MB（用户实体，共享）
10万个索引 × 5KB = 500MB（各种列表索引）
总计：1GB（可预测）

为什么策略3在生产环境中更优？

实体复用：同一个User在多个列表中只存1份
内存可控：只看实体总数，不看列表数量
易于维护：更新User只删1个key，所有列表自动更新
扩展性强：添加新列表类型，复用已有实体缓存

3.5 如何选择缓存策略？

决策树：

你的系统有多种列表类型吗？（粉丝、关注、推荐、搜索...）
  ├─ 是 → 策略3（优化缓存）✓ 必选
  │      理由：避免实体重复，内存成本可控
  │
  └─ 否：只有一种列表
       │
       └─ 数据会频繁更新吗？（用户修改资料、新增删除关系）
            ├─ 是 → 策略3（优化缓存）✓ 必选
            │      理由：数据一致性易于维护
            │
            └─ 否：数据几乎不变
                 └─ 策略2（朴素缓存）可接受
                    但长期仍建议策略3

推荐：生产环境 95% 的场景应该使用策略3

唯一适合策略2的场景：

只有一种固定的列表类型
数据几乎不变（如静态排行榜）
用户只看第1页（如首页推荐）

四、缓存穿透与雪崩

4.1 缓存穿透

定义：查询一个数据库中不存在的数据，导致每次请求都打到数据库。

场景：攻击者恶意查询不存在的用户ID。

// 问题代码
func GetUser(id int64) (*User, error) {
    cacheKey := fmt.Sprintf("user:%d", id)
    if cached := cache.Get(cacheKey); cached != nil {
        return cached.(*User), nil
    }
    
    user := db.QueryOne("SELECT * FROM users WHERE id = ?", id)
    if user == nil {
        return nil, nil  // ❌ 不存在的用户不缓存
    }
    
    cache.Set(cacheKey, user, 3600)
    return user, nil
}

解决方案：缓存空值

// ✅ 缓存空结果，但设置较短的过期时间
const NULL_CACHE_MARKER = "__NULL__"  // 空值标记

func GetUser(id int64) (*User, error) {
    cacheKey := fmt.Sprintf("user:%d", id)
    cached, err := cache.Get(cacheKey).Result()
    
    if err == nil {
        // 缓存命中
        if cached == NULL_CACHE_MARKER {
            return nil, nil  // 明确知道不存在
        }
        var user User
        json.Unmarshal([]byte(cached), &user)
        return &user, nil
    }
    
    // 缓存未命中，查询数据库
    user := db.QueryOne("SELECT * FROM users WHERE id = ?", id)
    if user == nil {
        // 空值缓存60秒，防止频繁查询不存在的数据
        cache.Set(cacheKey, NULL_CACHE_MARKER, 60*time.Second)
        return nil, nil
    }
    
    // 正常数据缓存1小时
    data, _ := json.Marshal(user)
    cache.Set(cacheKey, data, 3600*time.Second)
    return user, nil
}

关键点：

使用特殊标记（如 __NULL__）而非简单的 nil，避免和缓存未命中混淆
空值TTL要比正常数据短（60秒 vs 3600秒），防止数据恢复后仍返回空值
可以使用单独的空值缓存前缀，如 null:user:123，更容易监控和清理

其他防御手段：

布隆过滤器：快速判断ID是否可能存在
参数校验：拒绝明显非法的ID（如负数、超长数字）

4.2 缓存雪崩

定义：大量缓存同时失效，导致请求瞬间打到数据库。

场景1：缓存集中过期

// ❌ 所有粉丝索引在同一时间过期
cache.Set("follower_ids:123", ids, 3600)
cache.Set("follower_ids:456", ids, 3600)
cache.Set("follower_ids:789", ids, 3600)

解决方案：随机过期时间

// ✅ 添加随机偏移
import "math/rand"

ttl := 3600 + rand.Intn(600)  // 3600-4200秒之间
cache.Set("follower_ids:123", ids, ttl)

场景2：缓存服务宕机

解决方案：

多级缓存：本地缓存 + Redis
熔断降级：缓存故障时返回默认数据或降级查询
限流保护：数据库查询限流，防止打垮DB

// 多级缓存示例
func GetUser(id int64) (*User, error) {
    // L1: 本地缓存（进程内存）
    if user := localCache.Get(id); user != nil {
        return user, nil
    }
    
    // L2: Redis
    if user := redis.Get(fmt.Sprintf("user:%d", id)); user != nil {
        localCache.Set(id, user, 60)  // 回填本地缓存
        return user, nil
    }
    
    // L3: 数据库（带限流保护）
    if !rateLimiter.Allow() {
        return nil, errors.New("too many requests")
    }
    
    user := db.QueryOne("SELECT * FROM users WHERE id = ?", id)
    if user != nil {
        redis.Set(fmt.Sprintf("user:%d", id), user, 3600)
        localCache.Set(id, user, 60)
    }
    
    return user, nil
}

五、代码实现演示

本项目实现了完整的性能基准测试代码，完整源码可在以下仓库找到：

📦 源码仓库：https://github.com/d60-Lab/RelationGraph

5.0 如何运行测试

前置要求：

Go 1.21+
Docker（运行 PostgreSQL 和 Redis）

步骤1：克隆仓库

git clone https://github.com/d60-Lab/RelationGraph.git
cd RelationGraph

步骤2：启动 PostgreSQL 和 Redis

# 使用 docker-compose 启动服务（使用自定义端口避免冲突）
docker-compose up -d postgres redis

# 等待服务启动（约5秒）
sleep 5

# 验证服务状态
docker-compose ps

步骤3：运行基准测试

# 运行测试（自动创建20,000个测试用户，执行6,000次请求）
go run cmd/cachebench/main.go

预期输出：

Setting up test data...
Test data ready.
  Running benchmark... done
  Warming cache... done
  Running benchmark... done
  Warming cache... done
  Running benchmark... done

Follower list latency (6k req, mixed pages, PostgreSQL + Redis)
No cache           avg=10.7ms p95=22.5ms p99=25.2ms ...
Naive list cache   avg=191µs p95=244µs p99=317µs ...
Optimized cache    avg=9.5ms p95=10.0ms p99=10.4ms ...

步骤4：清理环境（可选）

# 停止并删除容器
docker-compose down -v

环境变量（可选）：

# 自定义数据库连接
export DATABASE_URL="host=localhost user=postgres password=postgres dbname=postgres port=5434 sslmode=disable"

# 自定义Redis连接
export REDIS_ADDR="localhost:6380"

5.1 核心实现片段

// internal/cacheperf/followers.go

// 策略3：优化缓存实现
func (s *FollowerService) FetchFollowersOptimized(userID, page, pageSize int64) ([]FollowerSnapshot, error) {
    // 1. 获取粉丝ID索引（缓存）
    indexKey := fmt.Sprintf("follower_ids:%d", userID)
    var allIDs []int64
    
    if val := s.cache.Get(s.ctx, indexKey).Val(); val != "" {
        json.Unmarshal([]byte(val), &allIDs)
    } else {
        // 缓存未命中，查询数据库
        rows, _ := s.db.Query("SELECT follower_id FROM follows WHERE user_id = ?", userID)
        for rows.Next() {
            var id int64
            rows.Scan(&id)
            allIDs = append(allIDs, id)
        }
        
        data, _ := json.Marshal(allIDs)
        s.cache.Set(s.ctx, indexKey, data, 10*time.Minute)
        atomic.AddInt64(&s.indexLoads, 1)
    }
    
    // 2. 计算分页范围
    start := page * pageSize
    end := start + pageSize
    if start >= int64(len(allIDs)) {
        return []FollowerSnapshot{}, nil
    }
    if end > int64(len(allIDs)) {
        end = int64(len(allIDs))
    }
    pageIDs := allIDs[start:end]
    
    // 3. 批量获取用户实体（利用缓存）
    return s.fetchUsersByIDs(pageIDs)
}

func (s *FollowerService) fetchUsersByIDs(ids []int64) ([]FollowerSnapshot, error) {
    if len(ids) == 0 {
        return []FollowerSnapshot{}, nil
    }
    
    // 1. 批量构造缓存键
    cacheKeys := make([]string, len(ids))
    for i, id := range ids {
        cacheKeys[i] = fmt.Sprintf("user:%d", id)
    }
    
    // 2. 使用 MGET 一次性获取所有缓存（关键优化！）
    cachedVals, _ := s.cache.MGet(s.ctx, cacheKeys...).Result()
    
    // 3. 解析缓存结果，收集未命中的ID
    cached := make(map[int64]FollowerSnapshot)
    missingIDs := make([]int64, 0)
    
    for i, val := range cachedVals {
        if val != nil {
            var snapshot FollowerSnapshot
            if str, ok := val.(string); ok && str != "" {
                json.Unmarshal([]byte(str), &snapshot)
                cached[ids[i]] = snapshot
            }
        } else {
            missingIDs = append(missingIDs, ids[i])
        }
    }
    
    // 4. 批量查询数据库中缓存未命中的用户
    if len(missingIDs) > 0 {
        atomic.AddInt64(&s.userBulkLoad, 1)
        
        query := "SELECT id, username, bio FROM users WHERE id IN (" +
                 placeholders(len(missingIDs)) + ")"
        rows, _ := s.db.Query(query, toInterfaceSlice(missingIDs)...)
        
        for rows.Next() {
            var snapshot FollowerSnapshot
            rows.Scan(&snapshot.ID, &snapshot.Username, &snapshot.Bio)
            cached[snapshot.ID] = snapshot
            
            // 回写缓存（这里可以进一步优化为 MSET 批量写入）
            data, _ := json.Marshal(snapshot)
            s.cache.Set(s.ctx, fmt.Sprintf("user:%d", snapshot.ID), data, 10*time.Minute)
        }
    }
    
    // 5. 按原始ID顺序返回结果（保持顺序很重要！）
    results := make([]FollowerSnapshot, 0, len(ids))
    for _, id := range ids {
        if snap, ok := cached[id]; ok {
            results = append(results, snap)
        }
    }
    
    return results, nil
}

5.2 进一步优化：批量回写缓存

上面的代码中，回写缓存仍使用循环 SET。在缓存未命中较多时，可以进一步优化为批量写入：

// 更优化的回写方式：使用 MSET
if len(missingIDs) > 0 {
    query := "SELECT id, username, bio FROM users WHERE id IN (?)"
    rows, _ := s.db.Query(query, missingIDs...)
    
    // 收集需要批量写入的数据
    pipe := s.cache.Pipeline()
    
    for rows.Next() {
        var snapshot FollowerSnapshot
        rows.Scan(&snapshot.ID, &snapshot.Username, &snapshot.Bio)
        cached[snapshot.ID] = snapshot
        
        data, _ := json.Marshal(snapshot)
        key := fmt.Sprintf("user:%d", snapshot.ID)
        pipe.Set(s.ctx, key, data, 10*time.Minute)  // 加入pipeline
    }
    
    // 一次性执行所有SET（通过pipeline减少网络往返）
    pipe.Exec(s.ctx)
}

Pipeline vs 循环SET 性能对比：

回写 100 个用户到缓存：

循环 SET:  100 × 0.5ms = 50ms
Pipeline:  1 × 0.5ms   = 0.5ms   ← 快100倍

5.3 运行基准测试

# 运行基准测试
go run cmd/cachebench/main.go

# 输出示例
=== 缓存性能基准测试 ===

[1/3] 策略: 无缓存
  平均延迟: 9.4ms
  P95: 10.8ms, P99: 11.2ms
  总查询次数: 6000

[2/3] 策略: 朴素缓存
  平均延迟: 64.7µs
  P95: 128.3µs, P99: 145.6µs
  总查询次数: 360

[3/3] 策略: 优化缓存
  平均延迟: 4.1ms
  P95: 4.5ms, P99: 4.7ms
  索引加载: 1, 批量用户查询: 176

六、最佳实践总结

6.1 缓存设计原则

缓存最小粒度的实体，而非查询结果
ID索引 + 实体模式：列表类查询用ID索引，实体独立缓存
❗批量查询：永远使用 MGET/MSET，禁止循环调用 GET/SET
- Redis: MGET keys... / MSET key1 val1 key2 val2 ...
- Memcache: GetMulti([]string{...}) / SetMulti(map[string]...)
- 减少网络往返是缓存优化的第一要务
合理的TTL：热点数据长TTL，冷数据短TTL，添加随机抖动
缓存空值：防止缓存穿透，但TTL要短

6.2 缓存更新策略

// 场景：用户更新资料
func UpdateUser(user *User) error {
    // 1. 更新数据库
    if err := db.Update(user); err != nil {
        return err
    }
    
    // 2. 删除实体缓存（让其自然失效）
    cache.Delete(fmt.Sprintf("user:%d", user.ID))
    
    // 3. 不要删除所有相关的索引！
    // ❌ 错误：cache.Delete(fmt.Sprintf("followers:*"))
    // ✅ 正确：只在添加/删除关注关系时清理索引
    
    return nil
}

// 场景：添加关注关系
func Follow(userID, targetID int64) error {
    if err := db.Insert(&Follow{UserID: userID, FollowerID: targetID}); err != nil {
        return err
    }
    
    // 清理受影响的索引
    cache.Delete(fmt.Sprintf("follower_ids:%d", targetID))  // 目标用户的粉丝列表
    cache.Delete(fmt.Sprintf("following_ids:%d", userID))   // 当前用户的关注列表
    
    return nil
}

6.3 监控指标

生产环境务必监控：

缓存命中率：应保持在90%以上
平均延迟：缓存命中 < 1ms，未命中 < 50ms
P99延迟：关注长尾请求性能
缓存内存使用：防止OOM
缓存穿透次数：异常增长可能是攻击

6.4 何时不用缓存

数据强一致性要求（如交易金额）
数据变化极其频繁（如实时计数器）
数据量极小（几十条记录，数据库查询已经很快）
查询频率极低（每小时几次，缓存命中率太低）

七、总结

缓存不仅仅是性能优化工具，更是资源管理和数据一致性的挑战。

核心原则

为什么要缓存实体而非查询结果？

内存是有限的、昂贵的资源
- 缓存列表会导致内存成本爆炸（单用户GB级 → 系统TB级）
- 实体缓存内存占用可控且可预测
数据一致性必须可维护
- 缓存列表：更新时不知道该清理哪些键，容易脏数据
- 缓存实体：更新时只需删除实体键，所有引用自动获取最新数据

最佳实践检查清单

❌ 不要缓存查询结果（尤其是列表、分页、聚合结果）
❌ 不要循环调用 GET/SET（100次往返可能比1次DB查询还慢）
✅ 缓存最小粒度的实体（用户、商品、订单等独立对象）
✅ 列表存ID索引（轻量、易失效、复用性高）
✅ 永远批量查询（MGET/MSET/Pipeline）
✅ 防御穿透和雪崩（空值缓存、随机TTL、多级缓存）
✅ 监控核心指标（命中率、P99延迟、内存占用）

记住三个约束

内存约束：Redis成本是MySQL的10倍以上，必须精打细算
一致性约束：无法精确失效的缓存等于定时炸弹
网络约束：减少往返次数是性能优化的第一要务

希望本文能帮助你理解缓存设计的本质矛盾，在性能、成本、一致性之间找到平衡点！

📦 完整源码仓库：

GitHub: https://github.com/d60-Lab/RelationGraph
缓存策略实现: internal/cacheperf/followers.go
基准测试程序: cmd/cachebench/main.go

运行测试：

git clone https://github.com/d60-Lab/RelationGraph.git
cd RelationGraph
go run cmd/cachebench/main.go