Redis Cluster需要保存slot-key的映射关系,在7.0之前使用Radix tree实现,它同时是实现Redis 5.0的Stream功能的基础。在新的7.0中,放弃了这种存储方式,巧妙地对DictEntry进行改造,从而实现内存优化,本文进行简单介绍,详细pr如下:
详细: Significant memory saving and latency improvements in cluster mode (#9356)
其整体思路就是,防止key名存在多个地方,把key相关的放到原始结构体中,可以有效防止键值较多时候的浪费(radix tree需要额外存储key)。
在dictEntry中加了一个void *metadata[]属性
改造前:
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next;
} dictEntry;
改造后:
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry next; / Next entry in the same hash bucket. */
void metadata[]; / An arbitrary number of bytes (starting at a
定义一个16384长度的slotToKeys数组
typedef struct slotToKeys {
uint64_t count; / Number of keys in the slot. /
dictEntry head; / The first key-value entry in the slot. */
} slotToKeys;
struct clusterSlotToKeyMapping {
slotToKeys by_slot[CLUSTER_SLOTS];
};
每个slotToKeys会将相同slot的DictEntry利用如下数据结构连接一起:
typedef struct clusterDictEntryMetadata {
dictEntry prev; / Prev entry with key in the same slot */
dictEntry next; / Next entry with key in the same slot */
} clusterDictEntryMetadata;
示意图如下:
(1) 添加k-v(db.c)
void dbAdd(redisDb db, robj key, robj *val) {
sds copy = sdsdup(key->ptr);
dictEntry *de = dictAddRaw(db->dict, copy, NULL);
serverAssertWithInfo(NULL, key, de != NULL);
dictSetVal(db->dict, de, val);
signalKeyAsReady(db, key, val->type);
if (server.cluster_enabled) slotToKeyAddEntry(de, db);
}
(2) 添加映射关系(cluster.h cluster.c)
void slotToKeyAddEntry(dictEntry entry, redisDb db) {
sds key = entry->key;
unsigned int hashslot = keyHashSlot(key, sdslen(key));
slotToKeys slot_to_keys = &(db->slots_to_keys).by_slot[hashslot];
slot_to_keys->count++;
/ Insert entry before the first element in the list. /
dictEntry *first = slot_to_keys->head;
dictEntryNextInSlot(entry) = first;
if (first != NULL) {
serverAssert(dictEntryPrevInSlot(first) == NULL);
dictEntryPrevInSlot(first) = entry;
}
serverAssert(dictEntryPrevInSlot(entry) == NULL);
slot_to_keys->head = entry;
}
dictEntryNextInSlot和dictEntryPrevInSlot是定义clusterDictEntryMetadata中的prev和next
#define dictEntryNextInSlot(de) \
(((clusterDictEntryMetadata *)dictMetadata(de))->next)
(((clusterDictEntryMetadata *)dictMetadata(de))->prev)
这里偷懒发下PR中的数据,包括内存优化和性能变化两个方面
(1) 测试数据:前缀是key:NNNNNN
debug populate 1000000
(2) 结果
Result | used_memory_human |
---|---|
优化前 | 116.37M |
优化后 | 93.51M |
对比 | 22.86M (20% save) |
(3) 结论
优化效果:在小value场景,优化了20%的空间(键名越长越明显)
(1) 压测命令
redis-benchmark --cluster -t set,get -n 10000000 --threads 8 -P 100 -q -r $r
(2) 压测集群
3 masters, 0 replicas
(3) 压测结果
unstable | This PR | ||
---|---|---|---|
r=1,000,000,000 | SET | 1327140.00 requests per second, p50=3.351 msec | 1990049.75 requests per second, p50=2.239 msec |
GET | 3619254.50 requests per second, p50=1.159 msec | 3317850.00 requests per second, p50=1.239 msec | |
r=1,000,000 | SET | 1473839.38 requests per second, p50=2.167 msec | 2094240.75 requests per second, p50=2.167 msec |
GET | 3620564.75 requests per second, p50=1.263 msec | 3320053.00 requests per second, p50=1.367 msec | |
r=1,000 | SET | 2844950.25 requests per second, p50=1.623 msec | 2840909.00 requests per second, p50=1.647 msec |
GET | 4424779.00 requests per second, p50=0.879 msec | 4422822.00 requests per second, p50=0.895 msec |
(4) 结论:
性能上写入提升明显,读取和更新下降5-10%
该优化思路清晰明了(不让key存多份,注:pr中有关expire dict的讨论),在小value、大量key的场景下在内存优化上能取得不错的效果。
如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!
赞1
添加新评论1 条评论
2022-02-23 23:26