Current mainstream KV cache optimization techniques (quantization and pruning) suffer from "one-size-fits-all" limitations and cannot fully exploit the fine-grained differences within the KV cache.
Abstract: As the scaling of memory density slows physically, a promising solution is to scale memory logically by enhancing the CPU's memory controller to encode and store data more densely in memory.
The DXE Core allocates buckets for runtime memory types that serve allocations to the memory type. The number of buckets for a given memory type should be kept to one to reduce runtime memory ...