HashMap Memory Allocation Diagram

SOSP 25 | DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction

Current mainstream KV cache optimization techniques (quantization and pruning) suffer from "one-size-fits-all" limitations and cannot fully exploit the fine-grained differences within the KV cache.

IEEE

Memory Allocation Under Hardware Compression

Abstract: As the scaling of memory density slows physically, a promising solution is to scale memory logically by enhancing the CPU's memory controller to encode and store data more densely in memory.

GitHub

[Feature]: Flag memory allocation HOBs with RT mem allocations

The DXE Core allocates buckets for runtime memory types that serve allocations to the memory type. The number of buckets for a given memory type should be kept to one to reduce runtime memory ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

SOSP 25 | DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction

Memory Allocation Under Hardware Compression

[Feature]: Flag memory allocation HOBs with RT mem allocations

Trending now