Skip to content

Tags: xlite-dev/Awesome-LLM-Inference

Tags

v2.6.20

Toggle v2.6.20's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update README.md (#152)

Adding Inference-Time Hyper-Scaling with KV Cache Compression

v2.6.19

Toggle v2.6.19's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[SageAttention-3] Microscaling FP4 Attention for Inference and An Ex…

…ploration of 8-bit Training (#147)

v2.6.18

Toggle v2.6.18's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Flex Attention: a Programming Model for Generating Optimized Attentio…

…n Kernels (#146)

v2.6.17

Toggle v2.6.17's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add The Sparse Frontier: Sparse Attention Trade-offs in Transformer L…

…LMs (#145)

Add The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

v2.6.16

Toggle v2.6.16's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[Triton-distributed] TileLink: Generating Efficient Compute-Communic…

…ation Overlapping Kernels using Tile-Centric Primitives (#142)

v2.6.15

Toggle v2.6.15's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add SeerAttention and SlimAttention Paper (#135)

* Add slim-attention: transform KV-cache to K cache only

Signed-off-by: sven <svenzhang@live.com>

* Add SeerAttention: learnable sparse attention like NSA(deepseek) MoBA

Signed-off-by: sven <svenzhang@live.com>

---------

Signed-off-by: sven <svenzhang@live.com>

v2.6.14

Toggle v2.6.14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update Mooncake-v3 paper link (#130)

v2.6.13

Toggle v2.6.13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[DeepSeek-NSA] Native Sparse Attention: Hardware-Aligned and Nativel…

…y Trainable Sparse Attention (#119)

v2.6.12

Toggle v2.6.12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add Multi-head Latent Attention(MLA) topic (#118)

v2.6.11

Toggle v2.6.11's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥🔥[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture f…

…or LLM Serving (#117)