-
Notifications
You must be signed in to change notification settings - Fork 188
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Description
Roadmap
Functionality
- [New feature] Fine-tune Medusa heads during SFT #36
- [New feature] More sampling schemes #39
- Distill from any model without access to the original training data
- Batched inference
- Fine-grained KV cache management
Integration
Local Deployment
- [New feature] mlc-llm support #33
- [New feature] exllama support #32
- [New feature] llama.cpp support #35
Serving
Research
- [Research] Explore tree sparsity (speed +10%-20%) #34
- Optimize the tree-based attention to reduce additional computation
- Improve the acceptance scheme to generate more diverse sequences
KaruroChori and tengwenxuanGsunshine, ctlllll, beingPurple, zdhNarsil and kaustubhcs
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation