Model Compression - Search News

11d

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

LCLMs compress LLM context before decode — 8.8x faster at 16x compression, beating every KV cache method tested. Open-sourced by NYU and Columbia.

Nature

Context-aware implicit neural representations to compress Earth systems model data

Multiphysics, multiscale climate models, such as the Energy Exascale Earth System Model (E3SM) generate massive volumes of data over extended time periods to support long-term climate analysis. Data ...

Nature

Counterclockwise block-by-block knowledge distillation for neural network compression

Model compression is a technique for transforming large neural network models into smaller ones. Knowledge distillation (KD) is a crucial model compression technique that involves transferring ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

Context-aware implicit neural representations to compress Earth systems model data

Counterclockwise block-by-block knowledge distillation for neural network compression

Trending now