Our great sponsors
-
sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
BERT-Large (345 million parameters) is now faster than the much smaller DistilBERT (66 million parameters) all while retaining the accuracy of the much larger BERT-Large model! We made this possible with Intel Labs by applying cutting-edge sparsification and quantization research from their Prune Once For All paper and utilizing it in the DeepSparse engine. It makes BERT-Large 12x smaller while delivering 8x latency speedup on commodity CPUs. We open-sourced the research in SparseML; run through the overview here and give it a try!
BERT-Large (345 million parameters) is now faster than the much smaller DistilBERT (66 million parameters) all while retaining the accuracy of the much larger BERT-Large model! We made this possible with Intel Labs by applying cutting-edge sparsification and quantization research from their Prune Once For All paper and utilizing it in the DeepSparse engine. It makes BERT-Large 12x smaller while delivering 8x latency speedup on commodity CPUs. We open-sourced the research in SparseML; run through the overview here and give it a try!
Related posts
- [R] New sparsity research (oBERT) enabled 175X increase in CPU performance for MLPerf submission
- [R] How well do sparse ImageNet models transfer? Prune once and deploy anywhere for inference performance speedups! (arxiv link in comments)
- [P] Compound sparsification: using pruning, quantization, and layer dropping to improve BERT performance
- จำแนกสายพันธ์ุหมากับแมวง่ายๆด้วยYoLoV5
- Nebuly – The LLM Analytics Platform