Top 8 multilingual-model Open-Source Projects
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
PaddleOCR2Pytorch
PaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
-
Indic-BERT-v1
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
ToLD-Br
Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
The dataset is based on ToLD-Br, which is a huge dataset of tweets (or is it Xeets now?) that contains some additional info such as a classification if the text contains homophobia, obscenity, insults, racism, misogyny and xenophobia. The dataset for the competition, however, is a simple toxicity column.
Index
What are some of the best open-source multilingual-model projects? This list will help you:
Project | Stars | |
---|---|---|
1 | whisper-timestamped | 1,547 |
2 | contextualized-topic-models | 1,166 |
3 | PaddleOCR2Pytorch | 774 |
4 | Indic-BERT-v1 | 271 |
5 | kiri | 240 |
6 | mgpt | 195 |
7 | WangChanGLM | 91 |
8 | ToLD-Br | 34 |
Sponsored