multilingual-nlp

Open-source projects categorized as multilingual-nlp

Top 3 multilingual-nlp Open-Source Projects

multilingual-nlp
  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • xmtf

    Crosslingual Generalization through Multitask Finetuning

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • TheVault

    [EMNLP 2023] The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

multilingual-nlp related posts

  • List of code generation datasets (open source)

    4 projects | /r/datasets | 30 May 2023
  • [P] Fine-tuning LLaMA on TheVault by AI4Code

    2 projects | /r/LocalLLaMA | 30 May 2023

Index

What are some of the best open-source multilingual-nlp projects? This list will help you:

Project Stars
1 mteb 1,515
2 xmtf 501
3 TheVault 82

Sponsored
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com