[R] Cross-lingual Wikipedia dataset

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • wit

    WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages. (by google-research-datasets)

  • There's the Wikipedia Image Text dataset, which has many languages (including English and simple English) aswell as a TF datasets wrapper. https://github.com/google-research-datasets/wit

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • AI enthusiasm #9 - A multilingual chatbot📣🈸

    6 projects | dev.to | 1 May 2024
  • What contributing to Open-source is, and what it isn't

    1 project | news.ycombinator.com | 27 Apr 2024
  • Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch

    11 projects | news.ycombinator.com | 10 Apr 2024
  • PullRequestBenchmark Challenge: Can AI Replace Your Dev Team?

    1 project | news.ycombinator.com | 10 Apr 2024
  • PRBenchmark – Expert PR Review Capabilities Equals Expert PR Creation Capability

    1 project | news.ycombinator.com | 5 Apr 2024