-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Methodology: extracted 100MB of article texts from each of the different Wikipedias using https://github.com/attardi/wikiextractor, and counted the character prevalences using Python. The similarity measure is just the sum of the absolute differences in character prevalences (so a lower score means more similar): e.g. if language A has distribution {A: 0.5, B: 0.3, C: 0.2} and language B has distribution {A: 0.8, B: 0.2} then their similarity is |0.5-0.8|+|0.3-0.2|+|0.2-0.0|=0.6. The final chart was generated using graphviz and pillar.