-
wikipedia-changes
Repository for the post and talk "Hazelcast + Kibana: best buddies for exploring and visualizing data"
Now is the time to create a data pipeline to get this data in Hazelcast. Note that if you want to follow along, the project is readily available on GitHub.
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
okhttp-eventsource
Server-sent events (SSE) client implementation for Java, based on OkHttp: http://javadoc.io/doc/com.launchdarkly/okhttp-eventsource
Wikipedia provides changes through Server-Sent Events. In short, with SSE, you register a client to the endpoint, and every time new data comes in, you are notified and can act accordingly. On the JVM, a couple of SSE-compatible clients are available, including Spring WebClient. Instead, I chose to use OkHttp EventSource because it's lightweight - it only depends on OkHttp, and its usage is relatively straightforward.
-
lingua
The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike (by pemistahl)
A linguist can infer the language of the field. It's also possible to use an automated process in the pipeline. A couple of NLP libraries are available in the JVM ecosystem, but I set my eyes on Lingua, one focused on language recognition.
Related posts
-
Comparing Language Detection Libraries (& API) Using Java/ColdFusion/CFML
-
Announcing Lingua 1.2.0 - The most accurate natural language detection library for the JVM, suitable for long and short text alike
-
r/argentina es el subreddit de habla hispana mas popular del sitio
-
The most popular languages on Reddit, after analyzing 1M comments: English, German, Spanish, Portuguese, French, Italian, Romanian, Dutch... [OC]
-
Usando a Biblioteca Lingua para Kotlin