Apache Spark
the-algorithm
Apache Spark | the-algorithm | |
---|---|---|
101 | 265 | |
38,378 | 10 | |
0.6% | - | |
10.0 | 10.0 | |
5 days ago | about 2 years ago | |
Scala | ||
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Apache Spark
- "xAI will open source Grok"
-
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
-
🦿🛴Smarcity garbage reporting automation w/ ollama
Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
-
Go concurrency simplified. Part 4: Post office as a data pipeline
also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
-
Five Apache projects you probably didn't know about
Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
-
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Integrate Pyspark Structured Streaming with confluent-kafka
Apache Spark - https://spark.apache.org/
-
Spark – A micro framework for creating web applications in Kotlin and Java
A JVM based framework named "Spark", when https://spark.apache.org exists?
- Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
-
PySpark SparkSession Builder with Kubernetes Master
I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.
the-algorithm
-
"xAI will open source Grok"
> Wasn’t the tweet recommendation system “open sourced” as well? Does this guy know the difference between open source and “open source”?
What do you mean? There exists only one binding definition of open source
> https://opensource.org/osd
and either some product does satisfy it, or it doesn't. As far as I am aware
> https://github.com/twitter/the-algorithm
does satisfy the open source definition, so your sarcasm looks demagogical to me, but I am very willing to learn something new.
- Recommendation algorithm manipulation via mass blocks
-
Leaving Twitter
I'm not a Twitter user so if this is a dumb question I apologize.
This sounds like a pretty serious allegation. How do you know this is true? Is it in the source code?[1]
[1]: https://github.com/twitter/the-algorithm
- The new X button doesn't close the website
-
Twitter has officially changed its logo to ‘X’
There is already a bug report for this: https://github.com/twitter/the-algorithm/issues/1876
-
The look of a man who has royally screwed up
There is a way for Meta to fuck this up. The Algorithm is licensed under GPL, which is a copyleft license. That means any derivative works based on it must also be licensed and open sourced under GPL. If Meta doesn't do that, they may be on the hook.
- The irony
-
Twitter sends Meta cease-and-desist letter over new Threads app: Sources
And I believe the source for that was effectively opened up to the world: https://github.com/twitter/the-algorithm
- Twitter is threatening to sue Meta over threads. Here’s the letter Twitter sent Meta
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
hn-search - Hacker News Search
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
the-algorithm-ml - Source code for Twitter's Recommendation Algorithm
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Mastodon - Your self-hosted, globally interconnected microblogging community
Scalding - A Scala API for Cascading
apple-notes-liberator - Free your Apple Notes data from Notes.app
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
Async Ruby - An awesome asynchronous event-driven reactor for Ruby.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Finagle - A fault tolerant, protocol-agnostic RPC system