Aerospike
Apache Spark
Our great sponsors
Aerospike | Apache Spark | |
---|---|---|
15 | 101 | |
971 | 38,378 | |
3.0% | 1.3% | |
8.7 | 10.0 | |
27 days ago | 2 days ago | |
C | Scala | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Aerospike
- Ask HN: Why are there no open source NVMe-native key value stores in 2023?
-
Aerospike Driver for LINQPad
Aerospike for LINQPad 7 is a data context dynamic driver for interactively querying and updating an Aerospike database using “LINQPad”. The driver is free. For more information go to this blog post. You can directly download the driver from the LINQPad NuGet manager.
-
Using In-Memory Databases in Data Science
Aerospike is a real-time cloud structured platform with good performance capabilities. This IMDB platform allows enterprises to perform their operations in real time through the hybrid memory and parallelism model.
- System Design: Caching, Content Delivery Networks (CDN) & Proxies.
-
Block and Filesystem side-by-side with K8s and Aerospike
Block storage stores a sequence of bytes in a fixed size block (page) on a storage device. Each block has a unique hash that references the address location of the specified block. Unlike a filesystem, block storage doesn't have the associated metadata such as format-type, owner, date, etc. Also, block storage doesn’t use the conventional storage paths to access data like a filesystem file. This reduction in overhead contributes to improved overall access speeds when using raw block devices. The ability to store bytes in blocks allows applications the flexibility to decide how these blocks are accessed and managed, making block storage an ideal choice for low latency databases such as Aerospike. From a developer's perspective, a block device is simply a large array of bytes, usually with some minimum granularity for reads and writes. In Aerospike this granularity is configured and referred to as the write-block-size. The Aerospike Kubernetes Operator uses the storage infrastructure software inside of Kubernetes and the need for data platforms to use raw block storage becomes ever more important.
-
Aerospike & IoT using MQTT
This example shows how the Aerospike database can be easily and scalably used to store industrial time series data made available by the MQTT ecosystem. Aerospike plus its Community Time Series Client streamlines the storage and retrieval of the data, supporting the ability to both write and read millions of data points per second if required.
-
Building Large-Scale Real-Time JSON Applications
Real-time large-scale JSON applications need reliably fast access to data, high ingest rates, powerful queries, rich document functionality, scalability with no practical limit, always-on operation, and integration with streaming and analytical platforms. They need all this at low cost. The Aerospike Real-time Data Platform provides all this functionality, making it a good choice for building such applications. The Collection Data Types (CDTs) in Aerospike provide powerful support for modeling, organizing, and querying a large JSON document store. Visit the tutorials and code sandbox on the Developer Hub to explore the capabilities of the platform, and play with the Document API and query capabilities for JSON.
- System Design: NoSQL databases
- System Design: Caching
- Aerospike named to Inc. 5000 list of fastest-growing companies in America.
Apache Spark
- "xAI will open source Grok"
-
Groovy 🎷 Cheat Sheet - 01 Say "Hello" from Groovy
Recently I had to revisit the "JVM languages universe" again. Yes, language(s), plural! Java isn't the only language that uses the JVM. I previously used Scala, which is a JVM language, to use Apache Spark for Data Engineering workloads, but this is for another post 😉.
-
🦿🛴Smarcity garbage reporting automation w/ ollama
Consume data into third party software (then let Open Search or Apache Spark or Apache Pinot) for analysis/datascience, GIS systems (so you can put reports on a map) or any ticket management system
-
Go concurrency simplified. Part 4: Post office as a data pipeline
also, this knowledge applies to learning more about data engineering, as this field of software engineering relies heavily on the event-driven approach via tools like Spark, Flink, Kafka, etc.
-
Five Apache projects you probably didn't know about
Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.
-
Apache Spark VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023
-
Integrate Pyspark Structured Streaming with confluent-kafka
Apache Spark - https://spark.apache.org/
-
Spark – A micro framework for creating web applications in Kotlin and Java
A JVM based framework named "Spark", when https://spark.apache.org exists?
- Rest in Peas: The Unrecognized Death of Speech Recognition (2010)
-
PySpark SparkSession Builder with Kubernetes Master
I recently saw a pull request that was merged to the Apache/Spark repository that apparently adds initial Python bindings for PySpark on K8s. I posted a comment to the PR asking a question about how to use spark-on-k8s in a Python Jupyter notebook, and was told to ask my question here.
What are some alternatives?
dragonfly - A modern replacement for Redis and Memcached
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
yugabyte-db - YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ClickHouse - ClickHouse® is a free analytics DBMS for big data
Scalding - A Scala API for Cascading
neon - Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, branching, and bottomless storage.
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
ydb - YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.