Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Apache Arrow
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
TA-Lib
What you need for your use case is a column-oriented store. I recommend explore bcolz or apache arrow for a column file-based systems. These are very fast, support memory mapping, uses compression and SSD speed (and even CPU architecture, in case of arrow) optimally almost out of the box, and has good interfaces to Numpy and Pandas (in case you are using Python for final data consumption and analysis). The columnar structure makes it easy to add or delete a column easily (or even dynamically). If you need a more scalable (albeit at the cost of speed) solution, you can devise a schema over a regular columnar db or an nosql db - see arctic from Man group for an example.
What you need for your use case is a column-oriented store. I recommend explore bcolz or apache arrow for a column file-based systems. These are very fast, support memory mapping, uses compression and SSD speed (and even CPU architecture, in case of arrow) optimally almost out of the box, and has good interfaces to Numpy and Pandas (in case you are using Python for final data consumption and analysis). The columnar structure makes it easy to add or delete a column easily (or even dynamically). If you need a more scalable (albeit at the cost of speed) solution, you can devise a schema over a regular columnar db or an nosql db - see arctic from Man group for an example.
What you need for your use case is a column-oriented store. I recommend explore bcolz or apache arrow for a column file-based systems. These are very fast, support memory mapping, uses compression and SSD speed (and even CPU architecture, in case of arrow) optimally almost out of the box, and has good interfaces to Numpy and Pandas (in case you are using Python for final data consumption and analysis). The columnar structure makes it easy to add or delete a column easily (or even dynamically). If you need a more scalable (albeit at the cost of speed) solution, you can devise a schema over a regular columnar db or an nosql db - see arctic from Man group for an example.
I do the exact thing with a CSV file. The project is open source here https://github.com/namuan/trading-utils/ if you want to have a look.
Related posts
- Interacting with Amazon S3 using AWS Data Wrangler (awswrangler) SDK for Pandas: A Comprehensive Guide
- How to use Spark and Pandas to prepare big data
- How to use Spark and Pandas to prepare big data
- Arrow v1.0: After 8 years, a new milestone with a lot of new features
- AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite