-
incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
In the world of table formats, there are three competing standards: Apache Iceberg, Apache Hudi, and Delta Lake, with two out of the three being Apache projects (and there is also Apache XTable for interoperability between these and future formats). For catalogs, options include Nessie, Gravitino, Polaris, and Unity Catalog, all of which are open source but not yet Apache projects.
-
InfluxDB
InfluxDB โ Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
In the world of table formats, there are three competing standards: Apache Iceberg, Apache Hudi, and Delta Lake, with two out of the three being Apache projects (and there is also Apache XTable for interoperability between these and future formats). For catalogs, options include Nessie, Gravitino, Polaris, and Unity Catalog, all of which are open source but not yet Apache projects.
-
Many popular open source projects are beloved and closely tied to particular vendors. For example, web frameworks like React and Angular are associated with Meta and Google, respectively. Database software like MongoDB, Elasticsearch, and Redis are also tied to specific commercial entities but are widely used and praised for their functionality. When there is a clear driver of a project, it can offer some benefits:
-
This practice, in itself, isn't inherently bad. Many businesses maintain commercial proprietary forks of open-source projects, but usually, the commercial version has a different name than the open-source project. For example, in the world of data catalogs, Dremio is the main developer of Nessie, and Snowflake drives Polaris. Both aim to become community-driven projects over time but will also drive integrated features in their respective commercial products under different names. For instance, if you set up your own Nessie catalog, it has a distinct name compared to the Dremio Enterprise Catalog (formerly Arctic) integrated into Dremio Cloud. The Dremio Enterprise Catalog is powered by Nessie but has additional features, so the different names prevent confusion about available features or which documentation to reference.
-
In contrast, Databricks maintains internal forks of Spark, Delta Lake, and Unity Catalog, using the same names for both the open-source versions and the features specific to the Databricks platform. While they do provide separate documentation, online discussions often reflect confusion about how to use features in the open-source versions that only exist on the Databricks platform. This creates a "muddying of the waters" between what is open and what is proprietary. This isn't an issue if you are a Databricks user, but it can be quite confusing for those who want to use these tools outside of the Databricks ecosystem.
-
Redis
For developers, who are building real-time data-driven applications, Redis is the preferred, fastest, and most feature-rich cache, data structure server, and document and vector query engine.
Many popular open source projects are beloved and closely tied to particular vendors. For example, web frameworks like React and Angular are associated with Meta and Google, respectively. Database software like MongoDB, Elasticsearch, and Redis are also tied to specific commercial entities but are widely used and praised for their functionality. When there is a clear driver of a project, it can offer some benefits:
-
In reality, independence isn't always crucial. Many open-source standards in web development, like React, are not Apache projects and are heavily directed by their creators, such as Meta. However, a web framework like React isn't responsible for the interoperability of web applications. Instead, long-standing standards like REST and HTTP serve as the glue that connects web applications across various backend languages, frontend frameworks, and more.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
In the world of table formats, there are three competing standards: Apache Iceberg, Apache Hudi, and Delta Lake, with two out of the three being Apache projects (and there is also Apache XTable for interoperability between these and future formats). For catalogs, options include Nessie, Gravitino, Polaris, and Unity Catalog, all of which are open source but not yet Apache projects.
-
Apache Arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
It's this kind of certainty that underscores the vital role of the Apache Software Foundation (ASF). Many first encounter Apache through its pioneering project, the open-source web server framework that remains ubiquitous in web operations today. The ASF was initially created to hold the intellectual property and assets of the Apache project, and it has since evolved into a cornerstone for open-source projects worldwide. The ASF enforces strict standards for diverse contributions, independence, and activity in its projects, ensuring they can withstand the test of time as standards in software development. Many open-source projects strive to become Apache projects to gain the community credibility necessary for adoption as standard software building blocks, such as Apache Tomcat for Java web applications, Apache Arrow for in-memory data representation, and Apache Parquet for data file formatting, among others.
Related posts
-
Svelte + Manifest = Giving Svelte a proper backend with 7 lines of code ๐งก๐ฆ
-
Crafting A Minimalist Portfolio Website with SvelteKit and Pico CSS
-
Apache IoTDB: Database for Internet of Things
-
Has anyone used Apache IoTDB? What is your opinion on it?
-
Best 15 Svelte UI Components & Libraries for Enterprise-Grade Apps