Splitgraph Alternatives
Similar projects and alternatives to splitgraph
-
Baserow
Baserow is an open source online database tool and Airtable alternative. Create your own database without technical experience. Our user friendly no-code tool gives you the powers of a developer without leaving your browser. (by bramw)
-
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
haystack
:mag: Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
-
django-pgviews
Fork of django-postgres that focuses on maintaining and improving support for Postgres SQL Views.
-
go-mysql-server
A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.
-
-
pgbouncer-rr-patch
Adds query routing and rewriting extensions to pgbouncer
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
-
sirix
SirixDB is a temporal, evolutionary database system, which uses an append-only approach to store immutable revisions. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
-
BlingFire
A lightning fast Finite State machine and REgular expression manipulation library.
-
metamask-extension
:globe_with_meridians: :electric_plug: The MetaMask browser extension enables browsing Ethereum blockchain enabled websites
-
Open Food Network
Connect suppliers, distributors and consumers to trade local produce. We're recruiting paid contributors, link below.
-
grouparoo
🦘 The Grouparoo Monorepo - open source customer data sync framework
-
-
-
-
Stream-Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
-
Grafana
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
-
openpilot
openpilot is an open source driver assistance system. openpilot performs the functions of Automated Lane Centering and Adaptive Cruise Control for over 150 supported car makes and models.
-
Mattermost
Mattermost is an open source platform for secure collaboration across the entire software development lifecycle.
splitgraph reviews and mentions
-
Postgres Auditing in 150 lines of SQL
You might like what we're doing with Splitgraph. Our command line tool (sgr) installs an audit log into Postgres to track changes [0]. Then `sgr commit` can write these changes to delta-compressed objects [1], where each object is a columnar fragment of data, addressable by the LTHash of rows added/deleted by the fragment, and attached to metadata describing its index [2].
I haven't explored sirix before, but at first glance it looks like we have some similar ideas — thanks for sharing, I'm excited to learn more, especially about its application of ZFS.
[0] https://www.splitgraph.com/docs/working-with-data/tracking-c...
[1] https://www.splitgraph.com/docs/concepts/objects
[2] https://github.com/splitgraph/splitgraph/blob/master/splitgr...
-
The world of PostgreSQL wire compatibility
Shameless plug, but your list is missing Splitgraph [0] :)
We’ve been based on Postgres from the beginning, and although the backend is a bit more complex at this point, we’ve kept the wire protocol intact. We’re also heavily invested in FDWs, not only for federated queries (e.g. querying data at Snowflake – btw, you might enjoy our blog post on achieving a 100x speedup with aggregation pushdown), but also for queries on warehoused data stored as Splitgraph images. By keeping Postgres compatibility as our guiding constraint, we’ve been able to build a lot of functionality on top of just a few simple abstractions. The result is something akin to a magic Postgres database – you can connect dozens of live sources to it using FDW plugins, or you can ingest from hundreds data sources using Airbyte connectors, ultimately storing the data as immutable Splitgraph images in object storage.
As for the wire protocol, our implementation is heavily reliant on (a forked version of) PgBouncer. Basically, a query arrives, we parse it for references to tables (which look like Docker image tags), and the proxy layer performs whatever orchestration is necessary to satisfy the query. That could mean instantiating a foreign server to a saved connection, loading some data from object storage, or even lazily loading only the requisite data (we call this “layered querying” since it’s implemented similarly to AUFS). In the future, it could also mean delegating the query to a more specialized engine like Presto.
Point is, by keeping the frontend intact, we’re able to retain compatibility with all Postgres clients, but we’re free to implement the backend in more scalable or domain specific ways. For example, we’re able to horizontally scale our query capacity by simply adding more “cache nodes” that perform the layered querying.
We are definitely all-in on the Postgres wire protocol, and all the ecosystem compatibility that comes along with it. You can read our blog for more in depth discussions of this, but I don’t want to spam too many links here. :)
[0] https://www.splitgraph.com
[1] https://www.splitgraph.com/blog/postgresql-fdw-aggregation-p...
-
Scalable PostgreSQL Connection Pooler
We are building a solution for this problem at Splitgraph [0] – it sounds like we could probably help with your use case. You can get it to work yourself with our open source code [1], but our (private beta, upcoming public) SaaS service will put all your schemas on a more scalable “data delivery network,” which incidentally, happens to be implemented with PgBouncer + rewriting + ephemeral instances. In a local engine (just a Postgres DB managed by Splitgraph client to add extra stuff), there is no PgBouncer, but we use Foreign Data Wrappers to accomplish the same.
On Splitgraph, every dataset – and every version of every dataset – has an address. Think of it like tagged Docker images. The address either points to an immutable “data image” (in which case we can optionally download objects required to resolve a query on-the-fly, although loading up-front is possible too) or to a live data source (in which case we proxy directly to it via FDW translation). This simple idea of _addressable data products_ goes a long way – for example, it means that computing a diff is now as simple as joining across two tables (one with the previous version, one with the new).
Please excuse the Frankenstein marketing site – we’re in the midst of redesign / rework of info architecture while we build out our SaaS product.
Feel free to reach out if you’ve got questions. And if you have a business case, we have spots available in our private pilot. My email is in my profile – mention HN :)
[0] https://www.splitgraph.com/connect
[1] examples: https://github.com/splitgraph/splitgraph/tree/master/example...
-
Ask HN: How to get compeitors to use our open source interop-prototcol?
Federated data sharing is the core use case of the magic Postgres database we’re building at Splitgraph [0]. We’d love to help you solve these problems! The ideas you’re describing are exactly what we want to achieve – data sharing should be as easy as changing a connection string in a SQL client. It sounds like your use case would be a good fit for what we’re building. If you’d like to learn more, please send me a note – email in profile.
-
Cloudera taken private for $5.3b, acquires Datacoral and Cazena
The data industry continues to hype this idea of “multi-cloud,” but then the “modern data stack” is centralized around a single warehouse and nobody sees any irony in that.
The big bet we’re making at Splitgraph [0] is that the next wave of data engineering will take a more decentralized, “data mesh” type approach to enterprise architecture. “Data gravity” really does exist -expensive to move, in terms of both cost and operational complexity. So instead of bringing the data to the query, why not bring the query to the data? All we need for that is a set of read only credentials.
Cloudera mentions they bought DataCoral to help with data integration and connectors. They’ve correctly identified the problem - data sprawl and fragmentation will inevitably grow - but I’m not sure they have the right solution.
Data integration is important, but it’s a moving target, which is why it calls for a collaborative open source solution. This is why so many new startups, like AirByte most recently, are coalescing around the Singer taps that Stitch left behind after its acquisition by Talend.
We also support using Singer taps to ingest data into versioned Splitgraph images [1], so we’re excited to see more collaboration on maintenance of taps. For us it’s a useful feature, but it should be just that — a feature. Is there really a need to replicate all of your data before you can even query it? Or would you rather experiment by directly querying its source?
[0] https://www.splitgraph.com
[1] unreleased and undocumented atm, but it does work. We’re hiring, especially on the frontend if you want to help build the web UI. See profile.
-
Google Dataset Search
On the public DDN (data.splitgraph.com:5432), we enforce a (currently arbitrary) 10k row limit on responses. You can construct multiple queries using LIMIT and OFFSET, or you can run a local Splitgraph engine without a limit. We also have a private beta program if you want a managed or self-hosted deployment. And we are planning to ship some features for "export to csv" type use cases (potentially other output formats too).
For live/external data, we proxy the query to the data source, so there is no theoretical data size limit except for any defined by the upstream.
For snapshotted data, we store the data as fragments in object storage. Any size limit depends on the machine where Splitgraph's Postgres engine is running, and how you choose to materialize the data when downloading it from object storage. You can "check out" an entire image to materialize it locally, at which point it will be like any other Postgres schema. Or you can use "layered querying" which will return a result set while only materializing the fragments necessary to answer the query.
Regarding ClickHouse, you could watch this presentation [0] my co-founder Artjoms gave at a recent ClickHouse meet-up on the topic of your question. We also have specific documentation for using the ClickHouse ODBC client with the DDN [1], as well as an example reference implementation. [2]
[0] https://www.youtube.com/watch?v=44CDs7hJTho
[1] https://www.splitgraph.com/connect
[2] https://github.com/splitgraph/splitgraph/tree/master/example...
-
Ask HN: Who is hiring? (April 2021)
Splitgraph (https://www.splitgraph.com) | Remote | Full-time
Splitgraph is reshaping how organizations interact with data. We provide a unified interface to discover and query data. In practice, this means we're building a data catalog (a web app) and query layer (implemented with the Postgres wire protocol).
We're a seed-stage, venture-funded startup hiring our initial team. The two co-founders are looking to grow the team by adding multiple engineers across the stack. This is an opportunity to make a big impact on an agile team while working closely with the founders.
Splitgraph is a remote-first organization. The founders are based in the UK, and the company is incorporated in both USA and UK. Candidates are welcome to apply from any geography. We want to work with the most talented, thoughtful and productive engineers in the world.
Open positions:
* Senior Software Engineer - Frontend. Responsible for the web stack, mainly involving Typescript, React, Next.js, Postgraphile, etc.
* Senior Software Engineer - Backend. Responsible for a variety of core services, using Python, Poetry, Postgres, C, Lua, and a ton of other technologies.
Learn more & apply: https://www.notion.so/splitgraph/Splitgraph-is-Hiring-25b421...
- Splitgraph - a tool for building, versioning and querying reproducible datasets
-
Data Mesh – a new enterprise data architecture
We're building a data mesh at Splitgraph [0]. We provide a unified interface to query and discover data products. You can query the data using the Postgres wire protocol at a single endpoint, with any of your existing tools. And you can discover it in the catalog, using a familiar GitHub-like interface. You can try this right now on the public website, where we federate access to 40k open datasets. Every dataset is addressable with a `namespace/repository:tag` format. The `tag` can refer to either the live data, in which case we forward the query upstream or to a versioned snapshot of data that you build with declarative, Docker-like tooling. [1]
On the enterprise side, integrating the access and discovery layers gives a lot of advantages, especially around data governance. On the web, we give users tools to connect data sources, document them, and share/audit access to them. When a query comes through the endpoint, since we're implemented as a Postgres proxy, we can rewrite/filter/drop it in accordance with rules, or we can forward it along to the upstream data source(s) and/or join across them. If you use Splitfiles to generate versioned data, we can also provide data lineage/provenance and full reproducibility.
We've been working on this for ~3 years but are still pretty early. If anyone wants to help, we just raised a seed round and are hiring a remote team -- check my comment history for links.
[0] https://www.splitgraph.com
[1] https://www.splitgraph.com/docs/working-with-data/using-spli...
Stats
splitgraph/splitgraph is an open source project licensed under GNU General Public License v3.0 or later which is an OSI approved license.
Popular Comparisons
Are you hiring? Post a new remote job listing for free.