Greenplum VS vitess

Compare Greenplum vs vitess and see what are their differences.

Greenplum

Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI. (by greenplum-db)

vitess

Vitess is a database clustering system for horizontal scaling of MySQL. (by vitessio)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
Greenplum vitess
9 59
6,177 17,715
1.2% 1.7%
9.9 9.9
6 days ago about 16 hours ago
C Go
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Greenplum

Posts with mentions or reviews of Greenplum. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-11.
  • Ask HN: It's 2023, how do you choose between MySQL and Postgres?
    7 projects | news.ycombinator.com | 11 May 2023
    Friends don't let their friends choose Mysql :)

    A super long time ago (decades) when I was using Oracle regularly I had to make a decision on which way to go. Although Mysql then had the mindshare I thought that Postgres was more similar to Oracle, more standards compliant, and more of a real enterprise type of DB. The rumor was also that Postgres was heavier than MySQL. Too many horror stories of lost data (MyIsam), bad transactions (MyIsam lacks transaction integrity), and the number of Mysql gotchas being a really long list influenced me.

    In time I actually found out that I had underestimated one of the most important attributes of Postgres that was a huge strength over Mysql: the power of community. Because Postgres has a really superb community that can be found on Libera Chat and elsewhere, and they are very willing to help out, I think Postgres has a huge advantage over Mysql. RhodiumToad [Andrew Gierth] https://github.com/RhodiumToad & davidfetter [David Fetter] https://www.linkedin.com/in/davidfetter are incredibly helpful folks.

    I don't know that Postgres' licensing made a huge difference or not but my perception is that there are a ton of 3rd party products based on Postgres but customized to specific DB needs because of the more liberalness of the PG license which is MIT/BSD derived https://www.postgresql.org/about/licence/

    Some of the PG based 3rd party DBs:

    Enterprise DB https://www.enterprisedb.com/ - general purpose PG with some variants

    Greenplum https://greenplum.org/ - Data warehousing

    Crunchydata https://www.crunchydata.com/products/hardened-postgres - high security Postgres for regulated environments

    Citus https://www.citusdata.com - Distributed DB & Columnar

    Timescale https://www.timescale.com/

    Why Choose PG today?

    If you want better ACID: Postgres

    If you want more compliant SQL: Postgres

    If you want more customizability to a variety of use-cases: Postgres using a variant

    If you want the flexibility of using NOSQL at times: Postgres

    If you want more product knowledge reusability for other backend products: Postgres

  • Show HN: Postgres WASM
    16 projects | news.ycombinator.com | 3 Oct 2022
    I was wondering if anyone had thought about using this to experiment with the planner.

    The engineering and support teams at Greenplum, a fork of Postgres, have a tool (minirepro[0]) which, given a sql query, can grab a minimal set of DDLs and the associated statistics for the tables involved in the query that can then be loaded into a "local" GPDB instance. Having the DDL and the statistics meant the team was able to debug issues in the optimizer (example [1]), without having access to a full set of data. This approach, if my understanding is correct, could be enabled in the browser with this Postgres WASM capability.

    [0] https://github.com/greenplum-db/gpdb/blob/6X_STABLE/gpMgmt/b...

  • Amazon Aurora's Read/Write Capability Enhancement with Apache ShardingSphere-Proxy
    5 projects | dev.to | 26 May 2022
    A database solution architect at AWS, with over 10 years of experience in the database industry. Lili has been involved in the R&D of the Hadoop/Hive NoSQL database, enterprise-level database DB2, distributed data warehouse Greenplum/Apache HAWQ and Amazon’s cloud native database.
  • What’s the Database Plus concept and what challenges can it solve?
    5 projects | dev.to | 10 May 2022
    Today, it is normal for enterprises to leverage diversified databases. In my market of expertise, China, in the Internet industry, MySQL together with data sharding middleware is the go to architecture, with GreenPlum, HBase, Elasticsearch, Clickhouse and other big data ecosystems being auxiliary computing engine for analytical data. At the same time, some legacy systems (such as SQLServer legacy from .NET transformation, or Oracle legacy from outsourcing) can still be found in use. In the financial industry, Oracle or DB2 is still heavily used as the core transaction system. New business is migrating to MySQL or PostgreSQL. In addition to transactional databases, analytical databases are increasingly diversified as well.
  • Data Science Competition
    15 projects | dev.to | 25 Mar 2022
    Green Plum
  • Inspecting joins in PostgreSQL
    2 projects | dev.to | 11 Jan 2022
    PostgreSQL is a free and advanced database system with the capacity to handle a lot of data. It’s available for very large data in several forms like Greenplum and Redshift on Amazon. It is open source and is managed by an organized and very principled community.
  • Using Postgres as a Data Warehouse
    3 projects | /r/dataengineering | 11 May 2021
    There's Greenplum!

vitess

Posts with mentions or reviews of vitess. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-07.
  • Vitess 18
    2 projects | news.ycombinator.com | 7 Nov 2023
    Why would it be a Google project? https://github.com/vitessio/vitess
  • PlanetScale Scaler Pro
    3 projects | news.ycombinator.com | 6 Jul 2023
    This is great news. I strolled around https://github.com/vitessio/vitess/issues/12967.

    Are there any public discussions of more trade-offs vitess has to make to enable fks?

  • Scaling Databases at Activision [pdf]
    3 projects | news.ycombinator.com | 21 Apr 2023
    3 projects | news.ycombinator.com | 21 Apr 2023
  • Want to avoid MySQL but find PlanetScale really appealing
    4 projects | /r/PostgreSQL | 25 Mar 2023
    A lot of this is possible thanks to the magic of Vitess.
  • YouTube System Design
    2 projects | /r/softwarearchitecture | 5 Feb 2023
    ### YouTube The popular implementations of an on-demand video streaming service are the following: - YouTube - Netflix - Vimeo - TikTok --- #### Requirements - The user (**client**) can upload video files - The user can stream video content - The user can search for videos based on the video title --- #### Data storage ##### Database schema - The primary entities are the videos, the users, and the comments tables - The relationship between the users and the videos is 1-to-many - The relationship between the users and the comments table is 1-to-many - The relationship between the videos and the comments table is 1-to-many --- ##### Type of data store - The wide-column data store ([LSM](https://en.wikipedia.org/wiki/Log-structured\_merge-tree) tree-based) such as [Apache HBase](https://hbase.apache.org/) is used to persist thumbnail images for clumping the files together, fault-tolerance, and replication - A cache server such as Redis is used to store the metadata of popular video content - Message queue such as Apache Kafka is used for the asynchronous processing (encoding) of videos - A relational database such as MySQL stores the metadata of the users and the videos - The video files are stored in a managed object storage such as AWS S3 - Lucene-based inverted-index data store such as Apache Solr is used to persist the video index data to provide search functionality --- #### High-level design - Popular video content is streamed from CDN - Video encoding (**transcoding**) is the process of converting a video format to other formats (MPEG, HLS) to provide the best stream possible on multiple devices and bandwidth - A message queue can be configured between services for parallelism and improved fault tolerance Codecs (H.264, VP9, HEVC) are compression and decompression algorithms used to reduce video file size while preserving video quality - The popular video streaming protocols (data transfer standard) are **MPEG-DASH** (Moving Pictures Experts Group - Dynamic Adaptive Streaming over HTTP), **Apple HLS** (HTTP Live Streaming), **Microsoft Smooth Streaming**, and **Adobe HDS** (HTTP Dynamic Streaming) --- #### Video upload workflow 1. The user (**client**) executes a DNS query to identify the server 2. The client makes an HTTP connection to the load balancer 3. The video upload requests are rate limited to prevent malicious clients 4. The load balancer delegates the client's request to an API server (**web server**) with free capacity 5. The web server delegates the client's request to an app server that handles the API endpoint 6. The ID of the uploaded video is stored on the message queue for asynchronous processing of the video file 7. The title and description (**metadata**) of the video are stored in the metadata database 8. The app server queries the object store service to generate a pre-signed URL for storing the raw video file 9. The client uploads the raw video file directly to the object store using the pre-signed URL to save the system network bandwidth 10. The transcoding servers query the message queue using the publish-subscribe pattern to get notified on uploaded videos 11. The transcoding server fetches the raw video file by querying the raw object store 12. The transcoding server transcodes the raw video file into multiple codecs and stores the transcoded content on the transcoded object store 13. The thumbnail server generates on average five thumbnail images for each video file and stores the generated images on the thumbnail store 14. The transcoding server persists the ID of the transcoded video on the message queue for further processing 15. The upload handler service queries the message queue through the publish-subscribe pattern to get notified on transcoded video files 16. The upload handler service updates the metadata database with metadata of transcoded video files 17. The upload handler service queries the notification service to notify the client of the video processing status 18. The database can be partitioned through [consistent hashing](https://systemdesign.one/consistent-hashing-explained/) (key = user ID or video ID) 19. [Block matching](https://en.wikipedia.org/wiki/Block-matching\_algorithm) or [Phase correlation](https://en.wikipedia.org/wiki/Phase\_correlation) algorithms can be used to detect the duplicate video content 20. The web server (API server) must be kept stateless for scaling out through replication 21. The video file is stored in multiple resolutions and formats in order to support multiple devices and bandwidth 22. The video can be split into smaller chunks by the client before upload to support the resume of broken uploads 23. Watermarking and encryption can be used to protect video content 24. The data centers are added to improve latency and data recovery at the expense of increased maintenance workflows 25. Dead letter queue can be used to improve fault tolerance and error handling 26. Chaos engineering is used to identify the failures on networks, servers, and applications 27. Load testing and chaos engineering are used to improve fault tolerance 28. [RAID](https://en.wikipedia.org/wiki/RAID) configuration improves the hardware throughput 29. The data store is partitioned to spread the writes and reads at the expense of difficult joins, transactions, and fat client 30. Federation and sharding are used to scale out the database 31. The write requests are redirected to the leader and the read requests are redirected to the followers of the database 32. [Vitess](https://vitess.io/) is a storage middleware for scaling out MySQL 33. Vitess redirects the read requests that require fresh data to the leader (For example, update user profile operation) 34. Vitess uses a lock server (Apache Zookeeper) for automatic sharding and leader election on the database layer 35. Vitess supports RPC-based joins, indexing, and transactions on SQL database 36. Vitess allows to offload of partitioning logic from the application and improves database queries by caching
  • Typesafe Database Queries on the Edge
    9 projects | dev.to | 12 Nov 2022
    PlanetScale is a serverless MySQL database provider which is based on Vitess. You get the scaling benefits of Vitess without the need to manage it yourself.
  • YouTube confirms that it has removed the “sort by oldest/newest” option
    10 projects | news.ycombinator.com | 11 Nov 2022
  • Ask HN: Real-world anecdotes of MySQL at scale?
    2 projects | news.ycombinator.com | 27 Sep 2022
    Are you referring to distributed MySQL such as Vitess? It is the backend for Slack and GitHub; also was the backend for YouTube in the past.

    https://vitess.io/

    2 projects | news.ycombinator.com | 27 Sep 2022
    There’s Vitess that’s been mentioned on HN a lot recently. https://vitess.io/

What are some alternatives?

When comparing Greenplum and vitess you can also consider the following projects:

tidb - TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial

supabase - The open source Firebase alternative.

cockroach - CockroachDB - the open source, cloud-native distributed SQL database.

citus - Distributed PostgreSQL as an extension

go-mysql-elasticsearch - Sync MySQL data into elasticsearch

kingshard - A high-performance MySQL proxy

Tile38 - Real-time Geospatial and Geofencing

migrate - Database migrations. CLI and Golang library.

TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

ClickHouse - ClickHouse® is a free analytics DBMS for big data

orchestrator - MySQL replication topology manager/visualizer