datacompy
koalas
datacompy | koalas | |
---|---|---|
4 | 2 | |
386 | 3,321 | |
8.8% | 0.3% | |
7.5 | 4.6 | |
5 days ago | about 2 months ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datacompy
- How to Check 2 SQL Tables Are the Same
-
Comparing 2 CSV files
datacompy is a package to compare 2 pandas dataframes
- Performing Data Tests on External Data/Complex Data Quality Checks
-
Best Practice When Comparing Data Across Two SQL Servers in Python
https://github.com/capitalone/datacompy will allow you to compare two tables/dataframes against one another, and see detailed results on the difference.
koalas
-
My new company uses Pyspark. I want to learn it before my starting date. Any advice?
If they're using databricks and you're familiar with pandas, koalas should be right up your alley .
-
Spark vs Pandas
If you like excessive use of square brackets.. I mean pandas, you might wanna check out Koalas. Koalas suppose to provide pandas datafrafe API implementation atop of Spark.
What are some alternatives?
data-science-ipython-notebooks - Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Dask - Parallel computing with task scheduling
data-diff - Compare tables within or across databases
PandasGUI - A GUI for Pandas DataFrames
dbt-audit-helper - Useful macros when performing data audits
popmon - Monitor the stability of a Pandas or Spark dataframe ⚙︎
visualiza - A general-purpose dynamic data visualizer.
cape-dataframes - Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.
fastdbfs - fastdbfs - An interactive command line client for Databricks DBFS.
diffable-sql