-
databooks
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I took a look at few different options, the main issue is that GitLab is ruby, while most options (like nbdime) are in python. It also needs to work by default, so zero effort for the user. What I did was create a markdown from each version, cleaning up a bit metadata and some noisy outputs, and diff them (https://gitlab.com/gitlab-org/incubation-engineering/mlops/rb-ipynbdiff). It's an MVP, but it works well enough and allows for diffing output as well (I will be adding some metadata back soon too). The next step is create a semantic diff algorithm over the JSON tree, and actually render the diffs per cell.
If you're working on diffs for Jupyter Notebooks, it's worth looking into this: https://github.com/datarootsio/databooks
I recommend checking out ploomber, great integration with notebooks and Git!
Related posts
-
[D] What MLOps platform do you use, and how helpful are they?
-
How do I number my .py file names?
-
Launch HN: Ploomber (YC W22) – Quickly Deploy Data Pipelines from Jupyter/VSCode
-
Show HN: JupySQL – a SQL client for Jupyter (ipython-SQL successor)
-
Decent low code options for orchestration and building data flows?