-
I took a look at few different options, the main issue is that GitLab is ruby, while most options (like nbdime) are in python. It also needs to work by default, so zero effort for the user. What I did was create a markdown from each version, cleaning up a bit metadata and some noisy outputs, and diff them (https://gitlab.com/gitlab-org/incubation-engineering/mlops/rb-ipynbdiff). It's an MVP, but it works well enough and allows for diffing output as well (I will be adding some metadata back soon too). The next step is create a semantic diff algorithm over the JSON tree, and actually render the diffs per cell.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
databooks
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.
If you're working on diffs for Jupyter Notebooks, it's worth looking into this: https://github.com/datarootsio/databooks
-
-
I recommend checking out ploomber, great integration with notebooks and Git!
Related posts
-
[D] What MLOps platform do you use, and how helpful are they?
-
How do I number my .py file names?
-
Launch HN: Ploomber (YC W22) – Quickly Deploy Data Pipelines from Jupyter/VSCode
-
Show HN: JupySQL – a SQL client for Jupyter (ipython-SQL successor)
-
Decent low code options for orchestration and building data flows?