nbdev
dbt
Our great sponsors
nbdev | dbt | |
---|---|---|
45 | 1 | |
4,740 | 3,802 | |
0.9% | - | |
6.5 | 10.0 | |
about 1 month ago | over 2 years ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nbdev
- The Jupyter+Git problem is now solved
-
What is literate programming used for?
One example I've seen is ML/DL folks using jupyter notebooks to develop DL libraries in jupyter notebooks, see https://github.com/fastai/nbdev
-
GitHub Accelerator: our first cohort and what's next
- https://github.com/fastai/nbdev: Increase developer productivity by 10x with a new exploratory programming workflow.
-
Startups are in first batch of GitHub OS Accelerator
9. Nbdev: Boost developer productivity with an exploratory programming workflow - https://nbdev.fast.ai/
-
Start learning python for a Statistician with SAS experience and little R experience
See if you like nbdev way of working with data through python and jupyter. nbdev is an optional part that will create python packages from jupyter notebooks. Also even the simple tutorials are opinionated and will guide you to unit test your code and write CICD pipelines.
- FastKafka - free open source python lib for building Kafka-based services
-
isn't this just too much for a take home assignment?
You probably don’t have time for this for the purposes of your task, but I will also throw in the recommendation of nbdev especially if you’re a Python person. I haven’t had a project to use it on yet, but I’ve gone through the docs and the walkthrough and it seems like a great framework for starting potential projects with all the infrastructure needed for if/when they eventually get big and need all the packaging and stuff
-
Any experience dealing with a non-technical manager?
nbdev: jupyter notebooks -> python package
-
Resources to bridge the gap between jupyter notebooks and regular python development
Take a look at https://github.com/fastai/nbdev - haven't used it but supposedly the whole if fast.ai library was written that way. It sounds like a natural direction in your scenario - allowing your to keep working in a familiar environment and still producing production ready code (will, at least in paper 😅)
- Rant: Jupyter notebooks are trash.
dbt
-
Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics
Due to the rise in cloud-based data warehouses, businesses can directly load all the raw data into the data warehouse without prior transformations. This process is known as ELT (Extract, Load, Transform) and gives data and analytics teams freedom to develop ad-hoc transformations based on their particular needs. ELT became popular as the cloud's processing power and scale became better suited to transforming data. DBT (website, GitHub) is a popular open-source tool recommended for ELT and allows businesses to transform data in their warehouses more effectively. It's a great pairing with with RudderStack's Cloud Extract ETL tool.
What are some alternatives?
papermill - 📚 Parameterize, execute, and analyze notebooks
Apache Kafka - Mirror of Apache Kafka
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
superset - Apache Superset is a Data Visualization and Data Exploration Platform
rr - Record and Replay Framework
Snowplow - The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Jupyter-PowerShell - Jupyter Kernel for PowerShell
rudderstack-docs - Documentation repository for RudderStack - the Customer Data Platform for Developers.
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
ClickHouse - ClickHouse® is a free analytics DBMS for big data