devenv
deequ
Our great sponsors
devenv | deequ | |
---|---|---|
88 | 17 | |
3,410 | 3,119 | |
15.2% | 1.5% | |
9.8 | 7.5 | |
6 days ago | 7 days ago | |
Nix | Scala | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
devenv
-
Fast, Declarative, Reproduble and Composable Developer Environments Using Nix
I gave devenv multiple tries, and I am sorry to say there are multiple annoying issues that forced me to give up every time.
Some of these 200+ issues are unsolved for a fairly long time.
https://github.com/cachix/devenv/issues
-
Nix – A One Pager
Software developers often want to customize:
1. their home environments: for packages (some reach for brew on MacOS) and configurations (dotfiles, and some reach for stow).
2. their development shells: for build dependencies (compilers, SDKs, libraries), tools (LSP, linters, formatters, debuggers), and services (runtime, database). Some reach for devcontainers here.
3. or even their operating systems: for development, for CI, for deployment, or for personal use.
Nix provision all of the above in the same language, with Nixpkgs, NixOS, home-manager, and devShells such as https://devenv.sh/. What's more, Nix is (https://nixos.org/):
- reproducible: what works on your dev machine also works in CI in prod,
- declarative: you version control and review your configurations and infrastructure as code, at a reasonable level of abstraction,
- reliable: all changes are atomic with easy roll back.
-
Show HN: Lapdev, a new open-source remote dev environment management software
https://devenv.sh/ and nix in general are great for setting up dev environments.
-
Show HN: Flox 1.0 – Open-source dev env as code with Nix
> but worried that the development is not moving forward
There is an open v1.0 PR: https://github.com/cachix/devenv/pull/1005
-
What's the Next Vagrant?
2) A way to run services apps depend on (databases, job runners, cache etc).
I am going to suggest one of the Nix based tools that do those things:
- https://devenv.sh/ (I use this at work)
-
Ask HN: How can I make local dev with containers hurt less?
Yup, I haven’t tried it but there is https://devenv.sh which is built on top of nix and makes it simple.
-
Flakes aren't real and cannot hurt you: using Nix flakes the non-flake way
Although Guix reads better than Nix (after all, it's Lisp), I found the support and resources available for learning severely lacking.
Plus, you have to jump through hoops to install non-free software, which goes against the ethos of Guix anyway.
IMHO, Nix is clearly "the winner" here and we'll see more and more adoption as it improves. Lots of folks are doing exciting work (see https://determinate.systems/, https://devenv.sh/, https://flakehub.com/). And the scale and organization around nixpkgs is damn impressive.
-
NixOS has one fatal flaw
I don't think you can ever get Nix as simple as PNPM, simply because native libraries are sometimes annoying, need to be configured at build time to a greater degree and because the problem space it attacks is so much larger than PNPM, which only deals with the JS/Node.js ecosystem.
However, I do think that there exist reasonable levels of abstraction that sacrifice some expressive power for simplicity and such systems could maybe expose a PNPM-like CLI. One example that comes to mind is devenv.nix [1]. While it doesn't yet have a CLI, its configuration file is YAML and relatively simple. I think there's more to be done in this space and I hope for tools that are easier to grasp in the future.
> Nix package files evaluate down to configuration for the Nix package manager, but I haven’t ever seen a good explanation for the basic essentials underneath all the abstraction. Every guide I’ve learned from and all the package defs I’ve read seem to cargo cult many layers of mysterious config composing config. Without easy to learn essentials it’s difficult to grok the system as a whole.
To me it sounds like the essential that you're referring to is the 'derivation' primitive, which is almost always hidden behind the mkDerivation abstraction from nixpkgs. This [2] blog post is an exploration of what exactly that means.
I'd also love for the documentation situation to be much better, in particular in terms of official, curated resources. But I'm not convinced that you actually need to know the difference between derivation and mkDerivation to make effective use of Nix, because in practice you would always use the latter. That said, mkDerivation and the whole of nixpkgs is essentially a huge DSL (I believe this is what you meant when you said 'config composing config') that you do need to know and is woefully underdocumented.
> I would love to adopt Nix for developer tooling for Notion’s engineers, but today it’s about infinity times easier to work around the limitations mentioned of Docker+Ubuntu+NPM than to work around the limitations of Nix.
One approach I have taken to is to specify the environment in Nix, but then generate Docker devcontainers from it, so most people don't come into contact with Nix if they don't want to.
[1] https://devenv.sh
[2] https://ianthehenry.com/posts/how-to-learn-nix/derivations/
-
Development Environments with Guix, similar to devenv.sh
This though, through the use of devenv.sh, which uses nix, as when I got into nix I though it was going to be easier to just make a development environment, not the case. Until I found devenv.sh, I could actually finally make good environments... It also has other features like containers and services, which also help me know that I can get the most of it if the time comes.
-
devenv needs help testing 1.0 release
Instructions: https://github.com/cachix/devenv/pull/745
deequ
-
[Data Quality] Deequ Feedback request
There's no straightforward way to drop and rerun a metric collection. For example, say you detect a problem in your data. You fix it, rerun the pipeline, and replace the bad data with the good. You'd want your metrics history to reflect the true state of your data. But the "bad run" cannot be dropped. Issue
-
Thoughts on a business rules engine
I had similar requirements for QA reporting on large and diverse data sets. I implemented data check pipelines, with rules in AWS Deequ (https://github.com/awslabs/deequ) running on an Apache Spark cluster. The Deequ worked well for me, but there were a few cases where I opted to write the rule checks in the data store to improve throughput (i.e. SQL checks on critical data elements on the database).
-
Building a data quality solution for devs and business people
Hey all! At the companies where I've worked as a developer, I've found that business stakeholders typically want a concrete way to check and assure the quality of data that pipelines are producing, before other downstream systems and users get impacted. I've tested solutions like Deequ, but I found that it made building compliance and data rules a bit more complicated and put a greater emphasis on developers to get the rules right that business was expecting. I also experienced issues with running checks in parallel and getting row level details about the failures.
-
deequ VS cuallee - a user suggested alternative
2 projects | 30 Nov 2022
- November 15-19, 2022 FLiP Stack Weekly
- What are your favourite GitHub repos that shows how data engineering should be done?
- Well designed scala/spark project
-
Soda Core (OSS) is now GA! So, why should you add checks to your data pipelines?
GE is arguably the most well known OSS alternative to Soda Core. The third option is deequ, originally developed and released in OSS by AWS. Our community has told us that Soda Core is different because it’s easy to get going and embed into data pipelines. And it also allows some of the check authoring work to be moved to other members of the data team. I'm sure there are also scenarios where Soda Core is not the best option. For example, when you only use Pandas dataframes or develop in Scala.
-
Congrats on hitting the v1 milestone, whylabs! You're r/MLOps OSS tool of the month!
I wonder how this compares with tools like DeeQu (https://github.com/awslabs/python-deequ - requires Spark) or Pandas Profiling? One plus side I can see is that it doesn't require Apache Spark to run profiling (though a quick look at the code indicates that they are working on Spark support) and can work with real time systems.
-
What companies/startups are using Scala (open source projects on github)?
There are so many of them in big data, e.g. Kafka, Spark, Flink, Delta, Snowplow, Finagle, Deequ, CMAK, OpenWhisk, Snowflake, TheHive, TVM-VTA, etc.
What are some alternatives?
devbox - Instant, easy, and predictable development environments
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
nix-direnv - A fast, persistent use_nix/use_flake implementation for direnv [maintainer=@Mic92 / @bbenne10]
azure-kusto-spark - Apache Spark Connector for Azure Kusto
direnv - unclutter your .profile
dbt-data-reliability - dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
devshell - Per project developer environments
Quill - Compile-time Language Integrated Queries for Scala
rembg - Rembg is a tool to remove images background
BigDL - Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc.
nix - Nix, the purely functional package manager
re_data - re_data - fix data issues before your users & CEO would discover them 😊