OpenRefine
pyinfra
OpenRefine | pyinfra | |
---|---|---|
45 | 30 | |
10,498 | 2,644 | |
0.5% | 2.2% | |
9.7 | 9.0 | |
about 20 hours ago | 10 days ago | |
Java | Python | |
BSD 3-clause "New" or "Revised" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenRefine
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
"OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/
-
What you need to know about the future of Mozilla Hubs
Yes, let's hope! The strategy has worked out sometimes - Google shut down 'Google Refine' 10 years ago, it got turned into 'Open Refine', last update 2 months ago. https://github.com/OpenRefine/OpenRefine
It's a hugely useful tool if you're working with messy Excel-scale data, i.e., most biologists or social scientists.
-
OpenRefine
It seems to be pure JS with jQuery: https://github.com/OpenRefine/OpenRefine/blob/master/main/we...
-
java string equals returns false, even for identical strings
EDIT: trim() does not remove unicode 0x200b (unicode character for zero width space). https://github.com/OpenRefine/OpenRefine/issues/5105 is worth a read.
-
UIUC MCS - CS 513 Review - Theory and Practice of Data Cleaning
There were six homework assignments. In order they were Regular Expressions, OpenRefine, Datalog, SQL, Provenance, and Python. None of these assignments took more than two to three hours to complete. They all were basic implementation and programming assignments with autograders.
-
"We have great datasets"
Open Refine will get you about 70% there. It's FOSS
-
Is there any tools to streamline data cleaning process?
I’ve heard good things about https://openrefine.org/
-
What is the best approach to removing duplicate person records if the only identifier is person firstname middle name and last name? These names are entered in varying ways to the DB, thus they are free-fromatted.
It's not suited to SQL, use Open Refine or python fuzzywuzzy.
pyinfra
- Pyinfra: Automate Infrastructure Using Python
-
Show HN: A new open-source automation tool as an alternative to Ansible/Salt
There is https://pyinfra.com/
As a sidenote, I also made a small experiment a while ago : https://github.com/linkdd/tricorder/
But it's a bit of a chicken-and-egg problem. Without users, I don't know how it should be used, without features I won't get any users. So for now, it's in a state of "I'll address bug reports and feature requests, but I won't actively develop it".
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
I like https://github.com/pyinfra-dev/pyinfra. "pyinfra automates infrastructure using Python"
Only played with it for a little but it seems well designed an simpler alternative to ansible, chef and other such things.
-
Interesting Uses of Ansible's ternary filter
Haven't used it in anger yet, but I have high hopes for PyInfra: https://github.com/pyinfra-dev/pyinfra
-
How to manage multiple Wagtail sites from central point
pyinfra - https://pyinfra.com/ - Pyinfra is simpler for me than Ansible. I completed the entire deployment in one afternoon, from installing and configuring the VPS server from scratch to deploying the application and automatically restoring the database from a backup.
- Pyinfra: Pyinfra automates infrastructure super fast at scale
-
How do you guys handle server automation?
I’ve replaced Ansible with PyInfra where ever possible. https://pyinfra.com/ is very clean, and fast but lacks the shear amount of automation that can be found with Ansible.
-
What Ansible is capable to do that Python doesn't?
Some folks don't like YAML all that well, and I can understand where they are coming from. I wish Ansible provided a good Python API so that playbooks could be written in Python easier. But there is a project called PyInfra that is trying to do something similiar to Ansible, using Python as the configuration language. https://pyinfra.com/ It is still pretty new so not got nearly as many modules written for it yet.
- Pyinfra automates infrastructure super fast at scale
What are some alternatives?
CQEngine - Ultra-fast SQL-like queries on Java collections
Ansible - Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.
visidata - A terminal spreadsheet multitool for discovering and arranging data
Fabric - Simple, Pythonic remote execution and deployment.
LightAdmin - [PoC] Pluggable CRUD UI library for Java web applications
psutil - Cross-platform lib for process and system monitoring in Python
Smooks - Extensible data integration Java framework for building XML and non-XML fragment-based applications
Docker Compose - Define and run multi-container applications with Docker
Jimfs - An in-memory file system for Java 7+
letsencrypt - Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.
JBake - Java based open source static site/blog generator for developers & designers.
SaltStack - Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here: