OpenRefine
english-words
Our great sponsors
OpenRefine | english-words | |
---|---|---|
45 | 84 | |
10,465 | 10,052 | |
1.7% | 1.8% | |
9.7 | 0.0 | |
1 day ago | 21 days ago | |
Java | Python | |
BSD 3-clause "New" or "Revised" License | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenRefine
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
"OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/
-
What you need to know about the future of Mozilla Hubs
Yes, let's hope! The strategy has worked out sometimes - Google shut down 'Google Refine' 10 years ago, it got turned into 'Open Refine', last update 2 months ago. https://github.com/OpenRefine/OpenRefine
It's a hugely useful tool if you're working with messy Excel-scale data, i.e., most biologists or social scientists.
-
OpenRefine
It seems to be pure JS with jQuery: https://github.com/OpenRefine/OpenRefine/blob/master/main/we...
-
java string equals returns false, even for identical strings
EDIT: trim() does not remove unicode 0x200b (unicode character for zero width space). https://github.com/OpenRefine/OpenRefine/issues/5105 is worth a read.
-
UIUC MCS - CS 513 Review - Theory and Practice of Data Cleaning
There were six homework assignments. In order they were Regular Expressions, OpenRefine, Datalog, SQL, Provenance, and Python. None of these assignments took more than two to three hours to complete. They all were basic implementation and programming assignments with autograders.
-
"We have great datasets"
Open Refine will get you about 70% there. It's FOSS
-
Is there any tools to streamline data cleaning process?
I’ve heard good things about https://openrefine.org/
-
What is the best approach to removing duplicate person records if the only identifier is person firstname middle name and last name? These names are entered in varying ways to the DB, thus they are free-fromatted.
It's not suited to SQL, use Open Refine or python fuzzywuzzy.
english-words
- The longest word you can type on the first row of a QWERTY keyboard
-
Is there an English based word where the letter J is followed by a consonant?
From this word list, there are 88 "words" containing a J followed by a consonant. The only ones in any kind of common use (that aren't abbreviations or something like that) are from Arabic.
-
Is there a create that provides a dictionary of words?
What you're looking for is not a crate but data. You can search for a list of all words in English (or your language of choice), such as this, but for a game, you probably want only the most common ones.
-
Need help importing the entire English dictionary in an iterable format. No definitions, just words.
You can find all the English words here.
-
What is sleep paralysis? And Astral projection if you are just your physical body?
If I were to put a 30-digit integer number, and 5 random words from https://github.com/dwyl/english-words (the dictionary files, not the webpage), would you be able to tell me what they were? How much lead-up time would that take? (I'll find a location where nobody else could see the paper, and would make it so that after my publishing the location and time, nobody would have a reasonable chance of getting there.)
-
Getting an English dictionary
You can just do a join with any text file containing English words. For example a quick search shows this.
-
Most common English words containing every possible pair of letters [OC]
List of every English word: https://github.com/dwyl/english-words
- Obtaining a Word List
- Re-building Spelling Bee for fun: what dictionary should I use
-
Need help in making "crossword puzzle" for assignment in C++
Well, that seems like a really shitty task then, if you were not even given a limited list of valid words to search for. English has many thousands of different words and it might be difficult to find a complete list somewhere. Maybe https://github.com/dwyl/english-words
What are some alternatives?
CQEngine - Ultra-fast SQL-like queries on Java collections
google-10000-english - This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
visidata - A terminal spreadsheet multitool for discovering and arranging data
SecLists - SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.
LightAdmin - [PoC] Pluggable CRUD UI library for Java web applications
Adj-Noun-Wordlist-Generator - Outputs combinations of adjectives, nouns and digits.
Smooks - Extensible data integration Java framework for building XML and non-XML fragment-based applications
Removeddit - View deleted stuff from reddit
Jimfs - An in-memory file system for Java 7+
toybox
JBake - Java based open source static site/blog generator for developers & designers.
data-police-shootings - The Washington Post is compiling a database of every fatal shooting in the United States by a police officer in the line of duty since 2015.