OpenRefine
CSVLint
OpenRefine | CSVLint | |
---|---|---|
45 | 44 | |
10,498 | 134 | |
0.5% | - | |
9.7 | 7.6 | |
1 day ago | about 1 month ago | |
Java | C# | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenRefine
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
"OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/
-
What you need to know about the future of Mozilla Hubs
Yes, let's hope! The strategy has worked out sometimes - Google shut down 'Google Refine' 10 years ago, it got turned into 'Open Refine', last update 2 months ago. https://github.com/OpenRefine/OpenRefine
It's a hugely useful tool if you're working with messy Excel-scale data, i.e., most biologists or social scientists.
-
OpenRefine
It seems to be pure JS with jQuery: https://github.com/OpenRefine/OpenRefine/blob/master/main/we...
-
java string equals returns false, even for identical strings
EDIT: trim() does not remove unicode 0x200b (unicode character for zero width space). https://github.com/OpenRefine/OpenRefine/issues/5105 is worth a read.
-
UIUC MCS - CS 513 Review - Theory and Practice of Data Cleaning
There were six homework assignments. In order they were Regular Expressions, OpenRefine, Datalog, SQL, Provenance, and Python. None of these assignments took more than two to three hours to complete. They all were basic implementation and programming assignments with autograders.
-
"We have great datasets"
Open Refine will get you about 70% there. It's FOSS
-
Is there any tools to streamline data cleaning process?
I’ve heard good things about https://openrefine.org/
-
What is the best approach to removing duplicate person records if the only identifier is person firstname middle name and last name? These names are entered in varying ways to the DB, thus they are free-fromatted.
It's not suited to SQL, use Open Refine or python fuzzywuzzy.
CSVLint
-
A question for the pro's, am I misusing SQL?
Also, a little self-promotion here, I've created the CSV Lint plug-in for Notepad++ to work with CSV text data files. It can reformat, validate and sort csv files, as well as convert csv to different formats including SQL. Meaning it can take a csv data file and generate INSERT INTO statements, including CREATE TABLE with the corresponding column datatypes and everything.
-
Looking for a CSV editor that doesn't modify the data like Excel does
I've created the CSV Lint plug-in for Notepad++ which can do all kinds of validation and transformations on a CSV file, processing it just as a text-file. Although it's only on Windows.
-
Best Way to Import a CSV From Into PostrgeSQL
fyi Notepad++ has a CSV Lint plug-in which can convert a csv file into an SQL INSERT VALUES script, including a CREATE TABLE statement with the appropriate column datatypes (based on the content of the csv data)
-
How to Import Data (XLSX, CSV, etc) into pgadmin
Maybe you could use Notepad++ with the CSV Lint plug-in to convert a csv file to an SQL INSERT VALUES script, including a CREATE TABLE statement.
-
Problem importing CSV file
You could try opening the file as a plain text file in notepad, or maybe using Notepad++ and the CSV Lint plug-in
-
Best language/tool to work with CSV files?
I just want to mention I've created a CSV Lint plug-in for Notepad++, maybe not exactly what you're looking for but it can generate initial Python scripts based on csv files, so might be useful.
-
CSV Lint plug-in for Notepad++ to view csv files, validate and convert to SQL insert script
The CSV Lint plug-in for Notepad++ was updated recently, it's available in the latest release of Notepad++ (v8.5.3). It is a useful plug-in for anyone working with csv datasets. I have created the plugin and had posted about it before, and this latest update has some more improvements and bugfixes.
-
Just joined a company in a sunset industry, data is in Excel. I want to migrate from Excel to PostgreSQL. I have zero knowledge in SQL, but i have some experience in programming using MatLab. Is this possible? I am thinking of Jose Portilla's course on Udemy as starting point.
If you just want to create tables you could experiment a bit with Notepad++ and the CSV Lint plug-in (Disclaimer: I'm the author of this plugin).
-
How do you guys handle pandas and its sh*tty data type inference
There's also the CSV Lint plug-in for Notepad++ which can detect datatypes, and then you can do CSV Lint > Generate metadata > Python script. Although idk it might not work correctly for all datetime datatypes.
-
Data manipulation tools
idk if it counts as an ETL tool, but with the CSV Lint plug-in for Notepad++ you can quickly check a csv file for errors, validate a dataset or get a column summary report.
What are some alternatives?
CQEngine - Ultra-fast SQL-like queries on Java collections
datasetmultitool - CSV lint tool to validate csv files. It is a helper utility to process csv textfiles and check for data errors. It can check text width, validate and reformat date and datetime values, change point or comma decimal separator, remove thousand separator and change column order.
visidata - A terminal spreadsheet multitool for discovering and arranging data
CsvQuery - Plugin for Notepad++ that treats CSV files as (read only) SQL tables
LightAdmin - [PoC] Pluggable CRUD UI library for Java web applications
Customer-Analysis-Tableau - This repository contains the data source and the tableau workbook used in my YouTube video: https://www.youtube.com/watch?v=_qReGTOrKTk
Smooks - Extensible data integration Java framework for building XML and non-XML fragment-based applications
NppPluginLexerExample - Notepad++ Plug-in Lexer and Folder example using the C# template
Jimfs - An in-memory file system for Java 7+
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
JBake - Java based open source static site/blog generator for developers & designers.
sqlitebrowser - Official home of the DB Browser for SQLite (DB4S) project. Previously known as "SQLite Database Browser" and "Database Browser for SQLite". Website at: