Open-source projects categorized as data-wrangling | Edit details
Language filter: + Python + R + Java + Go + C# + Rust

Top 9 data-wrangling Open-Source Projects

  • OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

    Project mention: 20+ Trending and Popular Java Open Source Project | | 2022-05-10


  • dasel

    Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

    Project mention: Zq: An Easier (and Faster) Alternative to Jq | | 2022-04-26

    I would definitely add dasel to that list. It's become my de facto serialized data converter, and regularly use it to convert between csv, toml, yaml, json, and xml using jq-ish syntaxes.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • Data-science-best-resources

    Carefully curated resource links for data science in one place

    Project mention: ⚙️ Data Science Collected Resources: A trove of carefully curated resources and links (on the topics of software, platforms, language, techniques, etc.) related to #DataScience, all in one place. h/t @Sauain | | 2021-09-21
  • Optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  • prose

    Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK. (by microsoft)

    Project mention: The Flash Fill Feature in Excel | | 2021-09-19

    Our program synthesis APIs are available publicly, but for non-commercial use only. <> You want to look for API samples corresponding to Transformation.Text capability. This is a more powerful capability than Flash Fill. If you happen to try this out, we'd be very interested in getting your feedback on whether this is powerful enough to handle your use cases, and if not, we would love to be inspired by your use cases for a future version of this technology. You may reach us at [email protected]

  • qsacnpj

    Pacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)

  • qsv

    CSVs sliced, diced & analyzed.

    Project mention: Modernizing AWK, a 45-year old language, by adding CSV support | | 2022-05-12

    I was using xsv a lot at work (it is so much faster than csvkit) but I've recently jumped to qsv, a fork with more features.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • R-Fundamentals

    D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

    Project mention: R-Fundamentals: NEW Data - star count:112.0 | | 2022-05-07
  • prosto

    Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

    Project mention: Excel 2.0 – Is there a better visual data model than a grid of cells? | | 2022-03-31

    One idea is to use columns instead of cells. Each column has a definition in terms of other columns which might also be defined in terms of other columns. If you change value(s) in some source column then these changes will propagate through the graph of these column definitions. Some fragments of this general idea were implemented in different systems, for example, Power BI or Airtable.

    This approach was formalized in the concept-oriented model of data which relies on two basic elements: mathematical functions and mathematical sets. In contrast, most traditional data models rely on only sets. Functions are implemented as columns. The main difficulty in any formalization is how to deal with columns in multiple tables.

    This approach was implemented in the Prosto data processing toolkit:

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-05-12.

data-wrangling related posts


What are some of the best open-source data-wrangling projects? This list will help you:

Project Stars
1 OpenRefine 8,819
2 dasel 3,230
3 Data-science-best-resources 1,874
4 Optimus 1,217
5 prose 545
6 qsacnpj 248
7 qsv 129
8 R-Fundamentals 112
9 prosto 59
Find remote jobs at our new job board There are 7 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives