OpenRefine
Guava
OpenRefine | Guava | |
---|---|---|
45 | 58 | |
10,498 | 49,412 | |
0.5% | 0.3% | |
9.7 | 9.6 | |
1 day ago | 6 days ago | |
Java | Java | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenRefine
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
"OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/
-
What you need to know about the future of Mozilla Hubs
Yes, let's hope! The strategy has worked out sometimes - Google shut down 'Google Refine' 10 years ago, it got turned into 'Open Refine', last update 2 months ago. https://github.com/OpenRefine/OpenRefine
It's a hugely useful tool if you're working with messy Excel-scale data, i.e., most biologists or social scientists.
-
OpenRefine
It seems to be pure JS with jQuery: https://github.com/OpenRefine/OpenRefine/blob/master/main/we...
-
java string equals returns false, even for identical strings
EDIT: trim() does not remove unicode 0x200b (unicode character for zero width space). https://github.com/OpenRefine/OpenRefine/issues/5105 is worth a read.
-
UIUC MCS - CS 513 Review - Theory and Practice of Data Cleaning
There were six homework assignments. In order they were Regular Expressions, OpenRefine, Datalog, SQL, Provenance, and Python. None of these assignments took more than two to three hours to complete. They all were basic implementation and programming assignments with autograders.
-
"We have great datasets"
Open Refine will get you about 70% there. It's FOSS
-
Is there any tools to streamline data cleaning process?
I’ve heard good things about https://openrefine.org/
-
What is the best approach to removing duplicate person records if the only identifier is person firstname middle name and last name? These names are entered in varying ways to the DB, thus they are free-fromatted.
It's not suited to SQL, use Open Refine or python fuzzywuzzy.
Guava
-
Lists: do you know the nature of yours? The strange story of a data container in Java
The first problem is at the level of Type System, given that a situation more correct would allow us to distinguish through the Collection Type which abstraction we are operating with, species if definable as mutable or immutable. The JCF was born at a time when great care was taken to offer immediate operational data structures, and with attention to performance, but with less attention to constructs or uses that are now seen as common. These concepts have been taken up by other infrastructures from which we certainly cannot fail to mention: Eclipse Collection, Guava Collections, and VAVR.
-
Google/guava: Google core libraries for Java
Even better is getting Gradle/Maven to correctly pull "plain" vs "Android" versions of the package instead of them just publishing the diverging code base as two repository packages.
https://github.com/google/guava/issues/2914
-
Guava 32.0 (released today) and the @Beta annotation
I'll admit I'm surprised to see that BOMs have been documented on maven.apache.org since mid-2008. It looks like Spring, for example, didn't adopt them until mid-2014. I don't know how widely they caught on in other areas. The first discussion of them in the context of Guava may have been in 2018, as I don't see mention of them in the various issues from 2011-2015 (#605, #1329, #1471, #1954.
-
Best Practice of Guava ImmutableList
And a quick peek at the source code for ImmutableList seems to confirm this (https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableList.java - it goes via a bunch of methods, but ends up using Arrays.copyOf(), which creates a fixed-size array).
-
Genuine question: how do you all use Haskell IRL?
The guava library of Java has some of these data structures implemented: https://github.com/google/guava/wiki/ImmutableCollectionsExplained , but implementations of the above book in many languages can be found on github (say, this one for Haskell: https://github.com/aistrate/Okasaki )
- Murmurhash -criando um rollout progressivo via backend
-
Один из примеров почему ChatGPT еще очень далеко до замены программистов, та и остальных профессий тоже.
Java Mask: Java Mask is a library that offers various string masking techniques for sensitive data such as credit card numbers, email addresses, and more. You can find the library at: https://github.com/miguelfreitas93/java-mask DataMasker: DataMasker is a Java library specifically designed for masking sensitive data, including credit card numbers, using customizable masking patterns. Visit the GitHub repository for more information and usage examples: https://github.com/GDSSecurity/DataMasker Maskify: Maskify is a simple Java library that can be used to mask credit card numbers, Social Security numbers, and other sensitive information. You can find the library at: https://github.com/jonathancarvalhoalves/maskify CreditCardUtils: This is a lightweight Java library that provides utility methods for validating, formatting, and masking credit card numbers. Visit the GitHub repository for more information: https://github.com/malkusch/creditcardutils Google Guava: Google Guava is a popular set of Java libraries containing a wealth of utilities for working with strings, collections, and more. While not specifically designed for masking credit card information, you can use Guava's string manipulation methods to mask sensitive data: https://github.com/google/guava
-
Twitter makes some of its source code public
I mean, I guess, technically? If you define it like that, then Microsoft has people working for them for free, as does Google, as does Apple, etc. It's not that weird, and you can try to twist it to be weird, but those of us in the software industry largely regard this as a good thing.
-
Managing unfixable CVEs
So we have https://github.com/google/guava/issues/4011
-
Java 17 migration: bias locks regression
Ok, so let's implement our lazy initialization more smartly to avoid acquiring the lock every time and use old fashion but still working double-checked locking. I've found it implemented by Suppliers.memoize in guava library.
What are some alternatives?
CQEngine - Ultra-fast SQL-like queries on Java collections
JGit - JGit project repository (jgit)
visidata - A terminal spreadsheet multitool for discovering and arranging data
javatuples - Typesafe representation of tuples in Java.
LightAdmin - [PoC] Pluggable CRUD UI library for Java web applications
Caffeine - A high performance caching library for Java
Smooks - Extensible data integration Java framework for building XML and non-XML fragment-based applications
Eclipse Collections - Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.
Jimfs - An in-memory file system for Java 7+
Hashids.java - Hashids algorithm v1.0.0 implementation in Java
JBake - Java based open source static site/blog generator for developers & designers.
Gephi - Gephi - The Open Graph Viz Platform