libu8ident
poly
libu8ident | poly | |
---|---|---|
9 | 24 | |
17 | 653 | |
- | 1.7% | |
1.8 | 8.1 | |
11 months ago | 21 days ago | |
C | Go | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
libu8ident
- Roaring bitmaps are compressed bitmaps, can be 100x faster
-
International domain names: where does HTTPS://meßagefactory.ca lead you?
In programming languages it's much worse. Identifiers can either be unidentifiable, and if so everybody has a different opinion what "identifiable" means. Even the standard on identifiers, UTF-39, is buggy and has too many interpretations, leading to a complete disaster. https://github.com/rurban/libu8ident/blob/master/doc/c11.md
In punycode domain names it's quite simple still.
With other names, it's even worse. No-one cares. Linkers do not, username and filesystem drivers do not. The Apple HFS+ did care a bit one day, until someone in the higher ranks decided that no-one needs unicode security anymore and switched the new APFS to unsafe again.
-
Using Unicode in a compiler
No, it's definitely not safe to use unrestricted Unicode in a compiler. See https://github.com/rurban/libu8ident/ for identifier rules, and http://www.unicode.org/reports/tr55/ for much worse problems.
- Ask HN: What interesting problems are you working on? ( 2022 Edition)
- Unicode Utilities: Confusables
-
How can you be fooled by the U+202E trick?
That's why unicode published the security guidelines and mechanisms to avoid such attacks. In 2004 already.
The problem is that nobody cared. Browsers invented punycode instead of following tr39, email ditto. But ok, at least something. Java did it, cperl did, rust did it.
Everybody else is vulnerable. Esp. most other programming languages, filesystems and login systems. https://github.com/rurban/libu8ident/blob/master/doc/c11.md
- Prevent Trojan Source attacks with GCC 12
-
Unicode Normalization Forms: When ö = ö
I'm maintaining such a library.
coreutils, diff, grep, patch, sed and friends all cannot find Unicode strings, they have no string support. They can only mimic filesystems, finding binary garbage. Strings are so rthi g different than pure ASCII or BINARY garbage. Strings have an encoding and are Unicode.
Filesystems are even worse because they need to treat filenames as identifiers, but do not. Nobody cares about TR31, TR39, TR36 and so on.
Here is an overview of the sad state of Unicode unsafeties in programming languages: https://github.com/rurban/libu8ident/blob/master/c11.md
- Why does Windows 10 run faster than Fedora?
poly
- Looking for an Open Source project to participate in for Google Summer of Code
-
GitHub Accelerator: our first cohort and what's next
- https://github.com/TimothyStiles/poly: Poly is a fast, well tested Go package for engineering organisms.
-
These 20 startups are in 1st ever batch of GitHub OS Accelerator
Poly: Fast Go package for engineering organisms
-
Ask HN: Burnt out from big tech. What's next?
You might want to look at computational biology. Jim Allison won the Nobel Prize back in 2018 for his work on immunotherapy for cancer and there's a lot of basic research work to be done to perfect this approach. Epigenetic clocks are really interesting too (see Steve Horvath's work). Also, there's synthetic biology, where you could, for example, explore this package that's written in Go: https://github.com/TimothyStiles/poly
- Any corner cases for Needleman-Wunsch that should be tested?
- Where can I find well-written go code to learn from?
-
High-performance language recommendation
Check out poly. It’s written in go and I’m using it for one of my projects too. The goal is that we should have high performance libraries that we can use knowing what people are working on the forks will give the community a leg up.
-
How is GO used in bioinfo?
The most popular bioinformatic package I've seen in go is poly.
-
Software engineers: consider working on genomics
I write synthetic biology software for a living and maintain this open source, Go package for engineering DNA that has high test coverage and a nice little dev community around it.
https://github.com/TimothyStiles/poly
A large part of my project's community are devs that want to get into the field but can't tolerate the ridiculously low pay, laughably bad management, disrespect, and what amounts to 40+ years of technical debt that's endemic to biotech software.
I've had companies here in the Bay Area offer me 100K a year with a straight face. I've had companies during interview tell me they're looking for someone to help, "set up GitHub". I've seen job listings for low paid web dev positions require applicants to have PhDs.
The reality is that except for a growing handful of places management straight up won't know the difference between IT and software engineers. It's what I call the naive buyers problem.
The demand for software engineers in biotech is generated by naive buyers that don't know what they need, why they need it, or how to get it.
Benchling and Recursion Pharmaceuticals have reputations in the industry of paying, "standard software salaries". So do the research divisions at places like deepmind/microsoft/google but in my experience there's even new multi-billion dollar institutes where senior management has never even heard the term devops.
Most places advertise for "data scientist", positions or some analog, instead of software engineers. This is mostly because upper management has never met an actual practicing software engineer in a professional setting. Many come from academia where the culture and work requirements heavily disincentivize standard software engineering practices.
It's also not uncommon for a biotech company to either have a very under qualified CTO whose main programming experience is what they learned doing ML research like stuff during their PhD or not even have one at all which has huge downstream consequences.
This week a software engineer trying to make the switch to biotech actually DM'd me to ask why they were seeing a ton of data science / ML job positions but no software engineering / devops positions.
They were worried that these companies were trying to save on costs by forcing their data scientists to create infrastructure but it's actually worse than that. Most of these companies aren't even aware that there's supposed to be infrastructure.
Despite all of this the future is looking better and I'm starting to find new companies and positions that are well... reasonable. I learned about this thread from a friend at a party last night that works at one of these companies. There's a small, strong new wave of companies and developers out there pushing biotech software forward. Hopefully some (including myself) make it big while pushing the idea that better tech equals better biotech.
-
Ask HN: What interesting problems are you working on? ( 2022 Edition)
It is more like the X Y Z W. However, the X Y Z W bits I am working on as well (https://github.com/TimothyStiles/poly , https://github.com/TimothyStiles/allbase , trilo.bio, freegenes.org). Going for fully automated "make bacterium X produce molecule Y", but still a while away (but surprisingly not THAT far off)
What are some alternatives?
Confusables - Simple library for matching a string to another string that is same but has letters that only *look* the same as original string
Raylib-CsLo - autogen bindings to Raylib 4.x and convenience wrappers on top. Requires use of `unsafe`
featurebase - A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
pg-mem - An in memory postgres DB instance for your unit tests
libredwg - Official mirror of libredwg. With CI hooks and nightly releases. PR's ok
linaria - Zero-runtime CSS in JS library
safeclib - safec libc extension with all C11 Annex K functions
seq - A high-performance, Pythonic language for bioinformatics
nbperf - Improved NetBSD's Perfect Hash Generation Tool v3
m4b-tool - m4b-tool is a command line utility to merge, split and chapterize audiobook files such as mp3, ogg, flac, m4a or m4b
reals - A lightweight python3 library for arithmetic with real numbers.
procedural-gl-js - Mobile-first 3D mapping engine with emphasis on user experience