-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
There are quite a few projects that didn't originate on Github. Some are mirrors of projects hosted elsewhere, some accept patches through other means. If get your linux kernel patch accepted by emailing it to the responsible maintainer, it will end up on https://github.com/torvalds/linux, but you never agreed to the Github ToS, all you did was agree to publish it under the GPLv2. Linus agreed to the Github ToS, but he can't give away rights he doesn't have, so he can't be giving Github any rights to your patches that go beyond the GPL.
Here is this guy's function copy-pasted on a SO question:
https://stackoverflow.com/questions/17913191/using-typedef-i...
Found it after 5 mins and a couple tweaks to the search terms.
Another:
https://vdoc.pub/documents/direct-methods-for-sparse-linear-...
Someone copied this guy's book and put it on scribd: https://www.scribd.com/document/514019650/Direct-Methods-for...
Someone put it on a "personal" edu page:
https://people.sc.fsu.edu/~jburkardt/c_src/csparse/csparse.c
A modified version of it here copyrighted under intel and open-source:
https://github.com/rwl/CSparse.py/blob/master/csparse.py
More:
https://tonus.pages.math.unistra.fr/schnaps/schnaps/csparse_...
Google search used to find them:
https://www.google.com/search?q=Sparse+matrix+addition+%22ch...
Could probably find more if I looked harder.
Many open source project don't allow contributions from people that have worked with similar projects with incompatible licenses. I remember https://github.com/cisco/ChezScheme/pull/376#issuecomment-45... and https://wiki.winehq.org/Developer_FAQ#Copyright_Issues
As an open-source developer with a fairly popular project (https://github.com/neuml/txtai - 2.6K+ stars), I'll say GitHub Copilot doesn't concern me but I understand the rationale.
I actually think with some additions it can be beneficial to the open-source community and introduce developers to libraries they couldn't quite articulate a google search for.
For example, one good addition to Copilot and really any generative AI tools would be a method of attribution. Code would be one of the easier ones. When generating a function snippet, add footnotes/citations for the top 3-5 most similar functions in the training database. This can be accomplished with a semantic index over the training set.