orange
ArchiveBox
Our great sponsors
orange | ArchiveBox | |
---|---|---|
27 | 248 | |
4,604 | 19,737 | |
1.7% | 3.1% | |
9.6 | 9.7 | |
8 days ago | 11 days ago | |
Python | Python | |
MIT |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
orange
-
Hierarchical Clustering
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
- Orange Data Mining
-
The Graph of Wikipedia [video]
For all you folks who aren't ace programmer types, the Orange3[1] platform gives you a very miniaturized[2] ability to turn out these sorts of visualizations very rapidly. It's not the most stable thing in the world, but the node-based ML workflow designer is worth the price of admission all by itself.
[1] https://orangedatamining.com/
[2] The Wikipedia extension in Text limits each search result to 25 articles, so sucking all of Wikipedia is . . well, Orange text analytics crashes when I look at it sideways with a null character, so let's not think about what would happen.
- Ask HN: What Underrated Open Source Project Deserves More Recognition?
-
Taxonomy Management?
First is identifying the "similar" things in a corpus. Best way I know to do that, for non-programmer audiences, is the Orange Data Mining tool, which gives you a node-based text mining interface to perform statistical analysis on text. Hierarchical Clustering shows - very rapidly - how similar your "modules" are, which ones are most similar. There's many other techniques (semantic viewer, similarity hash, etc) as well - the right one will depend on how your content is laying about.
- Orange: Open-source machine learning and data visualization
-
What exactly is AutoGPT?
Both tools are ripoffs of a data mining framework named Orange 3
-
Why don't more people use Altair for python Visualizations instead of Plotly?
You should also check out Orange Data Mining, it allows to create a lot of charts, filter data from a chart to another, build ML models, predictions and a lot more. And you can do it with zero code.
-
Advice on Transitioning to Data Science/ML/AI without Coding Experience
You can start with a free GUI based tool Orange. It is a component based data science workflow tool, which you can use to handle 60-75% of the traditional data science tasks from classification, regression, to basic neural networks.
- Has anybody used Orange?
ArchiveBox
-
Ask HN: What Underrated Open Source Project Deserves More Recognition?
Two projects I greatly appreciate, allowing me to easily archive my bandcamp and GOG purchases (after the initial setup anyways):
https://github.com/easlice/bandcamp-downloader
https://github.com/Kalanyr/gogrepoc
And I recently learned about archivebox, which I think is going to be a fast favorite and finally let me clear out my mess of tabs/bookmarks: https://github.com/ArchiveBox/ArchiveBox
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
-
Vice website is shutting down
If you really want to save the content for yourself, use something like https://archivebox.io/
I've been running a local instance for a few years now and download/save tech articles all time. I can search and find them as needed.
-
An Introduction to the WARC File
API is coming soon (relatively, it's still a one-man project)! Stay tuned https://github.com/ArchiveBox/ArchiveBox/issues/496
I have an event-sourcing refactor in progress now to allow us to pluginize functionality like the API (similar to Home Assistant with a plugin app sotre), it will take a month or two. Next up is the REST API using the new plugin system.
-
Ask HN: How can I back up an old vBulletin forum without admin access?
I guess your best chance is to use something like https://archivebox.io/.
-
ArchiveBox – open-source self-hosted web archiving
Yeah this is a cool project but it was discussed 2 days ago.
As mentioned by the maintainer there, they even maintain a list of alternatives, very classy:
https://github.com/ArchiveBox/ArchiveBox/wiki/Web-Archiving-...
- ArchiveBox: Open-source self-hosted web archiving
- Linkhut: A Social Bookmarking Site
- Show HN: Rem: Remember Everything (open source)
- Bookmark manager with a focus on organization?
What are some alternatives?
glue - Linked Data Visualizations Across Multiple Files
Wallabag - wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
paimon-moe - Your best Genshin Impact companion! Help you plan what to farm with ascension calculator and database. Also track your progress with todo and wish counter.
RDKit - The official sources for the RDKit library
SingleFile - Web Extension for saving a faithful copy of a complete web page in a single HTML file
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ArchivesSpace - The ArchivesSpace archives management tool
Interactive Parallel Computing with IPython - IPython Parallel: Interactive Parallel Computing in Python
grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
NumPy - The fundamental package for scientific computing with Python.
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.