CKAN
dwc
Our great sponsors
CKAN | dwc | |
---|---|---|
5 | 3 | |
3,823 | 167 | |
1.0% | 3.6% | |
9.8 | 0.0 | |
6 days ago | 27 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | Creative Commons Attribution 4.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
CKAN
-
Metadata Store - Which one to Choose ? OpenMetadata vs Datahub ?
We use Kubernetes as our deployment platform. Any feedback on one of these open source data catalogs ? - https://atlas.apache.org/#/ - https://opendatadiscovery.org/ - https://open-metadata.org/ - https://marquezproject.github.io/marquez/ - https://datahubproject.io/ - https://www.amundsen.io/ - https://ckan.org/ - https://magda.io/
-
How to start Data Science and Machine Learning Career?
Ckan
-
We are digitisers at the Natural History Museum in London, on a mission to digitise 80 million specimens and free their data to the world. Ask us anything!
We publish all our data on the [Data Portal](https://data.nhm.ac.uk), a Museum project that's been running since 2014. Instead of MediaWiki it runs on an open-source Python framework called [CKAN](https://ckan.org), which is designed for hosting datasets - though we've had to adapt it in various ways so that it can handle such large amounts of data.
dwc
-
We are digitisers at the Natural History Museum in London, on a mission to digitise 80 million specimens and free their data to the world. Ask us anything!
As a community we are trying to release more benchmark datasets for different kinds of training such as the herbarium specimens described in this paper: https://doi.org/10.3897/BDJ.7.e31817 It’s not an especially large dataset (only 1,800 specimens) but collecting, curating and annotating this often ends up being a multi-person process. There are a few places you can deposit research datasets or ML models like Zenodo (https://zenodo.org/) and get a DOI. In our sector Darwin Core is one of the key data standards for describing data and when we need to extend the standard we try and use existing ones (such as those on schema.org or
With regards to longevity, when we're planning our infrastructure and how we're actually going to store our digital data we have to think in the long, long term (100+ years), much as we have to when considering how to store the physical specimens. Currently we manage our own data centre which stores all our collections and image data but we’re exploring cloud options currently. In terms of how we store the actual data, we try to map to well known standards and ontologies (such as Darwin Core - https://dwc.tdwg.org/) to ensure our data is interoperable with others and can be managed using community standards. On the Data Portal specifically, we use a versioning system to make sure that data is available long term, even if it’s been changed since it was originally made public (this happens regularly as taxonomists love to reclassify specimens!). This is particularly important when users cite our data using DOIs which should be persistent and always available.
What are some alternatives?
ArchivesSpace - The ArchivesSpace archives management tool
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Archivematica - Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
Access to Memory (AtoM) - Open-source, web application for archival description and public access.
Collective Access: Providence - Cataloguing and data/media management application
Activeloop Hub - Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data real-time to PyTorch/TensorFlow. https://activeloop.ai [Moved to: https://github.com/activeloopai/deeplake]
CKAN-meta - Metadata files for the CKAN
kaggle-environments
kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
odd-platform - First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
datahub - The Metadata Platform for the Modern Data Stack