kylo
datasette-scraper
kylo | datasette-scraper | |
---|---|---|
1 | 1 | |
1,091 | 57 | |
0.5% | - | |
10.0 | 2.5 | |
over 1 year ago | about 1 year ago | |
Java | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
kylo
-
GitHub – GSA/code-gov: An informative repo for all Code.gov repos
https://github.com/simonw/datasette-lite :
> You can use this tool to open any SQLite database file that is hosted online and served with a `access-control-allow-origin: ` CORS header. Files served by GitHub Pages automatically include this header, as do database files that have been published online using `datasette publish`.*
> [...] You can paste in the "raw" URL to a file, but Datasette Lite also has a shortcut: if you paste in the URL to a page on GitHub or a Gist it will automatically convert it to the "raw" URL for you
> To load a Parquet file, pass a URL to `?parquet=`
> [...] https://lite.datasette.io/?parquet=https://github.com/Terada...*
There are various *-to-sqlite utilities that load data into a SQLite database for use with e.g. datasette. E.g. Pandas with `dtype_backend='arrow'` saves to Parquet.
datasette plugins are written in Python and/or JS w/ pluggy:
datasette-scraper
-
GitHub – GSA/code-gov: An informative repo for all Code.gov repos
https://github.com/cldellow/datasette-scraper/#architecture
(TIL datasette-scraper parses HTML with selectolax; and Selectolax with Modest or Lexbor is ~25x faster at HTML parsing than BeautifulSoup in the selectolax benchmark:
What are some alternatives?
extruct - Extract embedded metadata from HTML markup
code-json-generator - Automation that scrapes USEPA github and provides that metadata for code.gov
selectolax - Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
code-gov - An informative repo for all Code.gov repos
hugo-obsidian - simple GitHub action to parse Markdown Links into a .json file for Hugo
nifi-djl-processor - Apache NiFi 1.10 DJL
awesome-semantic-web - A curated list of various semantic web and linked data resources.
datasette-ripgrep - Web interface for searching your code using ripgrep, built as a Datasette plugin
datasette - An open source multi-tool for exploring and publishing data