GitHub – GSA/code-gov: An informative repo for all Code.gov repos

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

code-gov

3 255 0.0

An informative repo for all Code.gov repos
code-json-generator

1 0 0.0 JavaScript

Automation that scrapes USEPA github and provides that metadata for code.gov

At EPA we use this to keep this up to date but it just scrapes our GitHub:
https://github.com/USEPA/code-json-generator
This code.gov initative comes from Obama-era push to use/release open source, but the attention now seems to be on data (data.gov) and ai (ai.gov)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
hugo-obsidian

1 151 1.2 Go

Discontinued simple GitHub action to parse Markdown Links into a .json file for Hugo

Here's a way to scrape URLs to JSON/YAML and then build static HTML with Hugo in a GitHub Action: https://github.com/jackyzha0/hugo-obsidian
datasette is a webapp and CLI built on SQLite and Python. datasette-lite is the pyodide + WebAssembly build of datasette which can be served as static HTML, JS, and WASM SQlite.
datasette:

datasette

187 8,934 9.3 Python

An open source multi-tool for exploring and publishing data

https://github.com/simonw/datasette-lite :
> You can use this tool to open any SQLite database file that is hosted online and served with a `access-control-allow-origin: ` CORS header. Files served by GitHub Pages automatically include this header, as do database files that have been published online using `datasette publish`.*
> [...] You can paste in the "raw" URL to a file, but Datasette Lite also has a shortcut: if you paste in the URL to a page on GitHub or a Gist it will automatically convert it to the "raw" URL for you
> To load a Parquet file, pass a URL to `?parquet=`
> [...] https://lite.datasette.io/?parquet=https://github.com/Terada...*
There are various *-to-sqlite utilities that load data into a SQLite database for use with e.g. datasette. E.g. Pandas with `dtype_backend='arrow'` saves to Parquet.
datasette plugins are written in Python and/or JS w/ pluggy:

datasette-lite

10 308 5.4 HTML

Datasette running in your browser using WebAssembly and Pyodide

https://github.com/simonw/datasette-lite :
> You can use this tool to open any SQLite database file that is hosted online and served with a `access-control-allow-origin: ` CORS header. Files served by GitHub Pages automatically include this header, as do database files that have been published online using `datasette publish`.*
> [...] You can paste in the "raw" URL to a file, but Datasette Lite also has a shortcut: if you paste in the URL to a page on GitHub or a Gist it will automatically convert it to the "raw" URL for you
> To load a Parquet file, pass a URL to `?parquet=`
> [...] https://lite.datasette.io/?parquet=https://github.com/Terada...*
There are various *-to-sqlite utilities that load data into a SQLite database for use with e.g. datasette. E.g. Pandas with `dtype_backend='arrow'` saves to Parquet.
datasette plugins are written in Python and/or JS w/ pluggy:

kylo

1 1,091 10.0 Java

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

https://github.com/simonw/datasette-lite :
> You can use this tool to open any SQLite database file that is hosted online and served with a `access-control-allow-origin: ` CORS header. Files served by GitHub Pages automatically include this header, as do database files that have been published online using `datasette publish`.*
> [...] You can paste in the "raw" URL to a file, but Datasette Lite also has a shortcut: if you paste in the URL to a page on GitHub or a Gist it will automatically convert it to the "raw" URL for you
> To load a Parquet file, pass a URL to `?parquet=`
> [...] https://lite.datasette.io/?parquet=https://github.com/Terada...*
There are various *-to-sqlite utilities that load data into a SQLite database for use with e.g. datasette. E.g. Pandas with `dtype_backend='arrow'` saves to Parquet.
datasette plugins are written in Python and/or JS w/ pluggy:

datasette-scraper

1 57 2.5 Python

Add website scraping abilities to Datasette

https://github.com/cldellow/datasette-scraper/#architecture
(TIL datasette-scraper parses HTML with selectolax; and Selectolax with Modest or Lexbor is ~25x faster at HTML parsing than BeautifulSoup in the selectolax benchmark:

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
selectolax

6 967 7.7 Cython

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

https://github.com/rushter/selectolax#simple-benchmark )
(Apache Nutch is a Java-based web crawler which supports e.g. CommonCrawl (which backs various foundational LLMs)) https://en.wikipedia.org/wiki/Apache_Nutch#Search_engines_bu... . But extruct extracts more types of metadata and data than Nutch AFAIU: https://github.com/scrapinghub/extruct )
datasette-graphql adds a GraphQL HTTP API to a SQLite database:

extruct

3 819 3.8 Python

Extract embedded metadata from HTML markup

https://github.com/rushter/selectolax#simple-benchmark )
(Apache Nutch is a Java-based web crawler which supports e.g. CommonCrawl (which backs various foundational LLMs)) https://en.wikipedia.org/wiki/Apache_Nutch#Search_engines_bu... . But extruct extracts more types of metadata and data than Nutch AFAIU: https://github.com/scrapinghub/extruct )
datasette-graphql adds a GraphQL HTTP API to a SQLite database:

datasette-ripgrep

1 70 5.0 Python

Web interface for searching your code using ripgrep, built as a Datasette plugin

https://github.com/simonw/datasette-ripgrep
Seeing as there's already a JSONLD @context (schema) for code.json, CSVW as JSONLD and/or YAMLLD would be an easy way merge Linked Data graphs of tabular data:

awesome-semantic-web

5 1,311 6.1

A curated list of various semantic web and linked data resources.

https://github.com/semantalytics/awesome-semantic-web#csvw
A GitHub Action would run regularly, fetch each code.json, save each to a git repo, and then upsert each into a SQLite database to be published with e.g. datasette or datasette-lite.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project