Show HN: I created a first-of-its-kind open corpus of Australian law

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

open-australian-legal-corpus-creator

3 57 8.3 Python

The code used to create and update the Open Australian Legal Corpus, the first and only multijurisdictional open corpus of Australian legislative and judicial documents.

Hey HN, today I'm sharing my latest project, the Open Australian Legal Corpus, a first-of-its-kind multijurisdictional open corpus of Australian legislative and judicial documents. The idea behind this dataset was born a few months ago, when, while attempting to pretrain a BERT model for the Australian legal domain, I discovered that there was no freely accessible, openly licensed text corpus of Australian laws and cases that I could use. This was in contrast to the US, UK and EU which all had multiple large open legal corpora available. Thus, I set out to the fill the gap in Australian legal AI research by compiling a dataset of as many in force Australian laws, regulations, bills and decisions as I could find. The end product was a corpus of 97,750 texts totalling over forty million lines and half a billion tokens, and spanning five states, one external territory and the Commonwealth.
You can view the corpus on [HuggingFace](https://huggingface.co/datasets/umarbutler/open-australian-l...) and the code used to create it on [Github]( https://github.com/umarbutler/open-australian-legal-corpus-c...).

open-australian-legal-corpus-c

1 - -

Hey HN, today I'm sharing my latest project, the Open Australian Legal Corpus, a first-of-its-kind multijurisdictional open corpus of Australian legislative and judicial documents. The idea behind this dataset was born a few months ago, when, while attempting to pretrain a BERT model for the Australian legal domain, I discovered that there was no freely accessible, openly licensed text corpus of Australian laws and cases that I could use. This was in contrast to the US, UK and EU which all had multiple large open legal corpora available. Thus, I set out to the fill the gap in Australian legal AI research by compiling a dataset of as many in force Australian laws, regulations, bills and decisions as I could find. The end product was a corpus of 97,750 texts totalling over forty million lines and half a billion tokens, and spanning five states, one external territory and the Commonwealth.
You can view the corpus on [HuggingFace](https://huggingface.co/datasets/umarbutler/open-australian-l...) and the code used to create it on [Github]( https://github.com/umarbutler/open-australian-legal-corpus-c...).

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Jyutcitzi Font

1 project | /r/CantoneseScriptReform | 7 Dec 2023
No-code AI: OpenAI MyGPTs, LlamaIndex rags, or LangChain OpenGPTs?

2 projects | dev.to | 2 Dec 2023
No-code AI: OpenGPTs by LangChain

1 project | dev.to | 1 Dec 2023
WiFi driver for macOS big sur

1 project | /r/hackintosh | 29 Nov 2023
Best way to memorize/learn mechanisms? Anki, flash cards, etc?

1 project | /r/OrganicChemistry | 25 Jan 2023

Show HN: I created a first-of-its-kind open corpus of Australian law

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 26 Jun 2023

open-australian-legal-corpus-creator

open-australian-legal-corpus-c

InfluxDB

Related posts

Jyutcitzi Font

No-code AI: OpenAI MyGPTs, LlamaIndex rags, or LangChain OpenGPTs?

No-code AI: OpenGPTs by LangChain

WiFi driver for macOS big sur

Best way to memorize/learn mechanisms? Anki, flash cards, etc?