open-australian-legal-corpus-creator

The code used to create and update the Open Australian Legal Corpus, the first and only multijurisdictional open corpus of Australian legislative and judicial documents. (by umarbutler)

Open-australian-legal-corpus-creator Alternatives

Similar projects and alternatives to open-australian-legal-corpus-creator

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better open-australian-legal-corpus-creator alternative or higher similarity.

open-australian-legal-corpus-creator reviews and mentions

Posts with mentions or reviews of open-australian-legal-corpus-creator. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-26.
  • Show HN: Mapping almost every law, regulation and case in Australia
    1 project | news.ycombinator.com | 22 Mar 2024
    Hey HN,

    After months of hard work, I am excited to share the first ever semantic map of Australian law.

    My map represents the first attempt to map Australian laws, cases and regulations across the Commonwealth, States and Territories semantically, that is, by their underlying meaning.

    Each point on the map is a unique document in the [Open Australian Legal Corpus](https://huggingface.co/datasets/umarbutler/open-australian-l...), the largest open database of Australian law (which, full disclosure, I [created](https://umarbutler.com/how-i-built-the-largest-open-database...)). The closer any two points are on the map, the more similar they are in underlying meaning.

    As I cover in my article, there’s a lot you can learn by mapping Australian law. Some of the most interesting insights to come out of this initiative are that:

    ⦁ Migration, family and substantive criminal law are the most isolated branches of case law on the map;

    ⦁ Migration, family and substantive criminal law are the most distant branches of case law from legislation on the map;

    ⦁ Development law is the closest branch of case law to legislation on the map;

    ⦁ Case law is more of a continuum than a rigidly defined structure and the borders between branches of case law can often be quite porous; and

    ⦁ The map does not reveal any noticeable distinctions between Australian state and federal law, whether it be in style, principles of interpretation or general jurisprudence.

    If you’re interested in learning more about what the map has to teach us about Australian law or if you’d like to find out how you can create semantic maps of your own, check out the full article on my blog, which provides a detailed analysis of my map and also covers the finer details of how I built it, with code examples offered along the way.

  • I built the largest open database of Australian law
    1 project | news.ycombinator.com | 29 Oct 2023
    > Just one note - the link in your Github readme to https://umarbutler.com/open-australian-legal-corpus doesn't seem to go anywhere.

    Thanks for the heads up! I've fixed that now.

    > For someone interested in using the data (and help out with bugs/issues), where would you suggest starting?

    I think the best place to start is by downloading the Corpus (visit https://huggingface.co/datasets/umarbutler/open-australian-l... , and then click "Files and versions" and then "corpus.jsonl"). You can then use my Python library orjsonl to parse the dataset (you'd run, `corpus = orjsonl.load('corpus.jsonl')`). At that point, there's any number of applications you could use the dataset for. You could pretrain a model like BERT, ELECTRA, etc... and share it on HuggingFace. You could connect the dataset to GPT and do RAG over it. Etc...

  • Show HN: I created a first-of-its-kind open corpus of Australian law
    2 projects | news.ycombinator.com | 26 Jun 2023
    Hey HN, today I'm sharing my latest project, the Open Australian Legal Corpus, a first-of-its-kind multijurisdictional open corpus of Australian legislative and judicial documents. The idea behind this dataset was born a few months ago, when, while attempting to pretrain a BERT model for the Australian legal domain, I discovered that there was no freely accessible, openly licensed text corpus of Australian laws and cases that I could use. This was in contrast to the US, UK and EU which all had multiple large open legal corpora available. Thus, I set out to the fill the gap in Australian legal AI research by compiling a dataset of as many in force Australian laws, regulations, bills and decisions as I could find. The end product was a corpus of 97,750 texts totalling over forty million lines and half a billion tokens, and spanning five states, one external territory and the Commonwealth.

    You can view the corpus on [HuggingFace](https://huggingface.co/datasets/umarbutler/open-australian-l...) and the code used to create it on [Github]( https://github.com/umarbutler/open-australian-legal-corpus-c...).

  • A note from our sponsor - InfluxDB
    www.influxdata.com | 2 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic open-australian-legal-corpus-creator repo stats
3
57
8.3
3 months ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com