catala
spaCy
| catala | spaCy | |
|---|---|---|
| 44 | 122 | |
| 2,304 | 33,632 | |
| 1.0% | 0.3% | |
| 9.8 | 8.5 | |
| 3 days ago | 19 days ago | |
| OCaml | Python | |
| Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
catala
- I put all 8,642 Spanish laws in Git – every reform is a commit
- Catala – DSL for deriving algorithms producing automated legal decisions
-
Catala – Law to Code
funnily enough, the relation comes from the Author of the Programming Language surname's
from their repo: https://github.com/CatalaLang/catala
> The language is named after Pierre Catala, a professor of law who pionneered the French legaltech by creating a computer database of law cases, Juris-Data.
-
Solving a Million-Step LLM Task with Zero Errors
Thank you, https://catala-lang.org/ looks very interesting. I've experimented a lot with LLMs producing formal representations of facts and rules. What I've observed is that the resulting systems usually lose a lot of the original generalization capabilities offered by the current generation of LLMs (Finetuning may help in this case, but is often impractical due to missing training data). Together with the usual closed world assumption in e.g. Prolog, this leads to imho overly restrictive applications. So the approach I am taking is to allow the LLM to generate Prolog code that may contain predicates which are interpreted by an LLM.
So one could e.g. have
is_a(dog, animal).
- MIT study explains why laws are written in an incomprehensible style
-
On Building Git for Lawyers
Right? Or a domain specific language for law: https://github.com/CatalaLang/catala
I get that dislodging docx is an impossible job, but since docx is a terrible format for anything, it needs to be done for all the reasons the author mentions. The codes that run society should not be locked up in bad proprietary formats.
-
Open Source on its own is no alternative to Big Tech
I mean the "mainframe model" of a single large system and many dumb terminals, now dumb terminals are named enpoints and the mainframe is someone else computer across the world.
The trust problem is easy to solve, with an open society: as long as payments got processed with open APIs and the government takes care of the frauds there is no trust problem. I do not need to trust a third party with eCash, I only need to trust my State protections.
The idea is already tempted, see not only the historic eCash, witch are the modern GNU Taler chosen (it seems) by the EU for the digital Euro https://www.ngi.eu/ngi-projects/ngi-taler/ and https://social.network.europa.eu/@EC_NGI/111499172838284606 but also https://openfisca.org and https://github.com/CatalaLang/catala or few others alike.
That's still embrional but in FLOSS terms we have already more than enough, we just miss the law enforcing it and the schools teaching it to the masses.
- Co to znaczy być edżajlowi?
-
Hey, Computer, Make Me a Font
Programming and law can go together tho https://github.com/CatalaLang/catala
- GitHub - CatalaLang/catala: Programming language for literate programming law specification
spaCy
-
The Sovereign Redactor — A Precision-Guided Privacy Airlock
We use spaCy’s en_core_web_lg (Large) model as the underlying NLP engine. This gives the Redactor the linguistic context to understand that "Gatsby" in a book title should stay, but "Gatsby" mentioned as a person's name in a private letter might need to go.
-
NER: Gemini vs Spacy vs Compromise
For NER, if accuracy is critical, go with an LLM — even an old one like gemma-3-27b-it will outperform tools or small models trained for this task. But by using an LLM you are exposing your data, making an HTTP request, and most likely incurring a cost. If accuracy is not critical and you want to stay in Javascript, compromise is a good package for NER. If you want an even better package and it's OK not using Javascript, then try Spacy.
-
Parsing Nutrition Labels with AI: From Image to Structured Data
For more advanced food label AI, combine pattern matching with Named Entity Recognition (NER). Libraries like spaCy (Python) or compromise (JavaScript) can identify amounts, units, and nutrient names even in noisy text.
-
Building a Menu Scanner with OCR and AI
For complex or highly variable menus, consider using NLP libraries like spaCy (Python) or fine-tuning a transformer-based NER model (e.g., BERT) to identify dish names and prices.
-
GSoC 2026 Predictions: 30 NEW AI/ML/Security Organizations You Should Start Contributing to NOW!
spaCy: https://github.com/explosion/spaCy ⭐ 30k+
-
Solved: Is there a better way to test subject lines besides random A/B tools?
Open-Source NLP Libraries: Python libraries like spaCy, NLTK, and Hugging Face Transformers for building custom models.
-
Strengthening Open-Source Integrity: My First Contribution to spaCy
🔗 Pull Request: #13877 — Remove spaCy Quickstart from Universe/Courses due to spam redirect
-
The AI-Native GraphDB + GraphRAG + Graph Memory Landscape & Market Catalog
spaCy - spacy.io/
-
A Simple Guide to Keyword Clustering with spaCy
spaCy is an open-source library designed for advanced NLP tasks in Python. It’s widely used because it’s:
- SpaCy: Industrial-Strength Natural Language Processing (NLP) in Python
What are some alternatives?
mlang - Compiler for the M language, used to compute the income tax of French taxpayers
Jieba - 结巴中文分词
Les-codes-en-vigueur - Ce dépôt des Codes en vigueur permet à tout un chacun de consulter, modifier (_fork_) et proposer leurs changements (_Pull Request_) qui seront examinés systématiquement par les instances legislatives de la République Française. Ces dernières mettront en place dans les plus brefs délais un système de validation par les citoyens (_peers_) afin de pouvoir répondre à toutes les demandes. Nous travaillons de concert avec l'équipe de Github pour rendre disponible en Français l'interface de cette plateforme.
NLTK - NLTK Source
alaptorveny - Magyarország Alaptörvénye
Stanza - Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages