awesome-ai-safety
mentat
awesome-ai-safety | mentat | |
---|---|---|
5 | 4 | |
140 | 2,339 | |
9.3% | 5.4% | |
5.6 | 9.7 | |
7 months ago | 22 days ago | |
Python | ||
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-ai-safety
-
Ask HN: Who is hiring? (October 2023)
Giskard - Testing framework for ML models| Multiple roles | Full-time | France | https://giskard.ai/
We are building the first collaborative & open-source Quality Assurance platform for all ML models - including Large Language Models.
Founded in 2021 in Paris by ex-Dataiku engineers, we are an emerging player in the fast-growing market of AI Quality & Safety.
Giskard helps Data Scientists & ML Engineering teams collaborate to evaluate, test & monitor AI models. We help organizations increase the efficiency of their AI development workflow, eliminate risks of AI biases and ensure robust, reliable & ethical AI models. Our open-source platform is used by dozens of ML teams across industries, both at enterprise companies & startups.
In 2022, we raised our first round of 1.5 million euros, led by Elaia, with participation from Bessemer and notable angel investors including the CTO of Hugging Face. To read more about this fundraising and how it will accelerate our growth, you can read this announcement: https://www.giskard.ai/knowledge/news-fundraising-2022
In 2023, we received a strategic investment from the European Commission to build a SaaS platform to automate compliance with the upcoming EU AI regulation. You can read more here: https://www.giskard.ai/knowledge/1-000-github-stars-3meu-and...
We are assembling a team of champions: Software Engineers, Machine Learning researchers, and Data Scientists ; to build our AI Quality platform and expand it to new types of AI models and industries. We have a culture of continuous learning & quality, and we help each other achieve high standards & goals!
We aim to grow from 15 to 25 people in the next 12 months. We're hiring the following roles:
-
Ask HN: Who is hiring? (August 2023)
Giskard - Testing framework for ML models| Multiple roles | Full-time | France | https://giskard.ai/
We are building the first collaborative & open-source Quality Assurance platform for all ML models - including Large Language Models.
Founded in 2021 in Paris by ex-Dataiku engineers, we are an emerging player in the fast-growing market of AI Safety & Security.
Giskard helps Data Scientists & ML Engineering teams collaborate to evaluate, test & monitor AI models. We help organizations increase the efficiency of their AI development workflow, eliminate risks of AI biases and ensure robust, reliable & ethical AI models. Our open-source platform is used by dozens of ML teams across industries, both at enterprise companies & startups.
In 2022, we raised our first round of 1.5 million euros, led by Elaia, with participation from Bessemer and notable angel investors including the CTO of Hugging Face. To read more about this fundraising and how it will accelerate our growth, you can read this announcement: https://www.giskard.ai/knowledge/news-fundraising-2022
In 2023, we received a strategic investment from the European Commission to build a SaaS platform to automate compliance with the upcoming EU AI regulation. You can read more here: https://www.giskard.ai/knowledge/1-000-github-stars-3meu-and...
We are assembling a team of champions: Software Engineers, Machine Learning researchers, and Data Scientists ; to build our AI Quality platform and expand it to new types of AI models and industries. We have a culture of continuous learning & quality, and we help each other achieve high standards & goals!
We aim to grow from 15 to 25 people in the next 12 months. We're hiring the following roles:
* Software Engineer - https://apply.workable.com/giskard/j/AD2C90B581/ (Python, Java, Typescript, Vue.js, Cloud skills)
* Machine Learning Researcher - https://apply.workable.com/giskard/j/E89FE8E310/ (post-PhD)
* Data Science lead - https://apply.workable.com/giskard/j/E89FE8E310/ (ML + consulting experience required)
* Growth marketing intern - https://apply.workable.com/giskard/j/C8635E9B0C/
* Data Science intern - https://apply.workable.com/giskard/j/7F0B341852/
-
Show HN: Python library to scan ML models for vulnerabilities
Hi! I’ve been working on this automatic scanner for ML models to detect issues like underperforming data slices, overconfidence in predictions, robustness problems, and others. It supports all main Python ML frameworks (sklearn, torch, xgboost, …) and integrates with the quality assurance solution we are building at Giskard AI (https://giskard.ai) to systematically test models before putting them in production.
It is still a beta and I would love to hear your feedback if you have the time to try it out.
We have quite a few tutorials in the docs with ready-made colab notebooks to make it easy to get started.
If you are interested in the code:
https://github.com/Giskard-AI/giskard/tree/main/python-clien...
-
[R] Awesome AI Safety – A curated list of papers & technical articles on AI Quality & Safety
Repository: https://github.com/Giskard-AI/awesome-ai-safety
- AI Safety – curated papers for safer, ethical, and reliable AI
mentat
-
Benchmarking GPT-4 Turbo – A Cautionary Tale
Hey Paul, I'm a Mentat author.
> I also notice that the instructions prompt that mentat uses seems to be inspired by the aider benchmark? Glad to see others adopting similar benchmarking approaches.
We were inspired by you to use Exercism as a benchmark, thank you! We will add attribution for that. We switched our original instruction prompts for that benchmark to be similar to Aiders to allow for fair comparison.
> After looking around a bit, there seems to be a bunch of aider code in your repo. Some attribution would be appreciated.
We have an unused implementation of your output response format (https://github.com/AbanteAI/mentat/blob/main/mentat/parsers/...), but I don't know what else you are seeing? We implemented that to compare with our response formats and didn't find much difference in performance.
-
Ask HN: Who is hiring? (August 2023)
Abante AI | Full-time | Senior Software Engineer | Remote or Hybrid SF Bay Area
Abante AI is a new startup building Mentat, an open source, GPT-4 powered coding assistant. Mentat runs on the command line, gathering project context and coordinating edits across multiple files: https://github.com/biobootloader/mentat
- work with a small, talented team on an ambitious project
- open source: share what you build
- apply research to make a real product
- competitive pay + early equity
Contact me: [email protected]
- Mentat – AI tool that assists with any coding task, right from command line
What are some alternatives?
opentofu - OpenTofu lets you declaratively manage your cloud infrastructure.
nl-wallet - NL Public Reference Wallet
tabby - Self-hosted AI coding assistant
noria - Fast web applications through dynamic, partially-stateful dataflow
awesome-langchain - 😎 Awesome list of tools and projects with the awesome LangChain framework
reframe - LeapTable 🦘- The fastest way to build, deploy, and manage LLM-powered agents on tabular data (dataframes, SQL tables and Spreadsheets). [Moved to: https://github.com/peterwnjenga/leaptable]
giskard - 🐢 Open-Source Evaluation & Testing for LLMs and ML models
paip-lisp - Lisp code for the textbook "Paradigms of Artificial Intelligence Programming"
refact - WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding
LambdaLite - A functional, relational database in about 250 lines of Common Lisp
indradb - A graph database written in rust