SQLAlchemy vs scikit-learn

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

SQLAlchemy		scikit-learn
	Project
123	Mentions	81
8,750	Stars	58,046
3.3%	Growth	1.0%
9.7	Activity	9.9
5 days ago	Latest Commit	5 days ago
Python	Language	Python
MIT License	License	BSD 3-clause "New" or "Revised" License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

SQLAlchemy

Posts with mentions or reviews of SQLAlchemy. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-18.

Xz/liblzma: Bash-stage Obfuscation Explained
1 project | news.ycombinator.com | 31 Mar 2024

OK -
can we start considering binary files committed to a repo, even as data for tests, to be a huge red flag, and that the binary files themselves should instead be generated at testing time by source code that's stated as reviewable cleartext. This would make it much harder (though of course we can never really say "impossible") to embed a substantial payload in this way.
when binary files are part of a test suite, they are typically trying to illustrate some element of the program being tested, in this case a file that was incorrectly xz-encoded. Binary files like these weren't typed by hand, they will always ultimately come from something plaintext source.
Here's an example! My own SQLAlchemy repository has a few binary files in it! https://github.com/sqlalchemy/sqlalchemy/blob/main/test/bina... oh noes. Why are those files there? well in this case I just wanted to test that I can send large binary BLOBs into the database driver and I was lazy. This is actually pretty dumb, the two binary files here add 35K of useless crap to the source, and I could just as easily generate this binary data on the fly using a two liner that spits out random bytes. Anyone could see that two liner and know that it isn't embedding a malicious payload.
If I wanted to generate a poorly formed .xz file, I'd illustrate source code that generates random data, runs it through .xz, then applies "corruption" to it, like zeroing out the high bit of every byte. The process by which this occurs would be all reviewable in source code.
Introducing Flama for Robust Machine Learning APIs
11 projects | dev.to | 18 Dec 2023

Besides, flama also provides support for SQL databases via SQLAlchemy, an SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. Finally, flama also provides support for HTTP clients to perform requests via httpx, a next generation HTTP client for Python.
Alembic with Async SQLAlchemy
1 project | dev.to | 12 Dec 2023

Alembic is a lightweight database migration tool for usage with SQLAlchemy. The term migration can be a little misleading, because in this context it doesn't mean to migrate to a different database in the sense of using a different version or a different type of database. In this context, migration refers to changes to the database schema: add a new column to a table, modify the type of an existing column, create a new index, etc..
Imperative vs. Declarative mapping style in Domain Driven Design project
1 project | news.ycombinator.com | 28 Oct 2023
Unlocking efficient authZ with Cerbos’ Query Plan
5 projects | dev.to | 6 Sep 2023

To simplify this process, Cerbos developers have come up with adapters for popular Object-Relational Mapping (ORM) frameworks. You can check out for more details on the query plan repo - which also contains adapters for Prisma and SQLAlchemy - as well as a fully functioning application using Mongoose as its ORM.
Python: Just Write SQL
21 projects | news.ycombinator.com | 14 Aug 2023

That above pattern is one I've seen people do even recently, using the "select().c" attribute which from very early versions of SQLAlchemy is defined as "the columns from a subquery of the SELECT" ; this usage began raising deprecation warnings in 1.4 and is fully removed in 2.0 as it was a remnant of a much earlier version of SQLAlchemy. it will do exactly as you say, "make a subquery for each filter condition".
the moment you see SQLAlchemy doing something you see that seems "asinine", send an example to https://github.com/sqlalchemy/sqlalchemy/discussions and I will clarify what's going on, correct the usage so that the query you have is what you expect, and quite often we will add new warnings or documentation when we see people doing things we didn't anticipate.
A steering council note about making the global
3 projects | news.ycombinator.com | 29 Jul 2023

The creator and lead maintainer of SQLAlchemy, one of the most popular and most used Python library for accessing databases (who doesn't?) gave a rather interesting response to PEP703.
If this doesn't ring any alarm bells I don't know what will.
> Basically for the moment the GIL-less idea would likely be burdensome for us and the fact that it's only an "option" seems to strongly imply major compatibility issues that we would not prefer.
https://github.com/sqlalchemy/sqlalchemy/discussions/10002#d...
More public SQL-queryable databases?
3 projects | /r/datasets | 10 Jul 2023

Recently I discovered BigQuery public datasets - just over 200 datasets available for directly querying via SQL. I think this is a great thing! I can connect these direct to an analytics platform (we use Apache Superset which uses Python SQLAlchemy under the hood) for example and just start dashboarding.
How useful is Python in accounting and auditing?
1 project | /r/Accounting | 27 Jun 2023

When using python with sql databases like postgres or mariadb or SQLite you would use SQLAlchemy or another ORM of if you're feeling brave, you code it by hand. With ORMs you provide the address of your database and it connects for you, letting you use abstractions instead of writing all the SQL yourself (kind of analogous to using vlookups or index match instead of manually entering data).
Day 46-47: Beginner FastAPI Series - Part 3
2 projects | dev.to | 8 Jun 2023

Our tool we're going to be using for interfacing with the SQLite database is SQLAlchemy, a SQL toolkit that provides a unified API for various relational databases. If you installed FastAPI with pip install "fastapi[all]", SQLAlchemy is already part of your setup. but if you opted for FastAPI alone, you would need to install SQLAlchemy separately with pip install sqlalchemy.

scikit-learn

Posts with mentions or reviews of scikit-learn. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-09.

AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite
8 projects | news.ycombinator.com | 9 Apr 2024

Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
Polars
11 projects | news.ycombinator.com | 8 Jan 2024

sklearn is adding support through the dataframe interchange protocol (https://github.com/scikit-learn/scikit-learn/issues/25896). scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.
[D] Major bug in Scikit-Learn's implementation of F-1 score
2 projects | /r/MachineLearning | 8 Dec 2023

Wow, from the upvotes on this comment, it really seems like a lot of people think that this is the correct behavior! I have to say I disagree, but if that's what you think, don't just sit there upvoting comments on Reddit; instead go to this PR and tell the Scikit-Learn maintainers not to "fix" this "bug", which they are currently planning to do!
Contraction Clustering (RASTER): A fast clustering algorithm
1 project | news.ycombinator.com | 27 Nov 2023
Ask HN: Learning new coding patterns – how to start?
3 projects | news.ycombinator.com | 10 Nov 2023

I was in a similar boat to yours - Worked in data science and since then have made a move to data engineering and software engineering for ML services.
I would recommend you look into the Design Patterns book by the Gang of Four. I found it particularly helpful to make extensible code that doesn't break specially with abstract classes, builders and factories. I would also recommend looking into the book The Object Oriented Thought Process to understand why traditional OOP is build the way it is.
You can also look into the source code of popular data science libraries such as sklearn (https://github.com/scikit-learn/scikit-learn/tree/main/sklea...) and see how a lot of them have Base classes to define shared functionality between object of the same nature.
As others mentioned, I would also encourage you to try and implement design patterns in your everyday work - maybe you can make a Factory to load models or preprocessors that follow the same Abstract class?
Transformers as Support Vector Machines
1 project | news.ycombinator.com | 3 Sep 2023

It looks like you've been the victim of some misinformation. As Dr_Birdbrain said, an SVM is a convex problem with unique global optimum. sklearn.SVC relies on libsvm which initializes the weights to 0 [0]. The random state is only used to shuffle the data to make probability estimates with Platt scaling [1]. Of the random_state parameter, the sklearn documentation for SVC [2] says
Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary.
[0] https://github.com/scikit-learn/scikit-learn/blob/2a2772a87b...
[1] https://en.wikipedia.org/wiki/Platt_scaling
[2] https://scikit-learn.org/stable/modules/generated/sklearn.sv...
How to Build and Deploy a Machine Learning model using Docker
5 projects | dev.to | 30 Jul 2023

Scikit-learn Documentation
Planning to get a laptop for ML/DL, is this good enough at the price point or are there better options at/below this price point?
1 project | /r/developersIndia | 17 Jun 2023
Link Prediction With node2vec in Physics Collaboration Network
4 projects | dev.to | 16 Jun 2023

Firstly, we need a connection to Memgraph so we can get edges, split them into two parts (train set and test set). For edge splitting, we will use scikit-learn. In order to make a connection towards Memgraph, we will use gqlalchemy.
WiFilter is a RaspAP install extended with a squidGuard proxy to filter adult content. Great solution for a family, schools and/or public access point
1 project | /r/raspberry_pi | 21 May 2023

The ML component is based on scikit-learn which differentiates it from purely list-based filters. It couples this with a full-featured wireless router (RaspAP) in a single device, so it fulfills the needs of a use case not entirely addressed by Pi-hole.

What are some alternatives?

When comparing SQLAlchemy and scikit-learn you can also consider the following projects:

tortoise-orm - Familiar asyncio ORM for python, built with relations in mind

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

PonyORM - Pony Object Relational Mapper

Surprise - A Python scikit for building and analyzing recommender systems

Peewee - a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb

Keras - Deep Learning for humans

Orator - The Orator ORM provides a simple yet beautiful ActiveRecord implementation.

tensorflow - An Open Source Machine Learning Framework for Everyone

prisma-client-py - Prisma Client Python is an auto-generated and fully type-safe database client designed for ease of use

gensim - Topic Modelling for Humans

pyDAL - A pure Python Database Abstraction Layer

H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.