SQLAlchemy
scikit-learn
Our great sponsors
SQLAlchemy | scikit-learn | |
---|---|---|
123 | 81 | |
8,750 | 58,046 | |
3.3% | 1.0% | |
9.7 | 9.9 | |
5 days ago | 5 days ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
SQLAlchemy
-
Xz/liblzma: Bash-stage Obfuscation Explained
OK -
can we start considering binary files committed to a repo, even as data for tests, to be a huge red flag, and that the binary files themselves should instead be generated at testing time by source code that's stated as reviewable cleartext. This would make it much harder (though of course we can never really say "impossible") to embed a substantial payload in this way.
when binary files are part of a test suite, they are typically trying to illustrate some element of the program being tested, in this case a file that was incorrectly xz-encoded. Binary files like these weren't typed by hand, they will always ultimately come from something plaintext source.
Here's an example! My own SQLAlchemy repository has a few binary files in it! https://github.com/sqlalchemy/sqlalchemy/blob/main/test/bina... oh noes. Why are those files there? well in this case I just wanted to test that I can send large binary BLOBs into the database driver and I was lazy. This is actually pretty dumb, the two binary files here add 35K of useless crap to the source, and I could just as easily generate this binary data on the fly using a two liner that spits out random bytes. Anyone could see that two liner and know that it isn't embedding a malicious payload.
If I wanted to generate a poorly formed .xz file, I'd illustrate source code that generates random data, runs it through .xz, then applies "corruption" to it, like zeroing out the high bit of every byte. The process by which this occurs would be all reviewable in source code.
-
Introducing Flama for Robust Machine Learning APIs
Besides, flama also provides support for SQL databases via SQLAlchemy, an SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. Finally, flama also provides support for HTTP clients to perform requests via httpx, a next generation HTTP client for Python.
-
Alembic with Async SQLAlchemy
Alembic is a lightweight database migration tool for usage with SQLAlchemy. The term migration can be a little misleading, because in this context it doesn't mean to migrate to a different database in the sense of using a different version or a different type of database. In this context, migration refers to changes to the database schema: add a new column to a table, modify the type of an existing column, create a new index, etc..
- Imperative vs. Declarative mapping style in Domain Driven Design project
-
Unlocking efficient authZ with Cerbos’ Query Plan
To simplify this process, Cerbos developers have come up with adapters for popular Object-Relational Mapping (ORM) frameworks. You can check out for more details on the query plan repo - which also contains adapters for Prisma and SQLAlchemy - as well as a fully functioning application using Mongoose as its ORM.
-
Python: Just Write SQL
That above pattern is one I've seen people do even recently, using the "select().c" attribute which from very early versions of SQLAlchemy is defined as "the columns from a subquery of the SELECT" ; this usage began raising deprecation warnings in 1.4 and is fully removed in 2.0 as it was a remnant of a much earlier version of SQLAlchemy. it will do exactly as you say, "make a subquery for each filter condition".
the moment you see SQLAlchemy doing something you see that seems "asinine", send an example to https://github.com/sqlalchemy/sqlalchemy/discussions and I will clarify what's going on, correct the usage so that the query you have is what you expect, and quite often we will add new warnings or documentation when we see people doing things we didn't anticipate.
-
A steering council note about making the global
The creator and lead maintainer of SQLAlchemy, one of the most popular and most used Python library for accessing databases (who doesn't?) gave a rather interesting response to PEP703.
If this doesn't ring any alarm bells I don't know what will.
> Basically for the moment the GIL-less idea would likely be burdensome for us and the fact that it's only an "option" seems to strongly imply major compatibility issues that we would not prefer.
https://github.com/sqlalchemy/sqlalchemy/discussions/10002#d...
-
More public SQL-queryable databases?
Recently I discovered BigQuery public datasets - just over 200 datasets available for directly querying via SQL. I think this is a great thing! I can connect these direct to an analytics platform (we use Apache Superset which uses Python SQLAlchemy under the hood) for example and just start dashboarding.
-
How useful is Python in accounting and auditing?
When using python with sql databases like postgres or mariadb or SQLite you would use SQLAlchemy or another ORM of if you're feeling brave, you code it by hand. With ORMs you provide the address of your database and it connects for you, letting you use abstractions instead of writing all the SQL yourself (kind of analogous to using vlookups or index match instead of manually entering data).
-
Day 46-47: Beginner FastAPI Series - Part 3
Our tool we're going to be using for interfacing with the SQLite database is SQLAlchemy, a SQL toolkit that provides a unified API for various relational databases. If you installed FastAPI with pip install "fastapi[all]", SQLAlchemy is already part of your setup. but if you opted for FastAPI alone, you would need to install SQLAlchemy separately with pip install sqlalchemy.
scikit-learn
-
AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite
Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
-
Polars
sklearn is adding support through the dataframe interchange protocol (https://github.com/scikit-learn/scikit-learn/issues/25896). scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.
-
[D] Major bug in Scikit-Learn's implementation of F-1 score
Wow, from the upvotes on this comment, it really seems like a lot of people think that this is the correct behavior! I have to say I disagree, but if that's what you think, don't just sit there upvoting comments on Reddit; instead go to this PR and tell the Scikit-Learn maintainers not to "fix" this "bug", which they are currently planning to do!
- Contraction Clustering (RASTER): A fast clustering algorithm
-
Ask HN: Learning new coding patterns – how to start?
I was in a similar boat to yours - Worked in data science and since then have made a move to data engineering and software engineering for ML services.
I would recommend you look into the Design Patterns book by the Gang of Four. I found it particularly helpful to make extensible code that doesn't break specially with abstract classes, builders and factories. I would also recommend looking into the book The Object Oriented Thought Process to understand why traditional OOP is build the way it is.
You can also look into the source code of popular data science libraries such as sklearn (https://github.com/scikit-learn/scikit-learn/tree/main/sklea...) and see how a lot of them have Base classes to define shared functionality between object of the same nature.
As others mentioned, I would also encourage you to try and implement design patterns in your everyday work - maybe you can make a Factory to load models or preprocessors that follow the same Abstract class?
-
Transformers as Support Vector Machines
It looks like you've been the victim of some misinformation. As Dr_Birdbrain said, an SVM is a convex problem with unique global optimum. sklearn.SVC relies on libsvm which initializes the weights to 0 [0]. The random state is only used to shuffle the data to make probability estimates with Platt scaling [1]. Of the random_state parameter, the sklearn documentation for SVC [2] says
Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary.
[0] https://github.com/scikit-learn/scikit-learn/blob/2a2772a87b...
[1] https://en.wikipedia.org/wiki/Platt_scaling
[2] https://scikit-learn.org/stable/modules/generated/sklearn.sv...
-
How to Build and Deploy a Machine Learning model using Docker
Scikit-learn Documentation
- Planning to get a laptop for ML/DL, is this good enough at the price point or are there better options at/below this price point?
-
Link Prediction With node2vec in Physics Collaboration Network
Firstly, we need a connection to Memgraph so we can get edges, split them into two parts (train set and test set). For edge splitting, we will use scikit-learn. In order to make a connection towards Memgraph, we will use gqlalchemy.
-
WiFilter is a RaspAP install extended with a squidGuard proxy to filter adult content. Great solution for a family, schools and/or public access point
The ML component is based on scikit-learn which differentiates it from purely list-based filters. It couples this with a full-featured wireless router (RaspAP) in a single device, so it fulfills the needs of a use case not entirely addressed by Pi-hole.
What are some alternatives?
tortoise-orm - Familiar asyncio ORM for python, built with relations in mind
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
PonyORM - Pony Object Relational Mapper
Surprise - A Python scikit for building and analyzing recommender systems
Peewee - a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
Keras - Deep Learning for humans
Orator - The Orator ORM provides a simple yet beautiful ActiveRecord implementation.
tensorflow - An Open Source Machine Learning Framework for Everyone
prisma-client-py - Prisma Client Python is an auto-generated and fully type-safe database client designed for ease of use
gensim - Topic Modelling for Humans
pyDAL - A pure Python Database Abstraction Layer
H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.