Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Most of the self-service or no-code BI, ETL, data wrangling tools are am aware of (like airtable, fieldbook, rowshare, Power BI etc.) were thought of as a replacement for Excel: working with tables should be as easily as working with spreadsheets. This problem can be solved when defining columns within one table: ``ColumnA=ColumnB+ColumnC, ColumnD=ColumnAColumnE`` we get a graph of column computations* similar to the graph of cell dependencies in spreadsheets.
Yet, the main problem is in working multiple tables: how can we define a column in one table in terms of columns in other tables? For example: ``Table1::ColumnA=FUNCTION(Table2::ColumnB, Table3::ColumnC)`` Different systems provided different answers to this question but all of them are highly specific and rather limited.
Why it is difficult to define new columns in terms of other columns in other tables? Short answer is that working with columns is not the relational approach. The relational model is working with sets (rows of tables) and not with columns.
One generic approach to working with columns in multiple tables is provided in the concept-oriented model of data which treats mathematical functions as first-class elements of the model. Previously it was implemented in a data wrangling tool called Data Commander. But them I decided to implement this model in the *Prosto* data processing toolkit which is an alternative to map-reduce and SQL:
It defines data transformations as operations with columns in multiple tables. Since we use mathematical functions, no joins and no groupby operations are needed and this significantly simplifies and makes more natural the task of data transformations.
Moreover, now it provides *Column-SQL* which makes it even easier to define new columns in terms of other columns:
Functions matter – an alternative to SQL and map-reduce for data processing
1 project | reddit.com/r/datascience | 19 May 2021
NoSQL Data Modeling Techniques
1 project | news.ycombinator.com | 10 Apr 2021
[P] Open data transformations in Python, no SQL required
3 projects | reddit.com/r/MachineLearning | 1 Mar 2022
Show HN: Hamilton, a Microframework for Creating Dataframes
6 projects | news.ycombinator.com | 8 Nov 2021
Performing Data Tests on External Data/Complex Data Quality Checks
1 project | reddit.com/r/dataengineering | 7 May 2022