|6 days ago||5 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Rows.com: Spreadsheets on Steroids
5 projects | news.ycombinator.com | 10 Nov 2021
Standalone Python virtual server example https://github.com/finos/perspective/tree/master/examples/to...
JupyterLab demo on Binder https://mybinder.org/v2/gh/finos/perspective/master?urlpath=...
DuckDB-WASM: Efficient Analytical SQL in the Browser
2 projects | news.ycombinator.com | 29 Oct 2021
Show HN: Vizzu – Open-source charting library focused on animating charts
5 projects | news.ycombinator.com | 17 Oct 2021
The best example of WASM being used to render to canvas (it's also visualizations) I've seen is "Perspective":
"Perspective is an interactive analytics and data visualization component, which is especially well-suited for large and/or streaming datasets. Originally developed at J.P. Morgan and open-sourced through the Fintech Open Source Foundation (FINOS), Perspective makes it simple to build user-configurable analytics entirely in the browser, or in concert with Python and/or Jupyterlab. Use it to create reports, dashboards, notebooks and applications, with static data or streaming updates via Apache Arrow."
Open Source Is Finally Coming to Financial Services
3 projects | news.ycombinator.com | 15 Oct 2021
Man the a16z marketing machine is working hard unfortunately at cost of quality.
For those interested in FS and open source today, especially with a capital markets lens check out:
Lots of great projects, one I used recently and a favourite was this:
Perspective 1.0.0, an open source BI tool built on WebAssembly
2 projects | reddit.com/r/programming | 15 Oct 2021
As far as customizing the Perspective datagrid, the story on this is evolving :) . With the 1.0 release, we've released an NFT demo with a more current version of the plugin API, as well as new plugin API docs. Replacing innerHTML is only costly if you trigger a relayout before the replacement, which you'd want to avoid - check the pudgy-penguins demo source for examples which replaces these with without the intermediate DOM tree being rendered (though this is brower-dependent). If you can't, e.g. the replacement is async or whatever, the underlying regular-table component has an API that allows you to return the DOM elements themselves per cell, but you'd need to write a simple plugin to integrate this as Perspective's version provides its own dataListener.2 projects | reddit.com/r/programming | 15 Oct 2021
2 projects | reddit.com/r/Python | 13 Oct 2021
By the way, the link to the blog on the project website results in a 404.2 projects | reddit.com/r/Python | 13 Oct 2021
1 project | news.ycombinator.com | 13 Oct 2021
Perspective, a DataVis tool powered by WASM
Stream Processing Database
4 projects | reddit.com/r/Database | 28 Nov 2021
There's ksqldb (open source, built with java) and materialize (there's standalone edition), both need to use Kafka/RedPanda, also Clickhouse (open source, with materialize view with specific engine, but need to buffer the inserts using proxy like KittenHouse or buffering library like ch-timed-buffer), is there any other alternative to those 3 (that similarly doesn't do full scan to do aggregation)?
Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics
15 projects | dev.to | 25 Nov 2021
Moreover, using open-source warehouse tools can allow unlocking additional insights from your data in real-time and at a lesser cost. PostgreSQL (website, repo) is a popular example of an efficient and low-cost data warehousing solution. Another example is ClickHouse (website, GitHub), an open-source, analytics-focused DBMS that allows generating analytical reports from data in real-time using SQL.
Welcome to the free open-source OLAP server project
2 projects | dev.to | 15 Nov 2021
The most efficient way is to use column store databases as data sources for eMondrian. For example, ClickHouse could run as a powerful and fast query engine while eMondrian works as a proxy representing data as cubes and executing MDX queries.
How to speed up ClickHouse queries using materialized columns
1 project | dev.to | 11 Nov 2021
As of writing, there's a feature request on Github for adding specific commands for materializing specific columns on ClickHouse data parts.1 project | news.ycombinator.com | 26 Oct 2021
Nice article. Materialized columns in ClickHouse are a bit like indexes in the sense that they give a faster path to the answer by reading less data.
ClickHouse devs are adding a feature called semi-structured data that will optimize stored JSONs to columnar storage and also add convenient query syntax.  At that point the trade-off between stored JSON blobs and and materialized columns will become a lot less stark than it is today.
To what extent do you use SQL in your job?
1 project | reddit.com/r/datascience | 30 Oct 2021
I'm not a business analyst but a software developer. I've worked quite a bit with event data. Think "Order Completed", "User Signed Up" and "Subscription Cancelled". When those events get channelled into a column-store database like Redshift or Clickhouse, you can answer a lot of advanced questions using SQL. In particular, Clickhouse has lots of useful functions for analysing datasets. See this analysis of GitHub events as an example.
What is ClickHouse how it compares to PostgreSQL and TimescaleDB for time series
11 projects | news.ycombinator.com | 21 Oct 2021
Hi Ajay! Thanks for the thoughtful response and email. I would love a direct meeting and will contact you shortly.
I don't mean to gloss over ClickHouse imperfections. There are lots of them. For my money the biggest is that it still takes way too much expertise in ClickHouse for ordinary developers to use it effectively. Part of that is SQL compatibility, part of it is lack of tools of which simple backup is certainly one. To the extent that ClickHouse is risky, the risk is finding (and retaining) staff who can use it properly. Our business at Altinity exists in large part because of this risk, so I know it's real.
The big aha! experience for me has been that the things like lack of ACID transactions or weak backup mechanisms are not necessarily the biggest issues for most ClickHouse users. I came to ClickHouse from a long background in RDBMS and transactional replication. Things that would be game ending in that environment are not in analytic systems.
What's more interesting (mind-expanding even) is that techniques like deduplication of inserted blocks and async multi-master replication turn out to be just as important as ACID & backups to achieve reliable systems. Furthermore, services like Kafka that allow you to have DC-level logs are an essential part of building analytic applications that are reliable and performant at scale. We're learning about these mechanisms in the same way that IBM and others developed ACID transaction ideas in the 1970s--by solving problems in real systems. It's really fun to be part of it.
My comment didn't convey this clearly, for which I heartily apologize. I certainly don't intend to portray ClickHouse as perfect and still less to bash Timescale. I don't know enough about the latter to make any criticism worth reading.
p.s., Non-transactional insert (specifically non-atomicity across blocks and tables) is an undisputed problem. It's being fixed in https://github.com/ClickHouse/ClickHouse/issues/22086. Altinity and others are working on backups. Backup comes up in my job just about every day.11 projects | news.ycombinator.com | 21 Oct 2021
One thing I was surprised to see is that ClickHouse and ElasticSearch have the same number of contributors. That's pretty astounding given how much older and more prominent ElasticSearch has been.
Recommend a service for storing large amount of data except for Big Table
1 project | reddit.com/r/googlecloud | 30 Sep 2021
If this is analytical data, you could run your own Clickhouse system for cheap that could handle this load, though whole sale extracting the data later on would be tricky. Otherwise, and I know you said no, BigTable is probably the answer.
I Don't Think Elasticsearch Is a Good Logging System
8 projects | news.ycombinator.com | 28 Sep 2021
What are some alternatives?
VictoriaMetrics - VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
PostgreSQL - Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch
Adminer - Database management in a single PHP file
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
TileDB - The Universal Storage Engine
LevelDB - LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
MySQL - MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.