Folly
timescale-analytics
Our great sponsors
Folly | timescale-analytics | |
---|---|---|
90 | 8 | |
27,034 | 335 | |
0.8% | 4.2% | |
9.8 | 6.2 | |
2 days ago | about 2 months ago | |
C++ | Rust | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Folly
- Ask HN: How bad is the xz hack?
- Backdoor in upstream xz/liblzma leading to SSH server compromise
-
A lock-free ring-buffer with contiguous reservations (2019)
To set a HP on Linux, Folly just does a relaxed load of the src pointer, release store of the HP, compiler-only barrier, and acquire load. (This prevents the compiler from reordering the 2nd load before the store, right? But to my understanding does not prevent a hypothetical CPU reordering of the 2nd load before the store, which seems potentially problematic!)
Then on the GC/reclaim side of things, after protected object pointers are stored, it does a more expensive barrier[0] before acquire-loading the HPs.
I'll admit, I am not confident I understand why this works. I mean, even on x86, loads can be reordered before earlier program-order stores. So it seems like the 2nd check on the protection side could be ineffective. (The non-Linux portable version just uses an atomic_thread_fence SeqCst on both sides, which seems more obviously correct.) And if they don't need the 2nd load on Linux, I'm unclear on why they do it.
[0]: https://github.com/facebook/folly/blob/main/folly/synchroniz...
(This uses either mprotect to force a TLB flush in process-relevant CPUs, or the newer Linux membarrier syscall if available.)
-
Appending to an std:string character-by-character: how does the capacity grow?
folly provides functions to resize std::string & std::vector without initialization [0].
[0] https://github.com/facebook/folly/blob/3c8829785e3ce86cb821c...
-
Can anyone explain feedback of a HFT firm regarding implementation of SPSC lock-free ring-buffer queue?
My implementation was quite similar to Boost's spsc_queue and Facebook's folly/ProducerConsumerQueue.h.
-
A Compressed Indexable Bitset
> How is that relevant?
Roaring bitmaps and similar data structures get their speed from decoding together consecutive groups of elements, so if you do sequential decoding or decode a large fraction of the list you get excellent performance.
EF instead excels at random skipping, so if you visit a small fraction of the list you generally get better performance. This is why it works so well for inverted indexes, as generally the queries are very selective (otherwise why do you need an index?) and if you have good intersection algorithms you can skip a large fraction of documents.
I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block: https://github.com/facebook/folly/blob/main/folly/experiment...
-
Defer for Shell
C++ with folly's SCOPE_EXIT {} construct:
https://github.com/facebook/folly/blob/main/folly/ScopeGuard...
-
Is there any facebook/folly community for discussion and Q&A?
Seems like github issues taking a long time to get any response: https://github.com/facebook/folly
-
How a Single Line of Code Made a 24-Core Server Slower Than a Laptop
Can't speak for abseil and tbb, but in folly there are a few solutions for the common problem of sharing state between a writer that updates it very infrequently and concurrent readers that read it very frequently (typical use case is configs).
The most performant solutions are RCU (https://github.com/facebook/folly/blob/main/folly/synchroniz...) and hazard pointers (https://github.com/facebook/folly/blob/main/folly/synchroniz...), but they're not quite as easy to use as a shared_ptr [1].
Then there is simil-shared_ptr implemented with thread-local counters (https://github.com/facebook/folly/blob/main/folly/experiment...).
If you absolutely need a std::shared_ptr (which can be the case if you're working with pre-existing interfaces) there is CoreCachedSharedPtr (https://github.com/facebook/folly/blob/main/folly/concurrenc...), which uses an aliasing trick to transparently maintain per-core reference counts, and scales linearly, but it works only when acquiring the shared_ptr, any subsequent copies of that would still cause contention if passed around in threads.
[1] Google has a proposal to make a smart pointer based on RCU/hazptr, but I'm not a fan of it because generally RCU/hazptr guards need to be released in the same thread that acquired them, and hiding them in a freely movable object looks like a recipe for disaster to me, especially if paired with coroutines https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p05...
-
Ask HN: What are some of the most elegant codebases in your favorite language?
Not sure if it's still the case but about 6 years ago Facebook's folly C++ library was something I'd point to for my junior engineers to get a sense of "good" C++ https://github.com/facebook/folly
timescale-analytics
-
Timescale raises $110M Series C
Hi! So the team is over 100 at this point, but engineering effort is spread across multiple products at this point.
The core timescaledb repo [0] has 10-15 primary engineers (although we are aggressively hiring for database internal engineers), with a few others working on DB hyperfunctions and our function pipelining [1] in a separate extension [2]. I think generally the set of folks who contribute to low-level database internals in C is just smaller than other type of projects.
We also have our promscale product [3], which is our observability backend powered by SQL & TimescaleDB.
And then there is Timescale Cloud, which is obviously a large engineering effort (most of which does not happen in public repos).
And we are hiring. Fully remote & global.
https://www.timescale.com/careers
[0] https://github.com/timescale/timescaledb
[1] https://www.timescale.com/blog/function-pipelines-building-f...
[2] https://github.com/timescale/timescaledb-toolkit
[3] https://github.com/timescale/promscale ; https://github.com/timescale/tobs
-
Function pipelines: Building functional programming into PostgreSQL
(NB: Post author here)
This is in the TimescaleDB Toolkit extension [1] which is licensed under our community license for now and it's not available on DO. It is available on our cloud service fully managed. You can also install it and run it for free yourself.
- How percentile approximation works (and why it's more useful than averages)
-
How PostgreSQL aggregation works and how it inspired our hyperfunctions’ design
Absolutely! We're actually developing a lot of that: https://github.com/timescale/timescaledb-toolkit/tree/main/d...
A number of the things you're looking for we've done experimentally and we'll be stabilizing over the next few releases. So we'd love some feedback while we're still able to futz with the API without making breaking changes.
But the two you're asking about are, I think, going to be covered by hyperloglog (we just reimplemented the internals with HLL++) and stats_agg family of functions, which have both 1D (which will give you avg, stddev, variance, etc) and 2D (co-variance, slope, intercept, x-intercept etc as well as all the 1D functions).
Would also love issues if you think we're missing other stuff, going to be generalizing this and want to make it useful for folks.
(NB: Post author here.)
-
Postgres downsampling performance
If you know that you're going to be doing downsampling at the hourly level then a continuous aggregate on the hour is probably a good idea. We're also building some functions to make some of the continuous aggregate stuff for these sorts of cases easier/more accurate in more cases, especially if you need things like exact averages when you don't have the same number of points in an hour and want to re-aggregate on top of the continuous agg. See: https://github.com/timescale/timescale-analytics/pull/141/files
-
TimescaleDB Raises $40M
Fair point about adaptive chunking. You sound like a long-term user!
There is always a trade-off between getting features to users quickly to experiment and incrementally improve, versus doing it always very conservatively.
When we launched adaptive chunking (introduced in 0.11, deprecated in 1.2), we explicitly marked it as beta and default off, to hopefully reflect that. [1]
The approach we are now taking with Timescale Analytics [2] is to have an explicit distinction between experimental features (which will be part of a distinct"experimental" schema in the database, and must be expressly turned on with appropriate warnings) and stable features. Hopefully this can help find a good balance between stability and velocity, but feedback welcome!
[1] https://github.com/timescale/timescaledb/releases/tag/0.11.0
[2] https://github.com/timescale/timescale-analytics/tree/main/e...
What are some alternatives?
abseil-cpp - Abseil Common Libraries (C++)
orioledb - OrioleDB – building a modern cloud-native storage engine (... and solving some PostgreSQL wicked problems)  🇺🇦
Boost - Super-project for modularized Boost
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Seastar - High performance server-side application framework
Telegraf - The plugin-driven server agent for collecting & reporting metrics.
parallel-hashmap - A family of header-only, very fast and memory-friendly hashmap and btree containers.
promscale - [DEPRECATED] Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.
EASTL - Obsolete repo, please go to: https://github.com/electronicarts/EASTL
pgx - Build Postgres Extensions with Rust! [Moved to: https://github.com/tcdi/pgrx]
OpenFrameworks - openFrameworks is a community-developed cross platform toolkit for creative coding in C++.
t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means