Hail VS tenzir

Compare Hail vs tenzir and see what are their differences.

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
Hail tenzir
5 15
935 612
0.6% 0.8%
9.8 10.0
2 days ago about 3 hours ago
Python C++
MIT License BSD 3-clause "New" or "Revised" License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Hail

Posts with mentions or reviews of Hail. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-09.
  • We're wasting money by only supporting gzip for raw DNA files
    6 projects | news.ycombinator.com | 9 Jan 2023
  • Software engineers: consider working on genomics
    6 projects | news.ycombinator.com | 19 Nov 2022
    I don't have any funding to hire right now, but I'm always happy to chat about the industry and my experience building Hail (https://hail.is, https://github.com/hail-is/hail), a tool widely used by folks with large collections of human sequences.

    The other posters are not wrong about compensation. Total compensation is off by a factor of two to three.

    However, it is absolutely possible to work with a group of top-notch engineers on serious distributed systems & compilers in service of an excellent scientific-user experience. I know because I do. We are lucky to have a PI who respects and hires and diversity of expertise within his lab.

    I enjoy being deeply embedded with our users. I do not have to guess what they need or want because I help them do it every day.

    I also enjoy enmeshing engineering with statistics, mathematics, and biology. Work is more interesting when so many disciplines conspire towards the end of improved human health.

  • AWS doesn't make sense for scientific computing
    1 project | news.ycombinator.com | 7 Oct 2022
    I think this post is identifying scientific computing with simulation studies and legacy workflows, to a fault. Scientific computing includes those things, but it also includes interactive analysis of very large datasets as well as workflows designed around cloud computing.

    Interactive analysis of large datasets (e.g. genome & exome sequencing studies with 100s of 1000s of samples) is well suited to low-latency, server-less, & horizontally scalable systems (like Dremel/BigQuery, or Hail [1], which we build and is inspired by Dremel, among other systems). The load profile is unpredictable because after a scientist runs an analysis they need an unpredictable amount of time to think about their next step.

    As for productionized workflows, if we redesign the tools used within these workflows to directly read and write data to cloud storage as well as to tolerate VM-preemption, then we can exploit the ~1/5 cost of preemptible/spot instances.

    One last point: for the subset of scientific computing I highlighted above, speed is key. I want the scientist to stay in a flow state, receiving feedback from their experiments as fast as possible, ideally within 300 ms. The only way to achieve that on huge datasets is through rapid and substantial scale-out followed by equally rapid and substantial scale-in (to control cost).

    [1] https://hail.is

  • Ask HN: Who is hiring? (July 2021)
    33 projects | news.ycombinator.com | 1 Jul 2021
    Broad Institute of MIT and Harvard | Cambridge, MA | Associate Software Engineer | Onsite

    We are seeking an associate software engineer interested in contributing to an open-source data visualization library for analyzing the biological impact human genetic variation. You will contribute to projects like gnomAD (https://gnomad.broadinstitute.org), the world's largest catalogue of human genetic variation used by hundreds of thousands of researchers and help us scale towards millions of genomes in the coming years. We are also developing next-generation tools for enabling genetic analyses of large biobanks across richly phenotyped individuals (https://genebass.org). In this role you will gain experience developing data-intensive web applications with Typescript, React, Python, Terraform, Google Cloud Platform, and will make use of the scalable data analysis library Hail (https://hail.is). Key to our success is growing a strong team with a diverse membership who foster a culture of continual learning, and who support the growth and success of one another. Towards this end, we are committed to seeking applications from women and from underrepresented groups. We know that many excellent candidates choose not to apply despite their capabilities; please allow us to enthusiastically counter this tendency.

    Please provide a CV and links previous work or projects, ideally with contributions visible on Github.

    email: [email protected]

tenzir

Posts with mentions or reviews of tenzir. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-17.
  • Vector: A high-performance observability data pipeline
    5 projects | news.ycombinator.com | 17 Mar 2024
    We're building something similar at Tenzir, but more for operational security workloads. https://docs.tenzir.com

    Differences to Vector:

    - An agent has optional indexed storage, so you can store your data there and pick it up later. The storage is based on Apache Feather, Parquet's little brother.

    - Pipelines operators both work with data frames (Arrow record batches) or chunks of bytes.

    - Structured pipelines are multi-schema, i.e., a single pipeline can process streams of record batches with different schemas.

  • Ask HN: Who is hiring? (March 2024)
    12 projects | news.ycombinator.com | 1 Mar 2024
    Tenzir | Remote (EU) or Hamburg, Germany | open-core | Full-time | https://tenzir.com

    Tenzir is hiring several key engineering roles to meet the needs in expanding the team. Our product: security data pipelines. From the data side, think of it as an Arrow-native, multi-schema ETL tool that offers optional storage in Parquet/Feather. From the security perspective, think of it as a solution for collecting, parsing, transforming, aggregating, and routing data. We typically sit between the data sources (endpoint, network, cloud) and sinks (SIEM, data lake).

    Our open-source execution engine is C++20 (https://github.com/tenzir/tenzir), our platform is SvelteKit and TypeScript. Experience with data-first frontend apps is a great plus. Open positions at https://tenzir.jobs.personio.de:

        - Fullstack Engineer
  • Pql, a pipelined query language that compiles to SQL (written in Go)
    6 projects | news.ycombinator.com | 28 Feb 2024
    We're in the middle of getting TQL v2 [] out of the door with support for expressions and more advanced control flow, e.g., match-case statements. There's a blog post [#] about the core design of the engine as well.

    While it's a general-purpose ETL tool, we're targeting primary operational security use case where people today use Splunk, Sentinel/ADX, Elastic, etc. So some operators are very security'ish, like Sigma, YARA, or Velociraptor.

    [] https://github.com/tenzir/tenzir/blob/64ef997d736e9416e859bf...

    [#] https://docs.tenzir.com/blog/five-design-principles-for-buil...

  • Cisco Acquires Splunk
    5 projects | news.ycombinator.com | 21 Sep 2023
    Hey, founder of Tenzir [1] here — We are building an open-core pipeline-first security data engine that can massively reduce your Splunk costs. Even though we go to market "mid stream" we have a few users that use us as light-weight SIEM (or more accurately, just plain log management).

    We are still in early access to browse through our docs or swing by our Discord.

    [1] https://tenzir.com | https://docs.tenzir.com

  • VAST 3.1 open-source security data pipelines released
    1 project | /r/cybersecurity | 16 May 2023
    Download VAST v3.1 here: https://github.com/tenzir/vast/releases/tag/v3.1.0
  • C++ Jobs - Q2 2022
    4 projects | /r/cpp | 3 Apr 2022
    Tenzir is a funded seed-stage startup that builds a next generation data-plane for plug-and-play security operations. Our mission is to empower defenders with an open data engineering platform to perform data-driven investigations through combination best-of-breed solutions. Our stack consists of the high-performance C++20 telemetry engine VAST, a Rust API, and a ReasonML-based frontend.
  • Parallel Grouped Aggregation in DuckDB
    2 projects | news.ycombinator.com | 7 Mar 2022
    I had chat with Hannes, the DuckDB co-founder, a few weeks ago. They are building awesome stuff to become the "SQLite of OLAP". The team comes with a strong academic background and is tuned into the data engineering world.

    At Tenzir, we looked at DuckDB as embeddable backend engine to do the heavy lifting of query execution of our engine [1]. Our idea is throwing over a set of Parquet files, along with a query; initially SQL but perhaps soon Substrait [2] if it picks up.

    We also experiment with a cloud deployment [3] where a different set of I/O path may warrant a different backend engine. Right now, we're working on a serverless approach leveraging Datafusion (and depending on maturity, Ballista at some point).

    My hunch is that we will see more pluggability in this space moving forward. It's not only meaningful from an open-core business model perspective, but also pays dividends to the UX. The company that's solving a domain problem (for us: security operations center infrastructre) can leverage a high-bandwidth drop-in engine and only needs to wire it properly. This requires much less data engineers than building a poorman's version of the same inhouse.

    We also have the R use case, e.g., to write reports in Rmarkdown that crunch some customer security telemetry, highlighting outliers or other noteworthy events. We're not there yet, but with the right query backend, I would expect to get this almost for free. We're close to being ready to use Arrow Flight for interop, but it's not zero-copy. DuckDB has demonstrated the zero-copy approach recently [4], going through the C API. (The story is also relevant when doing s/R/Python/, FWIW.)

    [1] https://github.com/tenzir/vast

  • C++ Jobs - Q4 2021
    4 projects | /r/cpp | 2 Oct 2021
    To this end, we build the high-performance telemetry engine VAST, which at its core, ingests hundreds of thousands of events per second from high-volume data sources (such as network telemetry as NetFlow, Zeek, Suricata, and endpoint telemetry from various agents). To the user, VAST offers low-latency access through various APIs, and in particular Apache Arrow for high-bandwidth data sharing with downstream tooling. A flexible plugin API enables additional security-specific use cases on top, such as realtime matching of threat intelligence or mining of asset data for passive inventorization.
  • Ask HN: Who is hiring? (October 2021)
    27 projects | news.ycombinator.com | 1 Oct 2021
    Tenzir | C++, ReasonML, Rust, Python | Remote | Open-source | Full-time | https://tenzir.com

    Tenzir is a funded seed-stage startup that builds a next generation data-plane for plug-and-play security operations. Our mission is to empower defenders with an open platform to perform automated data-driven investigations through combination best-of-breed solutions. Our stack consists of the high-performance C++ database VAST (https://github.com/tenzir/vast), a Rust API, and a ReasonML-based frontend.

    Our open engineering positions include:

    - Database: https://tenzir.com/career/backend-engineer/

    - DevOps: https://tenzir.com/career/devops-platform-engineer/

    - Frontend: https://tenzir.com/career/frontend-engineer/

    We are based out of Hamburg, Germany, but cultivate an agile remote-first mindset. If you live in the region and look for a System Administrator, we’d love to hear from you!

    For any questions, feel free to reach out to us at [email protected].

  • Hiring: ReasonML Frontend Engineer - Remote EU
    1 project | /r/reasonml | 7 Sep 2021
    We at Tenzir (https://tenzir.com/) are an early-stage startup that build a next generation data-plane for modern Security Operations Centers. We are looking for a frontend engineer to help us enhance the web interface to VAST (our open-core telemetry engine, https://github.com/tenzir/vast). In our stack, we use C++ for VAST , Rust and ReasonML (compiled to JS) in our API-Layer, and ReasonML on the frontend. Our website is written in ReasonML with the help of Gatsby. Our team cultivates a mindset of strong typing and functional programming, practiced end-to-end across the entire stack. We're a remote-first company, scattered across Europe. Ideally looking for someone within (+ / -) 4hrs timezone.

What are some alternatives?

When comparing Hail and tenzir you can also consider the following projects:

GridScale - Scala library for accessing various file, batch systems, job schedulers and grid middlewares.

webviz - web-based visualization libraries

Vegas - The missing MatPlotLib for Scala + Spark

exo - A process manager & log viewer for dev

metorikku - A simplified, lightweight ETL Framework based on Apache Spark

dfir-orc - Forensics artefact collection tool for systems running Microsoft Windows

Scoozie - Scala DSL on top of Oozie XML

FFMpeg-Online - This repository catalogs a list of FFMpeg commands for different situations. By https://hotpot.ai.

Jupyter Scala - A Scala kernel for Jupyter

label-studio - Label Studio is a multi-type data labeling and annotation tool with standardized output format

Summingbird - Streaming MapReduce with Scalding and Storm

Baserow - Open source no-code database and Airtable alternative. Create your own online database without technical experience. Performant with high volumes of data, can be self hosted and supports plugins