Hail VS gnomad-browser

Compare Hail vs gnomad-browser and see what are their differences.

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
Hail gnomad-browser
5 15
935 78
0.6% -
9.8 9.7
2 days ago about 2 hours ago
Python TypeScript
MIT License MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Hail

Posts with mentions or reviews of Hail. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-09.
  • We're wasting money by only supporting gzip for raw DNA files
    6 projects | news.ycombinator.com | 9 Jan 2023
  • Software engineers: consider working on genomics
    6 projects | news.ycombinator.com | 19 Nov 2022
    I don't have any funding to hire right now, but I'm always happy to chat about the industry and my experience building Hail (https://hail.is, https://github.com/hail-is/hail), a tool widely used by folks with large collections of human sequences.

    The other posters are not wrong about compensation. Total compensation is off by a factor of two to three.

    However, it is absolutely possible to work with a group of top-notch engineers on serious distributed systems & compilers in service of an excellent scientific-user experience. I know because I do. We are lucky to have a PI who respects and hires and diversity of expertise within his lab.

    I enjoy being deeply embedded with our users. I do not have to guess what they need or want because I help them do it every day.

    I also enjoy enmeshing engineering with statistics, mathematics, and biology. Work is more interesting when so many disciplines conspire towards the end of improved human health.

  • AWS doesn't make sense for scientific computing
    1 project | news.ycombinator.com | 7 Oct 2022
    I think this post is identifying scientific computing with simulation studies and legacy workflows, to a fault. Scientific computing includes those things, but it also includes interactive analysis of very large datasets as well as workflows designed around cloud computing.

    Interactive analysis of large datasets (e.g. genome & exome sequencing studies with 100s of 1000s of samples) is well suited to low-latency, server-less, & horizontally scalable systems (like Dremel/BigQuery, or Hail [1], which we build and is inspired by Dremel, among other systems). The load profile is unpredictable because after a scientist runs an analysis they need an unpredictable amount of time to think about their next step.

    As for productionized workflows, if we redesign the tools used within these workflows to directly read and write data to cloud storage as well as to tolerate VM-preemption, then we can exploit the ~1/5 cost of preemptible/spot instances.

    One last point: for the subset of scientific computing I highlighted above, speed is key. I want the scientist to stay in a flow state, receiving feedback from their experiments as fast as possible, ideally within 300 ms. The only way to achieve that on huge datasets is through rapid and substantial scale-out followed by equally rapid and substantial scale-in (to control cost).

    [1] https://hail.is

  • Ask HN: Who is hiring? (July 2021)
    33 projects | news.ycombinator.com | 1 Jul 2021
    Broad Institute of MIT and Harvard | Cambridge, MA | Associate Software Engineer | Onsite

    We are seeking an associate software engineer interested in contributing to an open-source data visualization library for analyzing the biological impact human genetic variation. You will contribute to projects like gnomAD (https://gnomad.broadinstitute.org), the world's largest catalogue of human genetic variation used by hundreds of thousands of researchers and help us scale towards millions of genomes in the coming years. We are also developing next-generation tools for enabling genetic analyses of large biobanks across richly phenotyped individuals (https://genebass.org). In this role you will gain experience developing data-intensive web applications with Typescript, React, Python, Terraform, Google Cloud Platform, and will make use of the scalable data analysis library Hail (https://hail.is). Key to our success is growing a strong team with a diverse membership who foster a culture of continual learning, and who support the growth and success of one another. Towards this end, we are committed to seeking applications from women and from underrepresented groups. We know that many excellent candidates choose not to apply despite their capabilities; please allow us to enthusiastically counter this tendency.

    Please provide a CV and links previous work or projects, ideally with contributions visible on Github.

    email: [email protected]

gnomad-browser

Posts with mentions or reviews of gnomad-browser. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-14.
  • All identified polymorphisms in a given gene, how to find?
    2 projects | /r/genetics | 14 Jun 2023
  • AskScience AMA Series: We're human genetics researchers here to discuss connections between people in different geographical regions. Ask us anything!
    1 project | /r/askscience | 2 May 2023
  • Converting 23&Me raw data into a format usable by Admixtools 2
    2 projects | /r/bioinformatics | 25 Dec 2022
    try one of these: GnomAD (https://gnomad.broadinstitute.org/) 1000 Genomes (http://browser.1000genomes.org) dbSNP (http://www.ncbi.nlm.nih.gov/snp)
  • What is the maximum number of human?
    1 project | /r/estimation | 31 Oct 2022
    Maybe you can ask the opposite question; what are the bounds of of a functional human being. https://gnomad.broadinstitute.org/ GnomAD is a aggregation of healthy human genetic sequences which was primarily built on the aggregated control groups of many genetic sequencing studies. There are studies of this data analysing the co-occurrence of variants in gnomAD which may help.
  • Insights from personal sequencing data I can explore.
    1 project | /r/bioinformatics | 20 Oct 2022
    Maybe something like this? https://promethease.com/ Clinvar for variants that might be of clinical relevance. https://gnomad.broadinstitute.org/ for allele frequencies & some info about variants.
  • What are some non-pathogenic alleles of the SNCA gene, or how do I find them?
    1 project | /r/genetics | 2 Apr 2022
    You could look at aggregation databases such as gnomad https://gnomad.broadinstitute.org/ anything with a frequency incompatible with the disease is likely non pathogenic
  • Ask HN: Who is hiring? (March 2022)
    30 projects | news.ycombinator.com | 1 Mar 2022
    Broad Institute of MIT and Harvard | Cambridge, MA | Frontend Software Engineer | REMOTE or HYBRID (New England area)

    We are hiring a frontend developer to help lead the next phase of the gnomAD browser, a web application for displaying the world's largest collection of human genome/exome sequences. https://gnomad.broadinstitute.org. Looking for applicants who are excited about data visualization and designing complex interfaces for scientific research.

    Apply here: http://broad.io/cq7dw8

  • Ask HN: Who is hiring? (February 2022)
    19 projects | news.ycombinator.com | 1 Feb 2022
    Broad Institute of MIT and Harvard | New England | Software Engineer | REMOTE/HYBRID

    Our team is focused on building the tools necessary to visualize and interpret massive data sets of human genetic variation and functional genomic information. We have developed gnomAD (https://gnomad.broadinstitute.org), the world’s largest public reference dataset of human exomes and genomes. gnomAD has become one of the most widely used resources in the field, and is now the default reference database for virtually all clinical interpretation pipelines, as well as a standard analysis resource for a wide variety of genetic and biological studies. We estimate gnomAD has contributed to the clinical diagnosis of over 2 million patients with genetic disorders.

    Your role will be to maintain the gnomAD browser, our open source web application for exploring gnomAD and related datasets, and develop new scientific functionality as we continue to grow to over 1 million human samples. You will work with a team of software engineers, computational biologists and clinical and research users to develop new features and visualizations that incorporate user feedback. Software engineering skills and an interest in user interface design and data visualization are key. Basic familiarity with genomics and DNA sequencing data is preferred, but not required. Most importantly, the ideal candidate will have enthusiasm for playing a critical role in a team-oriented project and learning new domains.

    Minimum Requirements

  • Ask HN: How to be my own genetic disease researcher for my partner?
    4 projects | news.ycombinator.com | 6 Dec 2021
  • How to check if a discovered mutation is novel or was discovered before ?
    1 project | /r/genetics | 12 Oct 2021
    If you're talking about humans, start with gnomAD: https://gnomad.broadinstitute.org/

What are some alternatives?

When comparing Hail and gnomad-browser you can also consider the following projects:

GridScale - Scala library for accessing various file, batch systems, job schedulers and grid middlewares.

webviz - web-based visualization libraries

Vegas - The missing MatPlotLib for Scala + Spark

haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

metorikku - A simplified, lightweight ETL Framework based on Apache Spark

metamask-extension - :globe_with_meridians: :electric_plug: The MetaMask browser extension enables browsing Ethereum blockchain enabled websites

Scoozie - Scala DSL on top of Oozie XML

aioli - Framework for building fast genomics web tools with WebAssembly and WebWorkers

Jupyter Scala - A Scala kernel for Jupyter

Baserow - Open source no-code database and Airtable alternative. Create your own online database without technical experience. Performant with high volumes of data, can be self hosted and supports plugins

Summingbird - Streaming MapReduce with Scalding and Storm

FrameworkBenchmarks - Source for the TechEmpower Framework Benchmarks project