Tips for scalable workflows on AWS

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
  1. aws-genomics-workflows

    Discontinued Genomics Workflows on AWS

    There are a lot of tools built with C/C++ using glibc shared libraries. The AWS CLI v2 is one of these tools. It is common for workflow engines running on AWS to bind mount the AWS CLI from the host instance into the container so that it is available for interacting with other AWS services like staging data from Amazon S3. Challenges arise when a tooling container is based on an image without glibc shared libraries as is the case with ultra-minimal base images like alpine and busybox. You can still use these ultra-minimal images, but you need to take extra steps to ensure that glibc shared libraries are available. For example, the AWS CLI v2 is distributed with the shared libraries it needs, and to make it work on an alpine based container, you can modify the LD_LIBRARY_PATH environment variable in the container environment to point to where these shared libraries are installed.

  2. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  3. htslib

    C library for high-throughput sequencing data formats

    In contrast, processing can start immediately and only transfer what is necessary if tooling can read bytes of data directly from Amazon S3. Tools based on htslib can do this, so you can run something like:

  4. aws-sdk

    Discontinued Landing page for the AWS SDKs on GitHub

    One common pattern to integrate with AWS from a workflow job is to call additional services using the AWS CLI. Overall, this works well, but there are a few considerations one should note when doing so. First and foremost, a workflow job needs to know where the AWS CLI installed and how to use it. You can do this by either installing the AWS CLI on the host compute and bind mounting it into the container job, or including the AWS CLI as part of the container image. That said, see my notes above on keeping container images small for associated caveats. Second, while the AWS CLI is great for scripting, for more complex operations direct integration via the AWS SDK is a better fit.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Gentoo -Os vs -O3 application startup time?

    2 projects | /r/Gentoo | 29 Jun 2023
  • Software Development Project

    2 projects | /r/bioinformatics | 28 Jun 2022
  • Interested in Bioinformatics / C pair-programming opportunity / learning / portfolio project.

    1 project | /r/cscareerquestions | 13 Aug 2021
  • Does anyone know of a repository for actual genetic data?

    2 projects | /r/genetics | 5 Nov 2022
  • Pigz: A parallel implementation of gzip for multi-core machines

    5 projects | news.ycombinator.com | 17 Oct 2022

Did you know that C is
the 6th most popular programming language
based on number of references?