nodejs-bigquery
dbt-core
Our great sponsors
nodejs-bigquery | dbt-core | |
---|---|---|
43 | 86 | |
451 | 8,718 | |
0.7% | 6.1% | |
7.9 | 9.7 | |
10 days ago | 4 days ago | |
TypeScript | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
nodejs-bigquery
-
Wrangling BigQuery at Reddit
If you've ever wondered what it's like to manage a BigQuery instance at Reddit scale, know that it's exactly like smaller systems just with much, much bigger numbers in the logs. Database management fundamentals are eerily similar regardless of scale or platform; BigQuery handles just about anything we throw at it, and we do indeed throw it the whole book. Our BigQuery platform is more than 100 petabytes of data that supports data science, machine learning, and analytics workloads that drive experiments, analytics, advertising, revenue, safety, and more. As Reddit grew, so did the workload velocity and complexity within BigQuery and thus the need for more elegant and fine-tuned workload management.
-
Building a dev.to analytics dashboard using OpenSearch
Now I know I've got some data I could use, I now need to find a platform that I can use to analyse the data coming from the Forem API. I did consider some other pieces of software, such as Google BigQuery (with looker studio) and ElasticSearch (with Kibana), I ultimately went with OpenSearch which is essentially a forked version of ElasticSearch maintained by AWS. The main reasons are that I could host it locally for free (unlike BigQuery). I do have some prior experience with both elastic (back when it was called ELK) and OpenSearch, but my work with OpenSearch was far more recent, so I decided to go with that.
- Learning Excel. Is there a resource for fake data sets like retail and wholesale inventories and sales histories etc for testing and practice?
-
Data Analytics at Potloc I: Making data integrity your priority with Elementary & Meltano
Bigquery as our data warehouse
-
Designing a Video Streaming Platform 📹
Google BigQuery
-
What is data integration?
You build a data integration between all the ad service providers (e.g. Google Ads, Facebook Ads, etc.), ingesting data from those APIs and storing it in your BigQuery data warehouse.
-
What are Firebase Extensions? How can they speed up your app development?
It also includes some extensions that integrate Firebase with Google Cloud Platform services such as BigQuery.
-
Evolutionary Data Infrastructure
In addition, batch tasks require knowledge of the data schema of each service in order to get the data correctly and save it to the corresponding warehouse table. Assuming our data warehouse is GCP BigQuery, the schema in the warehouse table also needs to be created and modified manually.
-
Moving to Google Cloud managed services, from a FinOps point of view
BigQuery has a pricing model close to Pub/Sub : you pay for what you insert on the database (in streaming) and the storage of these data. The main difference is on what you can do with these data. BigQuery is not a message queuing service, this is a data warehouse service. It proposes a query service to exploit these data and you pay for these queries. Actually, not on the query itself but on the quantity of data manipulated for producing the results of the query. This means that you do not directly pay for a power capacity on query but on data transfer to produce the result which is very different from a none managed database perspective such SQL databases where the main model pricing is the node size to store and query data.
-
Apache Kafka Use Cases: When To Use It & When Not To
A Kafka-based data integration platform will be a good fit here. The services can add events to different topics in a broker whenever there is a data update. Kafka consumers corresponding to each of the services can monitor these topics and make updates to the data in real-time. It is also possible to create a unified data store through the same integration platform. Developers can implement a unified store either using an open source data warehouse like Apache Kylin or use a cloud-based one like Redshift or Snowflake. In this instance, the organization uses BigQuery. Data to this warehouse can be loaded through a separate Kafka topic. The below diagram summarizes the complete architecture.
dbt-core
-
Relational is more than SQL
dbt integration was one of our major goals early on but we found that the interaction wasn't as straightforward as had hoped.
There is an open PR in the dbt repo: https://github.com/dbt-labs/dbt-core/pull/5982#issuecomment-...
I have some ideas about future directions in this space where I believe PRQL could really shine. I will only be able to write those down in a couple of hours. I think this could be a really exciting direction for the project to grow into if anyone would like to collaborate and contribute!
-
Python: Just Write SQL
I really dislike SQL, but recognize its importance for many organizations. I also understand that SQL is definitely testable, particularly if managed by environments such as DBT (https://github.com/dbt-labs/dbt-core). Those who arrived here with preference to python will note that dbt is largely implemented in python, adds Jinja macros and iterative forms to SQL, and adds code testing capabilities.
-
Transform Your Data Like a Pro With dbt (Data Build Tool)
3). Data Build Tool Repository.
- How do I build a docker image based on a Dockerfile on github?
-
DBT core v1.5 released
Here’s the PR, which includes a what/how/why: https://github.com/dbt-labs/dbt-core/issues/7158
- Building Column Level Lineage for dbt
-
Unit testing with dbt
Hey OP! There are packages like dbt-datamocktool or dbt-unit-testing. You can check it out. You might want to check out this thread as well.
- SQL and M4 = Composable SQL
-
Interview Prep - Senior Data Integration role
RudderStack, dbt, Kafka, Headless CDP, etc. on top of my mind
What are some alternatives?
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
metricflow - MetricFlow allows you to define, build, and maintain metrics in code.
n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
citus - Distributed PostgreSQL as an extension
dagster - An orchestration platform for the development, production, and observation of data assets.
argo-navis - Argo Navis repository for research, docs and misc items
streamlit - Streamlit — A faster way to build and share data apps.
targets - Function-oriented Make-like declarative workflows for R
great_expectations - Always know what to expect from your data.
nbdev - Create delightful software with Jupyter Notebooks