|3 days ago||7 days ago|
|Apache License 2.0||BSD 1-Clause License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
is anyone want to join maintaining spark java framework?
2 projects | reddit.com/r/java | 21 Jun 2022
Wow, this has nothing to do with Apache Spark (https://spark.apache.org/), the wildly popular JVM based data processing framework.
How-to-Guide: Contributing to Open Source
19 projects | reddit.com/r/dataengineering | 11 Jun 2022
Perform computation over 500 million vectors
1 project | reddit.com/r/bigdata | 8 Jun 2022
I would guess that Apache Spark would be an okay choice. With data stored locally in avro or parquet files. Just processing the data in python would also work, IMO.
DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It
21 projects | dev.to | 2 Jun 2022
Apache Drill, Druid, Flink, Hive, Kafka, Spark
Optimizing Distributed Joins: The Case of Google Cloud Spanner and DataStax Astra DB
3 projects | dev.to | 31 May 2022
Shuffle and broadcast joins are more suitable for batch or near real-time analytics. For example, they are used in Apache Spark as the main join strategies. Co-located and pre-computed joins are faster and can be used for online transaction processing with real-time applications. They frequently rely on organizing data based on unique storage schemes supported by a database.
What do I need to know about distributed algorithms and systems?
1 project | reddit.com/r/AskProgramming | 22 May 2022
You generally want to keep your data in memory, rather than disk, to keep things reasonably fast. A system like Apache Spark tries to do this for you, spilling to disk when needed. In general, I'd recommend researching Spark, since it will cover a lot of the concepts you care about.
How to use Spark and Pandas to prepare big data
3 projects | dev.to | 10 May 2022
Apache Spark is one of the most actively developed open-source projects in big data. The following code examples require that you have Spark set up and can execute Python code using the PySpark library. The examples also require that you have your data in Amazon S3 (Simple Storage Service). All this is set up on AWS EMR (Elastic MapReduce).
AWS Glue: what is it and how does it work?
1 project | dev.to | 5 May 2022
With Glue, Apache Spark runs in the background. But if this is the first time you’ve heard of the popular open-source analytics engine, it may take you a while to familiarize yourself with the cloud software.
Real-time Open Source Indexes: Databases, Headless CMSs and Static Site Generators
7 projects | dev.to | 4 May 2022
Spark SQL (302 active contributors).
Top Responsibilities of a Data Engineering Manager
1 project | reddit.com/r/dataengineering | 2 May 2022
What’s more, picking the right technology is always evolving. New tools come out all the time, often with different functionality than existing tools. So it’s important that you stay up-to-date on what technologies are available and their latest features. For example, four years ago Apache Spark was completely unknown but today it is quickly becoming the de facto standard for stream processing.
[D] Deep Learning Framework for C++.
7 projects | reddit.com/r/MachineLearning | 12 Jun 2022
[D] PyTorch processes taking up tons of GPU memory - any way to reduce this?
1 project | reddit.com/r/MachineLearning | 25 May 2022
Maybe related: https://github.com/pytorch/pytorch/issues/12873
SWAHILI TEXT CLASSIFICATION USING TRANSFORMERS
4 projects | dev.to | 24 May 2022
Let's dive into the main topic of this article, we are going to train a transformer model for Swahili news classification, Since transformers are large to make the task simple we need to select a wrapper to work with, if you are good with PyTorch you can use PyTorch Lightning a wrapper for high-performance AI research, to wrap the transformers but today lets go with ktrain from Tensorflow Python Library.
AUDIO CLASSIFICATION USING DEEP LEARNING
3 projects | dev.to | 20 May 2022
The second option is to train your own model using machine learning frameworks like Tensorflow and Pytorch.
[D] My experience with running PyTorch on the M1 GPU
4 projects | reddit.com/r/MachineLearning | 19 May 2022
AUTOMATED SPEECH RECOGNITION APPROACHES AND CHALLENGES
2 projects | dev.to | 19 May 2022
The goal of this approach is to replace the intermediate steps with one algorithm. The deep learning approach has achieved state-of-the-art results in speech transcription tasks and is replacing the traditional methods used in ASR. It is also simpler because there are fewer steps involved and does not require as much expertise. The implementation of this approach requires a knowledge understanding of deep learning tools such as PyTorch, Tensorflow, DeepSpeech, etc.
Accelerated PyTorch Training on M1 Mac
6 projects | news.ycombinator.com | 18 May 2022
> Is too limited? Too hard to interact with? Not worth the effort?
IIRC the only way to access ANE is through the Accelerate framework, and it seems to have pretty severe limitations.
Apple has developed a tensorflow plugin but i can't tell you if it uses ANE. Earlier this year they also published a job offer talking about accelerating PyTorch with BNNS and Accelerate. Apparently PyTorch already uses Accelerate and AMX (the matrix coprocessor).
So might indeed be that ANE is too limited and Accelerate never gets to use it.
sono abbastanza nuovo in ai e machine learning, non so da dove cominciare ma voglio concentrare i miei sforzi e studiare come creare modelli/e fake per la moda (ad esempio), che tipo di tecnologia/freamwork devi studiare?
2 projects | reddit.com/r/ItalyInformatica | 18 May 2022
Discrete Algebraic Ricatti Equation Solver
4 projects | reddit.com/r/ControlTheory | 10 May 2022
What is the requirement of something to be compatible with PyTorch? You can in fact ask them about this specifically, they are always a helpful bunch https://github.com/pytorch/pytorch
Introduction to PyTorch
7 projects | dev.to | 2 May 2022
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Flux.jl - Relax! Flux is the ML library that doesn't make you tensor
mediapipe - Cross-platform, customizable ML solutions for live and streaming media.
Scalding - A Scala API for Cascading
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
tensorflow - An Open Source Machine Learning Framework for Everyone
Smile - Statistical Machine Intelligence & Learning Engine
ROCm - ROCm - Open Source Platform for HPC and Ultrascale GPU Computing
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing