|19 days ago||4 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
We haven't tracked posts mentioning Scalate yet.
Tracking mentions began in Dec 2020.
1 project | reddit.com/r/dataengineering | 3 Nov 2021
Have a look at Apache Spark
Spark is lit once again
6 projects | dev.to | 29 Oct 2021
Here at Exacaster Spark applications have been used extensively for years. We started using them on our Hadoop clusters with YARN as an application manager. However, with our recent product, we started moving towards a Cloud-based solution and decided to use Kubernetes for our infrastructure needs.
What is B2D Sector?
13 projects | dev.to | 17 Oct 2021
Example tools:\ Tensorflow, Tableau, Apache Spark, Matlab, Jupyter
Why should I invest in raptoreum? What makes it different
1 project | reddit.com/r/raptoreum | 25 Sep 2021
For your first question, if you are interested I encourage you to read the smart contracts paper here: https://docs.raptoreum.com/_media/Raptoreum_Contracts_EN.pdf and then to dig into what Apache Spark can do here: https://spark.apache.org/
How to use Spark and Pandas to prepare big data
3 projects | dev.to | 21 Sep 2021
Apache Spark is one of the most actively developed open-source projects in big data. The following code examples require that you have Spark set up and can execute Python code using the PySpark library. The examples also require that you have your data in Amazon S3 (Simple Storage Service). All this is set up on AWS EMR (Elastic MapReduce).
Google Colab, Pyspark, Cassandra remote cluster combine these all together
2 projects | dev.to | 13 Sep 2021
How to Run Spark SQL on Encrypted Data
3 projects | dev.to | 10 Aug 2021
For those of you who are new, Apache Spark is a popular distributed computing framework used by data scientists and engineers for processing large batches of data. One of its modules, Spark SQL, allows users to interact with structured, tabular data. This can be done through a DataSet/DataFrame API available in Scala or Python, or by using standard SQL queries. Here you can see a quick example of both below:
Machine Learning Tools and Algorithms
3 projects | reddit.com/r/u_Snoo36930 | 29 Jul 2021
Apache Spark :- A massive data processing engine with built-in modules for streaming, SQL, Machine Learning (ML), and graph processing, Apache Spark is recognized for being quick, simple to use, and general. It is also known for being fast, simple to use, and generic.
Strategies for running multiple Spark jobs simultaneously?
1 project | reddit.com/r/apachespark | 25 Jul 2021
Python VS Scala
2 projects | reddit.com/r/scala | 2 Jul 2021
Actually, it does. Scala has Spark for data science and some ML libs like Smile.
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Scalding - A Scala API for Cascading
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
Smile - Statistical Machine Intelligence & Learning Engine
Twirl - Twirl is Play's default template engine
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
Scio - A Scala API for Apache Beam and Google Cloud Dataflow.
dpark - Python clone of Spark, a MapReduce alike framework in Python
Summingbird - Streaming MapReduce with Scalding and Storm
Deeplearning4j - Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.