DacFx
spark
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DacFx
-
What do you guys use for version control?
For database projects, if you want the first party (and free) Microsoft experience you can use SQL Server Data Tools in Visual Studio 2019 or 2022 to get the full development experience on Windows. You can also use the SQL Database Projects extension in Azure Data Studio provided you don't run into any of the known limitations. Besides being free, the underlying technology (Data Tier Framework, or DacFx) is used under the hood by SSMS, SSDT, ADS, GitHub pipelines, etc., and is now open source so you can see what's going on if you want to dig into the code.
spark
- .NET for Apache Spark appears to be abandoned
-
Does anyone actually use ML.NET?
Re: DataFrames, that's good to know. There is the DataFrame API which is part of the Microsoft.Data.Analysis NuGet package and that's the API that the issue is tracking and shown in the sample notebook I shared. That API has no dependencies on other systems. The DataFrame you're referring to is part of the .NET for Apache Spark library which has the dependency on Apache Spark which rqeuires some initial setup.
-
What does the .NET ecosystem offer in terms of distributed data processing frameworks?
the data engineering ecosystem is new to me but my first impressions are that everything is catered toward JVM. The only somewhat promising option I've found for building a data pipeline in .NET is github.com/dotnet/spark.
What are some alternatives?
hack-together - Hack Together: Microsoft Graph and .NET is a hackathon for .NET developers to learn Microsoft Graph and Microsoft 365.
ParquetSharp.DataFrame - ParquetSharp.DataFrame is a .NET library for reading and writing Apache Parquet files into/from .NET DataFrames, using ParquetSharp
dotnet-webassembly - Create, read, modify, write and execute WebAssembly (WASM) files from .NET-based applications.
azure-event-hubs - ☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
ML.NET - ML.NET is an open source and cross-platform machine learning framework for .NET.
AnyDiff - A CSharp (C#) diff library that allows you to diff two objects and get a list of the differences back.
Mobius: C# API for Spark - C# and F# language binding and extensions to Apache Spark
jira-issue-analysis - Jira REST client, with emphasis on 'Time in Status' analysis and reporting
TorchSharp - A .NET library that provides access to the library that powers PyTorch.
RssDotnet - A list of my favourite dotnet RSS feeds
Akka.net - Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
databricks-end-to-end-streaming - End-to-end Kafka Streaming Examples on Databricks with Evolving Avro Schemas.