Home

Awesome

NuGet Badge

Icon

.NET for Apache® Spark™

.NET for Apache Spark provides high performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data.

.NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

.NET for Apache Spark runs on Windows, Linux, and macOS using .NET 6, or Windows using .NET Framework. It also runs on all major cloud providers including Azure HDInsight Spark, Amazon EMR Spark, AWS & Azure Databricks.

Note: We currently have a Spark Project Improvement Proposal JIRA at SPIP: .NET bindings for Apache Spark to work with the community towards getting .NET support by default into Apache Spark. We highly encourage you to participate in the discussion.

Table of Contents

Supported Apache Spark

<table> <thead> <tr> <th>Apache Spark</th> <th>.NET for Apache Spark</th> </tr> </thead> <tbody align="center"> <tr> <td>2.4*</td> <td rowspan=4><a href="https://github.com/dotnet/spark/releases/tag/v2.1.1">v2.1.1</a></td> </tr> <tr> <td>3.0</td> </tr> <tr> <td>3.1</td> </tr> <tr> <td>3.2</td> </tr> </tbody> </table>

*2.4.2 is <a href="https://github.com/dotnet/spark/issues/60">not supported</a>.

Releases

.NET for Apache Spark releases are available here and NuGet packages are available here.

Get Started

These instructions will show you how to run a .NET for Apache Spark app using .NET 6.

Build Status

Ubuntu iconWindows icon
UbuntuWindows
Build Status

Building from Source

Building from source is very easy and the whole process (from cloning to being able to run your app) should take less than 15 minutes!

Instructions
Windows iconWindows<ul><li>Local - .NET Framework 4.6.1</li><li>Local - .NET 6</li><ul>
Ubuntu iconUbuntu<ul><li>Local - .NET 6</li><li>Azure HDInsight Spark - .NET 6</li></ul>

<a name="samples"></a>

Samples

There are two types of samples/apps in the .NET for Apache Spark repo:

We welcome contributions to both categories!

<table> <tr> <td width="25%"> <h4><b>Analytics Scenario</b></h4> </td> <td> <h4 width="35%"><b>Description</b></h4> </td> <td> <h4><b>Scenarios</b></h4> </td> </tr> <tr> <td width="25%"> <h5>Dataframes and SparkSQL</h5> </td> <td width="35%"> Simple code snippets to help you get familiarized with the programmability experience of .NET for Apache Spark. </td> <td> <h5>Basic &nbsp;&nbsp;&nbsp; <a href="examples/Microsoft.Spark.CSharp.Examples/Sql/Batch/Basic.cs">C#</a> &nbsp; &nbsp; <a href="examples/Microsoft.Spark.FSharp.Examples/Sql/Basic.fs">F#</a>&nbsp;&nbsp;&nbsp;<a href="#"><img src="docs/img/app-type-getting-started.png" alt="Getting started icon"></a></h5> </td> </tr> <tr> <td width="25%"> <h5>Structured Streaming</h5> </td> <td width="35%"> Code snippets to show you how to utilize Apache Spark's Structured Streaming (<a href="https://spark.apache.org/docs/2.3.1/structured-streaming-programming-guide.html">2.3.1</a>, <a href="https://spark.apache.org/docs/2.3.2/structured-streaming-programming-guide.html">2.3.2</a>, <a href="https://spark.apache.org/docs/2.4.1/structured-streaming-programming-guide.html">2.4.1</a>, <a href="https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html">Latest</a>) </td> <td> <h5>Word Count &nbsp;&nbsp;&nbsp; <a href="examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/StructuredNetworkWordCount.cs">C#</a> &nbsp;&nbsp;&nbsp;<a href="examples/Microsoft.Spark.FSharp.Examples/Sql/Streaming/StructuredNetworkWordCount.fs">F#</a> &nbsp;&nbsp;&nbsp;<a href="#"><img src="docs/img/app-type-getting-started.png" alt="Getting started icon"></a></h5> <h5>Windowed Word Count &nbsp;&nbsp;&nbsp;<a href="examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/StructuredNetworkWordCountWindowed.cs">C#</a> &nbsp; &nbsp;<a href="examples/Microsoft.Spark.FSharp.Examples/Sql/Streaming/StructuredNetworkWordCountWindowed.fs">F#</a> &nbsp;&nbsp;&nbsp;<a href="#"><img src="docs/img/app-type-getting-started.png" alt="Getting started icon"></a></h5> <h5>Word Count on data from <a href="https://kafka.apache.org/">Kafka</a> &nbsp;&nbsp;&nbsp;<a href="examples/Microsoft.Spark.CSharp.Examples/Sql/Streaming/StructuredKafkaWordCount.cs">C#</a> &nbsp;&nbsp;&nbsp;<a href="examples/Microsoft.Spark.FSharp.Examples/Sql/Streaming/StructuredKafkaWordCount.fs">F#</a> &nbsp; &nbsp;&nbsp;<a href="#"><img src="docs/img/app-type-getting-started.png" alt="Getting started icon"></a></h5> </td> </tr> <tr> <td width="25%"> <h4>TPC-H Queries</h4> </td> <td width="35%"> Code to show you how to author complex queries using .NET for Apache Spark. </td> <td> <h5>TPC-H Functional &nbsp;&nbsp;&nbsp; <a href="benchmark/csharp/Tpch/TpchFunctionalQueries.cs">C#</a> &nbsp;&nbsp;&nbsp;<a href="#"><img src="docs/img/app-type-e2e.png" alt="End-to-end app icon"></a></h5> <h5>TPC-H SparkSQL &nbsp;&nbsp;&nbsp; <a href="benchmark/csharp/Tpch/TpchSqlQueries.cs">C#</a> &nbsp;&nbsp;&nbsp;<a href="#"><img src="docs/img/app-type-e2e.png" alt="End-to-end app icon"></a></h5> </td> </tr> </tr> </table>

Contributing

We welcome contributions! Please review our contribution guide.

Inspiration and Special Thanks

This project would not have been possible without the outstanding work from the following communities:

How to Engage, Contribute and Provide Feedback

The .NET for Apache Spark team encourages contributions, both issues and PRs. The first step is finding an existing issue you want to contribute to or if you cannot find any, open an issue.

Support

.NET for Apache Spark is an open source project under the .NET Foundation and does not come with Microsoft Support unless otherwise noted by the specific product. For issues with or questions about .NET for Apache Spark, please create an issue. The community is active and is monitoring submissions.

.NET Foundation

The .NET for Apache Spark project is part of the .NET Foundation.

Code of Conduct

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community. For more information, see the .NET Foundation Code of Conduct.

<a name="license"></a>

License

.NET for Apache Spark is licensed under the MIT license.