Home

Awesome

Apache Sedona

Scala and Java build Python build R build Docker image build Example project build Docs build

Download statisticsMavenPyPIConda-forgeCRANDockerHub
Apache Sedona225k/monthPyPI - Downloads DownloadsAnaconda-Server Badge Docker pulls
Archived GeoSpark releases10k/monthPyPI - DownloadsDownloads

Join the community

Follow Sedona on Twitter for fresh news: Sedona@Twitter

Join the Sedona Discord community:

Join the Sedona monthly community office hour: Google Calendar, Tuesdays from 8 AM to 9 AM Pacific Time, every 4 weeks

Sedona JIRA: Bugs, Pull Requests, and other similar issues

Sedona Mailing Lists: dev@sedona.apache.org: project development, general questions or tutorials.

What is Apache Sedona?

Apache Sedona™ is a spatial computing engine that enables developers to easily process spatial data at any scale within modern cluster computing systems such as Apache Spark and Apache Flink. Sedona developers can express their spatial data processing tasks in Spatial SQL, Spatial Python or Spatial R. Internally, Sedona provides spatial data loading, indexing, partitioning, and query processing/optimization functionality that enable users to efficiently analyze spatial data at any scale.

Features

Some of the key features of Apache Sedona include:

These are some of the key features of Apache Sedona, but it may offer additional capabilities depending on the specific version and configuration.

Click Binder and play the interactive Sedona Python Jupyter Notebook immediately!

When to use Sedona?

Use Cases:

Apache Sedona is a widely used framework for working with spatial data, and it has many different use cases and applications. Some of the main use cases for Apache Sedona include:

Code Example:

This example loads NYC taxi trip records and taxi zone information stored as .CSV files on AWS S3 into Sedona spatial dataframes. It then performs spatial SQL query on the taxi trip datasets to filter out all records except those within the Manhattan area of New York. The example also shows a spatial join operation that matches taxi trip records to zones based on whether the taxi trip lies within the geographical extents of the zone. Finally, the last code snippet integrates the output of Sedona with GeoPandas and plots the spatial distribution of both datasets.

Load NYC taxi trips and taxi zones data from CSV Files Stored on AWS S3

taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://your-directory/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'Fare_Amt')
zoneDf = sedona.read.format('csv').option("delimiter", ",").load("s3a://your-directory/data/TIGER2018_ZCTA5.csv")
zoneDf = zoneDf.selectExpr('ST_GeomFromWKT(_c0) as zone', '_c1 as zipcode')

Spatial SQL query to only return Taxi trips in Manhattan

taxidf_mhtn = taxidf.where('ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.93,40.79), pickup)')

Spatial Join between Taxi Dataframe and Zone Dataframe to Find taxis in each zone

taxiVsZone = sedona.sql('SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)')

Show a map of the loaded Spatial Dataframes using GeoPandas

zoneGpd = gpd.GeoDataFrame(zoneDf.toPandas(), geometry="zone")
taxiGpd = gpd.GeoDataFrame(taxidf.toPandas(), geometry="pickup")

zone = zoneGpd.plot(color='yellow', edgecolor='black', zorder=1)
zone.set_xlabel('Longitude (degrees)')
zone.set_ylabel('Latitude (degrees)')

zone.set_xlim(-74.1, -73.8)
zone.set_ylim(40.65, 40.9)

taxi = taxiGpd.plot(ax=zone, alpha=0.01, color='red', zorder=3)

Docker image

We provide a Docker image for Apache Sedona with Python JupyterLab and a single-node cluster. The images are available on DockerHub

Building Sedona

NameAPIIntroduction
commonJavaCore geometric operation logics, serialization, index
sparkSpark RDD/DataFrame Scala/Java/SQLDistributed geospatial data processing on Apache Spark
flinkFlink DataStream/Table in Scala/Java/SQLDistributed geospatial data processing on Apache Flink
snowflakeSnowflake SQLDistributed geospatial data processing on Snowflake
spark-shadedNo source codeshaded jar for Sedona Spark
flink-shadedNo source codeshaded jar for Sedona Flink
snowflake-testerJavatester program for Sedona Snowflake
pythonSpark RDD/DataFrame PythonDistributed geospatial data processing on Apache Spark
RSpark RDD/DataFrame in RR wrapper for Sedona
ZeppelinApache ZeppelinPlugin for Apache Zeppelin 0.8.1+

Documentation

Please visit Apache Sedona website for detailed information

Powered by

<a href="https://www.apache.org/"> <img alt="The Apache Software Foundation" src="https://www.apache.org/foundation/press/kit/asf_logo_wide.png" width="500" class="center"> </a>