Awesome
Introduction
Microsoft Maps is releasing country wide open building footprints datasets in United States. This dataset contains 129,591,852 computer generated building footprints derived using our computer vision algorithms on satellite imagery. This data is freely available for download and use.
License
This data is licensed by Microsoft under the Open Data Commons Open Database License (ODbL).
Data Vintage
The vintage of the footprints depends on the vintage of the underlying imagery. Bing Imagery is a composite of multiple sources with different capture dates. Each building footprint has a capture date tag associated if we were able to deduce the vintage of imagery source.
Footprints inside the highlighted region on the map are from 2019-2020. There are 73,250,745 such building footprints. This is the focal area where we rerun extraction for the latest release.
The rest of the footprints were extracted from older images, having wider range of capture dates, averaging 2012 year approximately. We have reused footprints from previous releases in this area.
FAQ
What the data include?
129,591,852 building footprint polygon geometries divided by 50 US states and the District of Columbia in GeoJSON format.
Why is the data being released?
Microsoft has a continued interest in supporting a thriving OpenStreetMap ecosystem.
What is the GeoJson format?
GeoJSON is a format for encoding a variety of geographic data structures. For Intensive Documentation and Tutorials, Refer to GeoJson Blog.
Should we import the data into OpenStreetMap?
Maybe. Never overwrite the hard work of other contributors or blindly import data into OSM without first checking the local quality. While our metrics show that this data meets or exceeds the quality of hand-drawn building footprints, the data does vary in quality from place to place, between rural and urban, mountains and plains, and so on. Inspect quality locally and discuss an import plan with the community. Always follow the OSM import community guidelines.
Will the data be used or made available in larger OpenStreetMap ecosystem?
Yes. Currently Microsoft Open Buildings dataset is used in ml-enabler for task creation. You can try it out at AI assisted Tasking Manager. The data will also be made available in Facebook RapiD.
What is the creation process for this data?
The building extraction is done in two stages:
- Semantic Segmentation – Recognizing building pixels on the aerial image using DNNs
- Polygonization – Converting building pixel blobs into polygons
Stage1: Semantic Segmentation
DNN architecture and training
The network backbone we used is EfficientNet described here. Although we have millions of labels at our disposal, we found that an effective combination of supervised and unsupervised training yields the best results.
Stage 2: Polygonization
Method description
We developed a method that approximates the prediction pixels into polygons making decisions based on the whole prediction feature space. This is very different from standard approaches, e.g. the Douglas-Peucker algorithm, which are greedy in nature. The method tries to impose some of a priori building properties, which is, at the moment, manually defined and automatically tuned. Some of these a priori properties are:
How good is the data?
Our metrics show that in the vast majority of cases the quality is at least as good as data hand digitized buildings in OpenStreetMap.
DNN model metrics
These are the intermediate stage metrics we use to track DNN model improvements and they are pixel based. Pixel recall/precision = 95.5%/94.0%
Polygon evaluation metrics
Match metrics:
Metric | Value |
---|---|
Precision | 98.5% |
Recall | 92.4% |
We evaluate following metrics to measure the quality of the output:
- Intersection over Union – This is the standard metric measuring the overlap quality against the labels
- Shape distance – With this metric we measure the polygon outline similarity
- Dominant angle rotation error – This measures the polygon rotation deviation
On our evaluation set contains ~15k building. The metrics on the set are:
IoU | Shape distance | Rotation error [deg] |
---|---|---|
0.86 | 0.4 | 2.5 |
False positive ratio in the corpus
We estimate <1% false positive ratio in 1000 randomly sampled buildings from the entire output corpus.
What is the coordinate reference system?
EPSG: 4326
Will there be more data coming for other geographies?
They are already available.
External References
The building data are featured in NYTimes article.
A Vector Tile implementation of the data is hosted by Esri.
Download links
State or district | Number of Buildings | Unzipped size |
---|---|---|
Alabama | 2,455,168 | 672.58 MiB |
Alaska | 111,042 | 30.00 MiB |
Arizona | 2,738,732 | 806.59 MiB |
Arkansas | 1,571,198 | 425.40 MiB |
California | 11,542,912 | 3.35 GiB |
Colorado | 2,185,953 | 619.88 MiB |
Connecticut | 1,215,624 | 324.20 MiB |
Delaware | 357,534 | 94.00 MiB |
District of Columbia | 77,851 | 22.52 MiB |
Florida | 7,263,195 | 2.01 GiB |
Georgia | 3,981,792 | 1.04 GiB |
Hawaii | 252,908 | 64.72 MiB |
Idaho | 942,132 | 259.43 MiB |
Illinois | 5,194,010 | 1.35 GiB |
Indiana | 3,379,648 | 920.20 MiB |
Iowa | 2,074,904 | 517.95 MiB |
Kansas | 1,614,406 | 428.38 MiB |
Kentucky | 2,447,682 | 663.98 MiB |
Louisiana | 2,173,567 | 600.69 MiB |
Maine | 758,999 | 187.84 MiB |
Maryland | 1,657,199 | 410.84 MiB |
Massachusetts | 2,114,602 | 566.87 MiB |
Michigan | 4,982,783 | 1.24 GiB |
Minnesota | 2,914,016 | 762.08 MiB |
Mississippi | 1,507,496 | 394.08 MiB |
Missouri | 3,190,076 | 840.28 MiB |
Montana | 773,199 | 200.45 MiB |
Nebraska | 1,187,234 | 302.72 MiB |
Nevada | 1,006,278 | 296.10 MiB |
New Hampshire | 577,936 | 146.40 MiB |
New Jersey | 2,550,308 | 681.55 MiB |
New Mexico | 1,037,096 | 291.54 MiB |
New York | 4,972,497 | 1.25 GiB |
North Carolina | 4,678,064 | 1.22 GiB |
North Dakota | 568,213 | 143.54 MiB |
Ohio | 5,544,032 | 1.42 GiB |
Oklahoma | 2,159,894 | 582.14 MiB |
Oregon | 1,873,786 | 545.94 MiB |
Pennsylvania | 4,965,213 | 1.23 GiB |
Rhode Island | 392,581 | 105.21 MiB |
South Carolina | 2,299,671 | 612.67 MiB |
South Dakota | 661,311 | 166.31 MiB |
Tennessee | 3,212,306 | 890.22 MiB |
Texas | 10,678,921 | 2.83 GiB |
Utah | 1,081,586 | 306.98 MiB |
Vermont | 351,266 | 87.92 MiB |
Virginia | 3,079,351 | 797.04 MiB |
Washington | 3,128,258 | 884.38 MiB |
West Virginia | 1,055,625 | 260.33 MiB |
Wisconsin | 3,173,347 | 817.06 MiB |
Wyoming | 386,518 | 99.32 MiB |
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Legal Notices
Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
Privacy information can be found at https://privacy.microsoft.com/en-us/
Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.