Home

Awesome

Tree Segmentation

This is the code repository for the Tree Segmentation Project. The purpose for this repository:

Image of lastrees output

Dependencies

How to run

The R script to process point cloud data and output tree canopy polygons is called lidar_processing_pipeline.R

There should be an input directory containing a collection of 1 or many las/laz files.

e.g.

files <- list.files(path="/path/to/input_las_files", pattern="*.las", full.names=TRUE, recursive=FALSE)

There should also be a path to a directory which will contain the shapefile/s of tree canopies.

e.g.

outws <- "/path/to/output_shps"

Processing LiDAR point cloud data

This tutorial builds on the lidR tutorial Segment individual trees and compute metrics by exploring in-depth the process of preparing the raw point cloud and tree segmentation.

Overview

Downloading data

Let's start a las tile from the City of Vancouver with a nice mixture of buildings and trees. The City of Vancouver has a really nice web interface:

City of Vancouver LiDAR Web Server

For this tutorial, we are going to download and work with tile 4810E_54560N

Before we even unzip the downloaded file, let's inspect all of the available metadata to get a sense of how much we know about the data. Luckily, the web interface has a nice metadata page. We can see from the metadata a few important features:

Inspecting the point cloud data

Now we will begin inspecting the raw point cloud data using the R package lidR.

Import packages we will use in this tutorial

require(lidR)
require(rlas) # Necessary for writelax
require(rgdal) # Writing to shp or raster
require(tictoc) # for tic() toc() function

Let's read in the las file

data <- /path/to/your/pointclouddata.las`
las <- readLAS(data) # Read in all of the data

and inspect the data

lascheck(las)
 Checking the data
  - Checking coordinates... ✓
  - Checking coordinates type... ✓
  - Checking attributes type... ✓
  - Checking ReturnNumber validity... ✓
  - Checking NumberOfReturns validity... ✓
  - Checking ReturnNumber vs. NumberOfReturns... ✓
  - Checking RGB validity... ✓
  - Checking absence of NAs... ✓
  - Checking duplicated points...
   ⚠ 6337 points are duplicated and share XYZ coordinates with other points
  - Checking degenerated ground points... ✓
  - Checking attribute population...
   ⚠ 'ScanDirectionFlag' attribute is not populated.
 Checking the header
  - Checking header completeness... ✓
  - Checking scale factor validity... ✓
  - Checking Point Data Format ID validity... ✓
  - Checking extra bytes attributes validity... ✓
  - Checking coordinate reference sytem... ✓
 Checking header vs data adequacy
  - Checking attributes vs. point format... ✓
  - Checking header bbox vs. actual content... ✓
  - Checking header number of points vs. actual content... ✓
  - Checking header return number vs. actual content... ✓
 Checking preprocessing already done 
  - Checking ground classification... yes
  - Checking normalization... no
  - Checking negative outliers...
   ⚠ 137970 points below 0
  - Checking flightline classification... yes

You can see that lascheck() provides useful quality control information about the LiDAR data.

We can also get some basic information about the point cloud using

summary(las)
class        : LAS (LASF v1.2)
point format : 1
memory       : 3.7 Gb 
extent       :481000, 482000, 5456000, 5457000 (xmin, xmax, ymin, ymax)
coord. ref.  : +proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
area         : 1 km²
points       : 47.36 million points
density      : 47.37 points/m²
names        : X Y Z gpstime Intensity ReturnNumber NumberOfReturns ScanDirectionFlag EdgeOfFlightline Classification Synthetic_flag Keypoint_flag Withheld_flag ScanAngleRank UserData PointSourceID 
File signature:           LASF 
File source ID:           0 
Global encoding:
 - GPS Time Type: Standard GPS Time 
 - Synthetic Return Numbers: no 
 - Well Know Text: CRS is GeoTIFF 
 - Aggregate Model: false 
Project ID - GUID:        00000000-0000-0000-0000-000000000000 
Version:                  1.2
System identifier:        LAStools (c) by rapidlasso GmbH 
Generating software:      las2las (version 181119) 
File creation d/y:        7/2019
header size:              227 
Offset to point data:     323 
Num. var. length record:  1 
Point data format:        1 
Point data record length: 28 
Num. of point records:    47360009 
Num. of points by return: 33908912 9530523 3165826 660943 85597 
Scale factor X Y Z:       0.01 0.01 0.01 
Offset X Y Z:             4e+05 5e+06 0 
min X Y Z:                481000 5456000 -397.71 
max X Y Z:                482000 5457000 308.65 
Variable length records: 
   Variable length record 1 of 1 
       Description: by LAStools of rapidlasso GmbH 
       Tags:
          Key 1024 value 1 
          Key 3072 value 3157 
          Key 3076 value 9001 
          Key 4099 value 9001 

Of particular interest is the projected coordinate system and point density.

Now let's inspect the classes

sort(unique(las@data$Classification))

[1] 1 2 3 5 6 7 9

From this, we can see which classes are missing from this las tile. An inspection of the City of Vancouver LiDAR classification and the ASPRS classification specifications shows that the classes are not aligned:

The City classes:

City Classes

The ASPRS class specifications

ASPRS Classes

This is unusual, so let's take a look at the the classified point cloud data to see what is going on.

plot(las, color = "Classification")

Classified Point Cloud

We can select individual classes to inspect them closer

las_class <- lasfilter(las, Classification == 5)
plot(las_class)

Class 5 Trees

After inspecting all of the classes, it appears as if the LiDAR tiles are in fact classified to ASPRS classification standards. However, when observing class 5 (High Vegetation), it became apparent that there were several outliers we will need to remove.

Outliers

Filtering point cloud data

Here we are going to filter out all of the classes except for our classes of interest

las <- readLAS(data, filter="-keep_class 2 5") # Keep high vegetation and ground point classes`

Then, normalize the data so that ground points are centered on 0.

dtm <- grid_terrain(las, algorithm = knnidw(k = 8, p = 2))
las_normalized <- lasnormalize(las, dtm)

There is an excellent example of using a filter to remove points above the 95th percentile of height in the lidR documentation.. This is how we implement the filter:

# Create a filter to remove points above 95th percentile of height
lasfilternoise = function(las, sensitivity)
{
  p95 <- grid_metrics(las, ~quantile(Z, probs = 0.95), 10)
  las <- lasmergespatial(las, p95, "p95")
  las <- lasfilter(las, Z < p95*sensitivity)
  las$p95 <- NULL
  return(las)
}

las_denoised <- lasfilternoise(las_normalized, sensitivity = 1.2)

You can see the filter does a good job removing most outliers

Before filtering

Noisy Las

After filtering

Denoised Las

Generating a canopy height model

Now that we have classes isolated and outliers filtered we can generate a canopy height model (CHM), which will be the basis for segmenting and classifying our trees. It is important to note that we are primarily interested in surface characteristics of the tree canopy. Therefore, it is not necessary to ensure that the points are uniformly distributed as would be the case in analyses where vertical point distribution is important such as grid metrics.

There have been several good tutorials on generating perfect canopy height models, incuding this from the authors of lidR and this from Martin Isenburg.

We are going to use a pitfree CHM generated in lidR.

chm <- grid_canopy(las_denoised, 0.5, pitfree(c(0,2,5,10,15), c(3,1.5), subcircle = 0.2))

lidR provides a nice way to visualize raster elevation data in 3D.

plot_dtm3d(chm)

3D CHM

Our objective is to generate polygons of individual tree canopies (hulls). Often times it is helpful to apply a 3x3 or 5x5 median filter to help smooth the canopy height model prior to tree detection. Applying a median filter helps define the boundary of the tree canopy and can lead to better results when delineating individual trees, especially in areas with a closed canopy.

Here a single 5x5 moving window is used to apply a median filter:

ker <- matrix(1,5,5)
chm_s <- focal(chm, w = ker, fun = median)

Individual tree detection

We are going to use a watershed algorithm for the tree detection with a height threshold of 4m

algo <- watershed(chm_s, th = 4)
las_watershed  <- lastrees(las_denoised, algo)

# remove points that are not assigned to a tree
trees <- lasfilter(las_watershed, !is.na(treeID))

# View the results
plot(trees, color = "treeID", colorPalette = pastel.colors(100))

las trees

Great! An initial inspection of the tree segmentation shows positive results--time to delineate tree canopies.

hulls  <- tree_hulls(trees, type = "concave", concavity = 2, func = .stdmetrics)

The individual tree canopy polygons (hulls) appear to look great.

tree hulls

An added bonus is that we also summarized point cloud metrics within each polygon when we included func = .stdmetrics in the tree_hulls function. This allows us to do many thing such as quickly apply statistical filters, classify trees using machine learning approaches, and visualize individual tree attributes.

For example, the following map shows the maximum height (zmax) within each tree hull.

tree hulls zmax

Summary

Every LiDAR based project will be different--point cloud data may range from a csv file of xyz coords to fully preprocessed and classified las data. However, the fundamentals of how we approach the project remain constant. Before any analysis is conducted it is necessary to thoroughly identify abnormalities and errors in the LiDAR dataset and how these will effect the analysis. For example,

Tree segmentation and delineation accuracy will vary based on forest cover type. Generally speaking tree segmentation in open conifer forests will yield higher accuracy than mixed, closed canopy forest types. Hastings et al. (2020) provides a useful accuracy assessment of individual tree crown delineation (ITCD) algorithms in temporate forests.