Usage Instructions and Documentation
Introduction
This notebook documents the Traffic Density Index (TDI) analysis workflow.The process uses medium spatial-resolution PlanetScope imagery to detect image pixels likely to contain vehicles. This work was heavily inspired by the following paper:
- Chen, Y., Qin, R., Zhang, G., & Albanwan, H. (2021). Spatial Temporal Analysis of Traffic Patterns during the COVID-19 Epidemic by Vehicle Detection Using Planet Remote-Sensing Satellite Images. Remote Sensing, 13(2), Article 2. https://doi.org/10.3390/rs13020208
notebooks/tdi
folder in this repository:
The
skywatch-api-notebook
is run first, and is used to query and download PlanetScope images for a desired area-of-interest (AOI) and time-of-interest (TOI).
Once the images have been downloaded, the tdi-notebook
is run to generate the TDI outputs.
Python Environment Setup
Before doing anything, the Python environment needs to be set up. The recommended method to build the environment is to use Anaconda (use of Mamba is strongly recommended as it is much faster at building the environment than regular Anaconda).To build the environment:
- Open a new terminal in the
notebooks
folder - Using mamba, enter the following command into the terminal:
mamba env create --name wb-spk-env --file environment.yml
mamba activate wb-spk-notebook-env
Notebook 1: skywatch-api-notebook
Initial Setup
There is some initial work required by the user prior to running the notebook. This initial work includes:
- Retrieve an API key from the Skywatch Earthcache website.
- Create a polygon file which outlines the image download AOI.
- This can be created in any standard desktop GIS program (i.e. QGIS/ArcGIS) and must be in a format readable by the geopandas package (i.e. geopackage, shapefile, geojson).
- The Earthcache API allows retrieved images to be cropped to complex geometry, and this is a good way to reduce image download costs by only selecting the areas needed. However, it is recommended to include some buffer around the areas of interest to avoid missing important data due to image georegistration shifts.
- Note that for the Beitbridge study AOI, a geometry file has been created already, located here.
Input Query Parameters
Open Jupyter Lab by typingjupyter lab
into the terminal. Inside Jupyter Lab, navigate to and open skywatch-api-notebook.ipynb
.
At the top of the notebook (2nd code cell), specify the following variables:
- Put your Earthcache API key in the
api_key
variable. - Set the
aoi_name
variable. This is the name of the area of interest, and will be appended to the file names for various files output by the program. - Set the
o_start_date
variable to the starting date for the image download query. Formatted as a string in the format"YYYY-MM-DD"
. - Set the
o_end_date
variable to the ending date for the image download query. Formatted as a string in the format"YYYY-MM-DD"
. - Set the
cc_thresh
variable to the desired cloud cover threshold to use for the query. Images with a cloud cover percentage exceeding this threshold will be omitted from the returned query results.- Note: prior to image download it is possible to use a more stringent cloud cover threshold. Therefore it is fine to use a higher value in
cc_thresh
for the initial query.
- Note: prior to image download it is possible to use a more stringent cloud cover threshold. Therefore it is fine to use a higher value in
- Set the
coverage_thresh
variable to the desired aoi coverage percentage. Images which cover less of the AOI than the specified threshold will be omitted from the returned query results.- Note: at a later stage in the notebook it is possible to use a more stringent aoi coverage threshold. Therefore it is fine to use a higher value in
coverage_thresh
for the initial query.
- Note: at a later stage in the notebook it is possible to use a more stringent aoi coverage threshold. Therefore it is fine to use a higher value in
- Set
output_graphs
toTrue
to output files for the graphs generated in Stage 3. Otherwise keep the default setting ofFalse
- Set the
aoi_file
variable to the path to the geometry file created in the Initial Setup section. - Set the
out_path
variable. All outputs from the program will be nested under this directory. - If desired, change other variables including the
outpath_search_df
,outpath_download_df
, andoutpath_images
, or leave as the default settings.
Stage 1: Search API
The first stage of the notebook queries the Earthcache API using the specified AOI and TOI and returns the search results.For larger time periods, the API may not return full results. To compensate for this, time intervals longer than 90 days are split into separate queries. At the end of the process, the separate queries are put back together.
The image results are stored in a pandas dataframe. This dataframe is then serialized into a pickle file, allowing the results to be retrieved if the notebook is shut down and resumed later.
Stage 2: Filter Query DataFrame
In the first stage, all image results which cover at least 50% of the AOI and are within the TOI are returned. This includes images with up to 100% cloud cover, and may include multiple images on a specific date.
In stage 2, the results are filtered down based on the AOI coverage and cloud coverage thresholds specified in the coverage_thresh
and cc_thresh
variables.
Finally, if multiple images are present on a given day, a selection function is applied to pick the “best” image for the day. The selection criteria is as follows:
- If any of the images have significantly lower AOI coverage than the others, they are removed.
- If there are still multiple images remaining, the program checks if any of the images are from the Dove-R or SuperDove satellites, as these are superior to the original Dove-Classic satellites. If this is the case, then any Dove-Classic scenes are removed.
- If there are still multiple images remaining, then the program simply takes the one with the highest AOI coverage.
The filtered dataframe is once again serialized into a pickle file, allowing the notebook to be shut down and resumed later.
Stage 3: Graph Generation
In this stage, a number of interactive graphs may be generated to allow the user to explore and visualize the returned query results.
For example, the below figure displays the number of images per year for a test AOI at different levels of cloud cover, along with the estimated cost to download all the images.
While this stage is optional, it is helpful to understand the query results, and can be informative for the final image selection criteria to be used for image downloading.
Stage 4: Download Imagery
The last stage of this notebook is used to download the images. It is possible to specify different download criteria for downloading images than used in the original query. For instance, a narrower date range can be specified, a more strict cloud cover or aoi coverage threshold can be used, or only select sensor types can be downloaded. Otherwise, the original query criteria can also be used for downloading the images. The following parameters can be set by the user in Stage 4:
start_date_download
: Starting date for image downloading. Formatted as a string in the format"YYYY-MM-DD"
. If set toNone
, the value foro_start_date
specified at the beginning of the notebook is used.end_date_download
: Ending date for image downloading. Formatted as a string in the format"YYYY-MM-DD"
. If set toNone
, the value foro_end_date
specified at the beginning of the notebook is used.max_cc_download
: Cloud cover percentage threshold for image downloading. If set toNone
, the value forcc_thresh
specified at the beginning of the notebook is used.aoi_coverage_download
: Image aoi percent coverage threshold for image downloading. If set toNone
, the value forcoverage_thresh
set at the beginning of the notebook is used.sat_list
: List of satellite types to download. Options are[dove-c, dove-r, superdove]
. If set toNone
, all 3 types are selected.
Once the final selection criteria is specified, the imdage dataframe is filtered down, and the program displays the total number of images to download and the total cost as shown below:
Total number of images to download: 90. Total cost: $676.67 USD
Images are downloaded using the API by setting up image download “pipelines”. Each image will create a unique pipeline, which will show up in the Earthcache online dashboard. The pipeline kicks off image retrieval and processing, which is required before the images can be downloaded. Note that once a pipeline is created, you will be billed for the images, even if you do not download them locally.
Image download proceeds as follows:
- Image download pipelines are created for each image in the filtered dataframe. The IDs for each pipeline are appended to the dataframe.
- The dataframe with the appended pipeline IDs is saved to a pickle file, allowing it to be retrieved in case the notebook session terminates.
- Each pipeline may take up to an hour to finish processing. The program will loop through the pipelines and check their status. If some pipelines are still processing, the program waits for up to 30 minutes then repeats the process. This loop repeats using progressively smaller intervals until all pipelines achieve a “completed” status.
- Once all image pipelines are finished processing, the next step downloads the images locally. Both the image file and associated metadata are downloaded and placed in a directory named after the image ID.
The first notebook is now finished, and you may move on to the second notebook to generate the TDI outputs.
File Outputs
This section provides some examples of folders and files output by the program. What is illustrated here assumes the subfolders outpath_search_df
, outpath_download_df
, and outpath_images
are left to their default settings.
out_path │ └───search_queries │ │ │ └───{aoi_name}_{o_start_date}_to_{o_end_date}_search_df.pkl │ │ Serialized file containing image search dataframe │ │ │ └───figures │ │ │ └───PlanetScope_Annual_Image_Count_and_Cost_{start_year}_to_{end_year}_{aoi_name}.html │ │ Interactive plot │ └───PlanetScope_Annual_Image_Count_and_Cost_{start_year}_to_{end_year}_{aoi_name}.png │ Static plot │ └───download_dataframes │ │ │ └───{aoi_name}_{start_date_download}_to_{end_date_download}_download_df.pkl │ Serialized file containing image download dataframe │ └───images │ └───{aoi_name}_{start_date_download}_to_{end_date_download} │ └───{Image_ID_1} │ └───{Image_ID_1}_analytic.tif │ │ Image file │ └───{Image_ID_1}_metadata.json │ Image metadata file │ └───{Image_ID_2} └───{Image_ID_2}_analytic.tif └───{Image_ID_2}_metadata.json ...
Notebook 2: tdi-notebook
Initial Setup
As with Notebook 1, there is some initial work required prior to running Notebook 2.Firstly, if images were downloaded in multiple batches (i.e. multiple date ranges were used), then all downloaded images must be placed under a common directory. It is alright if images are nested in various sub-directories, as long as all of the sub-directories are located under a common base directory. For example:
images │ └───query_1 │ │ │ └───q1_subfolder_1 │ │ └───q1_image1.tif │ └───q1_subfolder_2 │ │ └───q1_image2.tif │ │ ... │ ... │ └───query_2 │ │ │ └───q2_subfolder_1 │ │ └───q2_image1.tif │ └───q2_subfolder_2 │ │ └───q2_image2.tif │ │ ... │ ... ...
Secondly, the pickle files containing the image download dataframes created in Stage 3 of Notebook 1 must be in a common directory together. These contain the image information corresponding to your downloaded images. For example:
download_dataframes │ └───query_1_dataframe.pkl │ └───query_2_dataframe.pkl │ └───query_3_dataframe.pklLastly, a geometry polygon file must be created which outlines the roads and parking areas in the AOI within which the vehicles will be counted. Unlike the polygon file used for image downloading, this geometry file should be more narrow and specific to the desired detection areas, to reduce the issue of false positives.
For the Beitbridge study area, an example geometry file has been created already, located here.
Setting Run Parameters
In Jupyter Lab, navigate to and opentdi-notebook.ipynb
.
At the top of the notebook (2nd code cell), the run variables may be specified. Some must be set by the user, while others can be left set to the default settings.
aoi_name
: Name of the area of interest. Should match the AOI name used in Notebook 1. This name will be appended to certain files output by the program. This must be set by the user.out_path
: Path to the top-level directory for the project. All other folders and files will be nested under this folder. This must be set by the user.road_geom_file
: Path to the road and parking area geometry file. This must be set be the user.img_path
: Path to the directory containing the image files. By default it is set to a folder namedimages
which is nested under theout_path
directory. Leave set to default or change if desired.img_df_path
: Path to the directory containing the image download dataframes. By default it is set to a folder nameddownload_dataframes
which is nested under theout_path
directory. Leave set to default or change if desired.outpath_aligned
: Path where the co-aligned images in Stage 1 will be placed. The default path is a folder calledprocessed_images
which is nested underout_path
. Leave set to default or change if desired.outpath_tdi
: Directory where the generated TDI outputs will be placed. The default path is a folder calledtdi_outputs
which is nested underout_path
. Leave set to default or change if desired.band_subset
: Determines whether the vehicle detection algorithm uses all image bands or just the visible Blue, Green, Red, and the Near-Infrared band (BGRN). Options areallbands
to use all of them, or4band
to use only the BGRN bands. Note that this option only impacts 8-band SuperDove images. Dove-Classic and Dove-R sensors only have 4 bands, so both options are equivalent. If using images from multiple sensor types, set this option to4band
for consistency. Default setting isallbands
.kernel_size
: Size (in pixels) of the kernal used by the tophat filtering process. Options are3
,5
, and7
. Default value is7
. Using a smaller kernel size reduces the size of the detected objects, which may reduce false positive detections at the cost of reducing true positives as well. Leave set to default or change if desired.sieve_thresh
: Minimum object size (in pxels). Groupings of vehicle detection pixels smaller than the threshold are removed. Useful to reduce noise in the detection layers. Leave set to default or change if desired.min_thresh
: Minimum threshold to apply to tophat filter layers for vehicle detection. During the vehicle detection stage, the program applies iteratively smaller thresholds to the tophat filtered layers until it reaches the threshold value specified here. Default value is0.015
. Using a larger threshold will reduce both true-positive and false-positive detections, while using a smaller threshold will have the opposite effect. Default value is0.015
. Leave set to default or change if desired.
Stage 1: Align Images
In the first stage of the process, the PlanetScope images are all resampled so their pixels all line up through time. It was discovered through examining the images that the data downloaded directly from the Skywatch API can have small pixel misalignments from image to image.This analysis workflow requires all pixels from each image to be perfectly aligned, which is what this stage accomplishes.
The aligned images are placed in the folder specified in the outpath_aligned
variable.
Stage 2: Computing Reference Median Image
In the second stage of the program, a reference image is created. This reference image is used for vehicle detection in Stage 3.
The reference image
is the pixel-wise and band-wise median calculated from the full Planetscope image time series. The result from this process is a synthetic image with the same number of pixels and bands as the original images, where each pixel contains the median band values for that pixel location over the entire time series.
The median reference image
is used to represent a “vehicle-free” view of the roads and parking lots in the AOI. Since vehicles are mobile assets, their locations are expected to differ from image to image. Therefore over a long enough time series, the median value should represent the invariant features in the images, i.e. the underlying pavement. Please refer to the paper cited in the Introduction for more details.
Important Caveat: Since the assumption for the reference image
is that non-vehicle pixels are unchanging, any major, long-term changes to the roads and parking areas (such as construction) may lead to this assumption being violated. Therefore it is recommended to limit the range of dates covered by the images in any given run, such as processing images in batches of 1 year or 6 month intervals.
Stage 3: Tophat Filtering, Vehicle Detection, & Computing TDI
In the final stage of the process, the image analysis and vehicle detection is performed, and the TDI is calculated.
The core of the process is based on subtracting a target image from the generated median reference image
, then applying a multi-directional tophat filter to extract small objects from the calculated difference image. To learn more about the process, refer to the paper cited in the Introduction, or see here.
The program iterates through all images in the time series, applying the following processes:
- Each image band is subtracted from the corresponding band in the
reference image
. This creates thedifference image
. - A multi-directional tophat filter is applied to each image band in the
difference image
. This creates themulti-band tophat image
. - A
single band tophat image
is generated by taking the maximum value across all bands in themulti-band tophat image
. - A thresholding method is applied to the
single band tophat image
, creating thevehicle detection image
. Pixels with a tophat value exceeding the threshold are given a value of1
in thevehicle detection image
, indicating a detection, while pixels with tophat values less than the threshold are given a value of0
, indicating no detection. Themin_thresh
variable determines the sensitivity of the vehicle detection thresholding. - Groupings of detection pixels in the
vehicle detection image
assigned a value of1
are sieved. Detection pixels bordering other detection pixels are grouped together into objects, and the number of pixels in each are counted. Objects containing less pixels than the value specified in thesieve_thresh
variable are removed, with all pixels being reassigned a value of0
. - The
vehicle detection image
is masked using theroad_geom_file
polygons. Any pixels outside the polygons are masked out and assigned a value of0
. We will refer to this image as themasked detection image
. - The
Traffic Density Index (TDI)
is calculated from themasked detection image
as follows:
\[ TDI_i = {{N_{(v, i)}\over N_{(r, i)}} * S} \]
Where \({N_{(v, i)}}\) is the total number of detection pixels in themasked detection image
for PlanetScope image \(i\), \({N_{(v, i)}}\) is the total number of unmasked pixels for PlanetScope image \(i\), and \(S\) is a scale factor to avoid numerically small TDI
values (currently set to \(S = 100\)).
After the TDI
has been calculated for all images, the TDI values are appended to the image dataframe, and the dataframe is output as a CSV file which contains image information as well as the corresponding TDI values for each.
As a final step in the process, a time series graph showing TDI
over time can also be generated.
File Outputs
This section provides some examples of folders and files output by the program. What is illustrated here assumes the output subfolder names are left as their default path.
out_path │ └───processed_images │ │ │ └───{Image_ID_1}_aligned.tif │ │ Co-aligned image file │ │ │ └───{Image_ID_2}_aligned.tif │ │ │ └───... │ │ │ └───median_image.tif │ Reference median image file │ └───tdi_outputs │ └───{aoi_name}_TDI_DF_{first_image_date}_to_{last_image_date}.csv │ CSV file containing image information and corresponding TDI values │ └───{aoi_name}_TDI_Chart_{first_image_date}_to_{last_image_date}.html │ Interactive TDI time series plot │ └───{aoi_name}_TDI_Chart_{first_image_date}_to_{last_image_date}.png │ Static TDI time series plot │ └───tophat_rasters │ │ │ └───{Image_ID_1}_vehicle_detection_raster.tif │ │ Tophat filter raster │ │ │ └───{Image_ID_2}_vehicle_detection_raster.tif │ │ │ └───... │ └───detection_rasters │ └───{Image_ID_1}_vehicle_detection_raster.tif │ Masked vehicle detection raster │ └───{Image_ID_2}_vehicle_detection_raster.tif │ └───...
A small set of demonstration files are included in this repository in the following directory: data/tdi_demo_files
.