SIMILARITY ASSESSMENT OF LINEAR HYDROGRAPHIC FEATURES USING HIGH PERFORMANCE COMPUTING
L. V. Stanislawski, J. Wendel, E. Shavers
U.S. Geological Survey, Center of Excellence for Geospatial Information Science, 1400 Independence Road, Rolla Missouri, 65401, United States (lstan, jwendel, eshavers)@usgs.gov
University of Illinois at Urbana-Champaign, CyberGIS Center, 2046 Natural History Building M-C 150, 1301 W. Green Street, Urbana, Illinois 61801, United States firstname.lastname@example.org
KEY WORDS: line similarity assessment, hydrography, coefficient of line correspondence, National Hydrography Dataset, NHD
This work discusses a current open source implementation of a line similarity assessment workflow to compare elevation-derived drainage lines with the high-resolution National Hydrography Dataset (NHD) surface-water flow network. The process identifies matching and mismatching lines in each dataset to help focus subsequent validation procedures to areas of the NHD that more critically need updates.
The National Hydrography Dataset (NHD) is a comprehensive vector dataset of surface-water features within the United States (U.S. Geological Survey 2000) that is often used to support hydrologic and watershed research (Liu et al., 2016, Maceyka and Hansen 2016, Schneider et al. 2017, Sheng et al. 2007, Vanderhoof et al. 2017, Wu and Lane 2017). Two NHD datasets are available. The medium-resolution NHD, compiled from 1:100,000-scale source data, is a legacy dataset that is no longer maintained by the U.S. Geological Survey. The high-resolution NHD (NHD HR) is compiled from 1:24,000-scale or larger source data, and is the most up-to-date and detailed hydrography dataset for the nation (U.S. Geological Survey, 2018). To update and enhance the usability of the NHD, multiple efforts are in progress to improve the accuracy of the NHD HR. For instance, drainage lines or other hydrographic features may be derived from a recent digital elevation model (DEM), lidar, or other data (Stanislawski et al. 2016, Vanderhoof et al. 2017, Poppenga et al. 2013) to update the content of NHD HR.
Newly derived hydrographic feature content must be validated through comparisons with existing data or other independent data sources, such as high resolution photography, images, or field data. This paper describes an automated process for identifying matching and mismatching lines between two sets of lines representing similar features. In this case, elevation-derived drainage lines are compared with linear features in the surface-water flow network of NHD HR. The process identifies the matching and mismatching lines in each dataset, thereby furnishing locations where subsequent verification efforts may be focused and NHD content may be improved.
The process, first employed by Stanislawski and others (2016), furnishes the Coefficient of Line Correspondence (CLC) similarity metric for each set of tested lines. It is typically employed on HUC8 subbasin datasets, which include roughly 10,000 to 100,000 linear features. Early versions of the process were implemented through ArcGIS® tools with Python, which required between 10 minutes to an hour per subbasin, depending on the subbasin size and machine capacity. Given more than 2000 subbasins in the United States, this is a rather cumbersome task. This paper describes a new implementation of this workflow using Python, C, and open source tools with parallel processing within a Linux high-performance computing environment.
In brief, the CLC computation process involves the following steps: 1) computation of a line-density raster dataset for each of the two sets of lines being compared, 2) differencing the two line-density raster datasets and computing a confidence interval for cells not significantly different than zero (i.e. matching cells), 3) including matching cells within polygonal water features having more than a threshold of lines from both datasets, and 4) identifying the parts of the features that are within the matching or mismatching cells. Cell size depends on the accuracy of the input lines, and is calculated as twice the positional accuracy of the lines. The CLC is computed as the sum of the length of all matching lines from both datasets, divided by the sum of the length of all lines in both datasets.
The line-density rasterization process is completed with a custom C program that also uses input, output, and projection operations from the Open Source Geospatial Foundation (OSGeo) Geospatial Data Abstraction Library (GDAL, http://gdal.org) through the C application programming interface. Input vector data is stored in shapefiles. The program uses a circle, centered on each cell, to compute the density of lines for each associated cell. The radius of the circle is specified by the user to control the degree of smoothness in the resulting raster. The remainder of computations are completed through custom Python procedures assisted by GDAL and other open source utilities (Shapely, Numpy, Skimage) accessed through Python bindings. Links to C/C++/Python programming libraries are set through a virtual environment. A line tracking algorithm, designed to identify segments of lines that intersect raster cells, determines the parts of linear features passing through matching and mismatching cells.
Software is deployed on a 12-node Linux cluster, with each node having 20 processing cores and 128 gigabytes of RAM. Rapid access to file storage is furnished through a parallel shared Lustre file system on a high-speed Infiniband network. On this system, the open source version of the CLC program has completed individual subbasin processing in less than two minutes using a single processing core. Simultaneous processing of up to 240 subbasin datasets, each on a single core, is being tested. Computational resources (nodes, processors, memory, processing time, etc.) and job execution are managed through the Slurm Workload Manager.
Liu, Y.Y., D.R. Maidment, D. G. Tarboton, X. Zheng, A. Yildirim, N.S. Sazib, and S. Wang., 2016. A CyberGIS approach to generating high-resolution height above nearest drainage (HAND) raster for national flood mapping. The Third International Conference on CyberGIS and Geospatial Data Science. July 26–28, 2016, Urbana, Illinois. http://dx.doi.org/10.13140/RG.2.2.24234.41925/1
Maceyka, A., W.F. Hansen., 2016. Enhancing hydrologic mapping using lidar and high resolution aerial photos on the Francis Marion National Forest in coastal South Carolina. In: Stringer, Christina E.; Krauss, Ken W.; Latimer, James S., eds. 2016. Headwaters to estuaries: advances in watershed science and management—Proceedings of the Fifth Interagency Conference on Research in the Watersheds. March 2-5, 2015, North Charleston, South Carolina. e-Gen. Tech. Rep. SRS-211. Asheville, NC: U.S. Department of Agriculture Forest Service, Southern Research Station. 302 p.
Poppenga, S.K., D.B. Gesch, and B.B. Worstell., 2013. Hydrography change detection: The usefulness of surface channels derived from LiDAR DEMS for updating mapped hydrography. Journal of the American Water Resources Association 49(2): 371–389. doi:10.1111/jawr.12027.
Schneider, A., A. Jost, C. Coulon, M. Silvestre, S. Théry, and A. Ducharne., 2017. Global-scale river network extraction based on high-resolution topography and constrained by lithology, climate, slope, and observed drainage density, Geophys. Res. Lett., 44: 2773–2781, doi:10.1002/2016GL071844.
Sheng, J., J. P. Wilson, N. Chen, J. S. Devinny, and J. M. Sayre., 2007. Evaluating the quality of the National Hydrography Dataset for watershed assessments in metropolitan regions. GIScience & Remote Sensing 44 (3): 283–304. doi:10.2747/1548-1603.44.3.283.
Stanislawski, L.V., B.P. Buttenfield, and A. Doumbouya., 2015. A rapid approach for automated comparison of independently derived stream networks. Cartography and Geographic Information Science 42(5): 435-448, doi:10.1080/15230406.2015.1060869.
U.S. Geological Survey, 2000. The National Hydrography Dataset: Concepts and Contents (February 2000). United States Geological Survey. https://nhd.usgs.gov/chapter1/chp1_data_users_guide.pdf, last accessed March 1, 2018.
U.S. Geological Survey, 2018. Hydrography: NHDPlus High Resolution, National Hydrography Dataset, Watershed Boundary Dataset. United States Geological Survey. https://nhd.usgs.gov/index.html, last accessed March 1, 2018.
Vanderhoof, M.K., H.E. Distler, D.T.G. Mendiola, and M. Lang., 2017. Integrating Radarsat-2, Lidar, and Worldview-3 imagery to maximize detection of forested inundation extent in Delmarva Peninsula, USA. Remote Sensing 2017, 9, 105; doi:10.3390/rs9020105.
Wu, Q, and C.R. Lane., 2017. Delineating wetland catchments and modelling hydrologic connectivity using lidar and aerial imagery. Hydrology and Earth System Sciences, 21, 3579–3595, 2017, https://doi.org/10.5194/hess-21-3579-2017.