GIS-ential Knowledge with Austin Jennings: GIS4930 - Data Quality

We kicked off the Special Topics in GIS course with a lab focused on data accuracy and precision, two important concepts for evaluating GPS data quality. In this exercise, we worked with a set of 50 waypoints collected using a handheld GPS unit. The goal was to figure out how tightly the points clustered together (precision) and how close they were to a known reference point (accuracy). Using ArcGIS Pro, we calculated both horizontal and vertical precision, created buffer zones to visualize the spread of the data, and compared our results to the reference point. To wrap up, we explored error metrics and built a cumulative distribution function (CDF) to get a clearer picture of how GPS errors are distributed across the dataset.

Horizontal Accuracy: 3.24 m
Horizontal Precision (68%): 4.5 m

Horizontal precision is a measure of how closely repeated GPS measurements align with one another. In this lab, it was calculated as the distance within which 68% of the collected waypoints fall.

Horizontal accuracy measures how close the average GPS location is to a known reference point. This was determined by comparing the average waypoint location to a surveyed reference point.

In simple terms:

Precision = consistency (how tightly grouped the points are)
Accuracy = closeness to the “true” location

After finishing our GIS work in ArcGIS Pro, we switched to Excel to calculate detailed error metrics for the GPS points. Using formulas, we compared each collected point to a benchmark location and calculated the distance error for every point. From these values, we determined statistics like the minimum, maximum, mean, and percentiles.

One key metric we calculated was the Root Mean Square Error (RMSE), which is a single number representing the typical error across the whole dataset. In this lab, our RMSE was 3.06 meters, meaning that, on average, the GPS points were about three meters away from the benchmark location. RMSE is especially useful for quickly evaluating and comparing the overall quality of GPS data.

The Cumulative Distribution Function (CDF) scatter plot can be used to visually determine data metrics, such as the minimum, median, and maximum error, as well as percentiles (68%, 90%, and 95%). However, it cannot be used to directly determine the mean or RMSE.

The CDF provides a clearer picture of the dataset than individual metrics alone, as it shows how all the errors are distributed. It helps identify whether most GPS points are clustered together or if there are a few outliers with significantly higher errors, extending the line out into a “tail.” This insight goes beyond summary numbers and helps in evaluating the overall quality of the GPS data.

GIS-ential Knowledge with Austin Jennings

Wednesday, September 3, 2025

GIS4930 - Data Quality - Fundamentals

No comments:

Post a Comment

GIS Portfolio