Are you interested in taking your counting project to the next level? Use TRAFx Insights to help you get there.
Using big data to understand trail use: Strava
Version July 27, 2023, Jacob Herrero (MEDes.), TRAFx Research
In this updated article I introduce two Strava Inc. software tools relevant to those who plan, manage or advocate for trails and trail systems. Strava Global Heatmap is free to all, and Strava Metro is free to qualified organizations. First, I explore the opportunities and limitations associated with Global Heatmap, using trails in Canmore, Canada, as an example, while recognizing its primary purpose is to help people find popular cycling, running and other routes. I then introduce Strava Metro, a tool that explicitly targets the planning needs of government organizations.
I wrote the first version of this article in 2016, soon after Strava Global Heatmap and Strava Metro were released. This updated 2023 version includes some key insights researchers and others have gained since then about Strava data. Particularly, over the intervening years, researchers and practitioners have more thoroughly identified limitations and biases associated with Strava data when used for planning in the public domain. Further, despite Strava providing a good “no GIS…no technical experience required” user interface for Strava Metro, it is increasingly clear that a solid understanding of data science — and indeed social science — is required to avoid misleading conclusions. Overall, Strava data should be used with greater care and caution than originally thought, despite the total number of Strava GPS data points increasing from 1 trillion to 13 trillion globally since 2016. Strava data has become impressively “big” — the true definition of big data — but not necessarily better for planning purposes.
In this article the term “trail” is interchangeable with path, route or track.
“One thing that the process made clear is that rather than replacing conventional bike data sources and count programs, big data sources like Strava and StreetLight actually make the old “small” data even more important.”
Strava product |
Identifying regional hotspots |
Identifying trail networks |
Choosing trails to count |
Trail use type
|
Counts |
Direction of travel |
Speed |
Origin, destination, route |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Strava Metro and trail counter data
Appendix 1 - Suggested reading
Crowdsourcing Bicycle Volumes (2014)
Exploring Data Fusion Techniques to Estimate Network-Wide Bicycle Volumes (2022)
Bias and precision of crowdsourced recreational activity data from Strava (2023)
Screenshots of the Strava app; Image source: Strava
“Big data makes very little sense without small data to benchmark against”
@abudhabichris, cyclist
Big data—billions of data points—for trail planning, management and advocacy purposes sounds both futuristic and unattainable. In reality it has been a potential option for over a decade through data tools that utilize crowdsourced GPS tracks.
These tools potentially show how people move over trail networks and large areas better than trail counters or surveys at fixed locations do. Strava is the 800-pound gorilla in this regard.
The objective of this article is two-fold:
(1) to introduce Strava Global Heatmap and Strava Metro, in the context of trails and trail networks, and
(2) to point readers to several noteworthy additional readings about Strava data and its use (see Appendix 1).
In short, this article is an introduction to Strava data and how it might be used by people and organizations who plan, manage or advocate for trail systems. The ultimate objective of this article is simply to 'get the information out there' so others can investigate and use Strava data if they wish. It is a start point, not an end point.
I used Strava Global Heatmap to explore trail use patterns in the Calgary-Canmore region to identify 'real world' opportunities and limitations associated with this tool. I particularly focussed on Canmore, a city of approximately 15,000 people, next to Banff National Park, which is both an outdoor recreation mecca (trail running, mountain and road biking, cross-country skiing, hiking, climbing, etc.) and a place where there is important wildlife habitat, including critical wildlife movement corridors for grizzly bears, wolves, elk, etc. I then explored the potential of Strava Metro, the Strava data service intended for government organizations.
Image source: Strava
First, let’s look at some maps generated using the free Strava Global Heatmap. I’ve simply taken snips of the heatmap from the Strava website, labeled them and pasted them into this document.
As we move through the various heatmaps keep in mind that Strava created Global Heatmap to help people who bike, run, nordic ski, etc. find popular routes, and its interface and options reflect this. It was not created for trail or land use planning purposes per se. Further, the maps are based on Strava users only.
In the heatmaps that follow, the wider, red lines indicate that the trail or path is more popular among Strava app users and blue lines less so.
A good place to start is with a 'big picture' view of Canada and the US. Map 1 shows that most Strava users are where the most people live in North America: the eastern and western portions of the US and Canada. The northernmost large red nodes on the map are the Edmonton and Calgary areas in Alberta, Canada. The area of focus for this article is Calgary / Canmore / Banff. Strava use is mainly correlated with population concentrations and interest in and opportunity for cycling and running.
Map 1 - Canada-US Heatmap (red higher, blue lower Strava app use)
As we look at various heatmaps, here are some points of caution regarding Strava Global Heatmap. The snip below is from the Strava Global Heatmap website in July 2023.
Further:
|
Bike |
road, mountain, gravel, e-bike, etc. |
|
Foot |
run, trail run, hike, walk, virtual run |
|
Water |
canoe, kayak, paddle board, swim, etc. |
|
Snow |
nordic, backcountry and downhill skiing; ice skating; snowshoeing |
For Map 1 above, and all other Strava Global Heatmaps that follow, I have used the Global Heatmap settings above unless otherwise mentioned. Particularly, “All” was selected for activity type. Note that each activity type map has its own heat scaling (e.g., a trail on the ‘Bike’ heatmap with 1000 Strava users could have equal brightness as a trail on the ‘Foot’ heatmap with 200 Strava users). Also, note that although the foot icon includes various foot-based activities such as running, hiking and walking, many more runners use the Strava app than do hikers and walkers.
Map 2 below presents a regional view of recreational trail and road use in the Calgary-Canmore region. On the map it is very easy to identify the regional recreation hotspots, based on Strava users.
Map 2 - Calgary-Banff Region Heatmap (red higher, blue lower Strava app use)
Having lived in this region for over 50 years, and being an avid trail user myself, my general impression is that the heatmap does a good job of identifying key regional recreation 'hotspots': Calgary, Canmore, Bragg Creek, and Banff. Zooming in on central Calgary (Map 3), the popular multi-use, shared paths along the Bow and Elbow rivers, as well as certain designated bicycle routes are visible. Calgary has one of the most extensive urban pathway and bikeway networks in North America; many paths parallel water features (rivers, reservoirs and creeks).
Map 3 - Heatmap: routes / paths / trails in the Calgary area (red higher, blue lower Strava app use)
The Bragg Creek area, with its extensive network of trails favored by mountain bikers also shows up well.
Map 4 - Heatmap: Bragg Creek area trail network (red higher, blue lower Strava app use)
Let’s now zoom in on the Canmore area (see Map 5) where I live. I know these trails very well.
The Canmore Nordic Centre’s extensive system of nordic skiing and mountain biking trails is a very obvious cluster, particularly the area near the day lodge. It makes sense that this area is associated with the most Strava app use; it is the outdoor recreation heart of Canmore.
The network of heavily-used multi-use mountain biking, running and walking trails on the periphery of Canmore are also obvious (Highline, G8, Horseshoe, and Montane trails). The very popular paved multi-use path (the Legacy Trail), used mainly by cyclists, connecting Canmore and Banff town is noticeable, too, and makes perfect sense.
Map 5 - Heatmap: trails in the Canmore area (red higher, blue lower Strava app use)
Some of the trails leading up the local peaks (Ha Ling and Lady MacDonald) are easily identified.
Recall, these heatmaps do not show total trail use; they only show the relative popularity of trails among Strava app users.
It is also possible to view the Strava Global Heatmap in 3D as shown below. Certain things become more obvious using the 3D view. For example, several pirate / illegal mountain bike trails within a designated wildlife corridor are more noticeable using 3D view. Importantly, the 3D view shows why these pirate / illegal trails are attractive to certain mountain bikers: they offer a long, steep, sustained descent from the official, designated trail above (Highline trail).
Map 6 is noteworthy: it gives the false impression that the Horseshoe trails receive more use than the popular paved urban trails and paths noted above in Map 6. Personal experience and trail counter data does not support this. The map above simply shows that the Horseshoe trails are more popular among Strava users than are the mellower paved urban trails and paths despite their high total use. This highlights a limitation of Strava GPS data.
Map 6 - Heatmap: trails in the Canmore area (red higher, blue lower Strava app use)
Overall, the six heatmaps above simply show where Strava use is relatively high and where it is not. It is not possible to infer total use with a heatmap alone; another source of data, such as trail counter data, would be required.
I find Map 7 below interesting. It shows which trails and routes are used by Strava users who nordic ski, ice skate and snowshoe. However, as a longtime local resident I know that people do not nordic ski, ice skate or snowshoe on the urban roads in Canmore, as shown below. What is going on? When Strava adds a new activity type (e.g., roller skiing, e-biking, etc.), some existing users still select their historical activity type (e.g., skiing not roller skiing; biking not e-biking) because switching to the new one messes up their fitness statistics or reclassifying past activities is tedious to do. Strava makes no effort to correct this for the heatmap and indeed it would be difficult to do so.
So, it is important to understand that Strava Global Heatmaps—and indeed Strava Metro data, too— can contain oddities and anomalies and should be used carefully; local knowledge is essential.
Map 7 - Heatmap: Strava winter activity types in the Canmore Area
Here are some of the potential opportunities Strava Global Heatmap offers in the context of trying to understand trail use better. It facilitates a 'quick and dirty' identification of:
And here are some of the Strava Global Heatmap limitations:
A 2015 quote from Brian Riordan, of Strava about Strava Global Heatmap are good words of warning.
Whereas Strava Global Heatmap is only suitable for very cursory, high-level purposes, Strava Metro’s rich data allows for more detailed analysis and insights including specific activity types (mountain bike, run, walk, hike, nordic ski, snowshoe, etc.), direction of travel, gender, age, trail network patterns, speed, duration, routes, etc.
Image source: Strava
Strava Metro is a free service for qualified organizations. It provides the aggregated, anonymized GPS data and other information associated with Strava users for a particular location, generally for a 24-month period.
Strava Metro was a paid service from 2014 to 2020 but in late 2020 Strava graciously made it free. Why would Strava do this? The below is from a 2021 Strava blog post.
It is worth highlighting that Strava mentions “both urban and rural”. Whereas Strava Metro data has historically been used in the urban domain, there is potential to use it in rural areas, too, though there are few such published examples. One note of caution about rural areas: Strava data sets might be significantly smaller than those from typical urban areas simply because there are fewer people, particularly fewer Strava users, and hence the risk of errors and inaccuracies related to small sample size are higher.
Assuming it is used properly, Strava Metro data is suitable for public planning purposes, and in early 2020 Strava stated that over 300 communities have used it.
Strava Metro data is generally a good match for larger networks of higher use:
Once Strava data has been validated (see next section), some of the potential opportunities Strava Metro offers includes:
What are some Strava Metro data limitations and cautions? Some of these include:
This section on Strava Metro barely scratches the surface. For further information, see Appendix 1 for several key readings regarding Strava Metro. Also, see Strava Metro’s Getting Started guide.
Because the percentage of Strava users varies from trail to trail and route to route (est. 1 to 20%), trail counters and/or surveys are necessary for validation and 'truthing' purposes, as was done by Griffin and Jiao (2014) in Austin, Texas and in many other studies since. That is, you need to mathematically establish that Strava user volumes correlate well with actual trail user volumes on a variety of trails. If not, faulty and inaccurate conclusions and decisions can arise.
To gain the confidence of the public and decision makers, it is necessary to deploy counters and/or conduct surveys at a sufficient number of locations, for a sufficient duration of time to 'truth' the Strava counts.
A counting program involving Strava Metro data might have these main aspects:
Data from the counting program can then be used with Strava data to determine:
As mentioned earlier, Strava Global Heatmap can potentially be used to assist in locating counters, though Strava Metro is even better for this.
Once Strava data has been validated and truthed on a sufficient number and variety of trails for a sufficient time period, it is then possible to attempt inferences about the larger trail network, keeping in mind the Strava data limitations and biases mentioned earlier. This potentially reduces the number of trail counters required, and their associated cost (e.g., purchase price, staff time, etc.). The savings could potentially mitigate or offset the indirect costs associated with Strava Metro data (e.g., the time to query, analyze and integrate Stava data, pay a data scientist, etc.).
An advantage of using Strava Metro data is that simple, lower cost volumetric trail traffic counters can potentially be used rather than higher cost ($1000 to $3000) specialty counters that classify (bike vs non-bike) and/or provide direction of travel data. That is, Strava Metro data provides a basis to estimate, for example, bike vs non-bike use, direction of travel, speed, etc., allowing simpler, lower cost volumetric counters to be considered.
In short, Strava Metro creates the potential to reduce overall program costs while providing richer and more complete data (use type, direction of travel, route selection, origin - destination, etc.), over a wider area.
It is also possible to use Strava data retroactively. That is, if there are existing counters and counter data, it is generally possible to use this with historic Strava data.
The data world is rapidly changing. Today, there are types of trail data that were unavailable in past years. Strava data, a type of crowdsourced, ‘big data’, is an excellent example. It is unlikely that any one source of trail data will replace all others. Instead, the different sources (crowdsourced GPS-based, trail counter, surveys, etc.) have the potential to complement and strengthen each other.
Because Strava data is based upon GPS tracks, it is particularly good at showing where Strava users go, and providing other rich information, and has potential to be used as a powerful lens for better understanding trail use in a particular area, provided its limitations are kept firmly in mind. In this article I suggested that Strava Heatmap works best at the very coarse level, and Strava Metro at the finer, more detailed level. Both work better when trail counter data and/or survey is available.
Whereas most published use of Strava Metro data to date has been in urban areas in the context of bicycle-related transportation planning, Strava Metro data has potential to be used in the context of trail networks, particularly larger ones used by people who run or ride (e.g., multi-use trail networks in urban and non-urban areas).
Strava app use, as a portion of total trail use, varies from trail to trail (est. 1 to 20%). For accuracy, and to gain the confidence of the public and decision makers, it is necessary to determine the percentage of Strava users on a suitable selection of trails. This can be done using lower cost volume-only trail traffic counters in conjunction with the Strava data, or by conducting surveys. The combination of Strava data, volumetric counter data and/or surveys opens the door to cost-effective user classification (bike, gender, etc.) and other insights over a large area.
There is potential, with Strava data, to reduce overall project costs while at the same time providing richer and more complete information for planning, management, and advocacy purposes over a much wider area. In Appendix 1 I list some key references for those wanting to learn more.
A search of Google Scholar with the key word “Strava Metro” currently (2023) yields 481 results. Below are three references that I found particularly helpful and insightful, and flag some of the key issues associated with Strava data. Again, this is simply a start point. For more references regarding Strava data, use Google Scholar or other similar services.
Whereas a number of cities have used Strava Metro data, in my opinion, the case of Austin, Texas is a good example to showcase because of its rigour and detailed analyses.
Full title, authors, and date
Crowdsourcing Bicycle Volumes: Exploring the role of volunteered geographic information and established monitoring methods; Griffin and Jiao; 2014.
Abstract
The recent interest in performance measures and new bicycle infrastructure development has triggered rapid advancements in monitoring methods for active transportation, but comprehensive monitoring programs for the bicycle mode are far from ubiquitous. This study evaluates the use of GPS survey data and a new crowdsourced volume dataset that may offer promise to extend the reach of limited counting programs. The authors integrated count data from five separate trail locations in Austin, Texas, with a previous survey using the CycleTracks smartphone application and a new data product derived from a larger-scale use of the Strava fitness application. New crowdsourced methods offer a prospect of expanding the relative time and geography of bicycle traffic monitoring, but do not currently offer many other attributes about trips obtainable from other methods. Further studies involving the combination of high-accuracy monitoring points with crowdsourced datasets may improve the efficiency of monitoring programs over large areas.
Some salient points
Some of the salient points, particularly about the use of counters and validation, are below.
This paper is a research tour de force by a team of PhDs. It includes very valuable and insightful information about crowd-sourced GPS data such as Strava as well as about conventional data sources (counters, surveys, etc.) and AI / machine learning options. If I had to select a single authoritative reference about big data (e.g., Strava), small data (e.g., trail counter data) and AI, this would be it. Importantly, the authors include a very comprehensive summary of available literature and some excellent summary tables regarding Strava data. Surprisingly, despite visiting the lofty realms of big data and AI one of the paper’s key findings is that rather than replacing conventional data sources (counts, surveys, etc.), “...big data sources like Strava and StreetLight actually make the old “small” data even more important.” This echoes the quote on page 3 of my article: “Big data makes very little sense without small data to benchmark against”.
Full title, authors, and date
Exploring Data Fusion Techniques to Estimate Network-Wide Bicycle Volumes; Kothuri, Broach, McNeil, Hyun, Mattingly, Miah, Nordback and Proulx; 2022
Abstract
This research developed a method for evaluating and integrating emerging sources (Strava, StreetLight, and Bikeshare) of bicycle activity data with conventional demand data (permanent counts, short-duration counts) using traditional (Poisson) and advanced machine learning techniques. First, a literature review was conducted, along with cataloging and evaluating available third-party data sources and existing applications. Next, six sites (Boulder, Charlotte, Dallas, Portland, Bend, and Eugene) that represented a variety of contexts (urban, suburban) and geographical diversity were selected. Of these, Boulder, Charlotte and Dallas constituted the basic sites, where one year of data (i.e., 2019) was used for modeling. Portland, Bend, and Eugene in Oregon were considered enhanced sites, where three years of data (2017-2019) were used for model estimation. Demographic, network, count and emerging data were gathered for these sites. Using these data, Poisson and Random Forest models were estimated. The model estimation process was designed to allow for comparison of the relative accuracy and value added by different data sources and modeling techniques. Three sets of models were specified – All City Pooled, Oregon Pooled and city-specific models. In general, the three data sources (static, Strava, and StreetLight) appeared to be complementary to one another; that is, adding any two data sources together tended to outperform each data source on its own. Low-volume sites proved challenging, with the best-performing models still demonstrating considerable prediction error. City-specific models generally displayed better model fit and prediction performance. Using Strava or StreetLight counts to predict annual average daily bicycle traffic (AADBT) without static adjustment variables increased expected prediction error by a factor of about 1.4 (i.e., a 40% increase in %RMSE). That rule of thumb figure of 1.4 times was only slightly lower when combining Strava plus StreetLight without static variables (1.3x). Tests of transferability showed that transferring the model specifications without reestimating the model parameters resulted in 10-50% increase in error rate across models. Performance of machine learning models was comparable to count models. The findings from this study indicate that rather than replacing conventional bike data sources and count programs, big data sources like Strava and StreetLight actually make the old “small” data even more important.
Some salient points
I feel this is an important paper because it highlights some Strava data weaknesses that might otherwise be overlooked, and it also considers Strava data in the context of general recreation, not just cycling.
Full title, authors, and date
Bias and precision of crowdsourced recreational activity data from Strava; Venter, Gundersen, Scott and Barton; 2023
Abstract
Recreational activity is the single most valuable ecosystem service in many developed countries with a range of benefits for public health. Crowdsourced recreational activity data is increasingly being adopted in management and monitoring of urban landscapes, however inherent biases in the data make it difficult to generalize patterns to the total population. We used in-situ observations and questionnaires to quantify accuracy in Strava data - a widely used outdoor activity monitoring app – in Oslo, Norway. The precision with which Strava data captured the spatial (R2 = 0.9) and temporal variation (R2 = 0.51) in observed recreational activity (cyclist and pedestrian) was relatively high for monthly time series during summer, although precision degraded at weekly and daily resolutions and during winter. Despite the precision, Strava exhibits significant biases relative to the total recreationist population. Strava activities represented 2.5 % of total recreationist activity in 2016, a proportion that increased steadily to 5.7 % in 2020 due to a growing usership. Strava users are biased toward cyclists (8 % higher than observed), males (15.7 % higher) and middle-aged people (20.4 % higher for ages 35–54). Strava pedestrians that were able to complete a questionnaire survey (>19 years) were biased to higher income brackets and education levels. Future studies using Strava data need to consider these biases – particularly the under-representation of vulnerable age (children/elderly) and socio-economic (poor/uneducated) groups. The implementation of Strava data in urban planning processes will depend on accuracy requirements of the application purpose and the extent to which biases can be corrected for.
Some salient points