Using Python to Connect to Strava’s API and Analyse Your Activities — Dummies Guide

Benji Knights Johnson
The Startup
Published in
8 min readMay 3, 2020

--

I couldn’t find any all in one guide that goes through every part from start to finish of connecting to Strava’s API, so thought I’d do it myself after piecing everything together.

This was basically the first time I connected up to a REST API using Python, so hence why I’ve gone for the dummies version, as I was basically a dummy myself when I was trying all this.

Create your App/API Connection

Firstly, you need to create/register your App on your Strava profile, by going to this link: www.strava.com/settings/api (after signing in). I’ll explain the fields below.

Details on field requirements explained below

For non-techies, or first timers (like I was), you may get confused by some of the terms/fields here, so here’s some explanation.

The term App here is slightly confusing, as you are not actually creating a full application for this, just getting a bit of data out. Don’t worry, this App creation is just to register that you’re doing something with the API, and you don’t actually need to create a proper application from it (but some people do…). So your Application Name can be whatever you like.

Next confusing field is Website. Again, this doesn’t actually need to be used at all, and in fact doesn’t even need to be real/live. You do need to put in a URL that is of the right URL structure, though, but can just make up the in between bits, like I did.

And finally, the Authorization Callback Domain sounds ominous, but again, for our simple function of exporting some data we can keep this very straightforward and just use localhost.

These fields are used mainly just for display in the Authentication stage, but can be updated whenever you like, if you want that stage to look different.

Once this is done you will have a few details you need to note down. Note down the Client ID and also show the Client Secret and note that down for later.

Authentication

Next step is granting authentication to get access tokens to use in your script.

This I had some trouble with, as the access token that is displayed after creating the App allows you to connect to your profile to get some top line profile data, but to get down to your activity level data, you need a certain level of access that I needed to do some research to find out about.

Copy and paste this link into your browser:

http://www.strava.com/oauth/authorize?client_id=[REPLACE_WITH_YOUR_CLIENT_ID]&response_type=code&redirect_uri=http://localhost/exchange_token&approval_prompt=force&scope=profile:read_all,activity:read_all

putting your Client ID (that you noted down before) in for client_id.

FYI, I had to specify the scope of the access to specifically be able to get the activity level data. This details out the various levels of access, in a bit of a tech-y way.

After you click Authorise on that link, you will be taken to a page that doesn’t load properly (because we don’t have a real domain/website), but that doesn’t matter at all. We only need to note down the code that appears in the URL you are on. The link will look like this:

http://localhost/exchange_token?state=&code=[THIS_IS_THE_CODE_YOU_NEED_TO_COPY]&scope=read,activity:read_all,profile:read_all

And so as directed, note down the code that is in that URL.

Access Tokens

Now we get to the Python (although this can be done in a Postman style application as well).

To retrieve your access and refresh tokens from your noted down code you must make a slightly more complicated request. Here is the Python for it:

import requests
import json
# Make Strava auth API call with your
# client_code, client_secret and code
response = requests.post(
url = 'https://www.strava.com/oauth/token',
data = {
'client_id': [INSERT_CLIENT_ID_HERE],
'client_secret': '[INSERT_CLIENT_SECRET_KEY]',
'code': '[INSERT_CODE_FROM_URL_HERE]',
'grant_type': 'authorization_code'
}
)
#Save json response as a variable
strava_tokens = response.json()
# Save tokens to file
with open('strava_tokens.json', 'w') as outfile:
json.dump(strava_tokens, outfile)
# Open JSON file and print the file contents
# to check it's worked properly
with open('strava_tokens.json') as check:
data = json.load(check)
print(data)

Note that in the headers of the call, the client_id does not need the inverted commas, as it’s a numeric data type, not a string. All other variables need the inverted commas.

If all has worked, you should see the json file outputted with some tokens and some details about your profile.

Note: your code only works once, so if you run the script again, you will need to restart the Authentication process above to get your new code.

Eventually Getting Your Activities!

We now have all the relevant info to be able to get your actual activity data!

You will now want to find out what fields are included in the output, and which fields you actually want to return. To do this you can request the first page of your activities with this:

import requests
from pandas.io.json import json_normalize
import json
import csv
# Get the tokens from file to connect to Strava
with open('strava_tokens.json') as json_file:
strava_tokens = json.load(json_file)
# Loop through all activities
url = "https://www.strava.com/api/v3/activities"
access_token = strava_tokens['access_token']
# Get first page of activities from Strava with all fields
r = requests.get(url + '?access_token=' + access_token)
r = r.json()

df = json_normalize(r)
df.to_csv('strava_activities_all_fields.csv')

You can go through the csv that is outputted to decide which fields you want to use. Once you have decided (you can just use the ones I decided on if you like), create a Data Frame with the desired fields to output to and then iterate through all pages of activities. If you want to output any new fields, make sure you add them to your Data Frame set up, and match the iterated data load to the Data Frame for each field.

import pandas as pd
import requests
import json
# Get the tokens from file to connect to Strava
with open('strava_tokens.json') as json_file:
strava_tokens = json.load(json_file)
# Loop through all activities
page = 1
url = "https://www.strava.com/api/v3/activities"
access_token = strava_tokens['access_token']
# Create the dataframe ready for the API call to store your activity data
activities = pd.DataFrame(
columns = [
"id",
"name",
"start_date_local",
"type",
"distance",
"moving_time",
"elapsed_time",
"total_elevation_gain",
"end_latlng",
"external_id"
]
)
while True:

# get page of activities from Strava
r = requests.get(url + '?access_token=' + access_token + '&per_page=200' + '&page=' + str(page))
r = r.json()

# if no results then exit loop
if (not r):
break

# otherwise add new data to dataframe
for x in range(len(r)):
activities.loc[x + (page-1)*200,'id'] = r[x]['id']
activities.loc[x + (page-1)*200,'name'] = r[x]['name']
activities.loc[x + (page-1)*200,'start_date_local'] = r[x]['start_date_local']
activities.loc[x + (page-1)*200,'type'] = r[x]['type']
activities.loc[x + (page-1)*200,'distance'] = r[x]['distance']
activities.loc[x + (page-1)*200,'moving_time'] = r[x]['moving_time']
activities.loc[x + (page-1)*200,'elapsed_time'] = r[x]['elapsed_time']
activities.loc[x + (page-1)*200,'total_elevation_gain'] = r[x]['total_elevation_gain']
activities.loc[x + (page-1)*200,'end_latlng'] = r[x]['end_latlng']
activities.loc[x + (page-1)*200,'external_id'] = r[x]['external_id']
# increment page
page += 1
# Export your activities file as a csv
# to the folder you're running this script in
activities.to_csv('strava_activities.csv')

Something to note here. Now we are actually using the API we need to take into account request limits to the API. The rate limits are:

100 requests every 15 minutes, 1000 daily

Each iteration of the above for every new page is a new request. We return the most amount of rows per page for this API call, which is 200 and so if you have 400 activities, this is two requests. All good if you’re doing some casual running. However, if you’re a serious weekend warrior like some of my Strava friends, you may start getting to limits. Also, if you are running this code a few times for testing or anything, then you will reach limits, and have to wait 15 mins.

Automating the Retrieval of the Tokens

You’re not quite finished yet! One last step to automate the process of retrieving new access_tokens in the future, after your current token expires.

import requests
import json
import time
# Get the tokens from file to connect to Strava
with open('strava_tokens.json') as json_file:
strava_tokens = json.load(json_file)
# If access_token has expired then
# use the refresh_token to get the new access_token
if strava_tokens['expires_at'] < time.time():
# Make Strava auth API call with current refresh token
response = requests.post(
url = 'https://www.strava.com/oauth/token',
data = {
'client_id': [INSERT_CLIENT_ID_HERE],
'client_secret': '[INSERT_CLIENT_SECRET_KEY]',
'grant_type': 'refresh_token',
'refresh_token': strava_tokens['refresh_token']
}
)
# Save response as json in new variable
new_strava_tokens = response.json()
# Save new tokens to file
with open('strava_tokens.json', 'w') as outfile:
json.dump(new_strava_tokens, outfile)
# Use new Strava tokens from now
strava_tokens = new_strava_tokens
# Open the new JSON file and print the file contents
# to check it's worked properly
with open('strava_tokens.json') as check:
data = json.load(check)
print(data)

And so now to the end script with all the required parts together.

import pandas as pd
import requests
import json
import time
## Get the tokens from file to connect to Strava
with open('strava_tokens.json') as json_file:
strava_tokens = json.load(json_file)
## If access_token has expired then use the refresh_token to get the new access_token
if strava_tokens['expires_at'] < time.time():
#Make Strava auth API call with current refresh token
response = requests.post(
url = 'https://www.strava.com/oauth/token',
data = {
'client_id': [INSERT_CLIENT_ID_HERE],
'client_secret': '[INSERT_CLIENT_SECRET_KEY]',
'grant_type': 'refresh_token',
'refresh_token': strava_tokens['refresh_token']
}
)
#Save response as json in new variable
new_strava_tokens = response.json()
# Save new tokens to file
with open('strava_tokens.json', 'w') as outfile:
json.dump(new_strava_tokens, outfile)
#Use new Strava tokens from now
strava_tokens = new_strava_tokens
#Loop through all activities
page = 1
url = "https://www.strava.com/api/v3/activities"
access_token = strava_tokens['access_token']
## Create the dataframe ready for the API call to store your activity data
activities = pd.DataFrame(
columns = [
"id",
"name",
"start_date_local",
"type",
"distance",
"moving_time",
"elapsed_time",
"total_elevation_gain",
"end_latlng",
"external_id"
]
)
while True:

# get page of activities from Strava
r = requests.get(url + '?access_token=' + access_token + '&per_page=200' + '&page=' + str(page))
r = r.json()
# if no results then exit loop
if (not r):
break

# otherwise add new data to dataframe
for x in range(len(r)):
activities.loc[x + (page-1)*200,'id'] = r[x]['id']
activities.loc[x + (page-1)*200,'name'] = r[x]['name']
activities.loc[x + (page-1)*200,'start_date_local'] = r[x]['start_date_local']
activities.loc[x + (page-1)*200,'type'] = r[x]['type']
activities.loc[x + (page-1)*200,'distance'] = r[x]['distance']
activities.loc[x + (page-1)*200,'moving_time'] = r[x]['moving_time']
activities.loc[x + (page-1)*200,'elapsed_time'] = r[x]['elapsed_time']
activities.loc[x + (page-1)*200,'total_elevation_gain'] = r[x]['total_elevation_gain']
activities.loc[x + (page-1)*200,'end_latlng'] = r[x]['end_latlng']
activities.loc[x + (page-1)*200,'external_id'] = r[x]['external_id']
# increment page
page += 1
activities.to_csv('strava_activities.csv')

Now the whole process of getting your activity details from Strava’s API is fully automated, and you just need to run this end full script (no need for the Authentication bits from the start that I have included in this final script) every time you want to update your activities. Only extra thing I may try and do is get the code to only append the new activities, rather than bring back all of them every time, but for now it doesn’t take long to bring them all back, so leaving it at that.

I’m not going to go into exactly what I did with the visualisation of this data (that I did in Power BI, FYI), as that’s another kettle of fish. But from here, you can just do some standard Excel visuals, or import into any kind of Data Visualisation platform (I got Power BI to execute the Python code directly, using print(“activities”) at the end of the script instead).

I really hope this all made sense, and you got what you needed out of it. Feel free to leave comments and I will answer them as best I can.

Lastly, this is my first time writing a post on Medium, or generally writing pretty much anything in a format like this… Go easy, but do let me know how I’ve done, and any improvements you think I could make, and hopefully I’ll write more in the future!

Strava profile: www.strava.com/athletes/benjikj
LinkedIn: www.linkedin.com/benji-knights-johnson

--

--