Releasing Harrier League Data

R
Author

Jonny Law

Published

March 4, 2020

The North East Harrier League is a series of cross country running races in the North East of England taking place over the winter from September to March. Results are available online from 2012-13 season to the present season 2019-20. The results are available online in HTML format. I have downloaded and cleaned the data and it can be used for analysis or exploration. The data for senior men and women is available in a tabular format in my blog package - see the file which contains the parsing functions here to get an insight into what it takes to parse this kind of data.

I used the following R packages to download, parse the HTML and clean the resulting data

The data can be accessed by installing my R package which contains a selection of R code relating to this blog.

# install.packages("remotes")
# remotes::install_github("jonnylaw/jonnylaw")
data("harrier_league_results")

Determining the most difficult course

As a quick example of what can be done with the data I will consider the running time by course. The data can be split by male and female. However the men and women don’t compete over the same distance with the women completing two laps and the men completing three. Therefore we can plot the average time for a single lap of the course (obviously this doesn’t account for changing pace throughout the race). It appears that the hardest (or longest) course is Aykley Heads with the highest median race time.

Citation

BibTeX citation:
@online{law2020,
  author = {Jonny Law},
  title = {Releasing {Harrier} {League} {Data}},
  date = {2020-03-04},
  langid = {en}
}
For attribution, please cite this work as:
Jonny Law. 2020. “Releasing Harrier League Data.” March 4, 2020.