Data workflow

library(HealthDataScotland)

Introduction

The data used in HealthDataScotland is derived from a variety of sources on the NHS open data API, which is maintained by Public Health Scotland. This vignette will outlined exactly which sources are used and the workflow by which data is manipulated into a useable form by the application.

Data sources

Data is obtained from the following sources:

To aide with data transparency and reproducibility, I have outlined how the above data sets are retrieved and manipulated into a usable format by HealthDataScotland. The processed data set is stored in an Azure blob. This blob is updated periodically using a GitHub action.

GP data processing

GP metadata: GP metadata was obtained from the NHS open data set titled ‘gp-practice-contact-details-and-list-sizes’. This data set contains metadata for GP practice including name, address and phone number.
GP demography data: GP demographic data was obtained from the NHS open data set titled ‘gp-practice-populations’. This data set contains numbers of registered patients at each GP practice split by Gender and Age.
Combined data: The intersection of GP practice IDs between the metadata and demographic data was determined. This master set of IDs was used to combine the two data sets and this combined data set is used by the application.
Map: GP spatial data was derived from the Spatial Hub and manipulated into a format appropriate for visualisation. Please note the spatial data may be out of date and have missing information.
Summary: Unlike the interactive map, the ‘Summary’ section of the application uses the full combined GP data set derived from NHS open data. Therefore, demographic data for GP practices with missing spatial data can be viewed in this section. This section summarises GP demographic data at the national and health board level by grouping all GP practices within a particular group (e.g. the health board ‘Greater Glasgow and Clyde’) and summing the total number of registered patients. Please note that there may be unknown caveats with this approach. Therefore please refer to the published data sets before making any conclusions.

Hospital data processing

Hospital metadata: Hospital metadata was obtained from the NHS open data set titled ‘hospital-codes’. This data set contains metadata for hospitals including name, address and health board.
Hospital bed occupancy data: Hospital bed occupancy data was obtained from the NHS open data set titled ‘annual-hospital-beds-information’. This data set was filtered to the PHS data set ID ‘d719af13-5fb3-430f-810e-ab3360961107’ to obtain bed capacity per specialty per hospital location.
Combined data: The intersection of hospital codes between the metadata and bed capactiy data was determined. This master set of IDs was used to combine the two data sets and this combined data set is used by the application.
Map: Hospital spatial data was derived from the Spatial Hub and manipulated into a format appropriate for visualisation. Please note the spatial data may be out of date and have missing information.
Summary: Unlike the interactive map, the ‘Summary’ section of the application uses the full combined hospital data set derived from NHS open data. Therefore bed capacity data for hospitals with missing spatial data can be viewed in this section. This section summarises hospital bed capacity data at the national and health board level per specialty by grouping all hospitals within a particular group (e.g. the health board ‘Greater Glasgow and Clyde’) and taking the average percentage bed occupancy for a selected specialty (whilst filering non-applicable data points). Please note that there may be unknown caveats with this approach. Therefore please refer to the published data sets before making any conclusions.

Continuous integration

This data workflow creates a series of data.frames which are stored as an .RDS file in an Azure blob. This Azure blob is accessed when the shiny application launches. The data set is periodically updated every 6 months using a GitHub action.