General starter file
The starter file uses the UK Government’s gender pay gap data set. The data has been cleaned to remove some columns such as the mean and median bonus percentages, male and female bonus percent and responsible person. Removing the responsible person’s data allows us to follow the Data Ethics Framework and GDPR principles.
The UK government dataset had address and postcode details which I knew I could combine with the postcode analysis file. This would help me segregate date into locations, such as individual countries within the UK, and I could have also used this to define regions.
Kaggle datasets
In the planning stage, I identified that Kaggle has several data sets regarding the UK gender pay gap. On further investigation, there did not appear to be any data sets that would add value to my investigation on SMEs and Scotland vs the United Kingdom. Some of the data sets used data from ONS or the UK government, which I could access through alternative means.
ONS datasets
The Office for National Statistics data sets provided further gender pay gap data and were very comprehensive. Each year had approximately 20 data sets providing an array of data relating to the gender pay gap. This included data on public/private sector organisations, by industry, age, location, occupation and travel to work.
The ONS data sets include guidance on the statistical robustness of the gender pay gap estimates and how accurate these are. The guidance further notes that the dataset does not cover self-employed individuals which might impact on my findings.


What do you think?
Show comments / Leave a comment