Loading

Outcome 2 Write-Up

Outcome 2 focuses on the collection of data safely and securely. 

(a) Identify the data required, and appropriate sources, for the chosen approach 

Several open-source datasets relate to the gender pay gap, including the Office for National Statistics (2022) dataset. The UK government provides datasets monitoring the gender pay gap (UK Government, 2023). Kaggle provides several open-source datasets giving accurate data. The Scottish Government offers open-access statistics regarding the gender pay gap per Scottish Council area (Scottish Government, 2021).

I have chosen to focus on using the Office for National Statistics and UK Government datasets as sources of gender pay gap data. I have chosen not to use Kaggle datasets as I couldn’t source any that appeared to add value to the investigation, which I did not already have access to from ONS and UK Gov.

(b) Assemble the required dataset. 

I launched Power BI and imported datasets to combine and assemble the dataset needed to complete this project. The datasets were all spreadsheet (Excel) files, a known data connector file type for Power BI. When loading in each of the dataset files, once connected, I had to select which sheets of the dataset I required to be imported. Once the data was loaded, I checked for any errors and reviewed the data relationships, which would be necessary during analysis later on.

At this point, I undertook the required data transformation and cleaning to ensure the data was consistent and accurate. In doing so, this should ensure that the data remains good quality. This stage also included removing duplicates and resolving missing values.

Additionally, I explored appending datasets by using the Append Queries feature. This was a new skill that I wanted to investigate and familiarise myself with.

I also connected tables by defining relationships between specific datasets. This would allow me to compare certain aspects of data for visualisations across years.

(c) Employ secure methods for storing the data sample 

I uploaded all datasets to OneDrive. My OneDrive can only be accessed using my username and password and additional 2FA settings. Additional protection can be added through password protection of the folder. However, this is problematic for Power BI when using the datasets as the additional security can cause access issues, impacting dashboard updates.

(d) Recognise privacy considerations with the required data 
The datasets chosen for the project are open-source and/or public datasets. No private company datasets have been used during the investigation and development of this project. If utilising personal and/or confidential data from specific datasets, then the Data Ethics Framework (2020) must be followed to ensure appropriate and responsible use of data. This would minimise the risk of accidental or intentional bias occurring.

(e) Identify sources of bias in the collected data sample

There is a lower number of Scottish, Irish and Welsh companies participating in submitting gender pay gap data compared to England. This could potentially introduce bias as it does not represent all parts of the UK equally.

Additionally, for some datasets, data submission was purely voluntary unless the business had more than 250 employees; therefore, the sample may not accurately represent lower amounts than this. Thus establishing the gender pay gap in SMEs may prove difficult and/or inaccurate.

I also needed to be aware of confirmation bias as this might lead to errors in developing insights through my expectations and beliefs regarding the gender pay gap in SMEs.

(f) Review the impacts of data quality issues
In some of the datasets, there were missing values from submissions. This could lead to inaccurate analysis as missing values could be considered poor-quality data. There is the possibility that any insights drawn from visualisations using this data could be incorrect.

As mentioned, certain areas of the UK are underrepresented compared to England. The conclusions made from the data could then be biased or skewed.

Data quality issues could also lead to missed opportunities for creating insights on the gender pay gap. The investigation into the gender pay gap hopes to note important trends and patterns which as a result of data quality issues, could go unnoticed.

svg

What do you think?

Show comments / Leave a comment

Leave a reply

Loading
svg
Quick Navigation
  • 01

    Outcome 2 Write-Up