Loading

Cleaning and filtering data reflection

svgMay 15, 2023BlogKerry

General starter file

There were several steps to preparing the general starter file for analysis. This included changing the format of several columns, such as the due date column, to a year format. The quartile data columns needed to be changed to 0 decimal place.

It was also important to remove any unnecessary data completed by deleting columns such as Company Number, Employer ID, Current Name and Address.

Some columns were missing in the general starting file in the original government data I wished to use. I combined the required columns into the starter file dataset.

Due to the type of analysis, I wanted to complete, I split the SIC column into separate segments so that I could later filter the column to segregate industries. This was achieved by converting the SIC column to text and splitting the number of characters using the ‘once as left as possible’. Due to the type of analysis I wanted to complete, I broke the SIC column into separate segments so that I could later filter the column to segregate industries. This was achieved by converting the SIC column to text and splitting the number of characters using the ‘once as left as possible’ setting.

Power query was also used to add the postcode file as a separate dataset that could be referred to. The postcode in the general starter file was split into the first two digits, and I then deleted the further columns of digits as these were not required.

ONS dataset

The ONS dataset was split into twenty sub-datasets. I filtered the sub-datasets down to the six necessary for my project. Datasets focused on work travel, and parliamentary areas were unnecessary for my investigation.

I reviewed the datasets for accuracy and completeness. At the time of the project, the 2022 figures were provisional and had not been finalised.

I again needed to format several columns to decimal across the data sets from their original text format. I also had to delete several empty columns that had been imported into Power BI.

Where there was data missing, I replaced the values or removed the rows as needed.

svg

What do you think?

Show comments / Leave a comment

Leave a reply

Loading
svg
Quick Navigation
  • 01

    Cleaning and filtering data reflection