4.2. Data CleanΒΆ
To reshape the data from the previous page to make it suitable for analysis, we do the following:
Extract years and months from the date columns.
Join data together.
Recast the data types.
Group and aggregate the rows to get counts.
Calculate crime rates.
Lower case and replace white-space with underscore characters.
Remove duplicates.
Remove NAs.
The scripts for cleaning and reshaping the data can be found in the book by un-hiding the code and also in the Python scripts found in the src/make_data/
directory of the repository.