Welcome to Week 7

Please note: in the following video, where reference is made to a study ‘week’, this corresponds to Weeks 7 and 8 of this course.

RUTH ALEXANDERThe ability to make comparisons between different things lies at the heart of statistics. Is this number bigger than that number? Is this average of this group more or less than the average of that group? Or are these different things correlated in some way. Combining data sets is one way of getting data into a form that makes comparisons possible. But we might also be able to look within a particular set for groups of elements that we can compare with each other. For example, when it comes to totting up the bills each month, if the items on our bank and credit card statements were tagged with a category such as food, transport, utilities or entertainment, we could more easily find out just how much we're spending on one aspect of our lives compared to another. Or if you were in a sales team you mind find it useful to compare sales by month, by region or by sales rep. Ordering large data sets and then grouping them into sets of related items and treating that as a small data set in its own right is another of those things that's hard for us but easy for a programme to do. And as well as processing chunks of a data set as separate groups of items we can also twist and contort a data set, making new rows from columns or many columns from one column and bending it into a shape where we can group it, summarise it, or generate interesting data visualisations directly from it. So that's what we'll be covering in this final week of the course. How to work with groups of data in a single data set and how to get our data sets doing data gymnastics.



In Week 6 you saw how to merge two datasets containing a common column to create a single, combined dataset. Combining datasets allows us to make comparisons across datasets, as you discovered when looking for correlations between GDP and life expectancy.

This week, you’ll learn how to go the other way, separating out distinct ‘subsets’ or groups of data, before summarising them individually.

As well as splitting out different groups of data, row and column values can be rearranged to reshape a dataset and allow the creation of a wide range of pivot table style reports from a single data table.

In this week’s exercises, you’ll learn how a single line of code can be used to generate a wide variety of pivot table style reports of your own.