Jacquie Tran

16 minute read

(source: Wikimedia)

The countdown is well and truly on.

We are less than 1 week away from the first bounce of AFLW 3.0, the 3rd season of the women’s national Australian football league. As an Aussie living abroad, a footy fan, and a supporter of the increasing professionalisation of women’s sport, I’m excited to watch closely as the next chapter of the AFLW story unfolds.

How will Geelong and North Melbourne, new teams to the national league in 2019, fare against the 8 teams who have participated in AFLW seasons 1 and 2?

What will new draftees like Maddie Prespakis and Nina Morrison bring to the mix at senior-level?

Which international players and cross-coders will show the potential success to be had in talent transfers?

How will the conference format and rule changes play out?

Fascinating possibilities abound, but there’s lots to be learned from looking back too.

The analysis questions

Following on from my last post, I wanted to work with the fitzRoy package to obtain and explore match data from the 2017 and 2018 AFLW seasons.

In this post, Part I, I provide a codethrough that demonstrates how to obtain publicly available AFLW match data and prepare it for further analysis.

Over a series of posts, I’ll show how the available match data can be used to:

  1. Create team-by-team matchplay profiles; and
  2. Investigate the game statistics that are associated with successful team outcomes.

As an aside, I hope to demonstrate what can be gleaned from a small sample, while also acknowledging their limitations.

Getting started

In a new R session, load up the libraries (packages) to be used in this codethrough.

library(fitzRoy)
library(dplyr)
library(stringr)
library(knitr)
library(kableExtra)

Obtaining match data

Thanks to the excellent fitzRoy package, retrieving AFLW match data is a cinch with the get_aflw_match_data() function.

aflw_match_data <- get_aflw_match_data()

# I've suppressed warnings from this code chunk, so if you run this code
# locally, don't be alarmed - warnings are returned for the listed Match.Id
# values that correspond to matches yet to be played in 2019.

Inspecting match data

If you’re working in RStudio, the View() function is useful for looking at rectangular data sets in a spreadsheet-like format.

# Run this in RStudio to produce the screenshot below
# which is interactive: you can sort and filter on columns
# within the tab that opens up
View(aflw_match_data)

There are a range of useful R functions for retrieving meta-data about, well, the data you’re working with. For instance, we can use str() to find out useful details like the number of observations (rows), the number of variables (columns), the name of each variable, and the data type for each variable (e.g., character string, integer, numeric, and so on).

str(aflw_match_data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    93 obs. of  30 variables:
##  $ Match.Id            : chr  "CD_M20172640101" "CD_M20172640102" "CD_M20172640103" "CD_M20172640104" ...
##  $ Round.Id            : chr  "CD_R201726401" "CD_R201726401" "CD_R201726401" "CD_R201726401" ...
##  $ Competition.Id      : chr  "CD_S2017264" "CD_S2017264" "CD_S2017264" "CD_S2017264" ...
##  $ Venue               : chr  "Ikon Park" "Thebarton Oval" "VU Whitten Oval" "Casey Fields" ...
##  $ Local.Start.Time    : POSIXct, format: "2017-02-03 19:45:00" "2017-02-04 16:35:00" ...
##  $ Round.Number        : int  1 1 1 1 2 2 2 2 3 3 ...
##  $ Round.Abbreviation  : chr  "Rd 1" "Rd 1" "Rd 1" "Rd 1" ...
##  $ Weather.Type        : chr  "CLEAR_NIGHT" "RAIN" "RAIN" "RAIN" ...
##  $ Weather.Description : chr  "Clear" "Rain at times" "Partly cloudy" "Rain Possible storm" ...
##  $ Temperature         : num  18 18 18 18 18 18 18 18 18 18 ...
##  $ Home.Team           : chr  "Carlton" "Adelaide Crows" "Western Bulldogs" "Melbourne" ...
##  $ Home.Goals          : int  7 7 6 1 2 7 4 3 7 4 ...
##  $ Home.Behinds        : int  4 6 8 4 11 5 1 5 1 3 ...
##  $ Home.Points         : int  46 48 44 10 23 47 25 23 43 27 ...
##  $ Home.Left.Behinds   : int  1 2 2 1 2 1 0 0 0 0 ...
##  $ Home.Right.Behinds  : int  2 2 4 2 5 2 1 1 0 2 ...
##  $ Home.Left.Posters   : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Home.Right.Posters  : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Home.Rushed.Behinds : int  1 2 2 1 3 2 0 2 1 1 ...
##  $ Home.Touched.Behinds: int  0 0 0 0 1 0 0 0 0 0 ...
##  $ Away.Team           : chr  "Collingwood" "GWS Giants" "Fremantle" "Brisbane Lions" ...
##  $ Away.Goals          : int  1 1 1 4 7 5 7 5 6 3 ...
##  $ Away.Behinds        : int  5 6 6 1 6 4 2 6 7 5 ...
##  $ Away.Points         : int  11 12 12 25 48 34 44 36 43 23 ...
##  $ Away.Left.Behinds   : int  0 1 2 1 2 0 0 1 0 1 ...
##  $ Away.Right.Behinds  : int  3 4 3 0 2 3 1 3 3 3 ...
##  $ Away.Left.Posters   : int  0 0 0 0 0 0 0 1 1 0 ...
##  $ Away.Right.Posters  : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ Away.Rushed.Behinds : int  1 1 1 0 2 1 0 1 2 1 ...
##  $ Away.Touched.Behinds: int  0 0 0 0 0 0 1 0 1 0 ...

From visually inspecting the raw data and obtaining a meta-data summary, we know the following things about the AFLW match data set:

  • The data is structured so that the information pertaining to each match is contained on one row. This means that the statistics for any two opposing teams are stored along the same row.
  • Statistics for the home and away teams are distinguished by the prefix ‘Home.’ and ‘Away.’.
  • This data set includes ‘high-level’ information about each match, such as descriptors of the weather, the number of goals and behinds scored by each team, the total points scored by each team, and more.
  • The data frame is wide, with each of the ‘measured’ variables presented in distinct columns. (For more on this, have a read of this article by Matti Vuorre on wide and long data.)

Scrolling through the ‘temperature’ values, I also noticed that all matches in Rounds 1 to 3 of the 2017 season == 18. I am sceptical about the accuracy of this data given that weather is quite variable around Australia (as per the Bureau of Meteorology), and the AFLW season takes place in the summer months.

Obtaining detailed match data

To obtain game statistics for each team in each match (e.g., number of kicks, handballs, tackles, marks), the fitzRoy package again has a simple function for retrieving this data, which requires a vector of Match.ID values. Since I want game statistics data from all AFLW matches played to date, I’ll obtain 2017 and 2018 match IDs from the aflw_match_data set that we just retrieved.

# Vector of match IDs to retrieve detailed game statistics for
# Excluding match IDs for the 2019 season - yet to be played
match_id_vec <- aflw_match_data %>%
    select(Match.Id) %>%
    filter(!str_detect(Match.Id, "2019"))

# Retrieve detailed statistics for match IDs specified in match_id_vec
aflw_detailed <- get_aflw_detailed_data(match_id_vec$Match.Id)

Inspecting the detailed match data

Again, the View() and str() functions are helpful for initially inspecting the data set so you can take a look at its structure and contents. Using these functions, we learn that the AFLW detailed match data set has a similar structure to the ‘high-level’ match data we retrieved before (i.e., it’s in wide format, with one row per match so that both team’s statistics presented on the same row). This is handy because it means merging the data sets will be fairly straightforward.

The str() includes information about all variables in the data set, including variable names, but sometimes you may be interested in looking at variable names only - without additional meta-data such as the type of data stored in each variable. To do this, use the names() function like so:

names(aflw_detailed)
##   [1] "Match.Id"                                                   
##   [2] "Round.Id"                                                   
##   [3] "Competition.Id"                                             
##   [4] "away.stats.averages.behinds"                                
##   [5] "away.stats.averages.bounces"                                
##   [6] "away.stats.averages.clangers"                               
##   [7] "away.stats.averages.clearances.centreClearances"            
##   [8] "away.stats.averages.clearances.stoppageClearances"          
##   [9] "away.stats.averages.clearances.totalClearances"             
##  [10] "away.stats.averages.contestedMarks"                         
##  [11] "away.stats.averages.contestedPossessions"                   
##  [12] "away.stats.averages.disposalEfficiency"                     
##  [13] "away.stats.averages.disposals"                              
##  [14] "away.stats.averages.dreamTeamPoints"                        
##  [15] "away.stats.averages.freesAgainst"                           
##  [16] "away.stats.averages.freesFor"                               
##  [17] "away.stats.averages.goalAccuracy"                           
##  [18] "away.stats.averages.goalAssists"                            
##  [19] "away.stats.averages.goalEfficiency"                         
##  [20] "away.stats.averages.goals"                                  
##  [21] "away.stats.averages.handballs"                              
##  [22] "away.stats.averages.hitouts"                                
##  [23] "away.stats.averages.inside50s"                              
##  [24] "away.stats.averages.intercepts"                             
##  [25] "away.stats.averages.interchangeCounts.interchangeCap"       
##  [26] "away.stats.averages.interchangeCounts.interchangeCountQ1"   
##  [27] "away.stats.averages.interchangeCounts.interchangeCountQ2"   
##  [28] "away.stats.averages.interchangeCounts.interchangeCountQ3"   
##  [29] "away.stats.averages.interchangeCounts.interchangeCountQ4"   
##  [30] "away.stats.averages.interchangeCounts.totalInterchangeCount"
##  [31] "away.stats.averages.kicks"                                  
##  [32] "away.stats.averages.lastUpdated"                            
##  [33] "away.stats.averages.marks"                                  
##  [34] "away.stats.averages.marksInside50"                          
##  [35] "away.stats.averages.metresGained"                           
##  [36] "away.stats.averages.onePercenters"                          
##  [37] "away.stats.averages.ranking"                                
##  [38] "away.stats.averages.ratingPoints"                           
##  [39] "away.stats.averages.rebound50s"                             
##  [40] "away.stats.averages.scoreInvolvements"                      
##  [41] "away.stats.averages.shotEfficiency"                         
##  [42] "away.stats.averages.shotsAtGoal"                            
##  [43] "away.stats.averages.superGoals"                             
##  [44] "away.stats.averages.tackles"                                
##  [45] "away.stats.averages.tacklesInside50"                        
##  [46] "away.stats.averages.totalPossessions"                       
##  [47] "away.stats.averages.turnovers"                              
##  [48] "away.stats.averages.uncontestedPossessions"                 
##  [49] "away.stats.totals.behinds"                                  
##  [50] "away.stats.totals.bounces"                                  
##  [51] "away.stats.totals.clangers"                                 
##  [52] "away.stats.totals.clearances.centreClearances"              
##  [53] "away.stats.totals.clearances.stoppageClearances"            
##  [54] "away.stats.totals.clearances.totalClearances"               
##  [55] "away.stats.totals.contestedMarks"                           
##  [56] "away.stats.totals.contestedPossessions"                     
##  [57] "away.stats.totals.disposalEfficiency"                       
##  [58] "away.stats.totals.disposals"                                
##  [59] "away.stats.totals.dreamTeamPoints"                          
##  [60] "away.stats.totals.freesAgainst"                             
##  [61] "away.stats.totals.freesFor"                                 
##  [62] "away.stats.totals.goalAccuracy"                             
##  [63] "away.stats.totals.goalAssists"                              
##  [64] "away.stats.totals.goalEfficiency"                           
##  [65] "away.stats.totals.goals"                                    
##  [66] "away.stats.totals.handballs"                                
##  [67] "away.stats.totals.hitouts"                                  
##  [68] "away.stats.totals.inside50s"                                
##  [69] "away.stats.totals.intercepts"                               
##  [70] "away.stats.totals.interchangeCounts.interchangeCap"         
##  [71] "away.stats.totals.interchangeCounts.interchangeCountQ1"     
##  [72] "away.stats.totals.interchangeCounts.interchangeCountQ2"     
##  [73] "away.stats.totals.interchangeCounts.interchangeCountQ3"     
##  [74] "away.stats.totals.interchangeCounts.interchangeCountQ4"     
##  [75] "away.stats.totals.interchangeCounts.totalInterchangeCount"  
##  [76] "away.stats.totals.kicks"                                    
##  [77] "away.stats.totals.lastUpdated"                              
##  [78] "away.stats.totals.marks"                                    
##  [79] "away.stats.totals.marksInside50"                            
##  [80] "away.stats.totals.metresGained"                             
##  [81] "away.stats.totals.onePercenters"                            
##  [82] "away.stats.totals.ranking"                                  
##  [83] "away.stats.totals.ratingPoints"                             
##  [84] "away.stats.totals.rebound50s"                               
##  [85] "away.stats.totals.scoreInvolvements"                        
##  [86] "away.stats.totals.shotEfficiency"                           
##  [87] "away.stats.totals.shotsAtGoal"                              
##  [88] "away.stats.totals.superGoals"                               
##  [89] "away.stats.totals.tackles"                                  
##  [90] "away.stats.totals.tacklesInside50"                          
##  [91] "away.stats.totals.totalPossessions"                         
##  [92] "away.stats.totals.turnovers"                                
##  [93] "away.stats.totals.uncontestedPossessions"                   
##  [94] "away.team.teamAbbr"                                         
##  [95] "away.team.teamId"                                           
##  [96] "away.team.teamName"                                         
##  [97] "away.team.teamNickname"                                     
##  [98] "home.stats.averages.behinds"                                
##  [99] "home.stats.averages.bounces"                                
## [100] "home.stats.averages.clangers"                               
## [101] "home.stats.averages.clearances.centreClearances"            
## [102] "home.stats.averages.clearances.stoppageClearances"          
## [103] "home.stats.averages.clearances.totalClearances"             
## [104] "home.stats.averages.contestedMarks"                         
## [105] "home.stats.averages.contestedPossessions"                   
## [106] "home.stats.averages.disposalEfficiency"                     
## [107] "home.stats.averages.disposals"                              
## [108] "home.stats.averages.dreamTeamPoints"                        
## [109] "home.stats.averages.freesAgainst"                           
## [110] "home.stats.averages.freesFor"                               
## [111] "home.stats.averages.goalAccuracy"                           
## [112] "home.stats.averages.goalAssists"                            
## [113] "home.stats.averages.goalEfficiency"                         
## [114] "home.stats.averages.goals"                                  
## [115] "home.stats.averages.handballs"                              
## [116] "home.stats.averages.hitouts"                                
## [117] "home.stats.averages.inside50s"                              
## [118] "home.stats.averages.intercepts"                             
## [119] "home.stats.averages.interchangeCounts.interchangeCap"       
## [120] "home.stats.averages.interchangeCounts.interchangeCountQ1"   
## [121] "home.stats.averages.interchangeCounts.interchangeCountQ2"   
## [122] "home.stats.averages.interchangeCounts.interchangeCountQ3"   
## [123] "home.stats.averages.interchangeCounts.interchangeCountQ4"   
## [124] "home.stats.averages.interchangeCounts.totalInterchangeCount"
## [125] "home.stats.averages.kicks"                                  
## [126] "home.stats.averages.lastUpdated"                            
## [127] "home.stats.averages.marks"                                  
## [128] "home.stats.averages.marksInside50"                          
## [129] "home.stats.averages.metresGained"                           
## [130] "home.stats.averages.onePercenters"                          
## [131] "home.stats.averages.ranking"                                
## [132] "home.stats.averages.ratingPoints"                           
## [133] "home.stats.averages.rebound50s"                             
## [134] "home.stats.averages.scoreInvolvements"                      
## [135] "home.stats.averages.shotEfficiency"                         
## [136] "home.stats.averages.shotsAtGoal"                            
## [137] "home.stats.averages.superGoals"                             
## [138] "home.stats.averages.tackles"                                
## [139] "home.stats.averages.tacklesInside50"                        
## [140] "home.stats.averages.totalPossessions"                       
## [141] "home.stats.averages.turnovers"                              
## [142] "home.stats.averages.uncontestedPossessions"                 
## [143] "home.stats.totals.behinds"                                  
## [144] "home.stats.totals.bounces"                                  
## [145] "home.stats.totals.clangers"                                 
## [146] "home.stats.totals.clearances.centreClearances"              
## [147] "home.stats.totals.clearances.stoppageClearances"            
## [148] "home.stats.totals.clearances.totalClearances"               
## [149] "home.stats.totals.contestedMarks"                           
## [150] "home.stats.totals.contestedPossessions"                     
## [151] "home.stats.totals.disposalEfficiency"                       
## [152] "home.stats.totals.disposals"                                
## [153] "home.stats.totals.dreamTeamPoints"                          
## [154] "home.stats.totals.freesAgainst"                             
## [155] "home.stats.totals.freesFor"                                 
## [156] "home.stats.totals.goalAccuracy"                             
## [157] "home.stats.totals.goalAssists"                              
## [158] "home.stats.totals.goalEfficiency"                           
## [159] "home.stats.totals.goals"                                    
## [160] "home.stats.totals.handballs"                                
## [161] "home.stats.totals.hitouts"                                  
## [162] "home.stats.totals.inside50s"                                
## [163] "home.stats.totals.intercepts"                               
## [164] "home.stats.totals.interchangeCounts.interchangeCap"         
## [165] "home.stats.totals.interchangeCounts.interchangeCountQ1"     
## [166] "home.stats.totals.interchangeCounts.interchangeCountQ2"     
## [167] "home.stats.totals.interchangeCounts.interchangeCountQ3"     
## [168] "home.stats.totals.interchangeCounts.interchangeCountQ4"     
## [169] "home.stats.totals.interchangeCounts.totalInterchangeCount"  
## [170] "home.stats.totals.kicks"                                    
## [171] "home.stats.totals.lastUpdated"                              
## [172] "home.stats.totals.marks"                                    
## [173] "home.stats.totals.marksInside50"                            
## [174] "home.stats.totals.metresGained"                             
## [175] "home.stats.totals.onePercenters"                            
## [176] "home.stats.totals.ranking"                                  
## [177] "home.stats.totals.ratingPoints"                             
## [178] "home.stats.totals.rebound50s"                               
## [179] "home.stats.totals.scoreInvolvements"                        
## [180] "home.stats.totals.shotEfficiency"                           
## [181] "home.stats.totals.shotsAtGoal"                              
## [182] "home.stats.totals.superGoals"                               
## [183] "home.stats.totals.tackles"                                  
## [184] "home.stats.totals.tacklesInside50"                          
## [185] "home.stats.totals.totalPossessions"                         
## [186] "home.stats.totals.turnovers"                                
## [187] "home.stats.totals.uncontestedPossessions"                   
## [188] "home.team.teamAbbr"                                         
## [189] "home.team.teamId"                                           
## [190] "home.team.teamName"                                         
## [191] "home.team.teamNickname"

That loooooonnnng list of variables shows us that we now have a range of game statistics to analyse, thanks to the fitzRoy::get_aflw_detailed_data() function . This data set primarily includes simple counts of game actions (e.g., total number of kicks), or derived variables that convey proportion and provide measures of accuracy or ‘efficiency’ (e.g., disposal efficiency is the number of disposals that are effectively received by a team-mate, divided by the total number of disposals).

There are also some variables here like metres gained - a measure of the accumulated forward movement of the ball towards a team’s goals. This variable and a few others in this data set are routinely measured in the AFL (the men’s national competition) but not in the AFLW, so we will need to disregard these variables as we progress the analysis.

By combining the ‘high-level’ information stored in the match data with the detailed game statistics data sets, we can use common match analysis approaches to understand AFLW team performance, by examining which game actions are associated (or not!) with game outcomes.

Data preparation

Some data wrangling is needed to modify the data structure to suit this analysis, and tidy it up so it contains only the variables we will need.

The code below wrangles aflw_match_data so that every match has two rows, with one row for the home team’s game statistics and one row for the away team’s game statistics.

# Create a subset of aflw_match_data for HOME teams
home_teams <- aflw_match_data %>%
  select(Match.Id, Round.Id, Competition.Id, Venue, Local.Start.Time,
         Round.Number, Round.Abbreviation, Weather.Type, Weather.Description,
         Temperature, starts_with("Home"), Away.Points) %>%
    # Rename variables for ease of merging and analysis
  rename(team = Home.Team, goals = Home.Goals, behinds = Home.Behinds,
         points_for = Home.Points, points_against = Away.Points,
         behinds_left = Home.Left.Behinds, behinds_right = Home.Right.Behinds,
         posters_left = Home.Left.Posters, posters_right = Home.Right.Posters,
         behinds_rushed = Home.Rushed.Behinds,
         behinds_touched = Home.Touched.Behinds) %>%
  mutate(home_or_away = "Home")

# Create a subset of aflw_match_data for AWAY teams
away_teams <- aflw_match_data %>%
  select(Match.Id, Round.Id, Competition.Id, Venue, Local.Start.Time,
         Round.Number, Round.Abbreviation, Weather.Type, Weather.Description,
         Temperature, starts_with("Away"), Home.Points) %>%
    # Rename variables for ease of merging and analysis
  rename(team = Away.Team, goals = Away.Goals, behinds = Away.Behinds,
         points_for = Away.Points, points_against = Home.Points,
         behinds_left = Away.Left.Behinds, behinds_right = Away.Right.Behinds,
         posters_left = Away.Left.Posters, posters_right = Away.Right.Posters,
         behinds_rushed = Away.Rushed.Behinds,
         behinds_touched = Away.Touched.Behinds) %>%
  mutate(home_or_away = "Away")

# Row-bind the home and away subsets to create new data structure
# where one row == one team in each match, so each match will have two rows
aflw_match_data_clean <- rbind(home_teams, away_teams)

# Re-order the data set
aflw_match_data_clean <- aflw_match_data_clean[order(
  aflw_match_data_clean$Match.Id,
  rev(aflw_match_data_clean$home_or_away),
  decreasing = FALSE), ]

# Drop NA rows for fixtured matches that have not been played yet
aflw_match_data_clean <- aflw_match_data_clean %>%
  filter(!is.na(goals))

# Create new variables for match outcomes
aflw_match_data_clean <- aflw_match_data_clean %>%
  mutate(score_margin = points_for - points_against, # continuous variable
         match_outcome = case_when(                  # categorical variable
           score_margin > 0 ~ "Win",
           score_margin < 0 ~ "Loss",
           TRUE             ~ "Draw"))

We follow the same logic to wrangle aflw_detailed_data into the same structure as aflw_match_data_clean.

# Subset to variables that will be useful for the analysis
# Variables are excluded here because they don't contain useful or correct data
aflw_detailed_selected <- aflw_detailed %>%
  select(Match.Id, contains("stats.totals"), contains("away.team"),
         contains("home.team"), -contains("behinds"), -contains("goals"),
         -contains("interChange"), -contains("lastUpdated"),
         -contains("metresGained"), -contains("ranking"),
         -contains("ratingPoints"), -contains("scoreInvolvements"),
         -contains("superGoals"))

# Create a subset of aflw_detailed_selected for HOME teams
aflw_detailed_home <- aflw_detailed_selected %>%
  select(Match.Id, contains("home.team"),
         contains("stats.totals"), -contains("away")) %>%
    # Rename variables for ease of merging and analysis
  rename(team_abbr = home.team.teamAbbr, team_id = home.team.teamId,
         team = home.team.teamName, team_nickname = home.team.teamNickname,
         bounces = home.stats.totals.bounces,
         clangers = home.stats.totals.clangers,
         clearances_centre = home.stats.totals.clearances.centreClearances,
         clearances_stoppage = home.stats.totals.clearances.stoppageClearances,
         clearances_total = home.stats.totals.clearances.totalClearances,
         marks_contested = home.stats.totals.contestedMarks,
         possessions_contested = home.stats.totals.contestedPossessions,
         disposals_efficiency = home.stats.totals.disposalEfficiency,
         disposals = home.stats.totals.disposals,
         dream_team_points = home.stats.totals.dreamTeamPoints,
         frees_against = home.stats.totals.freesAgainst,
         frees_for = home.stats.totals.freesFor,
         goal_accuracy = home.stats.totals.goalAccuracy,
         goal_assists = home.stats.totals.goalAssists,
         goal_efficiency = home.stats.totals.goalEfficiency,
         handballs = home.stats.totals.handballs,
         hitouts = home.stats.totals.hitouts,
         inside50s = home.stats.totals.inside50s,
         intercepts = home.stats.totals.intercepts,
         kicks = home.stats.totals.kicks, marks = home.stats.totals.marks,
         marks_inside50 = home.stats.totals.marksInside50,
         one_percenters = home.stats.totals.onePercenters,
         rebound50s = home.stats.totals.rebound50s,
         shot_efficiency = home.stats.totals.shotEfficiency,
         shots_at_goal = home.stats.totals.shotsAtGoal,
         tackles = home.stats.totals.tackles,
         tackles_inside50 = home.stats.totals.tacklesInside50,
         possessions_total = home.stats.totals.totalPossessions,
         turnovers = home.stats.totals.turnovers,
         possessions_uncontested = home.stats.totals.uncontestedPossessions)

# Create a subset of aflw_detailed_selected for AWAY teams
aflw_detailed_away <- aflw_detailed_selected %>%
  select(Match.Id, contains("away.team"),
         contains("stats.totals"), -contains("home")) %>%
    # Rename variables for ease of merging and analysis
  rename(team_abbr = away.team.teamAbbr, team_id = away.team.teamId,
         team = away.team.teamName, team_nickname = away.team.teamNickname,
         bounces = away.stats.totals.bounces,
         clangers = away.stats.totals.clangers,
         clearances_centre = away.stats.totals.clearances.centreClearances,
         clearances_stoppage = away.stats.totals.clearances.stoppageClearances,
         clearances_total = away.stats.totals.clearances.totalClearances,
         marks_contested = away.stats.totals.contestedMarks,
         possessions_contested = away.stats.totals.contestedPossessions,
         disposals_efficiency = away.stats.totals.disposalEfficiency,
         disposals = away.stats.totals.disposals,
         dream_team_points = away.stats.totals.dreamTeamPoints,
         frees_against = away.stats.totals.freesAgainst,
         frees_for = away.stats.totals.freesFor,
         goal_accuracy = away.stats.totals.goalAccuracy,
         goal_assists = away.stats.totals.goalAssists,
         goal_efficiency = away.stats.totals.goalEfficiency,
         handballs = away.stats.totals.handballs,
         hitouts = away.stats.totals.hitouts,
         inside50s = away.stats.totals.inside50s,
         intercepts = away.stats.totals.intercepts,
         kicks = away.stats.totals.kicks, marks = away.stats.totals.marks,
         marks_inside50 = away.stats.totals.marksInside50,
         one_percenters = away.stats.totals.onePercenters,
         rebound50s = away.stats.totals.rebound50s,
         shot_efficiency = away.stats.totals.shotEfficiency,
         shots_at_goal = away.stats.totals.shotsAtGoal,
         tackles = away.stats.totals.tackles,
         tackles_inside50 = away.stats.totals.tacklesInside50,
         possessions_total = away.stats.totals.totalPossessions,
         turnovers = away.stats.totals.turnovers,
         possessions_uncontested = away.stats.totals.uncontestedPossessions)

# Row-bind the home and away subsets to create new data structure
# where one row == one team in each match, so each match will have two rows
aflw_detailed_clean <- rbind(aflw_detailed_home, aflw_detailed_away)

We have modified the data structure of aflw_match_data_clean and aflw_detailed_clean and simplified to include only the variables that are useful for this analysis. Now we can use the left_join() function from the dplyr package to merge the data sets.

aflw_merged <- left_join(aflw_match_data_clean, aflw_detailed_clean)

# Check the first few rows to see what the data looks like now
head(aflw_merged) %>%
    kable("html") %>%
    kable_styling() %>%
    scroll_box(width = "800px", height = "600px") 
Match.Id Round.Id Competition.Id Venue Local.Start.Time Round.Number Round.Abbreviation Weather.Type Weather.Description Temperature team goals behinds points_for behinds_left behinds_right posters_left posters_right behinds_rushed behinds_touched points_against home_or_away score_margin match_outcome team_abbr team_id team_nickname bounces clangers clearances_centre clearances_stoppage clearances_total marks_contested possessions_contested disposals_efficiency disposals dream_team_points frees_against frees_for goal_accuracy goal_assists goal_efficiency handballs hitouts inside50s intercepts kicks marks marks_inside50 one_percenters rebound50s shot_efficiency shots_at_goal tackles tackles_inside50 possessions_total turnovers possessions_uncontested
CD_M20172640101 CD_R201726401 CD_S2017264 Ikon Park 2017-02-03 19:45:00 1 Rd 1 CLEAR_NIGHT Clear 18 Carlton 7 4 46 1 2 0 0 1 0 11 Home 35 Win CARL CD_T8096 Blues 4 44 1 16 17 7 106 60.6 198 847 20 9 58.3 4 28.0 86 30 25 56 112 26 6 15 25 48.0 12 59 15 197 47 91
CD_M20172640101 CD_R201726401 CD_S2017264 Ikon Park 2017-02-03 19:45:00 1 Rd 1 CLEAR_NIGHT Clear 18 Collingwood 1 5 11 0 3 0 1 1 0 46 Away -35 Loss COLL CD_T8097 Magpies 3 37 4 15 19 4 94 52.8 163 926 9 20 16.7 0 3.7 48 28 27 48 115 35 4 23 18 22.2 6 87 8 163 57 69
CD_M20172640102 CD_R201726401 CD_S2017264 Thebarton Oval 2017-02-04 16:35:00 1 Rd 1 RAIN Rain at times 18 Adelaide Crows 7 6 48 2 2 0 0 2 0 12 Home 36 Win ADEL CD_T8098 Crows 0 46 5 9 14 5 108 51.9 185 825 23 21 46.7 3 30.4 66 23 23 56 119 31 8 15 19 65.2 15 55 9 186 52 78
CD_M20172640102 CD_R201726401 CD_S2017264 Thebarton Oval 2017-02-04 16:35:00 1 Rd 1 RAIN Rain at times 18 GWS Giants 1 6 12 1 4 0 0 1 0 48 Away -36 Loss GWS CD_T7889 Giants 0 45 7 13 20 2 92 46.1 165 783 21 23 10.0 1 4.8 52 9 21 52 113 37 6 13 15 47.6 10 62 9 169 55 77
CD_M20172640103 CD_R201726401 CD_S2017264 VU Whitten Oval 2017-02-04 19:40:00 1 Rd 1 RAIN Partly cloudy 18 Western Bulldogs 6 8 44 2 4 0 0 2 0 12 Home 32 Win WB CD_T7387 Bulldogs 5 29 6 24 30 1 96 57.4 190 853 10 17 35.3 6 21.4 80 33 28 46 110 21 7 20 13 60.7 17 59 13 188 48 92
CD_M20172640103 CD_R201726401 CD_S2017264 VU Whitten Oval 2017-02-04 19:40:00 1 Rd 1 RAIN Partly cloudy 18 Fremantle 1 6 12 2 3 0 0 1 0 44 Away -32 Loss FRE CD_T7886 Freo 4 39 3 11 14 2 74 56.4 156 680 17 10 12.5 1 6.7 54 14 15 50 102 35 3 20 22 53.3 8 44 2 151 46 77

Side note: Everytime I use dplyr to merge data sets, I feel immeasurable gratitude to have moved beyond the copy-paste approach of my Excel-heavy past - this technique is manual, tedious, and most importantly, prone to errors that can be difficult if not impossible to trace. In contrast, dplyr joins are elegant, efficient, and the type of join is clear from the name of the function. My go-to reference on this topic is the dplyr cheatsheet by Jenny Bryan.

Where to next?

Following this codethrough produces a tidy data set that contains detailed information about every match played in the 2017 and 2018 AFLW seasons.

For the next step in this analysis, we’ll begin exploring this data set using data summaries and visualisations to compare match statistics between AFLW teams.

[ puts on Ira Glass voice ]

Stay. Tuned.