项目作者: da-cali

项目描述 :
Analizing soccer data in R.
高级语言: R
项目地址: git://github.com/da-cali/exploring_the_beautiful_game.git
创建时间: 2020-02-12T09:41:50Z
项目社区:https://github.com/da-cali/exploring_the_beautiful_game

开源协议:GNU General Public License v3.0

下载


Exploring The Beautiful Game

I came across this soccer
dataset
and decided to
explore it using R.

We begin by loading the data.

  1. library(DBI)
  2. library(RSQLite)
  3. library(dplyr)
  1. ##
  2. ## Attaching package: 'dplyr'
  3. ## The following objects are masked from 'package:stats':
  4. ##
  5. ## filter, lag
  6. ## The following objects are masked from 'package:base':
  7. ##
  8. ## intersect, setdiff, setequal, union
  1. # Database connection.
  2. con <- dbConnect(SQLite(),"database.sqlite")
  3. # Names of tables.
  4. names <- as.data.frame(dbListTables(con))
  5. # Tables.
  6. player <- dbReadTable(con,'Player')
  7. playerAttr <- dbReadTable(con,'Player_Attributes')
  8. # Disconnecting from the database.
  9. dbDisconnect(con)

Let’s arrange the players by rating and find out who were the 50 players
with the highest FIFA ratings during this period.

  1. # Returns the name of player p.
  2. getName <- function(p) {filter(player,p$player_api_id==player_api_id)$player_name}
  3. # Top 50 distinct player attributes ordered by FIFA ratings.
  4. top50 <- distinct(na.omit(arrange(playerAttr,desc(overall_rating))),player_api_id,.keep_all=TRUE)[1:50,]
  5. # Show the names of the 50 players with the highest rankings in descending order.
  6. sapply(c(1:nrow(top50)), function(i) paste(i,"-",getName(top50[i,]),"\n")) %>% message
  1. ## 1 - Lionel Messi
  2. ## 2 - Cristiano Ronaldo
  3. ## 3 - Gianluigi Buffon
  4. ## 4 - Wayne Rooney
  5. ## 5 - Gregory Coupet
  6. ## 6 - Xavi Hernandez
  7. ## 7 - Alessandro Nesta
  8. ## 8 - Andres Iniesta
  9. ## 9 - Iker Casillas
  10. ## 10 - John Terry
  11. ## 11 - Ronaldinho
  12. ## 12 - Thierry Henry
  13. ## 13 - Arjen Robben
  14. ## 14 - David Trezeguet
  15. ## 15 - Francesco Totti
  16. ## 16 - Franck Ribery
  17. ## 17 - Frank Lampard
  18. ## 18 - Kaka
  19. ## 19 - Luis Suarez
  20. ## 20 - Manuel Neuer
  21. ## 21 - Neymar
  22. ## 22 - Radamel Falcao
  23. ## 23 - Ze Roberto
  24. ## 24 - Zlatan Ibrahimovic
  25. ## 25 - Adriano
  26. ## 26 - Carles Puyol
  27. ## 27 - Cesc Fabregas
  28. ## 28 - Cris
  29. ## 29 - David Villa
  30. ## 30 - Eden Hazard
  31. ## 31 - Julio Cesar
  32. ## 32 - Luca Toni
  33. ## 33 - Lucio
  34. ## 34 - Nemanja Vidic
  35. ## 35 - Petr Cech
  36. ## 36 - Robin van Persie
  37. ## 37 - Samuel Eto'o
  38. ## 38 - Steven Gerrard
  39. ## 39 - Andrea Pirlo
  40. ## 40 - Bastian Schweinsteiger
  41. ## 41 - Carlos Tevez
  42. ## 42 - David Silva
  43. ## 43 - Dida
  44. ## 44 - Didier Drogba
  45. ## 45 - Diego
  46. ## 46 - Fernando Torres
  47. ## 47 - Gerard Pique
  48. ## 48 - Hernan Crespo
  49. ## 49 - Jamie Carragher
  50. ## 50 - Joaquin

As expected, Messi and Ronaldo occupy positions 1 and 2.

Which of these players are over 190 centimeters tall?

  1. # Show players over 190 cm. tall.
  2. filter(player, player_api_id%in%(top50$player_api_id) & height>190)[,-c(1,4,5)]
  1. ## player_api_id player_name height weight
  2. ## 1 30728 David Trezeguet 190.50 176
  3. ## 2 30720 Dida 195.58 187
  4. ## 3 37482 Gerard Pique 193.04 187
  5. ## 4 30717 Gianluigi Buffon 193.04 201
  6. ## 5 30709 Luca Toni 193.04 194
  7. ## 6 27299 Manuel Neuer 193.04 203
  8. ## 7 30865 Nemanja Vidic 190.50 194
  9. ## 8 30859 Petr Cech 195.58 198
  10. ## 9 35724 Zlatan Ibrahimovic 195.58 209

Which of these players weigh under 160 pounds?

  1. # Show players under 160 pounds.
  2. filter(player, player_api_id%in%(top50$player_api_id) & weight<160)[,-c(1,4,5)]
  1. ## player_api_id player_name height weight
  2. ## 1 30731 Andrea Pirlo 177.80 150
  3. ## 2 30955 Andres Iniesta 170.18 150
  4. ## 3 38817 Carlos Tevez 172.72 157
  5. ## 4 37459 David Silva 170.18 148
  6. ## 5 30909 David Villa 175.26 152
  7. ## 6 30924 Franck Ribery 170.18 159
  8. ## 7 30981 Lionel Messi 170.18 159
  9. ## 8 19533 Neymar 175.26 150
  10. ## 9 22543 Radamel Falcao 177.80 159
  11. ## 10 30843 Robin van Persie 187.96 157
  12. ## 11 39854 Xavi Hernandez 170.18 148
  13. ## 12 38843 Ze Roberto 172.72 159

Surprisingly, Robin van Persie is among these players.

What is the average height and weight of the top 50 highest ranked
players?

  1. # Show mean height and weight.
  2. summarise(filter(player,player_api_id%in%(top50$player_api_id)),mean(height),mean(weight))
  1. ## mean(height) mean(weight)
  2. ## 1 182.4228 175.02

What is the preferred foot ratio?

  1. # Show number of left and right footed players.
  2. count(top50,preferred_foot)
  1. ## # A tibble: 2 x 2
  2. ## preferred_foot n
  3. ## <chr> <int>
  4. ## 1 left 10
  5. ## 2 right 40

Everything looks reasonable so far, but to further evaluate the quality
of the data lets test the idea that similar players play at similar
positions by grouping these players into four clusters (representing the
four main positions in the game: goalkeeper, defender, midfielder, and
striker).

  1. # K-means object.
  2. clusters <- kmeans(top50[,10:42],centers=4,iter.max=20,nstart=100)
  3. # Returns the names of the players of the nth cluster.
  4. clusterNames <- function(n) {
  5. clust <- as.data.frame(split(top50,clusters$cluster)[n])
  6. names(clust) <- names(top50)
  7. sapply(c(1:nrow(clust)), function(i) getName(clust[i,]))
  8. }
  9. # Matrix of names of players per cluster.
  10. playersPerCluster <- sapply(c(1:length(clusters$size)),clusterNames)
  11. # Show names of players per cluster.
  12. sapply(c(1:length(clusters$size)), function(i) (playersPerCluster[i]))
  1. ## [[1]]
  2. ## [1] "Gianluigi Buffon" "Gregory Coupet" "Iker Casillas" "Manuel Neuer"
  3. ## [5] "Julio Cesar" "Petr Cech" "Dida"
  4. ##
  5. ## [[2]]
  6. ## [1] "Wayne Rooney" "Xavi Hernandez" "Andres Iniesta"
  7. ## [4] "Ronaldinho" "Thierry Henry" "Francesco Totti"
  8. ## [7] "Frank Lampard" "Kaka" "Ze Roberto"
  9. ## [10] "Cesc Fabregas" "David Villa" "Samuel Eto'o"
  10. ## [13] "Steven Gerrard" "Andrea Pirlo" "Bastian Schweinsteiger"
  11. ## [16] "Diego" "Joaquin"
  12. ##
  13. ## [[3]]
  14. ## [1] "Alessandro Nesta" "John Terry" "Carles Puyol" "Cris"
  15. ## [5] "Lucio" "Nemanja Vidic" "Gerard Pique" "Jamie Carragher"
  16. ##
  17. ## [[4]]
  18. ## [1] "Lionel Messi" "Cristiano Ronaldo" "Arjen Robben"
  19. ## [4] "David Trezeguet" "Franck Ribery" "Luis Suarez"
  20. ## [7] "Neymar" "Radamel Falcao" "Zlatan Ibrahimovic"
  21. ## [10] "Adriano" "Eden Hazard" "Luca Toni"
  22. ## [13] "Robin van Persie" "Carlos Tevez" "David Silva"
  23. ## [16] "Didier Drogba" "Fernando Torres" "Hernan Crespo"

We can see that two of these clusters are composed exclusively by
goalkeepers and defenders, while the other two also seem to distinguish
between midfielders and strikers.