Using Rvest to pick my 2018 World Cup fantasy team

Today is the first day of the 2018 World Cup football in Russia. While, I do not know a lot about football, I do enjoy cheering for the Red Devils at both the European and the World Cup. And so do most of my friends. That’s why for the second time in a row, we are organising a small fantasy football league. This year we will be making use of the Sporza WK manager, provided by our national sport news network. The rules of their game are simple:

  1. Pick 11 players
  2. Stay within a budget of 300 million euro
  3. A maximum of 3 players from the same team are allowed
  4. Transfers are possible in certain phases of the tournament

You can obtain points if your players score goals and keep clean sheets, while you’ll lose points if your players scores owngoals or gets cautioned.

Like I said, I do not know a lot about football, so I would not be able to design a team from scratch. Therefore I will be using R, the tidyverse and Rvest to help me in this serious task!

1. Data collection

The first step of our journey is of course, data collection. Searching for a decent dataset was probably one of the hardest tasks. I wanted to use some random up to date player and team ratings, but could not find any decent one. Since I didn’t have a lot of time, I just settled with this ‘2018 FIFA World Cup squads’ Wikipedia page.

The page has a section for every team that contains the following information:

  • The group they are in
  • The coach of the team
  • A table with player stats such as
    • Position
    • Date of birth
    • Caps (= matches played)
    • Goals
    • Club

wikipedia

I’ve never done this before but this seems like a pretty easy web scraping job. For this I will try out Rvest using this tutorial as a guide. In the end I parsed the whole page using the following script:

https://gist.github.com/swuyts/99f34b6041565672b022e0d8b686afed

2. Exploring the players and teams

Before I hire anyone, I’d like to get to know them a little bit more. So let’s do some exploratory analysis. For example, who are the youngest players in the cup and how many goals have they scored?

https://gist.github.com/swuyts/492f52d2890ae1d7101ff859f4bac4a6

youngest_player

All right, this 19 year old Kylian Mbappé from France seems to be doing a great job! Who knows, he’ll end up in my team!

Now let’s do the same for the oldest players:
oldest_player.png

Cool! There seem to be 4 goal keepers in this list. None of them scored a goal, which is extremely normal for a player in that position I guess. In addition, Tim Cahill is one of the oldest players in the world cup and with almost 50 goals, scored the highest number of goals from these 10 players.

Next, let’s have a look at the teams. I wanted to figure out which team was the most experienced. For this, I will just look at the number of caps that each player has:

caps_per_team

It’s a little bit over plotted but wow, this is good news! Belgium tops the list and thus has the highest median number of caps. While one could probably interpret this graph in many different ways, to me this means that we’re the most experienced team and thus have a high chance of doing a pretty good job. Or at least, that’s what I choose to believe.

3. Team selection

For my team I’ve decided to go for a 3-4-3 formation because well eeeeuhh…. Just for no reason actually.

I will also start by recruiting midfield players, because back in the days when I played football, I always played that position.

3.1 Midfield

In total there’s 234 players to choose from. Let’s go for the obvious ones, the ones that scored the most:

https://gist.github.com/swuyts/0e578e0f7eef28513bb320294d702149

midfield.png

The first player I wanted to recruit was the top scorer, Thomas Müller, but apparently he’s annotated as a forward on the Sporza game. So unfortunately we can’t use him on this position.

  • So let’s recruit the second in line: Keisuke Honda! Welcome to the team!
  • The third in line is also from Japan, which does not seem like a good choice in this phase of the game, so let’s go for the captain of the Mexican squad Andrés Guardado!
  • Up next, from Costa Rica, also the captain: Bryan Ruiz!
  • Our final position, does not go to Özil as Germany is in the same group as Costa Rica and I do not want too much competition within the team, so give a round of applause for Denmark’s own Christian Eriksen!

All players are in the top 8 teams regarding their median number of caps (see above), so that’s a good sign!

3.2 Defence

Up next: 3 defenders! Again, I will go for the defenders that scored the highest number of goals. This is actually almost the only option I’ve got with the data I’ve collected.

defence.png

 

  •  The top goalscorer is again a Mexican: Rafael Màrquez. Of course, a good scientists always blasts his sequence first coach always googles his players first before picking them for their team. Doing this, I found that Màrquez had been banned for playing for his club Atlas for the last 2 months. So maybe that’s not such a strategic choice. Sorry Màrquez!
  • Instead, I will go for Sergio Ramos!
  • Unlucky for Branislav Ivanovic, but I will skip a Serbian player, as they are at the bottom of our “Number caps per team”-graph.
  • Although the fact that Bruno Alves (Portugal) is in the same group as Sergio Ramos (Spain; group B), I will still go for him as I think that both Spain and Portugal have a chance of getting through the first round!
  • My final player will not be from Panama as they are in the same group as Belgium and I do not think Panama will survive.

So excluding all the players above and the teams from which I’ve already picked a player, I will now focus only on the players with 8 goals and see whether there’s a good fit there:

defence2.png

Well, I’m not going to lie here as this is a little bit more arbitrary, but as a Belgian, I can’t ignore Jan Vertonghen on this list. So welcome to the team, Jan Vertonghen!

3.3 Forwards

That leaves space for three forwards!

forwards.png

Oh, this is definitely a super difficult list to choose from. Let’s make it a little bit easier by looking at what groups we already chose our players in:

https://gist.github.com/swuyts/91e88ca6ff2a51fb7b3119677df40c94

selection

Players from group A and D are missing!

  • Argentina is in group D, and Lionel Messi plays for Argentina. So welcome to him!
  • Uruguay is in group A, and Luis Suárez plays for Uruguay. I’ll happily make space for him!
  • Now that every group is represented, we’ll have to pick a player that’s already in one of these groups. Starting back at the top, we can’t go for Ronaldo as the count in group B would then be 3. So the next in line is Neymar from Brazil!

3.4 Goalkeeper

Only one final slot to fill: our goalkeeper!

goalkeeper.png

It makes a lot of sense that of all goalkeepers, not a single one has scored a goal. This means that choosing a keeper will be practically impossible with the data we have.

Therefore I’ll just go for our national hero: Thibaut Courtois!

4. Closing remark

So finally, I proudly present my official team lineup!

team.png

I’m pretty sure I will do a terrible job in our competition, but at least I’ve learned how to scrape a Wikipedia page using R.

 

All code can be found on my Github page as an RMarkdown file.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.