Do you know what my favorite thing about Deckstats is? The stats.
Do you know what my most hated thing about The Command Zone is? Their stats.
First off, their data set is super incomplete. There are some instances where the number of lands was just left blank. This doesn't mean that there was no lands (I checked a couple of the videos), it just wasn't recorded. There were two games where mass land destruction was involved (I included those games). I also excluded games where there was no winner, because in all cases we are comparing who won.
But this is still an amazing data set to work with, and I applaud everyone who put this together. It's a big data set, so short of cEDH games, the sample is a good representation of the population.
Question 1: Does having more lands in a game cause you to win?Null Hypothesis: There is no relation between the number of lands you play and if you won (-0.7>Correlation coefficient<0.7)
Alternate Hypothesis 1: Decks with more lands in play are more likely to win the game (Correlation coefficient>0.7)
Alternate Hypothesis 2: Decks with less lands in play are more likely to win the game (Correlation coefficient<-0.7)
There is an expression among statisticians; If you
torture the data enough, you can make it talk. Which is why you want to avoid torturing data, lest you show that green jellybeans cause acne.
Believe me, I tortured this data for a long time. I could not get it to say that the players with more lands in play were more likely to win.
First I just ran the correlation of "Mana producing Lands at end of the Game" versus "Player Won?". So this is comparing across all games (n=304) if the player who won had the most lands.
Correlation coefficient= 0.204, so
there is no correlation between number of lands in play and who won. But then I did some things I wasn't supposed to (I tortured the data). I started by averaging the number of lands within games, to make a proxy for game length. So if a game had players with 15, 19, 14, and 16 lands, the average was 16, so the game was about 16 turns long. This is unlikely to be the actual game length (keep in mind I'm not supposed to be doing this), but it's a proxy. I then ran the correlation again, this time controlling for game length, to see if players ahead of the mana curve did better.
Correlation coefficient= 0.275, so again,
no correlation. Finally (really pushing it this time) I did within game correlation. So within each game, did the winning player have the most lands.
Correlation coefficient= 0.218, once more
no correlation!Conclusion: I failed to reject the null hypothesis. I can say with confidence that there is no relation between the number of lands you play and if you win.
Interpretation: I think the problem with this analysis is that it only looked at lands. As I said before, mana sources would give a different result. Also, there are a lot of cEDH decks (namely
Flash Hulk and Godo) that can easily win with only two lands, but with that early a win, everyone would have 2 lands.
Question 2: Does having Sol Ring or Mana Crypt within your first 3 turns cause you to win more often?Null Hypothesis:
Sol Ring and/or
Mana Crypt in your first 3 turns does not have an effect on you winning (-0.7>Correlation coefficient<0.7).
Alternative 1: Players with
Sol Ring and/or
Mana Crypt in their first 3 turns are more likely to win (Correlation coefficient>0.7).
Alternative 2: Players with
Sol Ring and/or
Mana Crypt in their first 3 turns are less likely to win (Correlation coefficient<-0.7).
So I should get this out of the way; this null hypothesis sucks. I just can't think of a better way to phrase it. We know that
Sol Ring improves the power of your deck, that's why everyone uses it. So this is more measuring the strength of having this early game fast mana.
Running the
simple correlation of "If there was a
Sol Ring/
Mana Crypt" versus "Did that player win?" gives a correlation coefficient of -0.019, so
no correlation. But because
Sol Ring is such a common card, I frequently saw games where 3 players all had
Sol Ring/
Mana Crypt, but only one person can win. So this time around, I think it's fair to transform the data. Next I compare "Did the player that won have a
Mana Crypt/
Sol Ring?" and this is something for a Chi^2 test to handle. A Chi^2 test compares what was expected due to chance (the null hypothesis) compared to what actually happened. The math bit is a little complicated for me to explain, but if you're interested, this was the result.
| | Win | Loss | Total |
Had a ring | Actual | 25 | 87 | 112 |
| Expected | 27.91 | 84.09 | |
No Ring | Actual | 278 | 826 | 1104 |
| Expected | 275.09 | 828.91 | |
Total | | 303 | 913 | 1216 |
So instead of me describing how I got to the p-value (0.505 by the way, so
not significant). We can just look at the numbers. All the numbers we expect to get are very close to what we actually got.
Conclusion: I failed to reject the null hypothesis. There is no relation between you winning and if you played a
Sol Ring in the first 3 turns.
Interpretation: I think this question was asked the wrong way. What it actually should have been is "Do decks with
Sol Ring win more often then those without?" The issue is that budget would have an effect (most of the time people don't use
Sol Ring because they just don't have one).
Question 3: Which color is the strongest?This is the point where I really get mad at the way this data set is organised. I'll be back in a few hours to finish this post off.