Author Topic: Breaking down The Command Zone Stats (Read 8193 times)

Morganator 2.0 · « **on:** April 16, 2019, 02:39:49 am »

Previously on Deckstats...

Quote from: MustaKotka on April 12, 2019, 10:43:22 am

Looking at this from a statistical point of view: being short on spells is better than being short on lands (if you need to choose) because usually there are more spells than lands in your deck. Idk if you guys watch Command Zone but they actually did the math on this so it's not just my gut feeling about this: winning players tend to have most land in play.

Quote from: Morganator 2.0 on April 13, 2019, 04:28:52 am

I would like to see the stats that The Command Zone has.

Quote from: Red_Wyrm on April 14, 2019, 06:46:53 am

They go over sample size etc at the beginning, but they don't exactly give all of the data they get. By this I mean if they ran an ANOVA, T-test Chi-Square (Absolutely no reason to run this one with this type of data) or similar, which I assume they did as they hired a statistician to analyze the data, they didn't give us the correlation coefficient, or a P value to describe if the results were statistically significant or insignificant. They just present the final data. For example: They state having white in your deck leads to your chance to win decreasing by 1% (assuming you start with a 25% chance to win in a 4 player game) and playing red increases it by 3%, and blue green and black were around 8% I think.

So here is the link to part 1: https://www.youtube.com/watch?v=Iwdb_kPCwNU

This is the second video: https://www.youtube.com/watch?v=ttGjuNXWxpY

Oh and they cover the price of the decks and their win% too.

Quote from: Morganator 2.0 on April 14, 2019, 08:02:47 pm

Do you know what my favorite thing about Deckstats is? The stats.

Do you know what my most hated thing about The Command Zone is? Their stats.

First off, their data set is super incomplete. There are some instances where the number of lands was just left blank. This doesn't mean that there was no lands (I checked a couple of the videos), it just wasn't recorded. There were two games where mass land destruction was involved (I included those games). I also excluded games where there was no winner, because in all cases we are comparing who won.

But this is still an amazing data set to work with, and I applaud everyone who put this together. It's a big data set, so short of cEDH games, the sample is a good representation of the population.

Question 1: Does having more lands in a game cause you to win?
Null Hypothesis: There is no relation between the number of lands you play and if you won (-0.7>Correlation coefficient<0.7)
Alternate Hypothesis 1: Decks with more lands in play are more likely to win the game (Correlation coefficient>0.7)
Alternate Hypothesis 2: Decks with less lands in play are more likely to win the game (Correlation coefficient<-0.7)

There is an expression among statisticians; If you torture the data enough, you can make it talk. Which is why you want to avoid torturing data, lest you show that green jellybeans cause acne.

Believe me, I tortured this data for a long time. I could not get it to say that the players with more lands in play were more likely to win.

First I just ran the correlation of "Mana producing Lands at end of the Game" versus "Player Won?". So this is comparing across all games (n=304) if the player who won had the most lands. Correlation coefficient= 0.204, so there is no correlation between number of lands in play and who won. But then I did some things I wasn't supposed to (I tortured the data). I started by averaging the number of lands within games, to make a proxy for game length. So if a game had players with 15, 19, 14, and 16 lands, the average was 16, so the game was about 16 turns long. This is unlikely to be the actual game length (keep in mind I'm not supposed to be doing this), but it's a proxy. I then ran the correlation again, this time controlling for game length, to see if players ahead of the mana curve did better. Correlation coefficient= 0.275, so again, no correlation. Finally (really pushing it this time) I did within game correlation. So within each game, did the winning player have the most lands. Correlation coefficient= 0.218, once more no correlation!

Conclusion: I failed to reject the null hypothesis. I can say with confidence that there is no relation between the number of lands you play and if you win.
Interpretation: I think the problem with this analysis is that it only looked at lands. As I said before, mana sources would give a different result. Also, there are a lot of cEDH decks (namely Flash Hulk and Godo) that can easily win with only two lands, but with that early a win, everyone would have 2 lands.
Question 2: Does having Sol Ring or Mana Crypt within your first 3 turns cause you to win more often?
Null Hypothesis: Sol Ring and/or Mana Crypt in your first 3 turns does not have an effect on you winning (-0.7>Correlation coefficient<0.7).
Alternative 1: Players with Sol Ring and/or Mana Crypt in their first 3 turns are more likely to win (Correlation coefficient>0.7).
Alternative 2: Players with Sol Ring and/or Mana Crypt in their first 3 turns are less likely to win (Correlation coefficient<-0.7).

So I should get this out of the way; this null hypothesis sucks. I just can't think of a better way to phrase it. We know that Sol Ring improves the power of your deck, that's why everyone uses it. So this is more measuring the strength of having this early game fast mana.

Running the simple correlation of "If there was a Sol Ring/Mana Crypt" versus "Did that player win?" gives a correlation coefficient of -0.019, so no correlation. But because Sol Ring is such a common card, I frequently saw games where 3 players all had Sol Ring/Mana Crypt, but only one person can win. So this time around, I think it's fair to transform the data. Next I compare "Did the player that won have a Mana Crypt/Sol Ring?" and this is something for a Chi^2 test to handle. A Chi^2 test compares what was expected due to chance (the null hypothesis) compared to what actually happened. The math bit is a little complicated for me to explain, but if you're interested, this was the result.

Win Loss Total
Had a ring Actual 25 87 112
Expected 27.91 84.09
No Ring Actual 278 826 1104
Expected 275.09 828.91
Total 303 913 1216

So instead of me describing how I got to the p-value (0.505 by the way, so not significant). We can just look at the numbers. All the numbers we expect to get are very close to what we actually got.

Conclusion: I failed to reject the null hypothesis. There is no relation between you winning and if you played a Sol Ring in the first 3 turns.
Interpretation: I think this question was asked the wrong way. What it actually should have been is "Do decks with Sol Ring win more often then those without?" The issue is that budget would have an effect (most of the time people don't use Sol Ring because they just don't have one).
Question 3: Which color is the strongest?

This is the point where I really get mad at the way this data set is organised. I'll be back in a few hours to finish this post off.

Tonight on Deckstats, I present the analysis to figure out which is the best color in Commander, based on the data from The Command Zone and I also compare which is the best color combination out of all 32.
You know, I just realized… The Command zone paid people to do these stats, and I’m doing it for free.
Step 1: I simplified all of the data so that it made sense.
Decks containing…
White= 537
Blue= 578
Black= 594
Red= 549
Green= 584
Colorless= 7
Number of each deck in each color identity
Colorless= 7
Mono-White= 41
Mono-Blue= 58
Mono-Black= 75
Mono-Red=80
Mono-Green=59
Azorius= 38
Dimir= 45
Rakdos= 42
Gruul= 42
Selesnya= 44
Orzhov= 34
Izzet= 29
Golgari= 46
Boros= 27
Simic= 48
Esper= 35
Grixis= 41
Jund= 25
Naya= 31
Bant= 47
Abzan= 30
Jeskai= 29
Sultai= 36
Mardu= 37
Temur= 27
Anti-Green= 14
Anti-White= 19
Anti-Blue= 18
Anti-Black= 15
Anti-Red= 24
5-color= 73
Part 1: Which color is the best.
I hate this question. If we’re defining the best by number of wins per game played, then the order goes Blue, Green, Black, Red, White. But, we don’t know if this is a significant difference or not, these numbers are really close to each other. For that we use the Chi squared test again. I’m going to skip over most of the math bits this time (but I will show them if someone asks). The p-value of the chi-squared test was 0.162 which is not significant. Therefor with the data presented, we cannot say which is the best color. And I think this comes down to how I had to do this. Because decks can be more than 1 color, things get wonky. So instead, lets look at color identity.
Part 2: Which color identity is best.
This is literally the same thing as before, just with 32 levels instead of 5. But same deal, the p-value of the chi-squared test was 0.216, so not significant.

In case you're interested, I've attached the graphs that show the number of wins per game. Keep in mind that because these results were non-significant, if we played another 300 games of commander, we would see different results.

I know this doesn't look like much, but this took hours (mostly just rearranging the raw data so it makes sense). I'll leave interpretation for later, because we kinda know from experience that white is the worst color.

Soren841 · « **Reply #1 on:** April 16, 2019, 02:43:03 am »

Notice the order based on the data is exactly what I said. The differences may not be significant, but they reinforce the order that we all know, which I think gives it credibility. Also, most of the better performing color combos had black, blue, green, or some combination of them.. I think red and white are close to each other and the sultai colors are close to each other but red and black are pretty far apart, from experience.

Also..
SANS-Green= 14
SANS-White= 19
SANS-Blue= 18
SANS-Black= 15
SANS-Red= 24

btw r u like some kind of statistician lmao

Morganator 2.0 · « **Reply #2 on:** April 16, 2019, 03:47:48 am »

Quote from: Soren841 on April 16, 2019, 02:43:03 am

btw r u like some kind of statistician lmao

Amateur statistician I guess. My field of study requires me to know about statistics, and I translate my knowledge of statistics to card games.

In other words, I'm not sexually active.

But I'm not done yet. There is still one other thing that's been bothering me.

Quote from: Red_Wyrm on April 14, 2019, 06:46:53 am

For example: They state having white in your deck leads to your chance to win decreasing by 1% (assuming you start with a 25% chance to win in a 4 player game) and playing red increases it by 3%, and blue green and black were around 8% I think.

I'm still not quite sure how they came to this conclusion. Once I get some proper sleep I'll lock myself in a dark room to figure this out.

WWolfe · « **Reply #3 on:** April 16, 2019, 02:10:58 pm »

Love this thread and the series of posts in the other thread that lead to it!

Nothing surprising here as far as wide-spread perception of the best/worst colors. I'm curious to see what happens when you breakdown the CZ's numbers of increased/decreased win probability based on color inclusion.

Morganator 2.0 · « **Reply #4 on:** April 17, 2019, 03:56:17 am »

So I did something new now; I watched the first video.

At least, the first 20 minutes. These videos are extremely boring. Are there actually people who like this?

But anyway, I found out that the person they hired does have experience with Magic, but not with commander. Honestly.. good enough.
Here is the issue though. The two hosts of this video did not present the stats correctly. At all. I just know that this is going to come back to bite me in the future. At some point, I am going to spend a long time explaining to someone at my local game store that they shouldn't make a Planeswalker deck just because of The Command Zone's stats.

But I digress. What really caught my attention was that the numbers were extracted from the data set with a Python script. That worries me. For something like this you really should use a proper statistical software like R Studio (my preference), SPSS, Mini-Tab, or even Microsoft Excel (for the simple analyses). Still, this guy has a Harvard education, so he should know what he's doing (I now also understand why he got paid).

But the other thing that caught my attention; none of these graphs have error bars. So while a bar graph shows where the data did land, the error bars show where the data could have landed, which is important for statistical significance. As a general rule, if the error bars cross, the data is non-significant. Here's an example:

Continuing with the trend of people wondering what I do on my spare time, this is a graph I made showing the height of Golden Rod flowers, where some have been parasitized (the ones with galls). You can sort of make out that the plants that have been parasitized are slightly shorter, but because the error bars cross, we can't conclude anything. If I had done these measurements again, with the same number of plants, then I could see that the parasitized plants were slightly taller.

Now this might seem hypocritical, because the last two graphs I posted to this thread didn't have error bars. That's because it was late for me, and while Excel can put error bars on graphs, it is not good at it... like... at all. You really have to smack it around to make it work. Instead, I ran the Chi squared test for significance.

Okay, enough ranting (for now). Here are the graphs that The Command Zone showed.

So while I haven't checked, I have a hard time believing that all the people without Sol Ring landed exactly on 25%. That seems like a fudged number.

So this picture isn't actually a statistic, it's just showing how each deck was defined in terms of play style. So you are an enchantment deck if you have 20 or more enchantments.

The more I see of these graphs, the less convinced I am that these are well-tuned decks. No way does combat damage do better than combo.

Now this might be me being nit-picky, but I'm pretty sure the numbers on this graph are wrong. 18%*3+42% makes a total of 96%. You can't just have 4% go missing.

Without p-values or correlation coefficients, these numbers mean nothing. But I'm going to leave these graphs here. I'm going to see if I can re-create them in the future, with error bars.

Soren841 · « **Reply #5 on:** April 17, 2019, 04:00:58 am »

Speaking of stats, the probability calculator on here sucks. For a 1 of in EDH it goes 7/100 for the probability in the opening hand, and adds 1/100 for each subsequent turn. Not only are there only 99 cards in the actual deck, but the denominator decreases by 1 each time. For example, It says a 7% chance of drawing a card in the opening hand. Well I did the math and it's actually 7.3% It also says that there's a 17% chance for turn 10. Well I did the math again ((1/99)+(1/98)+(1/97)+(1/96)+(1/95)+(1/94)+(1/93)+(1/92)+(1/91)+(1/90)+(1/89)+(1/88)+(1/87)+(1/86)+(1/85)+(1/84)+(1/83)) and it's actually 17.5%

p.s. I hope Nils reads this and gets a real hypergeometric calculator put into deckstats.

p.p.s. this is obviously nothing compared to Morganator's math but I found this very annoying

p.p.p.s. correct me if this is completely wrong but the numbers are close so I'm gonna assume it is right. I hope I haven't forgotten basic math bc it hasn't been THAT long.

Morganator 2.0 · « **Reply #6 on:** April 17, 2019, 04:35:49 am »

Quote from: Soren841 on April 17, 2019, 04:00:58 am

Speaking of stats, the probability calculator on here sucks. For a 1 of in EDH it goes 7/100 for the probability in the opening hand, and adds 1/100 for each subsequent turn. Not only are there only 99 cards in the actual deck, but the denominator decreases by 1 each time. For example, It says a 7% chance of drawing a card in the opening hand. Well I did the math and it's actually 7.3% It also says that there's a 17% chance for turn 10. Well I did the math again ((1/99)+(1/98)+(1/97)+(1/96)+(1/95)+(1/94)+(1/93)+(1/92)+(1/91)+(1/90)+(1/89)+(1/88)+(1/87)+(1/86)+(1/85)+(1/84)+(1/83)) and it's actually 17.5%

p.s. I hope Nils reads this and gets a real hypergeometric calculator put into deckstats.

I noticed this a while ago, but I didn't think that there were this many people that cared about statistics. The hypergeometric calculator is fine, but with commander decks, it actually counts the commander(s) as being cards you could possibly draw. I've wondered if this is why so many people put their commanders in the side board.

Soren841 · « **Reply #7 on:** April 17, 2019, 04:46:54 am »

It's not even just the out of 100. Even with that, it should add (1/10)+(1/99) etc for the statistics but it just multiplies 1/100 by however many cards have been drawn, which is NOT how it works.

Side note, I do add the probabilities together right.. lol I despise stats

dokepa · « **Reply #8 on:** April 17, 2019, 04:59:50 am »

I think I just watched "A beautiful mind" on paper

, but 1 quick question , I might have missed this in the previous posts , is this all based off 1 vs 1 or 3 player , 4 player ect and how much would that alter all this information the more or less people playing when the data was recorded ?

Red_Wyrm · « **Reply #9 on:** April 17, 2019, 06:06:35 am »

Quote

The more I see of these graphs, the less convinced I am that these are well-tuned decks. No way does combat damage do better than combo.

Yeah I agree with you here Morgantor. From my interactions with you guys about the more competitive side of EDH, I have noticed that all these videos of EDH games are casual players with casual level decks. The only reason they might be better than other casual player decks is because they have all 10 original duals, and they have force of will, etc.

Quote

p.p.s. this is obviously nothing compared to Morganator's math but I found this very annoying

Haha. You said P P.

Quote

Side note, I do add the probabilities together right

It is funny how numbers work sometimes. Like a coin coming up heads/tails is 50/50, but if you get heads the first flip, it doesn't make the odds of the next flip being tails 100%. It is still 50/50 because the previous flip(s) don't affect the last one(s), but when drawing a deck, each draw does affect the probably of the next card being. For example, we have a two card deck with an ace of spades and ace of hearts. The probability that the first card drawn is an ace of spades is 50/50. If we draw the first card, regardless of what it is, we know what the next card will be (or rather won't be, and since it is a two card deck, we can use deductive reasoning because we is smart.)

Quote

is this all based off 1 vs 1 or 3 player , 4 player

All the data from the command zone episode, which is what this thread is about, (See, staying on topic Morgantor, proud of me?) was from a 4 player game if I am not mistaken.

Okay, so I couldn't figure out how to quote the image, without getting the entire post, so I cheated and took a screenshot.

So what is non combat no combo? Is that killing with an ability like purphoros, god of the forge, with his ETB effect?

Soren841 · « **Reply #10 on:** April 17, 2019, 12:33:52 pm »

Quote from: Red_Wyrm on April 17, 2019, 06:06:35 am

Quote
Side note, I do add the probabilities together right

It is funny how numbers work sometimes. Like a coin coming up heads/tails is 50/50, but if you get heads the first flip, it doesn't make the odds of the next flip being tails 100%. It is still 50/50 because the previous flip(s) don't affect the last one(s), but when drawing a deck, each draw does affect the probably of the next card being. For example, we have a two card deck with an ace of spades and ace of hearts. The probability that the first card drawn is an ace of spades is 50/50. If we draw the first card, regardless of what it is, we know what the next card will be (or rather won't be, and since it is a two card deck, we can use deductive reasoning because we is smart.)

It's the total probability of drawing it BY a certain turn. For the probability of drawing it ON a certain turn you don't add them, but u would add them to see the total probability of having drawn it by that turn.

Morganator 2.0 · « **Reply #11 on:** April 17, 2019, 04:34:57 pm »

Non-combat non-combo is exactly like Purphoros (that was even the example that was used). It could also be something like Gray Merchant of Asphodel.

And the "Other" category is a card like Felidar Sovereign or Helix Pinnacle. It's worrisome that those were the examples used instead of Laboratory Maniac and Approach of the Second Sun.

Soren841 · « **Reply #12 on:** April 17, 2019, 04:39:13 pm »

I mean I have no doubt they're completely casual decks..

Red_Wyrm · « **Reply #13 on:** April 17, 2019, 08:11:43 pm »

Quote from: Morganator 2.0 on April 17, 2019, 04:34:57 pm

Non-combat non-combo is exactly like Purphoros (that was even the example that was used). It could also be something like Gray Merchant of Asphodel.

And the "Other" category is a card like Felidar Sovereign or Helix Pinnacle. It's worrisome that those were the examples used instead of Laboratory Maniac and Approach of the Second Sun.

Isnt laboratory maniac a combo win condition? Or I guess by combo you guys mean specifically infinite combos instead of like some doomsday pile combo.

Morganator 2.0 · « **Reply #14 on:** April 21, 2019, 01:31:07 am »

I can't believe this.

I can't believe how long it took for me to figure this out.

You want to see what overthinking something looks like, scroll up to the top of this page and re-read all of it. I completely overthought all of this. Had I just remembered my first day of stats class, none of this would have happened. This is literally the first thing you learn.

Let me explain...

The first thing you learn about in statistics is the difference between a sample, and the population. The population is everything. If you are looking at the average height of people worldwide, the population is everyone. Every person on Earth. If you are looking at the average height of people in the United States, then the population is everyone living in the United States. And if you are examining the win percentages of commander games, the population is every possible game you could ever have. Every commander, every deck variation that commander could have, every possible combination of decks that could go against each other, and every possible outcome of those games.

As you can imagine, it is impossible (or at least hugely impractical) to measure the population.

So instead, you take a sample. The sample is where you only measure a group people, or you only look at a some commander games, not all of them. But with a good enough sample, you can accurately estimate the properties of the population. Now I thought that with 304 commander games, the Command Zone had a good sample size. And they do; 300 or so games is a very good sample. I've said this already.

Where I messed up, is that I thought that the sample was estimating all commander games in existence. But I watched a few of the games, I saw the decks that were being used. And then it hit me:

These are YouTube videos, they are being done for entertainment. That's why combat damage was the most common win condition; watching someone win with a combo is boring, but with combat damage, it's more entertaining to see someone pull ahead. That's why Helix Pinnacle is a more common win condition than Approach of the Second Sun; it's more entertaining to watch. Those 304 games aren't a sample of all possible commander games, they are only a sample of the possible games made by YouTubers.

I can't believe I didn't think of this sooner.

Summary

Wrapping this up now, I'm not going to spend any more time on it.

All results for The Command Zone's stats were found to be insignificant. Just because the correlation coefficient was -0.19, doesn't mean that Sol Ring decreases your win chance.
None of the results found by The Command Zone will reflect your games. I mean, they might, but it's not likely.

Right, I'm going to lay of the stats for a bit now. I'll wait until something else catches my eye.

So, give it an hour.

Author Topic: Breaking down The Command Zone Stats (Read 8193 times)

Morganator 2.0

Breaking down The Command Zone Stats

Soren841

Re: Breaking down The Command Zone Stats

Morganator 2.0

Re: Breaking down The Command Zone Stats

WWolfe

Re: Breaking down The Command Zone Stats

Morganator 2.0

Re: Breaking down The Command Zone Stats

Soren841

Re: Breaking down The Command Zone Stats

Morganator 2.0

Re: Breaking down The Command Zone Stats

Soren841

Re: Breaking down The Command Zone Stats

dokepa

Re: Breaking down The Command Zone Stats

Red_Wyrm

Re: Breaking down The Command Zone Stats

Soren841

Re: Breaking down The Command Zone Stats

Morganator 2.0

Re: Breaking down The Command Zone Stats

Soren841

Re: Breaking down The Command Zone Stats

Red_Wyrm

Re: Breaking down The Command Zone Stats

Morganator 2.0

Re: Breaking down The Command Zone Stats

		Win	Loss	Total
Had a ring	Actual	25	87	112
	Expected	27.91	84.09
No Ring	Actual	278	826	1104
	Expected	275.09	828.91
Total		303	913	1216