deckstats.net
You need to be logged in to do this.
The buttons above will open in a new window. Please return to this window after you have logged in. When you have logged in, click the Refresh Session button and then try again.

Author Topic: Power Level of Commander decks: Empirical Data  (Read 1182 times)

Morganator 2.0

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2633
  • Karma: 2503
  • Decks
Power Level of Commander decks: Empirical Data
« on: November 12, 2019, 01:41:12 am »
"What is the power level of your deck?" It's a common phrase you hear. A question asked by both casual and competitive players alike. No rational person wants to walk into a commander game with a deck that isn't on the same level as everyone else. The common metric is to give the deck a rating from 1 to 10 on how powerful it is. The problem; everyone's opinion is different. I've seen people who think their deck is a 10, because they've never seen a Flash Hulk or storm deck before. I've seen people give a 7 to decks that are much more powerful than they think. How well does this rating system actually work? And is there a way to unify the rating system?



You know what my favorite thing about Deckstats is? The stats.

Over the last month or two, I've been collecting data from the people in my playgroup. The task was simple; I gave them a sheet of paper that listed all of the active decks in our meta, and asked them to give each deck a rating from 1 to 10. If they didn't know the deck well enough, they would leave the entry blank.

Here are the decks, ordered from highest average rating to lowest. Names have been changed in order to keep identities a secret. Commanders with (c) next to them denote a deck at competitive commander power.

CommanderPlayerSelf-rateRatingst.devNumber of votes
Urza, Lord High Artificer (c)Gohan108.890.9289
Rashmi, Eternities Crafter (c)Krillin98.790.6997
Krenko, Mob BossMorganator 2.0    88.750.84914
Edric, Spymaster of Trest (c)Morganator 2.0108.611.14714
Brago, King EternalPiccolo98.330.8166
Brago, King Eternal (c)Android 1888.141.4647
Marath, Will of the WildPiccolo98.000.0003
Marwyn, the Nurturer (c)Gohan108.000.8669
Oona, Queen of the Fae (c)Frieza97.830.7643
Vilis, Broker of BloodKrillin77.800.8375
Atla Palani, Nest Tender (c)Morganator 2.097.781.8569
K'rrik, Son of Yawgmoth (c)Cell87.751.5004
Kruphix, God of HorizonsKrillin87.630.4794
Selvala, Heart of the Wilds (c)Goku87.581.1146
Prime Speaker Vannifar (c)Gohan87.571.1347
Jodah, Archmage EternalBulma87.461.03013
Najeela, the Blade-Blossom (c)Gohan77.451.06610
The Scarab GodMorganator 2.077.431.07214
Alesha, Who Smiles at DeathPiccolo6.57.330.5773
Jhoira, Weatherlight CaptainTien77.250.9574
Meren of Clan Nel TothGoku77.100.8945
Gaddock TeegGoku77.081.1146
Ayula, Queen Among BearsBulma77.071.15814
Karlov of the Ghost CouncilFrieza77.000.8164
The First SliverFrieza87.001.7859
Marrow-GnawerYamcha67.002.1604
Elsha of the Infinite (c)Gohan86.941.3218
Tasigur, the Golden FangPiccolo86.830.7643
Marchesa, the Black RoseGoku76.791.0757
Elenda, the Dusk RosePiccolo76.750.5004
Garna, the BloodflameTrunks86.751.5004
Yarok, the DesecratedGohan86.730.87611
Tuvasa the SunlitGohan76.631.1888
Gonti, Lord of LuxuryPiccolo66.600.5485
Rhys the RedeemedAndroid 177.56.601.6735
Grenzo, Dungeon WardenGoku56.581.4976
Kadena, Slinking SorcererGohan66.570.7877
Gargos, Vicious WatcherGohan76.501.0004
Marchesa, the Black RoseYamcha7.56.501.2914
Hapatra, Vizier of PoisonsAndroid 1876.501.3237
Marwyn, the NurturerTrunks76.501.3786
Krav+RegnaChi-Chi56.441.1168
Nekusar, the MindrazerAndroid 1766.400.8945
Golos, Tireless PilgrimChi-Chi46.361.1807
Breya, Etherium ShaperPiccolo9.56.330.5773
The Scorpion GodPiccolo5.56.330.5773
Greven, Predator CaptainGohan66.291.3807
Arcades, the StrategistChi-Chi56.251.5416
Edgar MarkovVegeta46.140.6907
Anje Falkenrath (c)Cell96.002.4494
Gahiji, Honored OneYamcha7.55.750.5004
Momir Vig, Simic VisionaryTrunks45.331.5283
Meren of Clan Nel TothTien55.002.0003
Gisela, Blade of GoldnightChiaotzu6.674.921.37912
Golos, Tireless PilgrimPiccolo24.402.0745
Yargle, Glutton of UrborgBulma24.092.02311

Pre-analysis

I did some asking around about why some people voted the way they did. A lot of these outcomes conflicted with how I perceived the strength of decks. In particular, Krenko, Mob Boss versus Edric, Spymaster of Trest. As the creator of both these decks, I am certain that Edric is more powerful, no contest. Edric can take on powerful cEDH decks, while Krenko is limited to high-power decks. So this prompted me to ask about how people rate decks. Two in particular caught my attention. Chi-Chi would rate decks entirely based on how fast they could win. She claims that she used to be in a cEDH league, so she's seen much faster decks before. The next one was Android 17. He rated decks based on how much of a threat they were to him. For example, he considered Yamcha's Marrow-Gnawer deck to be a 10, because his Rhys the Redeemed deck can't deal with an army of Rat Colony with fear.

The difference of opinions is something I've noticed. The two Brago, King Eternal decks both made it to the top 10, but only one of them is a cEDH deck... and it was rated lower than the non-cEDH deck. Now if you know how standard deviations work, you'll know that there isn't a statistically significant difference between the two Brago decks, and there also isn't a significant difference between my Edric and Krenko decks. But what is important is that some people did rate Krenko higher than Edric, and some people did rate the cEDH Brago lower than the casual Brago.

I'm not the only one that noticed this. When I told Gohan that his Urza, Lord High Artificer was the strongest deck, he quite comically said "What? No it isn't."

Error ratings
I made sure that people would rate their own decks, for two reasons. One, it gave them a baseline to compare deck powers to. Two, I wanted to see how well they could guess the power of their decks compared to everyone else. Here's the data for that. Names with (c) next to them denote players who have experience with cEDH deck-building.

PlayerNumber of Decks    Average deck power    Self-rate avg    Highest power    Lowest Power    Deck Range    Self-guess
error   
Other deck
guess error   
# of ratings
Android 1726.506.756.606.400.2010%16%10
Android 18 (c)47.327.508.146.501.645%11%28
Bulma36.215.677.464.093.3720%22%10
Cell (c)46.888.507.756.001.7527%16%4
Chiaotzu14.926.674.924.920.0036%19%32
Chi-Chi36.354.676.446.250.1926%24%24
Frieza (c)47.288.007.837.000.8310%15%11
Gohan (c)107.167.708.896.292.6011%18%47
Goku (c)87.036.807.586.581.007%17%29
Krillin (c)38.078.008.797.631.166%10%19
Piccolo106.776.948.334.403.9320%8%10
Tien36.136.007.255.002.252%13%26
Morganator 2.0 (c)48.148.508.757.431.3212%16%48
Trunks46.196.336.755.331.4217%8%10
Vegeta16.144.006.146.140.0035%22%2
Yamcha36.427.007.005.751.2520%16%12

Damn. I'm getting close to Deckstat's character limit. It's a little much to ask you guys to draw conclusions from this, so instead ask me questions. Ask for details about the group, and also some tests that I could do on this data. I know for sure that I'm going to test the predictive power of cEDH players versus casual players.

What I do know right now; the 1 to 10 system for rating commander decks is very inaccurate. It is a metric based entirely on personal experience, and who am I to say that I know better than everyone else? I really want to find a better way for people to rate their commander decks, similar to what Judaspriester and Dexflux were doing a while back.
https://deckstats.net/forum/index.php/topic,49777.0.html

Slyvester12

  • Hero Member
  • *****
  • Posts: 844
  • Karma: 540
  • Decks
Re: Power Level of Commander decks: Empirical Data
« Reply #1 on: November 12, 2019, 04:45:20 am »
I think asking people to rate decks with no guidelines is mostly a waste of time (which is why most people are unhappy with the 1-10 system).
Judaspriester and Dexflux have a good idea with establishing metrics to rate, like mana base and individual card power, but they're also basing it on arbitrary levels within those metrics.
The easiest way to measure power would be to check clearly definable categories like "by what turn does the deck have a 90% chance of producing mana equal to its average CMC in all of its colors?" This would be an easy number to calculate, and it wouldn't be biased against decks the way demanding a certain amount of mana or other questions would be. Fast, low CMC decks would have low numbers, and shambling rainbow monstrosities with an average CMC of 6 would have huge numbers.
Taking a 5-10 categories like this, then playing in a league style format that makes every deck play every other deck, then checking the results to see which categories had the most predictive power and repeating this process 10 or so times should give a reasonably reliable system, in my opinion.
But that sounds like a crazy amount of work that no one wants to do. Sorry for the long post.
« Last Edit: November 12, 2019, 06:12:11 am by Slyvester12 »
Elves and infect are the best things in Magic.

robort

  • Patron
  • Hero Member
  • *****
  • Posts: 1733
  • Karma: 429
  • Decks
Re: Power Level of Commander decks: Empirical Data
« Reply #2 on: November 12, 2019, 03:32:58 pm »
All right I will ask questions. First Gohan said even comically "No it isn't" but yet why did he rated it a 10. So why the small hint of surprise that it actually was rated the strongest?

Yet the Urza also got less votes then Krenko and Edric. That brings me to Marath deck that has 3 votes with an avg of 8 and a self rating of 9. So 2 others have played against this deck and 1 rated it a 7 while the other rated it an 8. Another vote or 2 can significantly change the rating in either direction. Is it because that deck isn't brought out enough for others to give it a rating? or?

Then I wonder how many 9-10's were gave out and how many 1-3's were gave out. I am wondering how much a deck is either overrated or underrated. Why I say this is because Yargos and Golos's deck owners both gave out 2's but yet both avg over 4.

My Last question for now also pertains to there is nothing really rated under a 4. Just 3 decks are rated under 5 and 3 are rated under 6. Is there something about having a deck lower then 6 so bad?? If you were to go play against say 3 random other players and you have nothing lower than a 6 but yet none of these players have anything higher than 4 or 5 then why isn't there many lower end decks to combat this if everyone is aiming to get the majority of their decks well over 6?
A legend in my own mind or so what the voices keep telling me

Morganator 2.0

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2633
  • Karma: 2503
  • Decks
Re: Power Level of Commander decks: Empirical Data
« Reply #3 on: November 12, 2019, 06:06:05 pm »
I like all of these questions. I'll address them one at a time.

All right I will ask questions. First Gohan said even comically "No it isn't" but yet why did he rated it a 10. So why the small hint of surprise that it actually was rated the strongest?

Gohan gave two other decks a rating of 10; his own Marwyn, the Nurturer deck and my Edric, Spymaster of Trest deck. So while he knew Urza was strong and deserved a 10 rating, seeing it as #1 overall confused him. He's confident that both Marwyn and Edric are stronger decks, a sentiment I agree with.

Yet the Urza also got less votes then Krenko and Edric. That brings me to Marath deck that has 3 votes with an avg of 8 and a self rating of 9. So 2 others have played against this deck and 1 rated it a 7 while the other rated it an 8. Another vote or 2 can significantly change the rating in either direction. Is it because that deck isn't brought out enough for others to give it a rating? or?

Damn, I forgot to bring up two important details. First off, the self-rate doesn't count towards the deck's overall rating. I did this initially because I wanted to compare how well people guessed the strength of their deck compared to what everyone else (excluding them) thought.

For that Marath deck all three people gave the deck a rating of 8, which is why it has no standard deviation. When you see a deck with a low number of votes, it's hard to draw anything conclusive from it, because your absolutely right: one or two votes could swing this deck away from the average of 8. Be wary about the decks with a low number of votes (it's why I included that part of the table). I eliminated all decks that only has 1 or 2 votes.

Then I wonder how many 9-10's were gave out and how many 1-3's were gave out. I am wondering how much a deck is either overrated or underrated. Why I say this is because Yargos and Golos's deck owners both gave out 2's but yet both avg over 4.

Here are the counts for each number.
1: 2
2: 3
3: 4
4: 17
5: 31
6: 67
7: 111
8: 72
9: 45
10: 11

As you can see, the most common vote was 7. Most decks (even the ones near the top) got at least one vote of 7. People weren't too inclinde to give out votes of less than 4, although I'm not sure why. I certainly don't think any of these decks are deserving of 3 or less, because none of them are all that bad. Even I didn't give out anything less than a 6.

This could be because people think of 7 as being a pretty average deck. If you have a strong mana base, included some interaction, and have a solid game-plan, you pretty much have a deck at a rating of 7.

Of course, saying you have an average deck is a pretty vague answer, because as we've seen, everyone has a different opinion of average. Same thing if you say your deck is pretty strong. That could mean anything.

My Last question for now also pertains to there is nothing really rated under a 4. Just 3 decks are rated under 5 and 3 are rated under 6. Is there something about having a deck lower then 6 so bad?? If you were to go play against say 3 random other players and you have nothing lower than a 6 but yet none of these players have anything higher than 4 or 5 then why isn't there many lower end decks to combat this if everyone is aiming to get the majority of their decks well over 6?

This could be something exclusive to my meta. Compared to some of the other play-group's I've been to, it's actually pretty strong for a non-cEDH group. Me and a friend (Bulma) went to a recent commander tournament hosted by an outside group. We weren't using cEDH decks, and we still won every game we played. Oops.

This also has to do with the way averaging works. You'll also notice that no deck is rated above a 9. 9 to 10 is supposed to be the score for a competitive deck, so are none of these decks competitive? Certainly not.
« Last Edit: November 13, 2019, 05:10:56 pm by Morganator 2.0 »

Morganator 2.0

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2633
  • Karma: 2503
  • Decks
Re: Power Level of Commander decks: Empirical Data
« Reply #4 on: November 14, 2019, 01:15:26 am »


I'm still trying to answer this question. I'm going to work on a new way of rating decks, but before that, I want to see what the differences between cEDH players and casual payers are when it comes to rating decks. I'm not going to be doing any actual stats this time around, because the sample sizes are very low so nothing conclusive can be drawn (also I'm lazy). Instead this will be purely visual.

Step 1, I cut down the number of decks. The table only includes decks that had at least 5 votes, so I removed about half of the decks that were originally here. Step 2, make two columns, one for the average score cEDH players gave to the deck, and one for the casual players. Here's the table, ordered from highest cEDH score to lowest.

CommanderPlayercEDH  Casual  cEDH votes  Casual votes  Self-rate  Original Rating
Edric, Spymaster of Trest (c)Morganator 2.0    9.338.0668108.61
Brago, King Eternal (c)Android 189.007.802588.14
Rashmi, Eternities Crafter (c)Krillin8.808.755298.79
Urza, Lord High Artificer (c)Gohan8.679.3363108.89
Krenko, Mob Boss Morganator 2.08.339.066888.75
Brago, King EternalPiccolo8.338.333398.33
Marwyn, the Nurturer (c)Gohan8.207.7554108.00
Najeela, the Blade-Blossom (c)Gohan8.007.084677.45
Selvala, Heart of the Wilds (c)Goku8.007.173387.58
Vilis, Broker of Blood (c)Krillin8.007.004177.80
Prime Speaker Vannifar (c)Gohan7.757.334387.57
The Scarab God Morganator 2.07.587.316877.43
Hapatra, Vizier of Poisons Android 187.506.102576.50
Atla Palani, Nest Tender (c)Morganator 2.07.408.255497.78
Gaddock Teeg Goku7.336.833377.08
Marwyn, the Nurturer Trunks7.335.673376.50
Jodah, Archmage Eternal Bulma7.177.716787.46
Yarok, the Desecrated Gohan7.086.306586.73
Rhys the Redeemed Andriod 177.006.00327.56.60
Meren of Clan Nel Toth Goku7.007.253277.10
Elsha of the Infinite (c)Gohan6.707.335386.94
Grenzo, Dungeon Warden Goku6.676.503356.58
The First Sliver Frieza6.587.836387.00
Ayula, Queen Among Bears Bulma6.577.577777.07
Gonti, Lord of Luxury Piccolo6.507.004166.60
Tuvasa the Sunlit Gohan6.407.005376.63
Edgar Markov Vegeta6.336.003446.14
Kadena, Slinking Sorcerer Gohan6.257.004366.57
Krav+RegnaChi-Chi6.206.835356.44
Arcades, the Strategist Chi-Chi6.136.504256.25
Golos, Tireless Pilgrim Chi-Chi6.107.005246.36
Greven, Predator Captain Gohan6.006.674366.29
Marchesa, the Black Rose Goku6.007.383476.79
Nekusar, the Mindrazer Andriod 176.007.003266.40
Golos, Tireless Pilgrim Piccolo5.672.503224.40
Gisela, Blade of Goldnight Chiaotzu5.174.67666.674.92
Yargle, Glutton of Urborg Bulma4.004.206524.09

I'm a little more happy with how this table looks than the last one, and not just because it was my deck that made the top spot. The competitive Brago deck is ranked higher than the non-competitive. I'm still not sure about some things though. Like why is Elsha ranked so low?

But here's the real question, and it's one that I still can't answer:

Who is right about the strength of decks?

Are cEDH players better at rating decks? Are casual players better? Does it even matter if everyone has a different view?

I still hate that this 1 to 10 rating system is based entirely on past experience. There has got to be a way to measure the strength of a deck where it isn't based on opinion. I'd really like "What turn can you win on?" to be a good way, but it just isn't. From my experience people tend to exaggerate how fast their deck can win, and not all decks are based on speed. There is also the issue that you don't really count how many turns go by in a casual game. After 6 you kinda stop counting.

Alright, brainstorming time. I'm determined to find a better rating system.

Who's with me?

dexflux

  • Jr. Member
  • **
  • Posts: 75
  • Karma: 26
  • Decks
Re: Power Level of Commander decks: Empirical Data
« Reply #5 on: November 16, 2019, 01:26:09 am »
Rating complex structures is quite interesting.

As you already now, I am a fan of rating decks according to metrics. As Slyvester12 said, we used arbitrary levels for those metrics (which is mainly because the people we play with can't be bothered to calculate relevant numbers for their decks and because it's fairly simple), but we could do better.

The easiest way to measure power would be to check clearly definable categories like "by what turn does the deck have a 90% chance of producing mana equal to its average CMC in all of its colors?" This would be an easy number to calculate, and it wouldn't be biased against decks the way demanding a certain amount of mana or other questions would be. Fast, low CMC decks would have low numbers, and shambling rainbow monstrosities with an average CMC of 6 would have huge numbers.

That is what I would try. The first problem is obviously to define the metrics properly and map the numbers that the calculations spit out to a more readable format.

Essentially, have a function that takes metrics data and spits out a number that then can be mapped to human-readable [1..5].

"by what turn does the deck have a 90% chance of producing mana equal to its average CMC in all of its colors?"

I'd take the data the common cEDH decklists produce with such a calculation as a ceiling, hence mapping it to 5 and going down from there.