I admit I read the article but didn't click on the
Mulligan policy, gameplay logic, etc tab which has all this and a better explanation as to where he is coming from. It is a long read for sure but it does add more
insight.
Mulligan policy: The first seven-card hand is kept if it has three, four or five lands and no more than five combined lands and Signets. Any hand with
Sol Ring and one, two, three, four or five lands is kept as well. All other hands are mulliganed. The second seven-card hand is kept under the same conditions, with one difference: two-land hands are now kept as well. For a mulligan to six or five, after bottoming, we keep if we hold two, three or four lands or if we hold one land and a
Sol Ring. For a mulligan to four, after bottoming, we always keep.
Bottoming is necessary for mulligans to six or lower. To that end, any Signets beyond the first are considered superfluous, so we start by bottoming superfluous Signets as much as necessary and possible. Afterwards, we bottom lands or spells depending on the mix in our hand, counting
Sol Ring as a land that we never put on the bottom. Specifically, we try to get as close as possible to three lands. We first bottom lands until we have at most three lands remaining, and then we get rid of spells if we still must put more cards on the bottom. When we bottom spells this way, the most expensive ones are always bottomed first.
This mulligan policy was manually defined by me based on what seemed most reasonable and realistic. It need not be optimal. In fact, the specific composition of your deck should have a meaningful impact on your mulligan policy. Optimizing a mulligan policy for every possible deck can be done via stochastic dynamic programming, and while this would make for an interesting future research direction, it’s outside the computational scope of this work.
Gameplay logic: On each turn, we start by playing a land if possible, and then we cast
Sol Ring if possible. On turns one and two, we then cast an
Arcane Signet if possible. After a turn one
Sol Ring, we’re done for the turn and can’t cast anything else. Otherwise, on any turn, suppose at this point we have N mana available from lands and mana rocks. Then on turns three and four, we cast a mana rock and an N-1 drop if possible. Subsequently, on any turn, if we don’t hold an N-drop but do hold a two-drop and a distinct N-2 drop, then we cast both of those spells. This is done so that, e.g., a two-drop and a three-drop is favored over casting a four-drop and wasting a mana.
Subsequently, on any turn, we play the highest mana value spell that we can cast, starting with six-drops, then five-drops, and so on, and we repeat this process until there’s nothing we can cast anymore. At the end, if we notice that we have a mana leftover and could’ve snuck in a mana rock, then we do so retroactively.
This gameplay strategy was manually defined by me based on what seemed good to me. It need not be optimal, and the same comments as I made for the mulligan policy also apply here.
Justification of the optimization criterion – expected compounded mana spent. Games are usually won by whoever spends the most mana over the course of a game. If you curve out while the
opposition is failing to affect the board, then you will usually win, and on-board advantages are compounded over time. The way I view the game of Magic, permanents generate some kind of advantage or value every turn, for example in the form of attacking creatures, planeswalker activations or valuable triggers, all measured by the card’s mana cost, and that adds up every turn they stay on the battlefield.
To illustrate this way of thinking, a
Lightning Greaves on turn two will contribute two mana per turn for the rest of the game. A
Mayhem Devil on turn three will contribute three mana on turn three, three mana on turn four and so on. Likewise, a
Parallel Lives will contribute four mana for every turn it stays on the battlefield, and a
Venser, the Sojourner will contribute five mana per turn. In my experience, thinking in this way often leads to well-crafted Magic decks, which is why I find this criterion appealing.
Justification of the optimization criterion – the relevant length of a typical game is seven turns. Maybe it takes a few more turns for the game to truly end, but in my experience, usually one player will have an insurmountable board presence by turn seven, at least in Limited or Standard. In any case, the first seven turns represent the early-to-mid-game, which is the part of the game where curving out matters the most. And given that you generally start with seven cards in hand, if you play a land and a spell on every turn, then turn seven represents the last turn before you run out of cards. With these ideas in mind, I applied this turn seven assumption for Standard and Limited previously. For Commander, I watched the last five matches on The Command Zone’s Game Knights and saw that on average, the first player lost by turn eight or nine. This closely matched the average game length in Limited, and therefore I kept the relevant game length for Commander the same as for single player 40-card or 60-card formats. Again, cEDH where players have Vintage-level turn-three combo decks is a different animal, and my work won’t be very useful there.
Permanents don’t draw cards, tap for mana or act as mana sinks: Real decks may play cards like
Llanowar Elves,
Beast Whisperer or
Shalai, Voice of Plenty, and their abilities might influence your mana curve. However, given the computational limitations of simulation optimization, we must keep the model simple.
When applying my model framework to real decks, you may view
Llanowar Elves as a combination of a mana rock and a one-drop, and you may view
Beast Whisperer or Shalai as regular four-drops. However, if your deck contains a large number of card draw effects and/or mana sinks, then you may still be able to efficiently use your mana with one fewer five-drop and/or one fewer 6-drop than the tables in this article will indicate.
Mana rocks don’t contribute towards compounded mana spent: Mana rocks are treated as lands that cost two mana to cast. If you have
Arcane Signet on the battlefield, then it does not contribute two mana towards the compounded mana criterion. Indeed, it doesn’t attack, block, or provide any beneficial triggers or abilities. Its value lies in letting your cast relevant N-drops, all of which do contribute towards the compounded mana criterion, more quickly.
Six-drops count as 6.2 mana: The power of spells tends to increase disproportionally beyond five mana. For example, in Limited you have vanilla 3/3s for three, vanilla 4/4s for four and vanilla 5/5s for five, but the classic six-drop is
Colossal Dreadmaw, which has trample as a bonus. Which is fair because reaching six or more mana is not trivial and won’t happen every game. It’s a bit of guesswork, but based on 20+ years of competitive Magic experience, I pegged a six-drop as being worth 6.2 mana.
We play against a goldfish: Opponents are assumed to not interact, which also means that we never recast our Commander a second time. This assumption facilitates the analysis. Yet in real games, if a good curve-out forces opponents to spend mana to interact with us, then that means that this curve-out was still worthwhile.
Mana rock modeling: Our deck always contains one
Sol Ring and an adjustable number of Arcane Signets. In real Commander decks, we can of course run only one actual
Arcane Signet, but its effect is very similar to
Three Visits, Signets, Talismans,
Fellwar Stone,
Nature’s Lore and other Commander staples. They all reasonably modeled as an
Arcane Signet for the purpose of this work.
I considered the addition of three-mana ramp spells like
Cultivate or
Kodama’s Reach as a separate category, but I decided against that because an extra card type would
complicate the optimization and gameplay logic considerably. Moreover, it would make the results applicable only to green decks, which wouldn’t be right. If you would like to apply my model framework to real decks, then I’d view spells like
Cultivate or
Kodama’s Reach as a combination of a mana rock and a three-drop.
No color requirements or tapped lands: Incorporating these features would make the model far too complicated. Instead, all lands are basically Command Towers. In reality, well-built mana bases generally shouldn’t run into color screw very often and should limit the number of tapped lands, so I expect that the impact of this assumption is limited.
If your actual deck has a large number of tapped lands, then you may want to run slightly fewer one-drops than my results would suggest. That’s because a tapped land is kind of like a one-drop, especially when you would often play such a land on the first turn of the game.
All cards are on-board effects: This assumption facilitates the analysis. But even interactive spells like
Swords to Plowshares or counterspells could be seen as one-drops and two-drops, respectively, because subtracting something from the
opposition is akin to adding to your own battlefield.
No card draw spells: Card draw spells or cantrips are not included in the model at the moment, but I did
consider them. After all, cards like
Brainstorm,
Read the Bones,
Harmonize and so on are popular inclusions, and a bunch of cheap cantrips could reduce your land count slightly. So initially, I incorporated card draw spells as an option. Unfortunately, in my first runs (which was based on four-mana Commanders), my optimization algorithm never chose to put
Divination or
Harmonize in the final deck. Given these initial results, I removed card draw spells from consideration altogether to speed up the optimization.
The fact that my optimization algorithm avoided
Divination and
Harmonize make sense because my simplified model ends on turn seven and has no specific cards, combos, colors or synergies to dig for. Nor does it feature sweepers or other spells that can regain lost tempo. In my model, casting
Divination on turn six would often draw you into a three-drop and a superfluous land, in which case you would generally be better off by just drawing that three-drop instead of that
Divination, and thus cutting
Divination from the deck and adding a three-drop. In real Commander decks, card draw spells would be more valuable than my single-minded mana curve results suggest.
Evaluation is via simulation: In a model with nine different card types, the number of permutations of even the top 15 cards is astronomical, which means that exact evaluation using multivariate hypergeometric probabilities is not feasible. Instead, I used pseudo-random number generators to shuffle decks and simulate each deck’s performance a bunch of times to estimate its expected compounded mana spent over the course of the first seven turns.
The number of simulations per deck was a function of the iteration that the optimization algorithm is currently in. I start in the first iteration with merely 10,000 simulations per deck in an attempt to quickly
explore. In every iteration, I move to the best deck in the neighborhood, and then increase the number of simulations per deck by 1,000. If we have to reevaluate a deck that we’ve seen in one or more prior iterations, then we combine the simulations from the current iteration with the ones that have already taken place prior. These exact numbers of simulations were set mostly for practical reasons: to ensure that the algorithm would finish in hours rather than weeks or months.
Optimization is via local search: The number of possible decks is also astronomical, so exhaustive enumeration is not feasible. Instead, I used a basic local search heuristic: start at an initial reasonable solution (based on multiplications of optimal 40-card or 60-card decks) and then keep moving to better points in a neighborhood until no better point exists. If that is the case, then we have reached a local optimum. If the number of simulations for the current best deck at that time exceeds 200,000, we then stop and
terminate the algorithm in the hope that it might also be a global optimum. I have no general concavity results, but based on deck building
intuition, I expect the structure of the criterion function to be conducive to a local search heuristic.
The neighborhood in question, in the terminology of Sklenar and Popela (2012), was a cross neighborhood at first and a star neighborhood at the end, plus the best deck from all previous iterations. The cross neighborhood, used when the number of simulations for the current best deck is less than 150,000, is all decks that are obtained by cutting at most one card and adding at most one card in total. For example, cut one two-drop and add one six-drop. When the best deck from the previous iteration, let’s call it D, was simulated for at least 150,000 games, then we switch to the star neighborhood of D, which is all 99-card decks where the number of copies of any individual card type differs at most one from D. For example, cut a one-drop, a two-drop, and a three-drop and add a five-drop, a mana rock and a land.
No guarantee of optimality: Since I used simulation and local search, true optimality cannot be guaranteed. For some Commander mana values, repeated runs from different starting conditions gave the same results. But for most Commander mana values, I got very similar yet slightly different outcomes in different runs. In the latter case, I re-ran the algorithm at least three times and picked the deck with the highest criterion as the final optimum. I didn’t run into these issues for the Standard/Limited models with six distinct card types, but moving to nine distinct card types in Commander exploded the difficulty. Based on observed decks changes across iterations, I don’t think there is a major risk of getting stuck in a local optimum, but there is an issue with random variance, especially with
Sol Ring in the mix.
Intuitively, when the neighborhood consists of hundreds of decks, it’s quite likely that in the simulations, one of them is lucky enough to draw
Sol Ring far more than average, which due to the power of that card would skew the results in that deck’s favor, even if with infinite simulations it would turn out worse. While I understand the theory of simulation and optimization well enough to leisurely set up something fun, interesting and functional for Magic, it’s not my specific area of expertise, and I’d say that this is currently one of the weaker parts of this study.
I considered several options to deal with this
Sol Ring variance problem. First, I could delete
Sol Ring from the model altogether, but that felt wrong because it’s the most-played Commander card overall on EDHREC and because it directly influences curves and land count. Another approach would be to increase the sample size, but my computer was already running for several days, so there are practical limits. Perhaps optimizing the code by using something more clever than the random.shuffle method in
Python or switching to a faster programming language could help, but it probably won’t be the end-all either. Perhaps the most promising approach would be to use variance reduction techniques for rare event simulation, but I simply did not have the time available to familiarize myself with the underlying theory and to apply it to this problem. It’s something to
consider for future updates, though, and I am open to suggestions.
I also welcome any discourse on my various modeling assumptions, especially from players with more Commander experiences than me. Twitter is an easy way to reach me.