Gathering Team Data

Team data is gathered by automatically watching Showdown matches. Generally, all 6 Pokemon on the team are revealed in team preview. Then, throughout the match, more details of the team composition are revealed. Most commonly, one tera type of the Pokemon, and a number of moves on each Pokemon are revealed. Sometimes items and abilities are also revealed. Lastly, the winner of the match and elo of both players are revealed in a Showdown Match. Each turn of the match is recorded, and could possibly be used for data analysis. This could include which Pokemon are led and brought.

Chi-Yu 1 1 Chi-Yu Ghost 0 ['Heat Wave']
Flutter Mane 1 1 Flutter Mane 1 Choice Specs ['Dazzling Gleam', 'Power Gem']
Iron Bundle 1 0 Iron Bundle 0 ['Protect', 'Icy Wind']
Landorus-Therian 0.0 0 0 []
Ogerpon-Hearthflame 1 0 Ogerpon 0 ['Wood Hammer']
Tornadus 0.0 0 0 []

Missing Data

Because we use match data instead of full teamsheets, our data can often be biased. For example, because moves are only recorded when they are used, we may miss certain patterns, such as moves that exist on a Pokemon, but are not clicked very often, or moves that are not often clicked together (e.g., choice item Pokemon).

  • Pokemon Brought: Because we do not always see all 4 Pokemon a player brings to the match, we don’t always know this information. This is handled by dividing the number of unseen slots by the number of remaining Pokemon and distributing it equally. For example, if a player reveals only 3 of their Pokemon before winning, the 3 Pokemon they revealed will have a brought value of 1, and the three they did not reveal will have a brought value of 1/3 each. This way, all players always ‘bring’ 4 Pokemon to keep things balanced.
  • Hidden Forms: Some Pokemon do not reveal their form in team preview, such as Urshifu. We only know which form it is when they bring it. To handle this, we consider all unknown forms to have a probability equal to the distribution of forms when the Pokemon is brought.
  • Item: Items are often revealed when they are used, but not all items have this property. If we simply reported items as they are used, we would over-count activated items that by a very high number. Instead, we use other, less biased, ways of revealing items.

Bot

The games are collected by a bot that watches Showdown matches 24/7. The bot directly connects to the WebSocket as its own client so it only needs to ask Showdown for the necessary information (only text, no images). The bot takes a break for 20 minutes every 4 hours to give Showdown a break, which means roughly 1 out of every 12 matches will be skipped. The bot is also unable to view private matches.

The bot also manually uploads replays if the match meets some high-elo criteria. Specifically, if a match contains a specific Pokemon, and is the highest-elo match that the bot has seen that day that contains that Pokemon, it will save it. At the end of the day, you can see the highest-elo match that was played for each Pokemon.

Calculating the Stats

PokeStats is unique because of how frequently the data is processed. Every day between 4 and 5 am and 4 and 5 pm, the stats are re-computed. This means that no matter when you view PokeStats, the numbers are up-to-date for at most 12 hours ago. We hope this will allow players to notice trends quicker and react to the meta. However, once a massive database of matches have been collected, how do we determine the statistics we show?

Graph

We begin by creating a massive graph where the nodes are Pokemon, Tera Types, Moves and Items. The nodes contain individual stats such as number of instances and number of wins, where the edges between the nodes contain the number of times those two nodes occurred together.

Graph of Stats

Statistics

We then use the constructed graph to compute the statistics for each Pokemon. Most of these can be computed simply. For example, win rates are computed with

WINRATE = (WINS / GAMES)

However, in many cases there are issues of statistical significance. When we inquire about win rate, we aren’t necessarily directly interested in the number of times a Pokemon won divided by the number of matches it played. What we are really inquiring about is how good a Pokemon is, but Pokemon can achieve an unrealistically high win rate by playing a small amount of matches. For example, if a top player uses Squirtle as a meme and wins 4/5 games, Squirtle will have a win rate of 80% since nobody else really uses Squirtle. Whereas Pokemon that are known to be very good typically have a win rate of 52-55% at best. One way of handling this is to simply not report stats the have a small sample size (<30), but we also use another clever technique to make stats more comparable across all Pokemon.

Statistical Significance

This phenomenon happens because of sample size. Because each Pokemon have been used a different number of times, it isn’t fair to directly compare Pokemon that have been used very little to Pokemon that have been used excessively. The statistics we compute of these samples are intended to be a representation of the “true” statistic. In this case, the true statistic would be the win rate of the Pokemon if an infinite number of games were played at all elos. A statistic like this would be much more useful that a statistic of a sample, because it would give a much better idea of how the Pokemon would perform in the future. Our goal then becomes to approximate this idealistic stat as closely as possible. The statistic of a sample is one approximation, but we can do better.

±(z_.8 / √n) * √((w/n)(1-w/n))

What we really want to report is the largest win rate that we are relatively confident would represent the true value, if an infinite number of games were played. We can compute this value using the confidence interval. For example, we might compute a win rate of 52% with a confidence interval of ±0.4% for an 80% confidence interval. This means we can be 80% confident that the “true” win rate is between 51.6% and 52.4%. Then there is a 10% chance of the win rate being above and a 10% of it being below that interval. On PokeStats, we choose to report the lower end of this interval. Thus, the statistics shows here roughly translate to, “We can be 90% confident that the true win rates (or other statistics) are at least this high.” The confidence interval is computed with,

Brought vs Team Preview

When considering synergy between Pokemon on the same team, and how Pokemon match up against Pokemon on the other team, should we consider the team preview or only the set of Pokemon that were brought? For stats where this question is applicable, we average these two values. Pokemon that are seen often together in team preview will have a high synergy, but it will be higher if they are also brought together often as well.

Synergy

The synergy between two Pokemon is computed as the frequency that combination occurs divided by the expected frequency the combination would occur if they were completely uncorrelated. Here is the equation:

(freq(a,b) / (freq(a)*freq(b)))

Pokemon that truly uncorrelated, (having Pokemon $a$ on the team does not increase the likelihood of Pokemon $b$ occurring on the team) will have a synergy of 1. Pokemon that occur together very frequently will have a synergy greater than 1, and infrequent combinations will be less than 1.