How data turned me into a millionaire
After a long time, I downloaded and installed World of Warcraft again. Probably influenced by my current job position, this time I approached the game with much more focus on economics and have soon set the goal to reach 1 Million gold.
To put things into perspective, in-game gold can be converted into real money and, at the time of this post, one million gold is enough to pay for 6 months of WoW’s subscription cost.
Initially I tried to earn gold by simply relying on my intuition and (outdated) knowledge of the game. However, the development of my liquid gold was not that promising and I could not even reach a third of my goal. Tired of poor results, I transitioned to a more analytical approach and decided to gather data that could support my decisions. As shown in the chart below, I soon started to increase my available capital and finally reached the desired one million mark.
Data collection
In this notebook we will replicate my results by reading from a copy of the same data that I have used. If you want to use up-to-date data that reflect your server, please use TradeSkillMaster’s API [1] by uncommenting the relevant block code.
import requests
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
# Option 2: read from csv
df = pd.read_csv("data_2018-09-26.csv")
list(df)
#df.head(2)
['Class',
'HistoricalPrice',
'Id',
'Level',
'MarketValue',
'MinBuyout',
'Name',
'NumAuctions',
'Quantity',
'RegionAvgDailySold',
'RegionHistoricalPrice',
'RegionMarketAvg',
'RegionMinBuyoutAvg',
'RegionQuantity',
'RegionSaleAvg',
'RegionSaleRate',
'SubClass',
'VendorBuy',
'VendorSell',
'RegionQuantity_Log']
#'''
#Documentation:
#http://support.tradeskillmaster.com/display/KB/TSM+Public+Web+API
#'''
#region = 'EU' #insert your region
#realm = 'nemesis' #insert your realm
#TSM_KEY = '' #insert your API key. This is available from the TSM website after creating a free account.
#r = requests.get('http://api.tradeskillmaster.com/v1/item/{}/{}?format=json&apiKey={}'.format(region, realm, TSM_KEY))
#df = pd.DataFrame(r.json())
#df.head(10)
Most of the names returned by the TSM API are quite intuitive once we understand the following nomenclature.
- Global: this includes data from every active WoW server.
- Region: data for a specific region, i.e. NA or EU.
- Market: every region has a multitude of “realms”. A realm auction house (or group of realms if they have low population) is what defines a market.
For example, the difference between Quantity and RegionQuantity is that the first case only counts items in Nemesis (the realm I have selected), whereas the second one includes all the listings in Europe. Notice that NumAuctions sometimes differs from Quantity because each auction can include multiple stacks of the same item. E.g. 10x Linen Cloth can all be sold in one single auction, or posted as smaller stacks over multiple auctions.
Some rows are not interesting and/or usable. In this case we remove items that are not currently present in the action house (i.e. RegionQuantity = 0). Moreover, we remove items with a sale rate of 1 because unrealistic.
#Remove data errors / outliers
df_clean = df[(df["RegionQuantity"] > 0) & (df["RegionSaleRate"] < 1)]
print('Before clean-up: ', df.shape)
print('After clean-up', df_clean.shape)
Before clean-up: (17428, 21)
After clean-up (16569, 21)
Data exploration
First, let us look at the data from a high level. For this purpose we will apply t-SNE algorithm [2].
# Relevant numerical columns
feat_cols = ['Level', 'MarketValue', 'MinBuyout', 'NumAuctions', 'Quantity',
'RegionAvgDailySold', 'RegionHistoricalPrice', 'RegionMarketAvg', 'RegionMinBuyoutAvg', 'RegionQuantity',
'RegionSaleAvg', 'RegionSaleRate', 'VendorBuy', 'VendorSell']
#https://medium.com/@luckylwk/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python-8ef87e7915b
from sklearn.manifold import TSNE
tsne = TSNE(verbose=0, n_iter=300, perplexity=60)
tsne_results = tsne.fit_transform(df_clean[feat_cols])
df_tsne = df_clean.copy()
df_tsne['x-tsne'] = tsne_results[:,0]
df_tsne['y-tsne'] = tsne_results[:,1]
Let us plot the two t-SNE dimensions and assign a different color based on the Class category.
sns.scatterplot(data=df_tsne, x='x-tsne', y='y-tsne', hue='Class', alpha=.8,
palette='Dark2')
Given that we have reduced 14 dimensions into 2 dimensions, t-SNE plots are at times difficult to interpret. Nevertheless, we can still notice that there are patterns categorizing different classes. In particular, the Tradeskill class clearly stands out from all the others.
Next step is to dig further into our dataset and gain some valuable insights about WoW’s economics.
Now we group items based on their sale rate and their quantity in the auction house. For a better readibility we will use a log scale for the x-axis (RegionQuantity).
ax = sns.scatterplot(x="RegionQuantity", y="RegionSaleRate",
hue="Class", size=df_clean.Level,
alpha=.8,
palette='Dark2',
#palette='viridis',
legend="brief", data=df_clean)
ax.set(xscale="log")
The bottom-left corner covers item that are present in low quantities and have a low sale rate. This does not always mean that they are less valuable. A real-life example of such case could be a Ferrari, since they are rare and not “traded” very often. Still, they are definitely worth a lot of money.
In WoW there are two main categories of items that fit this quarter of the graph: junk and transmog/rare items. We name junk all these items that can easily be bought from a NPC (non-player character) vendor, which scammers try to dump on inexperienced players by tricking them into thinking that these are unobtainable valuable items. Without going too much into details, the second category are items mainly bought by collectors, who are often willing to pay a high price, but who only cover a small part of the World of Warcraft population.
How do we choose a cutoff for the minimum RegionQuantity to remove the majority of value-inefficient items? A simple approach would be to arbitrarily choose a value based on the chart above. Yet, it also seems appropriate to pick different value based on the item class.
g = sns.catplot(x="Class", y="RegionQuantity", kind="box", data=df_clean, palette="Dark2");
g.set_xticklabels(rotation=90)
g.set(yscale="log")
As expected, each class has different quantities posted at the auction house. The graph above gives a clear picture, but before deciding on our RegionQuantity cutoff, let us go through one more step and figure out if any of the classes can be completely discarded.
We introduce a new variable Worth, measured as: RegionSaleAvg * RegionQuantity. This should given another indication of the items that are sold for a good price, but also “often enough”.
df_clean['Worth'] = df_clean['RegionSaleAvg'] * df_clean['RegionQuantity']
g = sns.FacetGrid(data=df_clean, col="Class",
col_wrap=4, hue="SubClass")
g.map(sns.scatterplot, "RegionSaleRate", "Worth", legend="brief")
Keys are definitely not interesting for our gold making goal. Gems, Glyphs and Quests can be profitable but we will only start considering them after we have set up a steady business in the other categories, which appear to be better (at least based on our simple metrics).
Finally, we can considerably reduce our dataset by removing the unwanted classes and by setting the following RegionQuantiy threshold:
- Containers, Weapons, Armors, Item Enhancements, Miscellaneous, Recipes: 5
- Consumables: 10
- Tradeskills: 75
#Consider replacing this block with a function that instead filters out everything below the k-th RegionQuantity percentile (by Class)
list_low = ['Container','Weapon','Armor','Item Enhancements', 'Miscellaneous', 'Recipe']
list_mid = ['Consumable']
list_high = ['Tradeskill']
df_clean = df_clean[((df_clean['Class'].isin(list_low)) & (df_clean['RegionQuantity'] >= 5)) |
((df_clean['Class'].isin(list_mid)) & (df_clean['RegionQuantity'] >= 10)) |
((df_clean['Class'].isin(list_high)) & (df_clean['RegionQuantity'] >= 75))]
print('Size of new dataset: ', df_clean.size)
print('Number of remaining items per category: ')
df_clean['Class'].value_counts()
Size of new dataset: 32384
Number of remaining items per category:
Consumable 572
Tradeskill 381
Miscellaneous 226
Armor 130
Item Enhancements 109
Weapon 39
Container 11
Recipe 4
Name: Class, dtype: int64
At this point one could argue that we have a reasonable dataset size and that we could start looking into the single items.
After quickly scanning through the item names, we see that the presence for some of these is questionable. For example, [Weak Flux] has still not been filtered out since it shows an ok MarketValue and a reasonable RegionQuantity. This is a clear case of scammers trying to speculate on items that do not have any intrinsic value. [Weak Flux] can be bought from NPCs located at convenient locations, and for a very low price. Still, scammers frequently post such items at a very high price to skew the market value.
How do we protect ourselves from these cheap shots? Our solution is to introduce R1 = MarketValue / RegionMarketAvg. This value shows how different is the average market price of an item in our server, compared to the average price across Europe. Small differences are normal and should be expected, but they should not be as extreme as the 11.7 R1 value for [Weak Flux]. For these reasons, we will remove everything with a local price 75% higher than the regional value, i.e. R1 > 1.75.
df_clean['R1'] = df_clean['MarketValue'] / df_clean['RegionMarketAvg']
df_clean[df_clean['Id'] == 2880].R1
13931 11.708622
Name: R1, dtype: float64
df_clean = df_clean[df_clean['R1'] <= 1.75]
print('Size of new dataset: ', df_clean.size)
Size of new dataset: 27669
On a lighter note: crafted Linen Shirts are more expensive in the Italian server compared to the European average. This is probably explained by the strong Italian fashion culture, rather than a higher presence of scammers in this market sector.
df_LS = df[df['Name'].str.contains('Linen Shirt')]
df_LS['R1'] = df_LS['MarketValue'] / df_LS['RegionMarketAvg']
pd.DataFrame.mean(df_LS.R1)
3.4070546175433356
Results
Finally, we take all the items within the top k% Worth per category (diversification is key) and this will be our final portfolio to invest on!
How do we make money from this list? The long explanation is for another post, but a very simple approach is to buy these items every time that they are posted below 85% their market value, and then repost them at full price.
k = 0.05 # k=5
def quantile_function(df, threshold, col_name):
qnt = df.quantile(threshold)[col_name]
return df[df[col_name] >= qnt]
df_clean.groupby('Class').apply(quantile_function, 1-k, 'Worth').Name
Class
Armor 5215 Darkmoon Deck: Fathoms
9472 Antiseptic Specimen Handlers
11233 Iron-Grip Specimen Handlers
12668 Crushproof Vambraces
12669 Fluid-Resistant Specimen Handlers
Consumable 248 Coastal Healing Potion
386 Flask of the Currents
387 Flask of Endless Fathoms
388 Flask of the Vast Horizon
389 Flask of the Undertow
960 Swamp Fish 'n Chips
969 Galley Banquet
970 Bountiful Captain's Feast
1279 Goblin Glider Kit
1331 Elixir of the Rapid Mind
1333 Medallion of the Legion
1468 Potion of Rising Death
1469 Potion of Bursting Blood
1471 Monel-Hardened Hoofplates
1472 Monel-Hardened Stirrups
1484 Coarse Leather Barding
1485 Drums of the Maelstrom
1493 War-Scroll of Intellect
1494 War-Scroll of Battle Shout
1495 Incendiary Ammunition
1503 Battle-Scarred Augment Rune
1516 Battle Potion of Intellect
1517 Battle Potion of Agility
1518 Battle Potion of Strength
1542 Vantus Rune: Uldir
...
Item Enhancements 14512 Enchant Ring - Pact of Versatility
14518 Enchant Weapon - Quick Navigation
Miscellaneous 16658 Giant Dinosaur Bone
16825 Huge Ogre Cache
17222 Big Bag of Booty
17263 Coin of Many Faces
17274 Pocket Fel Spreader
17281 Tome of the Tranquil Mind
17282 Tome of the Clear Mind
17288 Tome of the Quiet Mind
17315 Sky Golem
17316 Garn Nighthowl
17319 Steelbound Harness
Recipe 14689 Scribe's Research Notes
Tradeskill 13352 Tidespray Linen
13433 Living Steel
13461 Monelite Ore
13463 Storm Silver Ore
13860 Riverbud
13861 Star Moss
13863 Winter's Kiss
13864 Siren's Pollen
13865 Anchor Weed
13866 Sea Stalk
14104 Midnight Salmon
14149 Gloom Dust
14151 Veiled Crystal
14207 Viridescent Ink
Weapon 2380 Precision Attitude Adjuster
3200 Laribole Staff of Alacrity
Name: Name, Length: 64, dtype: object
References
[1] TradeSkillMaster
[2] t-SNE algorithm