You can't simply calculate it and compare it to reality. I'm sure someone who paid better attention in statistics than I did knows what to do here. We're probably gonna need at least 20 more weeks of Xur to be sure.
What we're gonna have to do is get a large set if data, enough to get that standard error as low as we can. Then use some test that I should know by now and test
H0: p(warlock item choice)= 0,6
Against
HA (the alternative hypothesis): p(warlock item choice)=\= 0.6
Then we can at least safely say that it's equally divided or not.
If they are rng besides weighing differently? We'll never know.