Sunday, July 15, 2012
Intro to Predictive Modeling
A predictive model is a way to tell the future based on a data set. Models are representations of some data set. Anyone can perfectly represent a set of data, but to match future data is what makes a model good or bad. Anything can be a model, so our goal is to find the best model we can.
In some examples, data can be made up of one or more features. These can include age, eye color, hair color, height, race and more for a person. Data may represent objects, situations, and countless things.
Another important factor is what data should we collect; many times data is already collected and we need to decide how to use it for our model.
Probability obviously plays an important role in predictive modeling, especially conditional probability. Before we go further, let's go over conditional probability a bit.
Conditional Probability is a method of determining the probability of an event occurring given that another event has occurred.
Wikipedia gives an example (http://en.wikipedia.org/wiki/Posterior_probability)
P(X|Y) = the probability X happens given that Y occurred = (intersection of x events with y events) / P(Y)
How many girls with trousers will be present in a group of children with all trousers?
The probability of girls having trousers = .5
The probability of boys having trousers = 1
40% of the groups is girls and 60% is boys
Probability of any children having trousers = .8
^ is the intersection.
t = trousers
p(g) = .4
p(b) = .6
p(t | b) = 1 (given person is a boy likelihood he has trousers)
p(t | g) = .5 (given person is a girl likelihood she has trousers)
p( g|t) = p(g ^ t) / p(t) (given person has trousers likelihood it is a girl)
p(t) = .8 (probability anyone has trousers)
p(t|g) = p(g ^t)/p(g) = .5
p(g^t) = p(g)p(t|g) = .4*.5 = .2 (rearrange, so we have the intersection)
p(g|t) = .2/.8 (plugin intersection to get answer)
= (1/5) * (5/4) = 5/20 = 1/4
Posterior Probability was illustrated in the above example. This is the probability some qualities are true given some evidence. In the example, the evidence was that everyone was wearing trousers. The quality here was gender.
More to come ...
Labels:
math,
predictive modeling
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment