Distribution in Data Science
x successes in n events, each with p probability
with μ = np and σ2 = npq
|=||number of times for a selected outcome within n trials|
|=||number of combinations|
|=||probability of success on one trial|
|=||probability of failure on a one-trial|
|=||number of trials|
Geometric distribution may be a variety of opportunity distribution supported three key assumptions. These are arranged as follows.
- The tests performed are independent.
- There are often only two results for every trial – success or failure.
- The probability of success, indicated by p, is that the same for every test.
first success with p probability on the nth trial
qn−1p, with µ = 1/p and σ2 =1−p/p2
3. Negative Binomial
- A negative binomial distribution (also called the Pascal Distribution) for random variables in a negative binomial experiment.
- number of failures before r successes
- The hypergeometric distribution is very the same as the statistical distribution. In fact, Bernoulli distribution is a superb measure of hypergeometric distribution as long as you create a sample of fifty or less of the population.
- K is that the number of successes within the population
- k is that the number of observed successes
- N is that the population size
- n is that the number of draws
- X is items of that feature
number of successes x in a hard and fast quantity, where success occurs at a median rate
µ = σ2 = λ
all values between a and b are equally likely
for a ≤ x ≤ b
Theoretical definition formulas and standard deviations are present
μ=(a+b)/2 and σ=√(b−a)2/12
|=||Probability density function|
Central Limit Theorem – sample mean of i.i.d. data approaches Gaussian distribution.
Empirical Rule – 68%, 95%, and 99.7% of values lie within one, two, and three standard deviations of the mean.
Normal Approximation – discrete distributions like Binomial and Poisson may be approximated using z-scores when np, nq, and λ are greater than 10
|=||probability density function|
memoryless time between independent events occurring at a median rate λ → λe−λx, with µ = 1/λ
time until n independent events occurring at a mean rate λ
where p and x are continuous chance variable.
Γ(α) = Gamma function
Prediction Error = Bias2 + Variance + Irreducible Noise
wrong assumptions when training → can’t capture underlying patterns → underfit
sensitive to fluctuations when training→ can’t generalize on unseen data → overfit
The bias-variance tradeoff attempts to attenuate these two sources of error, through methods such as:
– Cross-validation to generalize to unseen data
– Dimension reduction and have selection
In all cases, as variance decreases, bias increases.
ML models may be divided into two types:
– Parametric – uses a hard and fast number of parameters with regard to sample size
– Non-Parametric – uses a versatile number of parameters and doesn’t make particular assumptions on the data
validates test error with a subset of coaching data, and selects parameters to maximize average performance-
– k-fold – divide data into k groups, and use one to validate
– leave-pout – use p samples to validate and also the rest to train