logo-StatusNeo
d

WE ARE EXPERTS IN TECHNOLOGY

Let’s Work Together

Image Alt

StatusNeo

Distribution in Data Science

Discrete

1. Binomial

P_{x} = {n \choose x} p^{x} q^{n-x}

x successes in n events, each with p probability

with μ = np and σ2 = npq

P=binomial probability
x=number of times for a selected outcome within n trials
{n \choose x}=number of combinations
p=probability of success on one trial
q=probability of failure on a one-trial
n=number of trials

Note: If n = 1, this can be a Bernoulli distribution


2. Geometric

Geometric distribution may be a variety of opportunity distribution supported three key assumptions. These are arranged as follows.

  • The tests performed are independent.
  • There are often only two results for every trial – success or failure.
  • The probability of success, indicated by p, is that the same for every test.

 first success with p probability on the nth trial

qn−1p, with µ = 1/p and σ2 =1−p/p2


3. Negative Binomial

  • negative binomial distribution (also called the Pascal Distribution) for random variables in a negative binomial experiment.
  • number of failures before r successes


4. Hypergeometric

  • The hypergeometric distribution is very the same as the statistical distribution. In fact, Bernoulli distribution is a superb measure of hypergeometric distribution as long as you create a sample of fifty or less of the population.

  • K is that the number of successes within the population
  • k is that the number of observed successes
  • N is that the population size
  • n is that the number of draws
  • is items of that feature

5. Poisson

number of successes in a hard and fast quantity, where success occurs at a median rate

 µ = σ2 λ



Continuous

1. Uniform

all values between a and b are equally likely

f(x)=1/(b−a)

for a ≤ x ≤ b

Theoretical definition formulas and standard deviations are present

μ=(a+b)/2 and σ=√(b−a)2/12


2. Normal/Gaussian

f(x)= Probability density function
\sigma=Standard deviation
\mu=Mean

Central Limit Theorem – sample mean of i.i.d. data approaches Gaussian distribution.

Empirical Rule – 68%, 95%, and 99.7% of values lie within one, two, and three standard deviations of the mean.

Normal Approximation – discrete distributions like Binomial and Poisson may be approximated using z-scores when npnq, and λ are greater than 10


3. Exponential

f(x;\lambda)=probability density function
\lambda=rate parameter
x=Random variable

memoryless time between independent events occurring at a median rate λ → λe−λx, with µ 1/λ


4. Gamma

time until independent events occurring at a mean rate λ

where p and x are continuous chance variable.

Γ(α) = Gamma function



Concepts

Prediction Error = Bias2 + Variance + Irreducible Noise

1. Bias

wrong assumptions when training can’t capture underlying patterns underfit

2. Variance

sensitive to fluctuations when trainingcan’t generalize on unseen data overfit 

The bias-variance tradeoff attempts to attenuate these two sources of error, through methods such as: 

– Cross-validation to generalize to unseen data

Dimension reduction and have selection 

In all cases, as variance decreases, bias increases.

ML models may be divided into two types: 

Parametric – uses a hard and fast number of parameters with regard to sample size 

Non-Parametric – uses a versatile number of parameters and doesn’t make particular assumptions on the data

3. Cross-Validation

validates test error with a subset of coaching data, and selects parameters to maximize average performance-

k-fold – divide data into k groups, and use one to validate

leave-pout – use p samples to validate and also the rest to train

Reference:- https://github.com/aaronwangy/Data-Science-Cheatsheet

Comments

  • Ravi Kumar

    December 17, 2021

    Awesome

    reply
  • Shreyas Baksi

    December 17, 2021

    Very informative

    reply
  • Shubham Gupta

    December 17, 2021

    Concepts are well explained and easy to understand…!!

    reply
  • Aman

    December 17, 2021

    Great👍👍

    reply
  • Vansh Gupta

    December 17, 2021

    Nice work👍

    reply
  • Jitendr

    December 18, 2021

    Good work 👍

    reply
  • Chirag kr vasav

    December 18, 2021

    nice and informative blog

    reply
  • Kishore kumar

    December 18, 2021

    Very well explained. Great work keep it up

    reply
  • Mritunjay Kumar Singh

    December 18, 2021

    Very well conceptualized. Great work sir👍

    reply
  • Piyush Gupta

    December 23, 2021

    Very detailed and explained. Good job

    reply

Add Comment