Descriptive-Statistics-Measures-of-Shape-Skewness-and-Kurtosis.pptx (1).pdf

Published on Slideshow
Static slideshow
Download PDF version
Download PDF version
Embed video
Share video
Ask about this video

Scene 1 (0s)

Descriptive Statistics: Measures of Shape — Skewness & Kurtosis Subject: Data Science Mentor: Dr. Dipti Patil Students: Esha Belowo Manushri Deshpande UIT2022801 UIT20238222.

Scene 2 (26s)

Introduction to Descriptive Statistics “Before understanding skewness and kurtosis, we must first understand descriptive statistics. Descriptive statistics help us summarize and understand data using numerical measures and visualizations. There are three main types of measures: First, measures of central tendency — like mean, median, and mode — which tell us the center of the data. Second, measures of dispersion — like variance and standard deviation — which tell us how spread out the data is. Third, measures of shape — which include skewness and kurtosis — and these describe the distribution pattern of the data. In data science, descriptive statistics help us understand the data before applying machine learning models.

Scene 3 (1m 15s)

Measures of Shape Measures of shape describe how data are distributed around the centre: symmetry, tail weight and peak. They complement central tendency and dispersion by revealing asymmetry and extremal behaviour. • Summarise asymmetry (skewness). • Quantify tail weight and peakedness (kurtosis). • Guide transformations and outlier handling..

Scene 4 (2m 27s)

Skewness — Definition Skewness measures asymmetry of a probability distribution. Positive skew: long right tail; negative skew: long left tail. Skewness indicates whether the mean is pulled away from the median. Relationship: For moderate skew, mean > median suggests positive skew; mean < median suggests negative skew..

Scene 5 (3m 7s)

Types of Skewness Symmetrical Mean z median z mode; skewness Positive skew Right tail heavier; mean pulled to right; common in income, waiting times. Negative skew Left tail heavier; mean pulled to left; seen with exam scores with ceiling effects..

Scene 6 (4m 33s)

Skewness — Formulae Population skewness (third central moment) = - u)A3] / oA3 , where u is population mean and o is population standard deviation. Notes Pearson's second coefficient (sample) Sk 3 (mean - median) / s , where s is sample standard deviation. Useful for quick approximation and teaching intuition. Moment-based skewness is sensitive to outliers; Pearson's formula is robust for small samples but approximate..

Scene 7 (5m 13s)

Worked Example — Skewness Dataset: 2, 3, 4, 5, 20 Step-by-step • Mean = (2+3+4+5+20)/5 = 34/5 = 6.8 • Median = 4 (middle value) • Assume sample SD s z 7.03 (approximate; computed from sample variance) • Pearson's skewness: Sk z 3(6.8 - 4) / 7.03 8.4/7.03 1.19 Interpretation: Sk z 1.19 indicates moderate positive skew; the value 20 creates a long right tail and pulls the mean above the median. Visual: histogram and boxplot emphasise the outlier (20) driving skew and increasing variance..

Scene 8 (6m 8s)

Heavy-tailed vs Light tailen ts Heavy- Oded Kurtosis — Definition Kurtosis measures tail weight and peak sharpness of a distribution relative to a normal distribution. It quantifies propensity for extreme values (outliers). Informal: Higher kurtosis * heavier tails and a sharper peak; lower kurtosis lighter tails and flatter peak. Heavy-taxed distributions Heavy-tailed Light disto Indio.

Scene 9 (6m 37s)

Types of Kurtosis Mesokurtic Reference: normal distribution; kurtosis z 3 (excess kurtosis z o). Leptokurtic Heavy tails, more frequent extreme values; excess kurtosis > o — greater outlier risk. Platykurtic Light tails, fewer extremes; excess kurtosis < o — distribution is flatter than normal..

Scene 10 (7m 13s)

• : 1.002 Kurtosis KURTOSIS Lyx - 15x - - u)A4] 1. Excess > o: leptokurtic — heavier tails, higher outlier xr (.o.5,o-45.fr)- 15.+10,) xx- 15.0 xr- 16.'1X- IWüCATi0N xB+I—1, 2x.z55-(0'r) rx = +12 + Ox rx •.1.65.) HAPTOR GtAtA5 X- J.7S9 Kurtosis — Formula & Interpretation Population kurtosis (fourth central moment): K - / oA4. Standard kurtosis for normal = 3. Excess kurtosis = K - 3. Interpretation: probability. 2. Excess z o: mesokurtic — similar to normal. 3.Excess < o: platykurtic lighter tails, fewer extremes. Note: sample estimators include bias corrections; moment estimators can be unstable in small samples..

Scene 11 (7m 47s)

Role of Skewness & Kurtosis in Data Science Pipelin • Used during Exploratory Data Analysis (EDA) • Helps detect non-normal data distribution • Identifies presence of outliers • Guides data transformation (log, square root, Box-Cox) • Improves feature engineering decisions • Ensures better model assumptions Data Collection Data Cleaning EDA (Skewness & Kurtosis) Feature Engineering Model Building.

Scene 12 (9m 14s)

Importance in Machine Learning • Many ML models assume normal distribution • Linear Regression assumes normal residuals If income data is highly right-skewed • Skewed data affects prediction accuracy • High kurtosis increases outlier impact • Data transformation improves performance Apply log transformation. Example: Right-Skewed Income Data Income Log Transformed Income Data Log(lncome).

Scene 13 (9m 45s)

Real-World Applications of Skewness & Kurtosis Finance - Stock return volatility Risk assessment (high kurtosis = high risk) Healthcare - Disease outbreak patterns Business Analytics - Customer income distribution Fraud Detection - Unusual transaction detection Add small icons for each field..

Scene 14 (10m 12s)

Conclusion Skewness measures asymmetry of data Kurtosis measures tail heaviness Both are essential in EDA Help detect outliers Improve model reliability Crucial for accurate predictive modelling.

Scene 15 (10m 48s)

Thankyou Understanding data shape leads to better data-driven decisions.