Descriptive-Statistics-Measures-of-Shape-Skewness-and-Kurtosis.pptx

Published on
Embed video
Share video
Ask about this video

Scene 1 (0s)

[Audio] our topic, Descriptive Statistics: Measures of Shape – Skewness and Kurtosis. This presentation is for the subject Data Science under the guidance of Dr. Dipti Patil. In this presentation, we will understand what skewness and kurtosis are, how they are mathematically calculated, and most importantly, how they are used in real data science applications and machine learning..

Scene 2 (25s)

[Audio] Descriptive statistics help us summarize and understand data using numerical measures and visual tools. There are three main types of statistical measures. First, measures of central tendency such as mean, median, and mode, which tell us the center of the dataset. Second, measures of dispersion such as variance and standard deviation, which tell us how spread out the data is. Third, measures of shape, which include skewness and kurtosis. Before applying machine learning models, understanding these measures is very important because they help us understand the behavior and structure of the data..

Scene 3 (1m 7s)

[Audio] Measures of shape describe how the data is distributed around its center. They help us understand whether the data is symmetrical or not and whether it contains extreme values. Even if two datasets have the same mean and standard deviation, their shapes can be very different. Skewness measures asymmetry in the data distribution. Kurtosis measures the heaviness of tails or presence of outliers. These measures are especially important during exploratory data analysis..

Scene 4 (1m 41s)

[Audio] Skewness measures the asymmetry of a probability distribution. If a distribution is perfectly symmetrical, skewness is zero. If the right tail is longer, it is called positive skewness. If the left tail is longer, it is called negative skewness. There is a simple relationship between mean and median. If mean is greater than median, the distribution is positively skewed. If mean is less than median, it is negatively skewed. This happens because extreme values pull the mean towards the tail..

Scene 5 (2m 18s)

[Audio] There are three types of skewness. First is symmetrical distribution, where mean, median, and mode are approximately equal and skewness is close to zero. Second is positive skew. In this case, the right tail is heavier, and the mean shifts to the right. Income distribution is a common example. Third is negative skew. Here the left tail is heavier and the mean shifts to the left. This may occur in exam results where most students score high marks..

Scene 6 (2m 49s)

[Audio] The population skewness formula is based on the third central moment. It is calculated as the expected value of X minus the mean, raised to the power three, divided by the standard deviation cubed. For sample data, we often use Pearson's second coefficient of skewness. This is approximately equal to three times the difference between mean and median divided by standard deviation. The moment-based formula is more precise but highly sensitive to outliers. Pearson's formula is simpler and useful for interpretation..

Scene 7 (3m 24s)

[Audio] Now let us understand skewness with a simple example. Consider the dataset: 2, 3, 4, 5, 20. First, we calculate the mean. The sum of all values is 34. Dividing by 5 gives us a mean of 6.8. Next, we find the median. The middle value is 4. Assume the sample standard deviation is approximately 7.03. Now applying Pearson's formula: Skewness equals 3 multiplied by mean minus median divided by standard deviation. So we calculate 3 multiplied by 6.8 minus 4, divided by 7.03. This gives approximately 1.19. Since skewness is positive and greater than zero, the distribution is positively skewed. This happens because the value 20 is an extreme value which pulls the mean towards the right. This example clearly shows how even one outlier can significantly affect skewness..

Scene 8 (4m 27s)

[Audio] Kurtosis measures the tail heaviness and peak sharpness of a distribution. It tells us how extreme the values in the dataset are. Higher kurtosis means heavier tails and greater probability of extreme values. Lower kurtosis means lighter tails and fewer extreme values. Unlike skewness, kurtosis does not measure direction. It only measures extremity..

Scene 9 (4m 55s)

[Audio] There are three types of kurtosis. Mesokurtic distribution is similar to normal distribution and has kurtosis approximately equal to 3. Leptokurtic distribution has heavy tails and higher probability of extreme values. Excess kurtosis is greater than zero. Platykurtic distribution has light tails and fewer extreme values. Excess kurtosis is less than zero. In financial data, leptokurtic distributions indicate higher risk..

Scene 10 (5m 26s)

[Audio] The population kurtosis formula is based on the fourth central moment. It is calculated as the expected value of X minus mean raised to the power four divided by standard deviation raised to the power four. For normal distribution, kurtosis equals 3. Excess kurtosis is calculated by subtracting 3 from the kurtosis value. If excess kurtosis is positive, distribution is leptokurtic. If zero, mesokurtic. If negative, platykurtic..

Scene 11 (6m 0s)

[Audio] In the data science pipeline, skewness and kurtosis are mainly used during exploratory data analysis. They help detect non-normal distributions and identify outliers. They guide decisions about data transformation techniques such as log transformation or Box-Cox transformation. These steps improve feature engineering and ensure that model assumptions are satisfied before training..

Scene 12 (6m 26s)

[Audio] Many machine learning algorithms assume that data follows a normal distribution. For example, linear regression assumes normally distributed residuals. If data is highly skewed, predictions may become biased. High kurtosis increases the impact of outliers on the model. Therefore, transforming skewed data improves model accuracy and stability..

Scene 13 (6m 51s)

[Audio] Skewness and kurtosis are widely used in real-world scenarios. In finance, they help measure stock return volatility and risk. In healthcare, they help analyze disease outbreak patterns. In business analytics, they describe income distribution of customers. In fraud detection, unusual transaction patterns often show high kurtosis..

Scene 14 (7m 14s)

Conclusion Skewness measures asymmetry of data Kurtosis measures tail heaviness Both are essential in EDA Help detect outliers Improve model reliability Crucial for accurate predictive modelling.

Scene 15 (7m 26s)

[Audio] To conclude, skewness measures asymmetry in data, while kurtosis measures tail heaviness. Both are essential tools in exploratory data analysis. They help detect outliers, improve preprocessing decisions, and increase model reliability. Understanding the shape of data is very important before applying machine learning techniques. Thank you..