House Price Prediction Using Hedonic Price Model and XGBoost; A Comparative Study - Department of Mathematics, KNUST

Published on
Embed video
Share video
Ask about this video

Scene 1 (0s)

House Price Prediction Using Hedonic Price Model and XGBoost; A Comparative Study Department of Mathematics, KNUST Bedu-Addo Doreen Adu Boahene Theophilus Korang Samuel September 12, 2025 1 / 26.

Scene 2 (10s)

Table of Contents ▷ 1. Introduction ▷ 2. Problem Statement ▷ 3. Objectives of Study ▷ 4. Literature Review ▷ 5. Methodology ▷ 6. Results and Analysis ▷ 7. Conclusion ▷ 8. Recommendations ▷ 9. References 2 / 26.

Scene 3 (23s)

Introduction • Housing represents both shelter and investment • Real estate market is a reliable indicator of economic growth • In 2023, real estate in Ghana contributed approximately 1.6 billion Ghanaian cedis, which is roughly 121.2 million U.S. dollars, to the country’s GDP (Statista, 2024). 3 / 26.

Scene 4 (39s)

Problem Statement • Residential property valuation in Ghana currently relies on the replacement cost method (Local Government Act, 1993). • This method estimates value based on the cost of replacing the asset with a similar one. • It does not consider critical market factors such as location, neighborhood amenities, and demand. • There is a growing need to adopt modern, data-driven valuation techniques. • This study aims to model property values using multiple variables and predictive algorithms to improve valuation accuracy. 4 / 26.

Scene 5 (1m 1s)

Objectives of The Study • Develop a hedonic pricing model explaining the internal and external factors that affect house prices. • Run the XGBoost algorithm with the same factors. • Compare XGBoost and hedonic price model performances. 5 / 26.

Scene 6 (1m 14s)

Literature Review • Structural Attributes: • Number of rooms, floor space, age, condition. • (Garrod and Willis, 1992) found a 7% value per added room and 14% increase for a second bathroom. • Locational Attributes: • Distance to the city center, hospitals and a nice view. • Accessibility to transportation. • Oceanfront properties had 147% higher value followed by by those with partial ocean view 30%(Benson et al., 1998). • Neighborhood Attributes: • Crime rates and noise levels in particular. • (Wilhelmsson, 2000) concluded that properties in noisy environments are worth 30% less. • In Stockholm, (Ceccato and Wilhelmsson, 2011) found that residential burglary had the most negative impact of prices followed by other criminal activities. 6 / 26.

Scene 7 (1m 43s)

Methodology The starting point in hedonic price modelling is the assumption that the price P of a property is a function of say k characteristics measured by “quantities” xk and an error term ϵ. P = f (x1, x2, ..., xK) + ϵ The two best-known hedonic model specifications are the fully linear where β0..βk are parameters; P = β0 + K � k=1 βkxk + ϵ and the logarithmic-linear model ln P = β0 + K � k=1 βkxk + ϵ Hedonic models are most commonly estimated using regression analysis (“Hedonic regression”, 2025) 7 / 26.

Scene 8 (2m 4s)

eXtreme Gradient Boosting (XGBoost): is a scalable and improved version of the gradient boosting algorithm designed for efficacy, computational speed and model performance. (Malik et al., 2020) Source: Adapted from [Elmatary et al., 2021] 8 / 26.

Scene 9 (2m 17s)

XGBoost Notation • xi: Input feature vector for the i-th data point. • yi: True output (target value) corresponding to xi. • ˆyi: Predicted output for xi from the model. • fk(xi): Prediction from the k-th tree for instance i. • K: Total number of boosting rounds (trees). • L(yi, ˆyi): Loss function (e.g., mean squared error). • Ω(fk): Regularization function to penalize complexity. • T: Number of leaves in a tree. • ωj: weight of leaf j. • γ: Minimum split loss in a Regression tree • λ: L2 Regularization parameter 9 / 26.

Scene 10 (2m 44s)

Model: Assuming we have K trees, the model can be expressed as a sum of the predictions from each tree. ˆyi = K � k=1 fk(xi), f ∈ F Where F is a field of all Regression Trees. ˆy(t) i = t � k=1 fk(xi) = ˆy(t−1) i + ft(xi) Objective Function: Obj = n � i=1 L(yi, ˆyi) + K � k=1 Ω(fk) Loss Function: L(yi, ˆyi) = 1 2(yi − ˆyi)2 Regularization Term: Ω(ft) = γT + 1 2λ T � j=1 ω2 j 10 / 26.

Scene 11 (3m 2s)

Data • The dataset used in this project was obtained from Kaggle. • It originates from a web scraping effort targeting an online property listing platform called Tonaton.com • 17,890 property listings with 17 features such as size, bedrooms, bathrooms, location, amenities. • Exploratory data analysis was performed to understand feature distributions and relationships. 11 / 26.

Scene 12 (3m 19s)

Price Distribution • Prices were right-skewed. • Log transformation normalized the distribution for modeling. 12 / 26.

Scene 13 (3m 28s)

Price vs Distance to Center • Cantonments used as reference point (city center). • Prices decline with increasing distance from Cantonments. 13 / 26.

Scene 14 (3m 38s)

Correlation of Numerical Features • Bedrooms and bathrooms strongly correlated. • Latitude and longitude correlated with distance to center. • When two features were found to be highly correlated, one of them was dropped to avoid multicollinearity in regression. • Categorical features were one-hot encoded for modeling. 14 / 26.

Scene 16 (4m 0s)

Regression Model Results 16 / 26.

Scene 17 (4m 6s)

XGBoost Model Results • Predictions align more closely with actual values (R2 ≈ 0.88). • Stronger fit compared to regression. 17 / 26.

Scene 18 (4m 16s)

XGBoost Residual Analysis • Residuals symmetrically distributed around zero. • Less variance than regression, confirming improved predictability. 18 / 26.

Scene 19 (4m 25s)

XGBoost Feature Importance • Top driver: Air conditioning (dominant influence). • Other important features: furnishing status, bathrooms, bedrooms, distance to center. 19 / 26.

Scene 20 (4m 36s)

Comparative Analysis: HPM vs XGBoost Metric HPM XGBoost R2 0.752 0.876 MSE 0.319 0.164 RMSE 0.565 0.405 MAE 0.433 0.301 • High predictive accuracy in XGBoost. • Lower errors in XGBoost compared to HPM. Actual vs Predicted Prices (HPM left, XGBoost right) 20 / 26.

Scene 21 (4m 50s)

Conclusion • The hedonic model successfully quantified the impact of the environmental factors on house prices. • The XGBoost model demonstrated superior predictive accuracy. • This comparison highlights the balance between theory-driven interpretability (HPM) and data-driven predictive accuracy (XGBoost), which guides our evaluation of model performance. 21 / 26.

Scene 22 (5m 5s)

Recommendations • Incorporate neighborhood factors in future studies. • Include climate risk factors such as flood zones to access their influence on house prices. 22 / 26.

Scene 23 (5m 16s)

References Benson, E. D., Hansen, J. L., Schwartz, A. L., & Smersh, G. T. (1998).Pricing residential amenities: The value of a view. The Journal of Real Estate Finance and Economics, 16(1), 55–73. Ceccato, V., & Wilhelmsson, M. (2011).The impact of crime on apartment prices: Evidence from stockholm, sweden. Geografiska Annaler: Series B, Human Geography, 93(1), 81–103. Elmatary, A., Ammar, T., Elkashlan, M., & Elsayed, H. (2021). Flow diagram of gradient boosting machine learning method [Accessed June 21, 2025]. 23 / 26.

Scene 24 (5m 46s)

References Garrod, G. D., & Willis, K. G. (1992).Valuing goods’ characteristics: An application of the hedonic price method to environmental attributes. Journal of Environmental management, 34(1), 59–76. Hedonic regression [Accessed: 2025-09-07]. (2025). https://en.wikipedia.org/wiki/Hedonic regression Local Government Act. (1993). The local government act, 1993 (act 462) and the replacement cost approach [Accessed: 2025-06-10]. https://thebftonline.com/2023/09/19/property-valuation-rate- imposition-and-collection/ 24 / 26.

Scene 25 (6m 12s)

References Malik, S., Harode, R., & Singh, A. (2020). Xgboost: A deep dive into boosting (introduction documentation) (Technical Report). (Available at: https://www.researchgate.net/publication/339499154). Simon Fraser University and Chitkara University. Statista. (2024). Annual contributions of real estate to gdp in ghana 2013-2023 [Accessed: June 10, 2025]. https://www.statista.com/statistics/1271600/annual- contributions-of-real-estate-to-gdp-in-ghana/ Wilhelmsson, M. (2000).The impact of traffic noise on the values of single-family houses. Journal of environmental planning and management, 43(6), 799–815. 25 / 26.

Scene 26 (6m 43s)

Thank You 26 / 26.