DOI QR코드

DOI QR Code

Using Mechanical Learning Analysis of Determinants of Housing Sales and Establishment of Forecasting Model

기계학습을 활용한 주택매도 결정요인 분석 및 예측모델 구축

  • Kim, Eun-mi (Department of Economy Real Estate, Hansung University) ;
  • Kim, Sang-Bong (Department of Economics Hansung University) ;
  • Cho, Eun-seo (Department of Economy Real Estate, Hansung University)
  • 김은미 (한성대학교 경제부동산학과 부동산경제학 전공) ;
  • 김상봉 (한성대학교 경제학과) ;
  • 조은서 (한성대학교 경제부동산학과 부동산경제학)
  • Received : 2020.05.03
  • Accepted : 2020.06.12
  • Published : 2020.06.30

Abstract

This study used the OLS model to estimate the determinants affecting the tenure of a home and then compared the predictive power of each model with SVM, Decision Tree, Random Forest, Gradient Boosting, XGBooest and LightGBM. There is a difference from the preceding study in that the Stacking model, one of the ensemble models, can be used as a base model to establish a more predictable model to identify the volume of housing transactions in the housing market. OLS analysis showed that sales profits, housing prices, the number of household members, and the type of residential housing (detached housing, apartments) affected the period of housing ownership, and compared the predictability of the machine learning model with RMSE, the results showed that the machine learning model had higher predictability. Afterwards, the predictive power was compared by applying each machine learning after rebuilding the data with the influencing variables, and the analysis showed the best predictive power of Random Forest. In addition, the most predictable Random Forest, Decision Tree, Gradient Boosting, and XGBooost models were applied as individual models, and the Stacking model was constructed using Linear, Ridge, and Lasso models as meta models. As a result of the analysis, the RMSE value in the Ridge model was the lowest at 0.5181, thus building the highest predictive model.

본 연구는 OLS모형을 적용하여 주택보유기간에 영향을 미치는 결정요인을 추정한 후 SVM, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM을 통해 각 모형별 예측력을 비교하였다. 예측력이 가장 높은 모델을 기반모델 삼아 앙상블 모형 중 하나인 Stacking모형을 적용하여 더욱 예측력이 높은 모형을 구축하여 주택시장의 주택거래량을 파악할 수 있다는 점에 선행 연구와의 차이가 있다. OLS분석 결과 매도이익, 주택가격, 가구원 수, 거주주택형태(단독주택, 아파트)이 주택보유기간에 영향을 미치는 것으로 나타났으며, RMSE를 기준삼아 각 머신러닝 모형과 예측력 비교한 결과 머신러닝 모델의 예측력이 더 높은 것으로 나타났다. 이후, 영향을 미치는 변수로 데이터를 재구축한 후 각 머신러닝을 적용하여 예측력을 비교하였으며, 분석 결과 Random Forest의 예측력이 가장 우수한 것으로 나타났다. 또한 예측력이 가장 높은 Random Forest, Decision Tree, Gradient Boosting, XGBoost모형을 개별모형으로 적용하고, Linear, Ridge, Lasso모형을 메타모델로 하여 Stacking 모형을 구축하였다. 분석 결과, Ridge모형일 때 RMSE값이 0.5181으로 가장 낮게 나타나 예측력이 가장 높은 모델을 구축하였다.

Keywords

References

  1. Kang SH. 2017. The effects of Housing Price Growth on Housing Tenure. Housing Studies Review. 25(4):5-19.
  2. Kang HM, Kim JR. 2013. A Study on Factors Affecting the Time to Sell an Apartment. Institute for Finance & Knowledge. 11(2): 165-182.
  3. Kwon CM. 2019. Python machine learning Perfect guide. Wikibooks, p. 179-284.
  4. Kim YK. 2019. Prediction of Citizens’ Emotions on Home Mortgage Rate Using Machine Learning Algorithms. Journal of Cadastre & Land InformatiX. 49(1):65-84. https://doi.org/10.22640/LXSIRI.2019.49.1.65
  5. Kim EM, Kim SB. 2019. A Study on Macroeconomic Variables and Determinants of Housing Retention Period. Journal of Real Estate Analysis. 5(3):31-47. https://doi.org/10.30902/jrea.2019.5.3.31
  6. Kim TK. 2010. Exploring Impacts of Housing Market Policy Variables on Home Ownership Durations. Korea Planning Association. 45(5): 105-116.
  7. Sebastian R, Vahid M. 2019. Machine Learning with Python, scikit-learn, and TensorFlow. Gilbut, p.105-128.
  8. Andreas M, Sarah G. 2017. Introduction to Machine Learning with Python. Hanbit, p. 105-128.
  9. Hwang JY. 2008. Analysis on Determinants of Holding Period in Seoul Office Market. KonKuk University.
  10. Archer WD, Ling B, Smith. 2010. Ownership Duration in the Residential Housing Marekt: The Influence of Structure, Tenure, Household and Neighborhood Factors. Journal of Real Estate Finance and Economics. 40: 41-61. https://doi.org/10.1007/s11146-008-9126-2
  11. Collett DC, Lizieri C, Ward. 2003. Timing and the Holding Periods of Institutional Real Estate. Real Estate Economics. 31: 205-222. https://doi.org/10.1111/1540-6229.00063

Cited by

  1. 데이터마이닝과 학습기법을 이용한 부동산가격지수 예측 vol.12, pp.8, 2020, https://doi.org/10.15207/jkcs.2021.12.8.047
  2. 머신러닝 분석을 통한 분양가 상한제의 주택시장 영향 연구 vol.37, pp.8, 2020, https://doi.org/10.5659/jaik.2021.37.8.221