DOI QR코드

DOI QR Code

Structuring Risk Factors of Industrial Incidents Using Natural Language Process

자연어 처리 기법을 활용한 산업재해 위험요인 구조화

  • Kang, Sungsik (Department of Safety Engineering, Pukyong National University) ;
  • Chang, Seong Rok (Department of Safety Engineering, Pukyong National University) ;
  • Lee, Jongbin (Laboratory of Disaster Management, Pukyong National University) ;
  • Suh, Yongyoon (Department of Safety Engineering, Pukyong National University)
  • Received : 2021.02.03
  • Accepted : 2021.02.24
  • Published : 2021.02.28

Abstract

The narrative texts of industrial accident reports help to identify accident risk factors. They relate the accident triggers to the sequence of events and the outcomes of an accident. Particularly, a set of related keywords in the context of the narrative can represent how the accident proceeded. Previous studies on text analytics for structuring accident reports have been limited to extracting individual keywords without context. We proposed a context-based analysis using a Natural Language Processing (NLP) algorithm to remedy this shortcoming. This study aims to apply Word2Vec of the NLP algorithm to extract adjacent keywords, known as word embedding, conducted by the neural network algorithm based on supervised learning. During processing, Word2Vec is conducted by adjacent keywords in narrative texts as inputs to achieve its supervised learning; keyword weights emerge as the vectors representing the degree of neighboring among keywords. Similar keyword weights mean that the keywords are closely arranged within sentences in the narrative text. Consequently, a set of keywords that have similar weights presents similar accidents. We extracted ten accident processes containing related keywords and used them to understand the risk factors determining how an accident proceeds. This information helps identify how a checklist for an accident report should be structured.

Keywords

Acknowledgement

This work was supported by a Research Grant of Pukyong National University(2019)

References

  1. KOSHA, "Statistical Survey and Analysis of Industrial Disasters", 2018.
  2. Y. Suh, "Data Analytics for Social Risk Forecasting and Assessment of New Technology", J. Korean Soc. Saf., Vol. 32, No. 3, pp. 83-89, 2017. https://doi.org/10.14346/JKOSOS.2017.32.3.83
  3. C. D. Manning and H. Schutze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999.
  4. B. Kim, S. Chang and Y. Suh, "Text Analytics for Classifying Types of Accident Occurrence Using Accident Report Documents", J. Korean Soc. Saf., Vol. 33, No.3, pp. 58-64, 2018. https://doi.org/10.14346/JKOSOS.2018.33.3.58
  5. S. Kang and Y. Suh, "On the Development of Risk Factor Map for Accident Analysis using Textmining and SelfOrganizing Map(SOM) Algorithms", J. Korean Soc. Saf., Vol. 33, No. 6, pp. 77-84, 2018. https://doi.org/10.14346/JKOSOS.2018.33.6.77
  6. G. Ahn, M. Seo and S. Hur, "Development of Accident Classification Model and Ontology for Effective Industrial Accident Analysis based on Textmining", J. Korean Soc. Saf., Vol. 32, No. 5, pp. 179-185, 2017. https://doi.org/10.14346/JKOSOS.2017.32.5.179
  7. T. L. Bunn, S. Slavova and L. Hall, "Narrative Text Analysis of Kentucky Tractor Fatality Reports", Accid. Anal. Prev., Vol. 40, No. 2, pp. 419-425, 2008. https://doi.org/10.1016/j.aap.2007.07.010
  8. T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation of Word Representations in Vector Space", arXiv preprint, arXiv:1301.3781, 2013.
  9. X. He, D. Cai, S. Yan and H. Zhang, "Neighborhood Preserving Embedding", Tenth IEEE International Conference on Computer Vision, 2005.
  10. K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury and M. Gamon, "Representing Text for Joint Embedding of Text and Knowledge Bases", Conference on Empirical Methods in Natural Language Processing, pp. 1499-1509, 2015.
  11. J. H. Jo, "A study on the Causes Analysis and Preventive Measures by Disaster types in Construction Fields", KSMS, Vol. 14, No. 1, pp. 7-13, 2012.
  12. S. K. Kang, H. Yu and Y. J. Lee, "Analyzing Disaster Response Terminologies by Text Mining and Social Network Analysis", Information Systems Review, Vaol. 18, No. 1, pp. 141-155, 2016. https://doi.org/10.14329/isr.2016.18.1.141
  13. W. Jang and Y. Suh, "Identifying Abnormal Accidents Using Local Outlier Factor and Decision Tree Algorithms", Journal of the Korean Institute of Industrial Engineers, Vol. 45, No. 4, pp. 329-340, 2019. https://doi.org/10.7232/JKIIE.2019.45.4.329
  14. Y. Goldberg and O. Levy, "Word2vec Explained: Deriving Mikolov et al.'s Negative-sampling Word-embedding Method", arXiv preprint, arXiv:1402.3722, 2014.
  15. L. Ma and Y. Zhang, "Using Word2Vec to Process Big Text Data", IEEE International Conference on Big Data, 2015.
  16. Sanghyuk Choi, Jinseok Seol and Sang-goo Lee, "On Word Embedding Models and Parameters Optimized for Korean", Korean Language information Science Society, pp. 252-256, 2016.
  17. L. Van Der Maaten and G. Hinton, "Visualizing Data using t-SNE", J Mach Learn Res, Vol. 9, pp. 2579-2605, 2008.
  18. A. Likas, N. Vlassis and J. J. Verbeek, "The Global k-means Clustering Algorithm", Pattern Recognition, Vol. 36, No. 2, pp. 451-461, 2003. https://doi.org/10.1016/S0031-3203(02)00060-2