DOI QR코드

DOI QR Code

A Big Data Learning for Patent Analysis

특허분석을 위한 빅 데이터학습

  • Jun, Sunghae (Department of Statistics, Cheongju University)
  • Received : 2013.08.19
  • Accepted : 2013.09.15
  • Published : 2013.10.25

Abstract

Big data issue has been considered in diverse fields. Also, big data learning has been required in all areas such as engineering and social science. Statistics and machine learning algorithms are representative tools for big data learning. In this paper, we study learning tools for big data and propose an efficient methodology for big data learning via legacy data to practical application. We apply our big data learning to patent analysis, because patent is one of big data. Also, we use patent analysis result for technology forecasting. To illustrate how the proposed methodology could be applied in real domain, we will retrieve patents related to big data from patent databases in the world. Using searched patent data, we perform a case study by text mining preprocessing and multiple linear regression of statistics.

빅 데이터는 여러 분야에서 다양한 개념으로 사용된다. 예를 들어, 컴퓨터학과 사회학에서 빅 데이터에 대한 접근방법에 차이가 있지만, 데이터분석 관점에서는 공통적인 부분을 갖는다. 즉, 공학이든 사회과학이든 빅 데이터에 대한 분석은 반드시 필요하다. 통계학과 기계학습은 빅 데이터의 분석을 위한 대표적인 분석도구이다. 본 논문에서는 빅 데이터분석을 위한 학습도구에 대하여 알아보고 검색된 빅 데이터 원천에서부터 분석을 거쳐 최종적으로 분석결과를 사용하는 전체과정에 대하여 효율적인 빅 데이터학습 절차에 대하여 제안한다. 특히, 대표적인 빅 데이터 구조를 갖고 있는 특허문서에 대하여 빅데이터학습을 적용하여 특허분석을 수행하고 이 결과를 기술예측에 적용하는 방법에 대하여 연구한다. 제안방법에 대한 실제적용을 위하여 전 세계 특허청으로부터 빅 데이터 관련 특허문서를 검색하여 텍스트 마이닝의 전처리와 통계학의 다중선형회귀분석을 이용한 구체적인 빅 데이터학습에 대한 사례연구를 수행하였다.

Keywords

References

  1. H. Yang, Technology Planning Methodology Using Big Data, Issue paper 2012-14, Korea Institue of Science & Technology Evaluation and Planning, 2012.
  2. J. Han, M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2001.
  3. H. Shin, H. Jung, K. Cho, J. Lee, "A Prediction Method of Learning Outcomes based on Regression Model for Effective Peer Review Learning," Journal of Korean Institute od Intelligent Systems, vol. 22, no. 5, pp. 624-630, 2012. https://doi.org/10.5391/JKIIS.2012.22.5.624
  4. Y. Park, K. Park, "Estimation of Project Performance Using Fuzzy Linear Regression," Journal of Korean Institute od Intelligent Systems, vol. 18, no. 6, pp. 832-836, 2008. https://doi.org/10.5391/JKIIS.2008.18.6.832
  5. S. Kang, J. Kim, "Intelligent Spam-mail Filtering Based on Textual Information and Hyperlinks," Journal of Korean Institute od Intelligent Systems, vol. 14, no. 7, pp. 895-901, 2004. https://doi.org/10.5391/JKIIS.2004.14.7.895
  6. K. Kim, S. Lim, "Building Domain Ontology Based on Linguistic Patterns," Journal of Korean Institute od Intelligent Systems, vol. 16, no. 6, pp. 766-771, 2006. https://doi.org/10.5391/JKIIS.2006.16.6.766
  7. D. Hunt, L. D. Nguyen, M. Rodgers, Patent Searching Tools & Techniques, Wiley, 2007.
  8. A. T. Roper, S. W. Cunningham, A. L. Porter, T. W. Mason, F. A. Rossini, J. Banks, Forecasting and Management of Technology, Wiley, 2011.
  9. IBM, "What is big data?" www-01.ibm.com/software/data/bigdata, 2013, [Accessed: July 11, 2013]
  10. Gartner, "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data," www.gartner.com/newsroom/id/1731916, 2013, [Accessed: July 22, 2013]
  11. J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A. H. Byers, Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, 2011.
  12. I. Feinerer, A Text Mining Framework in R and I ts Applications, PhD Dissertation, Department of Statistics and Mathematics Vienna University of Economics and Business Administration, 2008.
  13. I. Feinerer, K. Hornik, Package 'tm', Text Mining Package, R Project CRAN, 2013.
  14. I. Feinerer, K. Hornik, D. Meyer, "Text mining infrastructure in R," Journal of Statistical Software, vol. 25, no. 5, pp. 1-54, 2008.
  15. S. M. Ross, Introduction to Probability and Statistics for Engineers and Scientists, Elsevier, 2009.
  16. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Data Mining, Inference, and Prediction, Springer, 2001.
  17. B. L. Bowerman, R. T. O'Connell, A. B. Koehler, Forecasting, Time Series, and Regression, An Applied Approach, Brooks/Cole, 2005.
  18. S. Jun, "Technology Forecasting of Intelligent Systems Using Patent Analysis," Journal of Korean Institute od Intelligent Systems, vol. 21, no. 1, pp. 1-6, 2011. https://doi.org/10.5391/JKIIS.2011.21.1.1
  19. S. Jun, "Vacant Technology Forecasting Using Ensemble Model," Journal of Korean Institute od Intelligent Systems, vol. 21, no. 3, pp. 341-346, 2011. https://doi.org/10.5391/JKIIS.2011.21.3.341
  20. KIPRIS, "Korea Intellectual Property Rights Information Service," www.kipris.or.kr, 2013, [Accessed: July 5, 2013]
  21. R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2013.

Cited by

  1. A Big Data Preprocessing using Statistical Text Mining vol.25, pp.5, 2015, https://doi.org/10.5391/JKIIS.2015.25.5.470
  2. Big Data Analysis Using Principal Component Analysis vol.25, pp.6, 2015, https://doi.org/10.5391/JKIIS.2015.25.6.592
  3. Technology Strategy based on Patent analysis vol.26, pp.2, 2016, https://doi.org/10.5391/JKIIS.2016.26.2.141
  4. Development of On-In-One Web Solution for Technology Marketing vol.26, pp.2, 2016, https://doi.org/10.5391/JKIIS.2016.26.2.099
  5. A R&D strategies for development using structured association map vol.26, pp.3, 2016, https://doi.org/10.5391/JKIIS.2016.26.3.190