科研管理 ›› 2022, Vol. 43 ›› Issue (9): 32-40.

• 论文 • 上一篇    下一篇

机器学习辅助智能决策分析——以“绿色创新”问题为例

张潮1,2,3,冷伏海2   

  1. 1.首都经济贸易大学外国语学院,北京100070;  2.中国科学院科技战略咨询研究院,北京100190; 3.中国科学院大学,北京100049
  • 收稿日期:2022-06-09 修回日期:2022-08-11 出版日期:2022-09-20 发布日期:2022-09-19
  • 通讯作者: 冷伏海
  • 基金资助:
    中国科学院科技战略咨询研究院重大攻关任务:“智库双螺旋法理论和应用研究” (E2X0111Z,2022.01—2022.12);中国科学院“十四五”科教基础设施规划研究(E1X0191601,2020.07-2021.03);国家实验室建设政策评估及优化建议研究 (E2X1371Z01,2022.05—2022.12)。

An analysis of the machine-learning-assisted intelligent decision-making——A study by taking "green innovation" as an example

Zhang Chao1,2,3, Leng Fuhai2   

  1. 1. School of Foreign Studies, Capital University of Economics and Business, Beijing 100070, China;
    2. Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China; 
    3. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-06-09 Revised:2022-08-11 Online:2022-09-20 Published:2022-09-19
  • Contact: Fuhai LengLeng

摘要:     政策信息学是一门新兴学科,用数据驱动的智能化方法提高决策质量是其中的一类重要研究方向。科学研究与政府政策的制定是相互影响和协同发展的,以科技文献为基础的多学科交叉领域知识计算可为政策研究提供知识分析和知识发现,进而提供需求分析和政策场景构建。语言是人类最有代表性的智慧特征,由学者撰写的学术文献中文字的部分体现了人类的思想精华,使用可以自行学习文本上下文的自然语言处理算法,可将这一部分思想精华总结归纳,形成含有概率,以及具有丰富性和立体性的主题,从而支持询证决策。本文以交叉领域“绿色创新”为例,使用无监督的隐含狄利克雷分布主题模型分析6891篇中英文文献摘要,发现英文文献聚焦绿色创新的一般性问题,比如“企业技术成本和政府补贴”“生态环境治理”和“二氧化碳排放和可再生能源”等;而中文文献聚焦中国问题,比如“经济要素与高质量发展”和“绿色创新空间溢出与区域发展”等。中英文文献都关注的议题包括“中国经济发展和环境规制”等。使用机器学习算法分析科技文献数据,作为政策的事前分析方法,具有科学且高效、质量高、内容有效和易沟通的特点。此方法为政府基于充分信息的循证决策和智能决策提供新思路。

关键词: 决策智能, 政策信息学, 机器学习, 智库问题, 主题建模

Abstract:    Recent years have seen a rise in the field of policy informatics. Think tank problem analytics is a typical approach in policy informatics, which often involves multi-discipline and multi-field knowledge computing. Large-scale, cross-disciplinary opinion gathering and literature information processing serve as the initial building blocks of policy analytics. The policy simulations, case studies, empirical analysis and scientific discoveries in journal papers can provide professional analysis for decision intelligence. This paper seeks to apply a topic modeling tool on parsing the journal abstracts of the think tank challenge "green innovation". By using a Bayesian hierarchical topic model called LDA, it is possible to find the latent themes that underlie huge collections of documents. We use the LDA to analyze 6891 journal paper abstracts on the topic of "green innovation" as an example, which produces a subset of problems. The English abstracts of 6098 papers were retrieved from the Web of Science core database, while the Chinese abstracts of 793 papers were retrieved from the CNKI core collection, and both sets of abstracts were downloaded.
    At various sizes, decision-making processes depend on expert opinions. Over time, scholars have created and employed a variety of techniques for gathering, analyzing, and aggregating expert opinions. Because language is the primary characteristic that sets humans apart from other species, and language is the primary means by which human culture, ideas, and wisdom are transmitted, natural language processing serves the most fundamental component of artificial intelligence. This makes it possible to use NLP to provide pathways for decision intelligence. 
    The exact procedures of this ML assisted decision intelligence method are: 1) The stakeholders identify the research area and the researcher creates a list of search phrases accordingly. 2) The researcher downloads academic papers (or abstracts). 3) Calculate the degree of confusion of the number of topics, and find the most appropriate LDA topic number. 4) The LDA model analyzes the text to create multiple word bags. 5) The professionals read each topic and deduce the topic labels. Each topic represents a subset of the think tank challenge, which can be parsed repeatedly using language processing models to ultimately create a framework for an understandable and workable solution. 
   Themes including "economic determinants and high-quality development", "spatial spillover of green innovation and regional development" and "corporate relationships and supply chain networks" are the main results of the parsing of Chinese literature. In contrast, "ecological and environmental governance" and "carbon dioxide emissions and renewable energy" are given substantially more weight in the English LDA output than they are in the Chinese LDA output. The topics "technology costs of enterprises and government subsidies", "innovation policies and sustainable development", "efficiency of science and technology innovation", "green financial markets and enterprise financing", "corporate competition and social responsibility", and "economic development and environmental regulation" can be found in both the English and Chinese literature parsing results. The amount of Chinese literature on green innovation is generally significantly less than the amount of English literature. Overall, the LDA topics derived from the abstracts of the English and Chinese literature are appropriate for future research.
    One of the contributions of this study is that it analyses the academic literature collection database to provide knowledge extraction and human-computer collaborative decision making for think tank challenges, which bridges the gap between the academia and other policy stakeholders (e.g., politicians, business practitioners, news reporters, etc.). Natural language processing methods, such as the LDA model can handle rich textual information and retain more information in the process of downscaling textual data. A number of additional studies using LDA in literature analysis have emerged in recent years, with some researchers using LDA as an intelligent literature review approach and others using LDA to identify research hotspots and follow new trends. This study presents a novel approach for improving evidence-based decisions and expanding the field of policy informatics research.
    Instead of being the outcome of some individual′s careful deliberation to arrive at a decision, policy making is rather a process of mutual communication and coordination among numerous interested organizations, groups, individuals, including the policy makers themselves. Text analysis offers a think-tank problem analytical approach with machine learning models, which converts a significant amount of complex texts into simple, understandable, and meaningful topics. It is a clever approach for policy analysis, because it can help the stakeholders to communicate and work together to enable decision-making that is based on expert wisdom.

Key words: decision-making intelligence, policy informatics;machine learning, think tank problem, topic modeling