Science Research Management ›› 2024, Vol. 45 ›› Issue (2): 1-11.DOI: 10.19571/j.cnki.1000-2995.2024.02.001

    Next Articles

Identification of China′s S&T policy properties based on deep learning

Li Munan1,2, Wang Liang1, Lai Huapeng1   

  1. 1. School of Business Administration, South China University of Technology, Guangzhou 510641, Guangdong, China; 
    2. Guangdong Key Lab on Innovation Methods & Decision Management System, Guangzhou 510641, Guangdong, China
  • Received:2022-08-09 Revised:2023-03-14 Online:2024-02-20 Published:2024-01-23

Abstract:     The current text analysis based on deep learning algorithm focuses more on short text information processing such as public opinion monitoring and sentiment analysis represented by microblog, online comments and news headlines and so on, while there are few related research on property identification and long text classification of various policy texts, paper full-text and patent full-text, which has significant room for exploration and expansion. Compared with traditional machine learning models, the relevant models or algorithms on deep learning have significant advantages in NLP (natural language processing) and text feature extraction. Deep learning algorithms can reduce manual intervention in feature engineering through pre-training language models, and thus has a promising application prospect in such fields as policy attribute or property identification and policy-instrument recognition. This paper aims at the automatic identification of the properties of science and technology policies, and the properties of policy are divided into such types as guiding, compulsory and encouraging. The main approach is to import several popular models of deep learning for comparative analysis. At the same time, this paper also carried out theoretical analysis on related computing problems such as (1) the impact on property identification among the different text length of policies; (2) the impact of data augmentation of text data; (3) and facilitate the information estimation of policy texts. In order to further enrich the application of deep learning model in scientometrics and informetrics, especially in the field of text analysis on science and technology policies, the experiments on property identification of science and technology policies form China local governments were conducted based on those selected models of deep learning, which are very popular in the latest studies on text classification. The theoretical and empirical analysis showed that the current representative deep learning models have significantly enhanced their processing capacity for property identification of science and technology policy after manipulation of data augmentation based on the EDA (Easy Data Augmentation) method that just presents the excellent performance in the English text processing in the relevant studies. The identifying accuracy of EDA+ bi-LSTM-Attention was more than 88%, and the average recognition accuracy of the other deep learning models (TextCNN, Bi-LSTM, RCNN, CapsNet and FastText, etc.) also can reach over 80% after text augmentation based on the EDA method. However, increasing the length of text interception from 500 words to 2000 words has no significant effect on the property-identification of Chinese science and technology policy, and these experimental results could also be useful for the following studies on policy-text analyses because it implied the full-text of policy could be unnecessary in the similar task of long-text processing. The research of this paper has certain significance of enlightenment and reference value for the quantitative analysis of science and technology management, such as automatic identification of science and technology policy attributes, classification of Chinese long text and identification of policy tools. Meanwhile, the output of this paper could be controversial for the limited policy-text, and in another data source of policy-text, e.g. energy policies, environment policies and financial policies and so on, whether those mentioned models of deep learning in this paper are still effective, should be further explored and discussed the future work.

Key words: deep learning, science and technology policy, property identification, data augmentation, text classification