Science Research Management ›› 2022, Vol. 43 ›› Issue (1): 176-183.
Previous Articles Next Articles
Li Hailin, Lin Chunpei
Received:
Revised:
Online:
Published:
Supported by:
Abstract: Keyword is an important part of scientific research literature and related references. Limited keywords can describe clearly some aspects of scientific research achievements including the main research objects, problems to be solved, methods to be used, conclusions and other related information. Meanwhile, it also reflects the research theme of scientific research achievements to some extent. Academic papers are the main forms of scientific research achievements at present. By analyzing the keywords of academic papers, we can find out the themes that the research objects are currently concerned about and their evolutionary trends in the research field at a certain time. Keyword-based topic research and analysis is necessary to establish similarities between keywords and combine clustering analysis method to find and track hot topics. To establish the similarity between keywords, co-word analysis is often used, which considers that there is a certain relationship between keywords appearing in the same document. We use statistical analysis to define a similarity method between keywords appearing together many times. The higher the frequency of keywords appearing together, the greater the similarity they are. Hierarchical clustering is one of the most commonly used methods in the field of scientific literature. It can observe and discover the similarity of key words or themes intuitively so as to divide hot issues and topics. However, hierarchical clustering needs to classify subject categories artificially, which makes clustering analysis results vulnerable to subjective factors. At the same time, the traditional method neglects the importance of time. It only divides all keywords in a general way based on statistics and only considers the frequency and location of keywords. It also ignores the importance of time to the classification of keywords. In view of the limitations of the traditional methods, which fail to consider and analyze the time factor of keyword, this paper proposes keyword analysis of research achievements based on time series clustering. Firstly, keywords are collected from many academic papers according to a special topic. The collected keywords of relevant scientific research achievements are parsed and incorporated into a database. With the ranking order and the frequency of keywords appearing together in all relevant scientific research achievements, the importance of the keyword is obtained by calculating the weight according to the order. In addition, the time series data of keyword importance value is formed according to the distribution of keyword importance in time sequence. Secondly, dynamic time warping (DTW) is used to measure the distance between keyword time series, the obtained distance matrix can be transformed into the corresponding similarity matrix that can used to start affinity propagation clustering (AP). AP based on the similarity matrix and hierarchical clustering method based on the distance matrix in traditional literature analysis were used to cluster the keyword time series. Once the keyword time series with the same trend are clustered together, the clusters have the characteristics of large similarity in the changeable trend of inter-cluster keywords and small similarity in the changeable trend of intra-cluster keywords. We also compare the effects of the different clustering results obtained by the two methods for keyword analysis. Finally, the clustering results are combined with visualization technology to realize the analysis of the keywords of scientific research achievements. The keywords of scientific research achievements published in an important journal of scientific and technological innovation management from 2008 to 2017 are regarded as the research object. The effectiveness of the proposed method is tested and its application in the analysis of key words of scientific research achievements is further elaborated. On the one hand, the proposed method in this paper can adapt to divide multiple clusters. Meanwhile, the influence of time on the division of keyword clusters is taken into consideration. The method divides the keywords with similar changeable trends in the same cluster. There may be interaction or mutual promotion between these words in a cluster and some close relationships between them can be found. At the same time, they may not only have some practical effects on keywords belonging to the same cluster, but more importantly, they may describe the same topic together. We can test and observe the effect of the proposed method on keyword analysis of scientific research achievements from a new perspective of time change. On the other hand, the changeable trend of the main keywords concerned by the published papers of the target journals is analyzed and studied. It can discover some knowledge including the main research topics of this journal from 2008 to 2017, the changeable trend of main keywords and the subject to which these keywords belong, as well as the overall change trend of these topic categories. All the knowledge is benefit for finding out the attention degree and evolution law of the published topics. The main contributions of this study are as follows: (1) It is considering that when the authors give keywords, they usually give more important keywords at the first position, which means that the order of keywords reflects the importance of keywords. This study confirms this conjecture that the importance of a keyword is related to the order the author listed. According to the order of keywords given in the literature, the weight of keywords is designed and converted into a time series data. The correlation between keywords is studied from the perspective of time series. (2) Affinity propagation can be used to cluster keyword time series data adaptively. It avoids subjective factors impact on the clustering results caused by the given number of clusters. The center object is good at representing the themes reflected by all the members of a cluster, which provides a theoretical basis for topic extraction and evolutionary analysis. (3) Dynamic time warping method is used to measure the similarity of keyword importance time series. The changeable theme using time series data are studied from the perspectives of numerical value and morphology, respectively. Furthermore, the numerical characteristics and the changeable trends in different clusters can be analyzed, which provides a feature description method and visualization technology for scientific research topic analysis.
Key words: keyword analysis, research achievement, time series, clustering analysis, affinity propagation
Li Hailin, Lin Chunpei. An analysis of keywords of research achievements based on time series clustering[J]. Science Research Management, 2022, 43(1): 176-183.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.kygl.net.cn/EN/
https://www.kygl.net.cn/EN/Y2022/V43/I1/176
Chen Feiqiong, Ren Sen