基于LDA主题模型的专利内容分析方法

王博, 刘盛博, 丁堃, 刘则渊

科研管理 ›› 2015, Vol. 36 ›› Issue (3) : 111-117.

科研管理 ›› 2015, Vol. 36 ›› Issue (3) : 111-117.
论文

基于LDA主题模型的专利内容分析方法

  • 王博, 刘盛博, 丁堃, 刘则渊
作者信息 +

Patent content analysis method based on LDA topic model

  • Wang Bo, Liu Shengbo, Ding Kun, Liu Zeyuan
Author information +
文章历史 +

摘要

主题模型是一种有效提取大规模文本隐含主题的建模方法。本文将Latent Dirichlet Allocation(LDA)主题模型引入专利内容分析领域,实现专利主题划分,解决以往专利主题分类过于粗泛、时效性差、缺乏科学性等问题。并在原有模型基础上构建LDA机构-主题模型,对专利知识主体和客体联合建模,实现专利主题和机构之间内在关系分析。最后,以通信产业LTE技术领域为例,验证该模型可以有效用于专利主题划分,实现各主题下专利知识主体竞争态势测度。

Abstract

Topic model is an effective modeling method for extracting implied themes in large-scale text. In this paper, Latent Dirichlet Allocation(LDA) topic model is introduced to patent content analysis for patent theme extraction, which solves the problems in previous patent subject classification, such problems as too rough classifications, lack of time-effectiveness and scientific nature, etc. And then, based on the original LDA model, an extended institution-topic model is developed in this paper. By joint modeling of patent subject and object, the internal relationships between patent themes and corresponding institutions is identified. Finally, a case study is carried out in the LTE technology of the telecommunication industry, and it is found out that these models can be used effectively for patent subject classification and identifying competition situation of patent subjects under each corresponding theme.

关键词

主题模型(LDA) / LDA机构-主题模型 / 专利内容分析 / LTE

Key words

topic model (LDA) / LDA institution-topic model / patent content analysis / LTE

引用本文

导出引用
王博, 刘盛博, 丁堃, 刘则渊. 基于LDA主题模型的专利内容分析方法[J]. 科研管理. 2015, 36(3): 111-117
Wang Bo, Liu Shengbo, Ding Kun, Liu Zeyuan. Patent content analysis method based on LDA topic model[J]. Science Research Management. 2015, 36(3): 111-117
中图分类号: G350   

参考文献

[1] 孙涛涛,刘云.基于专利耦合的企业技术竞争情报分析[J].科研管理,2011,32(9):140-146.Sun Taotao,Liu Yun.Competitive technical intelligence analysis for enterprises based on patents coupling[J].Science Research Management,2011,32(9):140-146. [2] 苗蕊,刘鲁.科学家合作网络中的社区发现[J].情报学报,2011,30(12):1312-1318.Miao Rui,Liu Lu.Community detection in scientific collaboration network[J].Journal of The China Society For Scientific and Technical Information,2011,30(12):1312-1318. [3] 王贤文,刘则渊,侯海燕.基于专利共被引的企业技术发展与技术竞争分析:以世界 500 强中的工业企业为例[J].科研管理,2010,31(4):127-138.Wang Xianwen,Liu Zeyuan,Hou Haiyan.Technology development and technology competition of enterprises based on patent co-citation analysis:A study on industrial enterprises of fortune 500.[J].Science Research Management,2010,31(4):127-138. [4] Bailón-Moreno R,Jurado-Alameda E,Ruiz-Baos R,et al.Analysis of the field of physical chemistry of surfactants with the unified scienctometric model.Fit of relational and activity indicators[J].Scientometrics,2005,63(2):259-276. [5] 栾春娟,罗海山,陈悦.专利研究的国际热点:创新[J].情报杂志,2010,29(6):27-29.Luan Chunjuan,Luo Haishan,Chen Yue.International highlight of patent research:Innovation[J].Journal of Intelligence,2010,29(6):27-29. [6] 谭晓,张志强.图情领域中专利分析主题的研究进展-基于WOS 的文献分析[J].图书情报工作,2012,56(20):85-91.Tan Xiao,Zhang Zhiqiang.Review of patent analysis in LIS—Based on WOS[J].Library and Information Service,2012,56(20):85-91. [7] 杨祖国,李文兰.中国专利被专利文献引用的主题分析[J].情报科学,2005,23(12):1845-1851.Yang Zuguo,Li Wenlan.Analysis on subject distribution of Chinese patents citations[J] Information Science,2005,23(12):1845-1851. [8] 官思发.基于专利信息分析的云计算技术透视 [J].情报杂志,2011(8):149-153.Guan Sifa.Analysis of cloud computing technology from the perspective of patent information[J].Journal of Intelligence,2011(8):149-153. [9] 沈君,王续琨,陈悦,等.战略坐标视角下的专利技术主题分析——以第三代移动通信技术为例[J].情报杂志,2012,31(11):88-94.Shen Jun,Wang Xukun,Chen Yue et al.Analysis on technology focus from the perspective of strategic diagram:A case in the field of 3G mobile communication [J].Journal of Intelligence,2012,31(11):88-94. [10] 韩红旗,安小米,朱东华,等.专利技术术语共现的战略图分析方法 [J].计算机应用研究,2011,28(2):576-579.Han Hong-qi,An Xiaomi,Zhu Dong-hua et al.Analysis methodology of strategical diagram of patent technology term using co-word technology[J].Application Research of Computers,2011,28(2):576-579. [11] 张杰,刘美佳,翟东升.基于专利共词分析的 RFID 领域技术主题研究[J].科技管理研究,2013,33(10):129-132.Zhang Jie,Liu Meijia,Zhai Dongsheng.Technology topic in RFID based on patent co-word analysis[J].Science and Technology Management Research,2013,33(10):129-132. [12] Misra H,Yvon F,Cappé O,et al.Text segmentation:A topic modeling perspective[J].Information Processing & Management,2011,47(4):528-544. [13] Ding Y.Topic-based page rank on author cocitation networks[J].Journal of the American Society for Information Science and Technology,2011,62(3):449-466. [14] Sugimoto CR,Li D,Russell TG,et al.The shifting sands of disciplinary development:Analyzing north american library and information science dissertations using latent dirichlet allocation[J].Journal of the American Society for Information Science and Technology,2011,62(1):185-204. [15] Griffiths TL,Steyvers M.Finding scientific topics[J].Proceedings of the National academy of Sciences of the United States of America,2004,101(Suppl 1):5228-5235. [16] Li S,Li J,Pan R.Tag-weighted topic model for mining semi-structured documents[C].Proceedings of the Twenty-Third international joint conference on Artificial Intelligence.AAAI Press,2013:2855-2861. [17] Rosen-Zvi M,Chemudugunta C,Griffiths T,et al.Learning author-topic models from text corpora[J].ACM Transactions on Information Systems (TOIS),2010,28(1):4. [18] Church KW.A stochastic parts program and noun phrase parser for unrestricted text[C].Proceedings of the second conference on Applied natural language processing.Association for Computational Linguistics,1988:136-143. [19] Toutanova K,Manning CD.Enriching the knowledge sources used in a maximum entropy part-of-speech tagger[C].Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora:held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2000:63-70. [20] Rosen-Zvi M,Griffiths T,Steyvers M,et al.The author-topic model for authors and documents[C].Proceedings of the 20th conference on Uncertainty in artificial intelligence.AUAI Press,2004:487-494. [21] Blei DM,Lafferty JD.Dynamic topic models[C].Proceedings of the 23rd international conference on Machine learning.ACM,2006:113-120. [22] Teh YW,Jordan MI,Beal MJ,et al.Hierarchical dirichlet processes[J].Journal of the American Statistical Association,2006,101(476):1566–1581.

基金

国家自然科学基金项目“基于中文文本挖掘技术的SPIOD专利知识演化分析”(61272370,2013.1-2013.12);高等学校学科点专项科研基金“基于SIPO数据库的专利知识测度体系及应用”(博导类)(20110041110034,2012-2014)。


Accesses

Citation

Detail

段落导航
相关文章

/