科研管理 ›› 2015, Vol. 36 ›› Issue (3): 111-117.

• 论文 • 上一篇    下一篇

基于LDA主题模型的专利内容分析方法

王博, 刘盛博, 丁堃, 刘则渊   

  1. 大连理工大学WISE实验室, 辽宁 大连 116024
  • 收稿日期:2013-10-12 修回日期:2014-06-19 出版日期:2015-03-25 发布日期:2015-03-20
  • 作者简介:王博(1983-),女(汉),吉林长春人,大连理工大学科学学与科技管理专业博士研究生,研究方向:专利计量、产业创新。
    刘盛博(1983-),男(汉),辽宁大石桥人,大连理工大学科学学与科技管理专业博士后,研究方向:知识计量。
    丁堃(1962-),女(汉),辽宁海城人,大连理工大学公共管理与法学学院教授,研究方向:学科知识管理与创新管理。
    刘则渊(1940-),男(汉),湖北恩施人,大连理工大学公共管理与法学学院教授,研究方向:科学学理论与科学计量学。
  • 基金资助:

    国家自然科学基金项目“基于中文文本挖掘技术的SPIOD专利知识演化分析”(61272370,2013.1-2013.12);高等学校学科点专项科研基金“基于SIPO数据库的专利知识测度体系及应用”(博导类)(20110041110034,2012-2014)。

Patent content analysis method based on LDA topic model

Wang Bo, Liu Shengbo, Ding Kun, Liu Zeyuan   

  1. WISE Lab, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2013-10-12 Revised:2014-06-19 Online:2015-03-25 Published:2015-03-20

摘要: 主题模型是一种有效提取大规模文本隐含主题的建模方法。本文将Latent Dirichlet Allocation(LDA)主题模型引入专利内容分析领域,实现专利主题划分,解决以往专利主题分类过于粗泛、时效性差、缺乏科学性等问题。并在原有模型基础上构建LDA机构-主题模型,对专利知识主体和客体联合建模,实现专利主题和机构之间内在关系分析。最后,以通信产业LTE技术领域为例,验证该模型可以有效用于专利主题划分,实现各主题下专利知识主体竞争态势测度。

关键词: 主题模型(LDA), LDA机构-主题模型, 专利内容分析, LTE

Abstract: Topic model is an effective modeling method for extracting implied themes in large-scale text. In this paper, Latent Dirichlet Allocation(LDA) topic model is introduced to patent content analysis for patent theme extraction, which solves the problems in previous patent subject classification, such problems as too rough classifications, lack of time-effectiveness and scientific nature, etc. And then, based on the original LDA model, an extended institution-topic model is developed in this paper. By joint modeling of patent subject and object, the internal relationships between patent themes and corresponding institutions is identified. Finally, a case study is carried out in the LTE technology of the telecommunication industry, and it is found out that these models can be used effectively for patent subject classification and identifying competition situation of patent subjects under each corresponding theme.

Key words: topic model (LDA), LDA institution-topic model, patent content analysis, LTE

中图分类号: