吉林大学学报(信息科学版)

• 论文 • 上一篇    下一篇

基于Hadoop的社交网络服务推荐算法

李玲, 任青, 付园, 陈鹤, 梅圣民   

  1. 吉林大学 通信工程学院, 长春 130012
  • 出版日期:2013-07-20 发布日期:2013-08-23
  • 作者简介:李玲(1965—), 女, 黑龙江齐齐哈尔人, 吉林大学副教授, 硕士生导师, 主要从事分布计算、 移动计算和计算机网络协议分析与设计研究, (Tel)86-13596491550(E-mail)liling2002@jlu.edu.cn。
  • 基金资助:

    吉林省自然科学基金资助项目(201215016)

Algorithm for Social Network Recommendation Service Based on Hadoop

LI Ling, REN Qing, FU Yuan, CHEN He, MEI Sheng-min   

  1. College of Communication Engineering, Jilin University, Changchun 130012, China
  • Online:2013-07-20 Published:2013-08-23

摘要:

为高效处理社交网络产生的海量数据, 并保证社交网的可扩展性, 将TF-IDF(Term Frequency-Inverse Document Frequency)算法进行MapReduce化设计, 并在Hadoop云平台上实现分布式的TF-IDF算法。利用该算法提取用户微博中的关键词, 再根据关键词发现用户的兴趣, 并对用户做相应的推荐。为验证分布式TF-IDF算法的有效性和可扩展性, 与TextRank算法的结果做对比。实验结果表明, 分布式TF-IDF算法提取的关键词能更准确地描述用户的特性, 同时具有良好的可扩展性。

关键词: Hadoop云平台, 分布式TF-IDF算法, MapReduce模型, TextRank算法

Abstract:

In order to process huge amount of data generated in the social network with efficiency and scalability, we designed the distributed TF-IDF (Term Frequency-Inverse Document Frequency) algorithm suitable for MapReduce, and implemented this algorithm on Hadoop. This algorithm extracts key words in user's weibo, in this way user's interest could be found, and the corresponding service could be recommended to the user. In order to verify the validity and scalability of the distributed TFIDF algorithm, the results of the distributed TF-IDF algorithm and TextRank algorithm was compared. The experimental results show that key words extracted by the distributed TF-IDF algorithm could represent characteristics of the user more accurately. By Contrasting the response time, it could be seen that the distributed TF
-IDF algorithm has a good scalability.

Key words: hadoop, distributed TF-IDF algorithm, mapreduce, textrank algorithm

中图分类号: 

  • TN915