2024年6月3日发(作者:)
吉林大学毕业论文
Ⅰ
微博内容提取
摘要
随着近年来微博等社交软件的使用人数日益增多,微博的隐私发展也成为人们日
益关注的问题,然而由于微博没有固定的格式约束使得在微博的研究过程中有一些无
意义的“噪音”的干扰,本文主要是为了完成微博的“噪音”过滤问题,实现一个小
软件,来将新浪微博等微博中下载到本地的微博来进行过滤,去除其中的噪音,提取
出纯净的页面内容,主要工作包括以下几个方面 :
(1)字符串的查找函数与分割函数的实现。
(2)多个文件的查找的函数的实现。
(3)固定字符串的即表情“噪音”的过滤实现。
(4)具有一定正则文法的“噪音”的过滤实现。
关键字:中文微博,微博,过滤,噪音,正则
吉林大学毕业论文
Ⅱ
Microblogging content extraction
Author: Liudi
Tutor: Yangkexin
Abstract
With recent years the number of micro-blog using social software is
increasing, the development of micro-blog privacy has become a growing
concern, However, due to the micro blog there is no fixed format constraint
makes the interference of some meaningless "noise" in the research process
of micro blog. the purpose of this paper is to complete the "noise" micro-blog
filtering problem, the realization of a small software, to be used for filtering
the download to the Sina micro-blog micro-blog etc., remove the noise,
extract the page content is pure, the main work includes the following
aspects:
(1) the search function and the function of the string segmentation.
(2) the implementation of the search function for multiple files
(3) the filter of the expression "noise" of the fixed string.


发布评论