First, you should load recharts
:
library(recharts)
WordCloud has only 1 type: wordCloud
The keys are:
x
represents the wordsy
represents the frequency of the wordsseries
is not linked with legend, but linked with colorsechartr(data, x, y, <t>, <type>)
Arg | Requirement |
---|---|
data |
source data in the form of data.frame |
x |
character independent variable. Only the first one is accepted if multiple variables are provided. |
y |
numeric dependent variable. Only the first one is accepted. |
series |
data series variable which will be coerced to factors. Only the first one is accepted if multiple variables are provided. |
t |
timeline variable which will be coerced to factors. Only the first one is accepted if multiple variables are provided. |
type |
‘wordCloud’ |
Fetch the Baidu buzz hotword web page and parse it into a data.frame Keyword and Freq.
For this purpose, we composed a function getBaiduHot
to parse the Baidu Hot Word Trend web page.
getBaiduHot <- function(url, top=30, HTMLencoding=NULL){
baiduhot <- paste0(readLines(url), collapse="")
charset <- gsub('^.+charset=([[:alnum:]-]+?)[^[:alnum:]-].+$', "\\1",
baiduhot)
if (is.null(HTMLencoding)) if (!is.null(charset)) HTMLencoding <- charset
baiduhot <- stringr::str_conv(baiduhot, HTMLencoding)
hotword <- gsub(".+?<a class=\"list-title\"[^>]+?>([^<>]+?)</a>.+?<span class=\"icon-(rise|fair|fall)\">(\\d+?)</span>.+?","\\1\t\\3\t\\2\t", baiduhot)
hotword <- enc2native(gsub("^(.+?)\t{4,}.+$","\\1", hotword))
hotword <- t(matrix(unlist(strsplit(hotword,"\t")), nrow=3))
hotword <- as.data.frame(hotword, stringsAsFactors=FALSE)
names(hotword) <- c("Keyword", "Freq", "Trend")
hotword$Freq <- as.numeric(hotword$Freq)
hotword <- hotword[order(hotword$Freq, decreasing=TRUE),]
return(hotword[1:top,])
}
hotword <- getBaiduHot("http://top.baidu.com/buzz?b=1", HTMLencoding = 'GBK')
knitr::kable(hotword)
Keyword | Freq | Trend | |
---|---|---|---|
11 | 小姑娘你火了 | 116955 | rise |
10 | 曝美女兵裸照丑闻 | 115900 | fair |
12 | 试探男友谎称绑架 | 106834 | fall |
9 | 富二代玩枪建工厂 | 76881 | rise |
13 | 男子谋生杀猫卖钱 | 42903 | fall |
16 | 儿生日父亲送毒品 | 39328 | fall |
14 | 男子撞脸达尔文 | 38389 | rise |
17 | 老太眼内8条活虫 | 33053 | rise |
5 | 铁路运行图将调整 | 27491 | rise |
1 | 清洁工擦窗困楼外 | 25871 | rise |
30 | 两架小型飞机相撞 | 21520 | fall |
46 | 耐克气垫门曝光 | 21170 | rise |
15 | 敖厂长被威胁事件 | 20979 | rise |
6 | 蒙冤16年回老家 | 18929 | rise |
7 | 离婚冷静期通知书 | 17932 | rise |
24 | 小学课文被指杜撰 | 12642 | fall |
2 | 贾静雯三胎再产女 | 12292 | rise |
27 | 偷上万元发红包 | 11621 | fall |
19 | 香港旺角暴乱罪成 | 11488 | fall |
45 | 安以轩宣布结婚 | 10655 | fall |
23 | 陈妍希短裙秀美腿 | 10652 | fall |
3 | 秘鲁洪灾泥石流 | 10441 | rise |
22 | 10亿建豪华校区 | 10137 | fall |
41 | 沈梦辰揭澡戏内幕 | 9333 | rise |
20 | 李维嘉终于笑了 | 9205 | fall |
21 | 洋妞街头脱衣暴走 | 9023 | rise |
18 | 男婴出生就18岁 | 8981 | rise |
25 | 火锅底料用560次 | 8295 | rise |
31 | 李维嘉被经纪人骗 | 8161 | rise |
26 | 捡到钱包要求陪睡 | 6526 | rise |
Only provide x
and y
.
echartr(hotword, Keyword, Freq, type='wordCloud') %>%
setTitle('Baidu Hot Word Top30 (realtime)', as.character(Sys.time()))
We want to group the hot words. Let’s assign a series
variable ‘Trend’. The ‘rise’ series and ‘fall’ series are colored differently.
echartr(hotword, Keyword, Freq, Trend, type='wordCloud') %>%
setTitle('Baidu Hot Word Top30 (realtime)', as.character(Sys.time()))
Let’s compare realtime, today, and 7-days hotwords.
First, get the other two web pages and rbind the datasets.
hotword$t <- 'Realtime'
hotword1 <- getBaiduHot("http://top.baidu.com/buzz?b=341&fr=topbuzz_b1",
HTMLencoding = 'GBK')
hotword1$t <- 'Today'
hotword2 <- getBaiduHot("http://top.baidu.com/buzz?b=42&c=513&fr=topbuzz_b341",
HTMLencoding = 'GBK')
hotword2$t <- '7-days'
hotword <- do.call('rbind', list(hotword, hotword1, hotword2))
hotword$t <- factor(hotword$t, levels=c('Realtime', 'Today', '7-days'))
Then come up with the chart.
g <- echartr(hotword, Keyword, Freq, t=t, type='wordCloud') %>%
setTitle('Baidu Hot Word Top30')
g
Then you can configure the widgets, add markLines and/or markPoints, fortify the chart.
setTheme
g %>% setTheme('dark', palette='manyeyes')
You can refer to related functions to play around on your own.