Date last run: 27Sep2019
Introduction
While browsing the internet I found 21 Recipes for Mining Twitter Data with rtweet from Bob Rudis and Paul Campbell. The corresponding github repository points to a blog entry with some background material. While trying to reproduce some of the recipes I was wondering which url
s were generated and how the authorisation structure was used to request the data. About the generation of url’s I wrote in url generation in the rtweet package. Also the setup of the authorisation structure was briefly discussed.
In this entry I will describe how the authorisation structure is used in combination with a generated url.
Set up the library and authorisation structure
I described how to setup the authorisation structure in the other document. If I don’t specify a token in a function call, then the information created in the setup will be used.
Program flow in rtweet package
In the second recipe of 21 Recipes for Mining Twitter Data with rtweet the authors mention the rtweet::get_trends
function. We studied this function to see how the data is retrieved with API calls.
Following the description in this recipe and looking at the code I see that the subfunction rtweet:::get_trends_
is called that does two requests to the twitter API.
The relevant parts of this function (concentrating on the second API call)
rtweet:::get_trends_ <-
001 function (woeid = 1, lat = NULL, lng = NULL, exclude = FALSE,
002 token = NULL, parse = TRUE)
003 {
...
033 query <- "trends/place"
034 token <- check_token(token)
...
041 params <- list(id = woeid, exclude = exclude)
042 url <- make_url(query = query, param = params)
043 gt <- TWIT(get = TRUE, url, token)
...
049 }
The rtweet:::check_token
retrieves the token created at installation time, because the token
argument defaults to NULL
.
The rtweet:::make_url
function uses the query
variable (here “trends/place”) to format the url to be used in the GET
function :
rtweet:::make_url <-
001 function (restapi = TRUE, query, param = NULL)
002 {
003 if (restapi) {
004 hostname <- "api.twitter.com"
005 }
006 else {
007 hostname <- "stream.twitter.com"
008 }
009 structure(list(scheme = "https", hostname = hostname, port = NULL,
010 path = paste0("1.1/", query, ".json"), query = param,
011 params = NULL, fragment = NULL, username = NULL, password = NULL),
012 class = "url")
013 }
The actual API call is done in this case via TWIT
:
rtweet:::TWIT <-
001 function (get = TRUE, url, ...)
002 {
003 if (get) {
004 GET(url, ...)
005 }
006 else {
007 POST(url, ...)
008 }
009 }
With this information we now can create our own functions to obtain the Amsterdam trends
get_Adam_trends <- function(parse=TRUE,resonly=FALSE) {
query <- "trends/place"
token <- rtweet:::check_token(NULL)
param <- list(id = '727232', exclude = NULL)
url <- rtweet:::make_url(query = query, param = param)
trd <- rtweet:::TWIT(get = TRUE, url, token)
if (resonly)
return(trd)
trd <- rtweet:::from_js(trd)
if (parse)
trd <- rtweet:::parse_trends(trd)
trd
}
Adam <- get_Adam_trends()
str(head(Adam,1))
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 9 variables:
#> $ trend : chr "#klimaatstaking"
#> $ url : chr "http://twitter.com/search?q=%23klimaatstaking"
#> $ promoted_content: logi NA
#> $ query : chr "%23klimaatstaking"
#> $ tweet_volume : int NA
#> $ place : chr "Amsterdam"
#> $ woeid : int 727232
#> $ as_of : POSIXct, format: "2019-09-27 12:12:56"
#> $ created_at : POSIXct, format: "2019-09-27 12:05:56"
With this function we can also see what url-string is created and used to access the twitter data. We can use this url-string in the httr::GET
function but only in combination with the token:
(url <- get_Adam_trends(resonly=TRUE)$url)
#> [1] "https://api.twitter.com/1.1/trends/place.json?id=727232"
res = httr::GET(url)
print(httr::content(res, as = "text"))
#> [1] "{\"errors\":[{\"code\":215,\"message\":\"Bad Authentication data.\"}]}"
token <- rtweet:::check_token(NULL)
(res = httr::GET(url,token))
#> Response [https://api.twitter.com/1.1/trends/place.json?id=727232]
#> Date: 2019-09-27 12:12
#> Status: 200
#> Content-Type: application/json;charset=utf-8
#> Size: 7.13 kB
The API reference pages give details about the various API endpoints. For this particular endpoints see https://developer.twitter.com/en/docs/trends/locations-with-trending-topics/api-reference/get-trends-available and https://developer.twitter.com/en/docs/trends/trends-for-location/api-reference/get-trends-place .
A list with all entrypoints is here .
SessionInfo
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.2 knitr_1.25 magrittr_1.5 HOQCutil_0.1.11
#> [5] R6_2.4.0 rlang_0.4.0 stringr_1.4.0 httr_1.4.1
#> [9] tools_3.6.0 rtweet_0.6.9 xfun_0.8 htmltools_0.3.6
#> [13] askpass_1.1 openssl_1.4.1 digest_0.6.20 tibble_2.1.3
#> [17] crayon_1.3.4 purrr_0.3.2 curl_4.0 glue_1.3.1
#> [21] evaluate_0.14 rmarkdown_1.15 stringi_1.4.3 compiler_3.6.0
#> [25] pillar_1.4.2 jsonlite_1.6 pkgconfig_2.0.2