Date last run: 08Dec2019
Introduction
Recently I tried to reproduce an example in a vignette of my package odataR.
See here. To my frustration this did not work any more. After some searching I found out that the layout of the table had changed :(
But this gave me reason to study the structure of OData version 3 once again.
And off course to adapt the vignette on the new layout.
Catalog
The available tables can be found with the following code. Some explanation:
- package
dplyr
is attached because it will be used later on HOQCutil::silent_library
avoids package startup messages. PackageHOQCutil
can be found on Github- without arguments the
odataR_get_cat
function returns information about all available tables - because we know the identifier of the table we can use the query to retrieve only the information of that specific table. Annoying that the identifier of the table is
03759ned
: most tables have the indicationNED
but sometimes it isned
.
HOQCutil::silent_library(dplyr)
HOQCutil::silent_library(odataR)
cbscat = odataR::odataR_get_cat()
dim(cbscat)
#> [1] 4704 26
cbscat = odataR::odataR_get_cat(
query="?$filter=startswith(toupper(Identifier),'03759NED')"
)
tibble::glimpse(cbscat,width=getOption('width')+10)
#> Observations: 1
#> Variables: 26
#> $ Updated <chr> "2019-05-17T02:00:00"
#> $ ID <int> 407
#> $ Identifier <chr> "03759ned"
#> $ Title <chr> "Bevolking op 1 januari en gemiddeld; geslacht, leeftijd ...
#> $ ShortTitle <chr> "Bevolking; geslacht, leeftijd, regio"
#> $ ShortDescription <chr> "\nDeze tabel bevat cijfers over de bevolking op 1 januar...
#> $ Summary <chr> "Bevolking op 1 januari en gemiddelde jaarbevolking naar ...
#> $ Modified <chr> "2019-05-17T02:00:00"
#> $ MetaDataModified <chr> "2019-05-17T02:00:00"
#> $ ReasonDelivery <chr> "Correctie"
#> $ ExplanatoryText <chr> "De cijfers over 2018 van de gemiddelde bevolking in de g...
#> $ OutputStatus <chr> "Regulier"
#> $ Source <chr> "CBS."
#> $ Language <chr> "nl"
#> $ Catalog <chr> "CBS"
#> $ Frequency <chr> "Perjaar"
#> $ Period <chr> "1988 - 2018"
#> $ SummaryAndLinks <chr> "Bevolking op 1 januari en gemiddelde jaarbevolking naar ...
#> $ ApiUrl <chr> "https://opendata.cbs.nl/ODataApi/OData/03759ned"
#> $ FeedUrl <chr> "https://opendata.cbs.nl/ODataFeed/OData/03759ned"
#> $ DefaultPresentation <chr> "ts=1557230974927&graphtype=Table&r=RegioS,Perioden&k=Top...
#> $ DefaultSelection <chr> "$filter=((Geslacht eq 'T001038')) and ((Leeftijd eq '100...
#> $ GraphTypes <chr> "Table,Map"
#> $ RecordCount <int> 45982080
#> $ ColumnCount <int> 7
#> $ SearchPriority <chr> "2"
The general information about the table can be found in these catalog fields. The specifics about the coding of columns can be found in tables described in the following section.
Columns in the data tables
With the following code we retrieve the (first 7) attributes of the various dimensions of the table and the topics:
odataR::odataR_get_meta(table_id='03759ned',metatype='DataProperties') %>%
select(1:7) %>%
knitr::kable()
odata.type | ID | Position | ParentID | Type | Key | Title |
---|---|---|---|---|---|---|
Cbs.OData.Dimension | 0 | 0 | NA | Dimension | Geslacht | Geslacht |
Cbs.OData.Dimension | 1 | 1 | NA | Dimension | Leeftijd | Leeftijd |
Cbs.OData.Dimension | 2 | 2 | NA | Dimension | BurgerlijkeStaat | Burgerlijke staat |
Cbs.OData.GeoDimension | 3 | 3 | NA | GeoDimension | RegioS | Regio’s |
Cbs.OData.TimeDimension | 4 | 4 | NA | TimeDimension | Perioden | Perioden |
Cbs.OData.Topic | 5 | 5 | NA | Topic | BevolkingOp1Januari_1 | Bevolking op 1 januari |
Cbs.OData.Topic | 6 | 6 | NA | Topic | GemiddeldeBevolking_2 | Gemiddelde bevolking |
The various subtables that contain the information for these dimensions and topics have (always) the following structure:
odataR::odataR_get_meta(table_id='03759ned',metatype=NULL) %>%
knitr::kable()
The information in TableInfos
looks similar (but not identical?) to the information about the table in the catalog:
odataR::odataR_get_meta(table_id='03759ned',metatype='TableInfos') %>%
tibble::glimpse()
#> Observations: 1
#> Variables: 21
#> $ ID <int> 0
#> $ Title <chr> "Bevolking op 1 januari en gemiddeld; geslacht,...
#> $ ShortTitle <chr> "Bevolking; geslacht, leeftijd, regio"
#> $ Identifier <chr> "03759ned"
#> $ Summary <chr> "Bevolking op 1 januari en gemiddelde jaarbevol...
#> $ Modified <chr> "2019-05-17T02:00:00"
#> $ ReasonDelivery <chr> "Correctie"
#> $ ExplanatoryText <chr> "De cijfers over 2018 van de gemiddelde bevolki...
#> $ Language <chr> "nl"
#> $ Catalog <chr> "CBS"
#> $ Frequency <chr> "Perjaar"
#> $ Period <chr> "1988 - 2018"
#> $ ShortDescription <chr> "\nDeze tabel bevat cijfers over de bevolking o...
#> $ Description <chr> "INHOUDSOPGAVE\r\n\r\n1. Toelichting\r\n2. Defi...
#> $ DefaultPresentation <chr> "ts=1557230974927&graphtype=Table&r=RegioS,Peri...
#> $ DefaultSelection <chr> "$filter=((Geslacht eq 'T001038')) and ((Leefti...
#> $ GraphTypes <chr> "Table,Map"
#> $ OutputStatus <chr> "Regulier"
#> $ Source <chr> "CBS."
#> $ MetaDataModified <chr> "2019-05-17T02:00:00"
#> $ SearchPriority <chr> "2"
The information in UntypedDataset
and TypedDataSet
(the difference is described in the pdf
Handleiding CBS Open Data Services on pages 22 and 23) contains the actual data in coded form. We display the number of ‘records’ in TypedDataSet
and the contents of the ‘records’ 101 up to 103 (numbering starts with 0) :
odataR::odataR_get_table(table_id='03759ned',query='$count')
#> [1] 45982080
odataR::odataR_get_table(table_id='03759ned',decode=F,
query='?$skip=100&$top=3') %>%
knitr::kable()
ID | Geslacht | Leeftijd | BurgerlijkeStaat | RegioS | Perioden | BevolkingOp1Januari_1 | GemiddeldeBevolking_2 |
---|---|---|---|---|---|---|---|
100 | T001038 | 10000 | T001019 | LD03 | 1992JJ00 | 7089671 | NA |
101 | T001038 | 10000 | T001019 | LD03 | 1993JJ00 | 7143917 | NA |
102 | T001038 | 10000 | T001019 | LD03 | 1994JJ00 | 7190422 | NA |
For each of the coded dimensions in UntypedDataset
and TypedDataSet
there exist a code table
odataR::odataR_get_meta(table_id='03759ned',metatype='Geslacht') %>%
knitr::kable()
Key | Title | Description | CategoryGroupID |
---|---|---|---|
T001038 | Totaal mannen en vrouwen | NA | |
3000 | Mannen | NA | |
4000 | Vrouwen | NA |
odataR::odataR_get_meta(table_id='03759ned',metatype='Leeftijd',query='?$top=3') %>%
knitr::kable()
Key | Title | Description | CategoryGroupID |
---|---|---|---|
10000 | Totaal | 1 | |
10010 | 0 jaar | 2 | |
10100 | 1 jaar | 2 |
odataR::odataR_get_meta(table_id='03759ned',metatype='CategoryGroups',query='?$top=3') %>%
knitr::kable()
ID | DimensionKey | Title | Description | ParentID |
---|---|---|---|---|
0 | Leeftijd | Afzonderlijke leeftijden per 10 jaar | NA | |
1 | Leeftijd | Totaal | NA | 0 |
2 | Leeftijd | 0 tot 10 jaar | 0 |
odataR::odataR_get_meta(table_id='03759ned',metatype='BurgerlijkeStaat') %>%
dplyr::mutate(Description=gsub('\r\n',' ',Description)) %>%
dplyr::mutate(Description=substr(Description,1,40)) %>%
knitr::kable()
Key | Title | Description | CategoryGroupID |
---|---|---|---|
T001019 | Totaal burgerlijke staat | Burgerlijke staat: Formele positie van | NA |
1010 | Ongehuwd | Vanaf 2010: burgerlijke staat die aangee | NA |
1020 | Gehuwd | Vanaf 1998: wettig gehuwd plus partnersc | NA |
1050 | Verweduwd | Vanaf 2010: verweduwd na wettig huwelijk | NA |
1080 | Gescheiden | Vanaf 2010: gescheiden na wettig huwelij | NA |
odataR::odataR_get_meta(table_id='03759ned',metatype='RegioS',query='?$top=3') %>%
dplyr::mutate(Description=gsub('\r\n',' ',Description)) %>%
dplyr::mutate(Description=substr(Description,1,40)) %>%
knitr::kable()
Key | Title | Description | CategoryGroupID |
---|---|---|---|
NL01 | Nederland | Grenswijziging per 01-01-2018: - Ontvang | NA |
LD01 | Noord-Nederland (LD) | LD = Landsdeel Noord-Nederland: Groning | NA |
LD02 | Oost-Nederland (LD) | LD = Landsdeel Oost-Nederland: Overijss | NA |
odataR::odataR_get_meta(table_id='03759ned',metatype='Perioden',query='?$top=3') %>%
knitr::kable()
Key | Title | Description | Status |
---|---|---|---|
1988JJ00 | 1988 | NA | Definitief |
1989JJ00 | 1989 | NA | Definitief |
1990JJ00 | 1990 | NA | Definitief |
Example
To find for the year 2016 the number of single men living in Amstelveen that are in their thirties you could use the following query. Note that the keys are all character and therefore the spaces following the actual code are necessary. An alternative is using the startswith
function: e.g. and startswith(Geslacht,'3000')
.
odataR::odataR_get_table(
table_id='03759NED',
query=paste0(
"?$filter=startswith(RegioS,'GM0362')", # Amstelveen
" and Geslacht eq '3000 '", # men only
" and BurgerlijkeStaat eq '1010 '", # not married
" and startswith(Leeftijd,'13')", # age 30, 31, ...39
" and Perioden eq '2016JJ00'", # in 2016
"&$select=Leeftijd,BevolkingOp1Januari_1,GemiddeldeBevolking_2"
) ) %>%
dplyr::select(Leeftijd,BevolkingOp1Januari_1,GemiddeldeBevolking_2) %>%
knitr::kable()
Leeftijd | BevolkingOp1Januari_1 | GemiddeldeBevolking_2 |
---|---|---|
30 jaar | 349 | 372.5 |
31 jaar | 303 | 345.0 |
32 jaar | 272 | 293.0 |
33 jaar | 271 | 261.5 |
34 jaar | 257 | 262.5 |
35 jaar | 278 | 258.5 |
36 jaar | 261 | 271.0 |
37 jaar | 233 | 253.0 |
38 jaar | 216 | 233.0 |
39 jaar | 224 | 210.5 |
Session Info
This document was produced on 08Dec2019 with the following R environment:
#> R version 3.6.0 (2019-04-26)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] odataR_0.1.3 dplyr_0.8.3
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.3 knitr_1.26 magrittr_1.5 tidyselect_0.2.5
#> [5] HOQCutil_0.1.15 R6_2.4.1 rlang_0.4.2 fansi_0.4.0
#> [9] highr_0.8 stringr_1.4.0 httr_1.4.1 tools_3.6.0
#> [13] xfun_0.10 utf8_1.1.4 cli_1.1.0 htmltools_0.4.0
#> [17] digest_0.6.23 assertthat_0.2.1 tibble_2.1.3 crayon_1.3.4
#> [21] purrr_0.3.3 captioner_2.2.3 vctrs_0.2.0 zeallot_0.1.0
#> [25] curl_4.3 glue_1.3.1 evaluate_0.14 rmarkdown_1.18
#> [29] stringi_1.4.3 compiler_3.6.0 pillar_1.4.2 backports_1.1.5
#> [33] jsonlite_1.6 pkgconfig_2.0.3