library(raster)
## Loading required package: sp
The input and output data will be downloaded and generated in the 'data' folder of the working directory.
work.dir<-getwd()
ifelse(!dir.exists(file.path(work.dir, 'data')), dir.create(file.path(work.dir, 'data')), FALSE)
## [1] TRUE
The input data is available along with the following publication https://doi.org/10.1038/s41597-020-00675-z . To generate the LUCAS Copernicus polygons, the LUCAS point geometries are downloaded from https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/LUCAS/LUCAS_harmonised/1_table/lucas_harmo_uf_2018.zip along with the table containing the 2018 harmonised survey data from https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/LUCAS/LUCAS_harmonised/2_geometry/LUCAS_th_geom.zip .
Read the shapefile LUCAS_th_geom.shp containing the LUCAS theroretical points locations (it could take take about 5 min as there are 0.7 1.351.293 points with 3 columns) and select only the points surveyed in 2018 (337.854 points).
# Read the shapefile file with the theroretical points locations
lucas_geom
## class : SpatialPointsDataFrame
## features : 337854
## extent : -10.45588, 34.5606, 34.57025, 70.08408 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 3
## names : ID, POINT_ID, YEAR
## min values : 1, 26381954, 2018
## max values : 99996, 65241758, 2018
Read the table lucas_harmo_uf_2018.csv containing the LUCAS 2018 survey data (337.854 points with 117 columns) and merge it with the geometries loaded previously.
# Read the shapefile file with the theroretical points locations
str(lucas_table)
## 'data.frame': 337854 obs. of 117 variables:
## $ id : int 254889 228881 359812 247225 226038 339864 301836 282775 368367 399979 ...
## $ point_id : int 50883322 47785074 43583324 41302926 40503126 43982352 47601610 44803216 46404066 50722882 ...
## $ year : int 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
## $ nuts0 : chr "PL" "SE" "DE" "DE" ...
## $ nuts1 : chr "PL9" "SE3" "DE9" "DEB" ...
## $ nuts2 : chr "PL91" "SE33" "DE93" "DEB3" ...
## $ nuts3 : chr "PL912" "SE332" "DE93A" "DEB3G" ...
## $ th_lat : num 52.5 68.5 53 49.4 51.2 ...
## $ th_long : num 21.34 21.09 10.55 7.37 6.12 ...
## $ office_pi : int 0 1 0 0 1 0 0 0 0 0 ...
## $ ex_ante : int 0 0 0 0 0 0 0 0 0 0 ...
## $ survey_date : chr "16/06/18" "19/04/18" "23/08/18" "06/06/18" ...
## $ car_latitude : num 52.5 88.9 53 49.4 88.9 ...
## $ car_ew : chr "East" "Not relevant" "East" "East" ...
## $ car_longitude : num 21.34 88.89 10.55 7.37 88.89 ...
## $ gps_proj : chr "WGS84" "Not relevant" "WGS84" "WGS84" ...
## $ gps_prec : int 0 8888 9 3 8888 3 3 7 6 3 ...
## $ gps_altitude : int 131 8888 72 325 8888 730 66 104 80 358 ...
## $ gps_lat : num 52.5 88.9 53 49.4 88.9 ...
## $ gps_ew : chr "East" "Not relevant" "East" "East" ...
## $ gps_long : num 21.34 88.89 10.55 7.37 88.89 ...
## $ obs_dist : int 1 NA 0 1 NA 35 3 0 0 2 ...
## $ obs_direct : chr "On the point" "On the point" "On the point" "On the point" ...
## $ obs_type : chr "In situ < 100 m" "In office PI" "In situ < 100 m" "In situ < 100 m" ...
## $ obs_radius : logi NA NA NA NA NA NA ...
## $ letter_group : chr "F" "H" "B" "B" ...
## $ lc1 : chr "F40" "H12" "B11" "B51" ...
## $ lc1_label : chr "Other bare soil" "Peatbogs" "Common wheat" "Clovers" ...
## $ lc1_spec : chr "8" "8" "8" "8" ...
## $ lc1_spec_label : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lc1_perc : chr "> 75 %" "> 75 %" "10 - 25 %" "> 75 %" ...
## $ lc2 : chr "8" "8" "8" "8" ...
## $ lc2_label : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lc2_spec : chr "8" "8" "8" "8" ...
## $ lc2_spec_label : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lc2_perc : chr "Not relevant" " 0" " 0" " 0" ...
## $ lu1 : chr "U112" "U350" "U111" "U111" ...
## $ lu1_label : chr "Fallow land" "Community services" "Agriculture (excluding fallow land and kitchen gardens)" "Agriculture (excluding fallow land and kitchen gardens)" ...
## $ lu1_type : chr "8" "8" "8" "8" ...
## $ lu1_type_label : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lu1_perc : chr "> 90 %" "> 90 %" "> 90 %" "> 90 %" ...
## $ lu2 : chr "8" "8" "8" "8" ...
## $ lu2_label : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lu2_type : chr "8" "8" "8" "8" ...
## $ lu2_type_label : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lu2_perc : chr " 0" " 0" " 0" " 0" ...
## $ parcel_area_ha : chr "1 - 10 ha" "1 - 10 ha" "1 - 10 ha" "1 - 10 ha" ...
## $ tree_height_maturity : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ tree_height_survey : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ feature_width : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ lm_stone_walls : chr "No" "No" "No" "No" ...
## $ crop_residues : chr "No" "No" "Yes" "No" ...
## $ lm_grass_margins : chr "No" "No" "> 1 m width" "No" ...
## $ grazing : chr "No signs of grazing" "No signs of grazing" "No signs of grazing" "No signs of grazing" ...
## $ special_status : chr "No special status" "No special status" "Hunting" "Hunting" ...
## $ lc_lu_special_remark : chr "No remark" "No remark" "No remark" "No remark" ...
## $ cprn_cando : chr "Yes" "Not relevant" "Yes" "Yes" ...
## $ cprn_lc : chr "F4" "8" "B1" "B5" ...
## $ cprn_lc_label : chr "Other bare soil" "Not relevant" "Cereals" "Fodder crops" ...
## $ cprn_lc1n : int 23 88 51 25 88 88 88 51 51 20 ...
## $ cprnc_lc1e : int 43 88 51 30 88 88 88 51 51 15 ...
## $ cprnc_lc1s : int 23 88 51 20 88 88 88 16 51 51 ...
## $ cprnc_lc1w : int 16 88 51 25 88 88 88 51 51 40 ...
## $ cprn_lc1n_brdth : int 30 888 888 100 888 888 888 888 888 0 ...
## $ cprn_lc1e_brdth : int 100 888 888 50 888 888 888 888 888 0 ...
## $ cprn_lc1s_brdth : int 40 888 888 100 888 888 888 40 888 888 ...
## $ cprn_lc1w_brdth : int 95 888 888 50 888 888 888 888 888 100 ...
## $ cprn_lc1n_next : chr "C1" "8" "8" "C1" ...
## $ cprn_lc1s_next : chr "B5" "8" "8" "E2" ...
## $ cprn_lc1e_next : chr "B5" "8" "8" "C1" ...
## $ cprn_lc1w_next : chr "C3" "8" "8" "E2" ...
## $ cprn_urban : chr "No" "No" "No" "No" ...
## $ cprn_impervious_perc : int 88 888 85 0 888 0 0 0 4 0 ...
## $ inspire_plcc1 : int 0 888 888 888 0 0 0 90 57 0 ...
## $ inspire_plcc2 : int 0 888 888 888 6 100 5 0 26 9 ...
## $ inspire_plcc3 : int 0 888 888 888 9 0 9 3 4 35 ...
## $ inspire_plcc4 : int 8 888 888 888 85 0 86 3 3 56 ...
## $ inspire_plcc5 : int 0 888 888 888 0 0 0 4 6 0 ...
## $ inspire_plcc6 : int 0 888 888 888 0 0 0 0 1 0 ...
## $ inspire_plcc7 : int 92 888 888 888 0 0 0 0 3 0 ...
## $ inspire_plcc8 : int 0 888 888 888 0 0 0 0 0 0 ...
## $ eunis_complex : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ grassland_sample : int 0 0 0 0 0 0 0 0 0 0 ...
## $ grass_cando : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ wm : chr "No visible water management" "Not relevant" "No visible water management" "No visible water management" ...
## $ wm_source : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ wm_type : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ wm_delivery : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ erosion_cando : chr "Yes" "Not relevant" "Yes" "Yes" ...
## $ soil_stones_perc : chr "Not relevant" "Not relevant" " 0" " 0" ...
## $ bio_sample : int 0 0 0 0 0 0 0 0 0 0 ...
## $ soil_bio_taken : int 8 8 8 8 8 0 0 8 8 8 ...
## $ bulk0_10_sample : int 0 0 0 0 0 0 0 0 0 0 ...
## $ soil_blk_0_10_taken : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ bulk10_20_sample : int 0 0 0 0 0 0 0 0 0 0 ...
## $ soil_blk_10_20_taken : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ bulk20_30_sample : int 0 0 0 0 0 0 0 0 0 0 ...
## $ soil_blk_20_30_taken : chr "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
## $ standard_sample : int 1 0 1 1 0 1 1 1 1 1 ...
## [list output truncated]
The results is a SpatialPointsDataFrame of 337854 records with 119 variables. For a description of all the variables, see the record descriptor https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/LUCAS/LUCAS_harmonised/3_supporting/lucas_harmo_record_descriptor.xls.
spdf
## class : SpatialPointsDataFrame
## features : 337854
## extent : -10.45588, 34.5606, 34.57025, 70.08408 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 119
## names : POINT_ID, ID, YEAR, ID.1, YEAR.1, NUTS0, NUTS1, NUTS2, NUTS3, TH_LAT, TH_LONG, OFFICE_PI, EX_ANTE, SURVEY_DATE, CAR_LATITUDE, ...
## min values : 26381954, 1, 2018, 1, 2018, AT, AT1, AT11, AT111, 34.57024918, -10.45587972, 0, 0, 01/05/18, -4.167545, ...
## max values : 65241758, 99996, 2018, 572960, 2018, UK, UKN, UKN1, UKN16, 70.084081637, 34.560604893, 1, 1, 31/10/18, 88.888888, ...
The polygon geometries are generated for 63.364 points for which the Copernicus module was collected. Each point need to be projected in ETRS89-LAEA to add the N, E, S, W planimetric distance to generate the polygon.
This part of the code could take a lot of time and is probably not optimized properly.
The first filter removes 67 polygons for shich GPS information is wrong (i.e. GPS_PREC==8888, GPS_LAT==0,GPS_LONG==0). The second filter removes 10 polygons for which the difference between TH_LAT and GPS_LAT or TH_LONG and GPS_LONG is greater than 0.1 degree. This occurs when GPS_EW is wrongly encoded.
The resulting spdf
copernicus_polygon
## class : SpatialPolygonsDataFrame
## features : 63287
## extent : -10.18257, 34.06011, 34.59811, 69.95665 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 121
## names : POINT_ID, ID, YEAR, ID.1, YEAR.1, NUTS0, NUTS1, NUTS2, NUTS3, TH_LAT, TH_LONG, OFFICE_PI, EX_ANTE, SURVEY_DATE, CAR_LATITUDE, ...
## min values : 26461768, 100007, 2018, 22534, 2018, AT, AT1, AT11, AT111, 34.598316652, -10.18181545, 0, 0, 01/05/18, -4.167545, ...
## max values : 65021668, 99974, 2018, 438280, 2018, UK, UKN, UKN1, UKN16, 69.956514696, 34.059950365, 0, 0, 31/10/18, 69.956338, ...
In order to ineritate he LUCAS core point to the LUCAS Copernicus polygons, two checks are being done. First, the land cover level-2 should be the same for both LUCAS core (LC1) and LUCAS copernicus (CPRN_LC). Second, the LUCAS core point geometry (th_long,th_lat) should be within the LUCAS Copernicus polygon. These two conditions (i.e. CPRN_LC_SAME_LC1 and LUCAS_CORE_INTERSECT) are sumarized in the COPERNICUS_CLEANED attribute.
table(copernicus_polygon@data$COPERNICUS_CLEANED)
##
## FALSE TRUE
## 4861 58426
The outputs are saved in ./data as a Rdata (LUCAS_2018_Copernicus_polygons.RData) but are also exported as a shapefile in ./LUCAS_2018_Copernicus/LUCAS_2018_Copernicus_polygons.shp along with the attribute table as a csv ./LUCAS_2018_Copernicus/LUCAS_2018_Copernicus_attributes.csv. The sapefile along with the attribute table are compressed in ./data/LUCAS_2018_Copernicus.zip.
## [1] TRUE