Genereate the polygons

Create the directory

library(raster)
## Loading required package: sp

The input and output data will be downloaded and generated in the 'data' folder of the working directory.

work.dir<-getwd()
ifelse(!dir.exists(file.path(work.dir, 'data')), dir.create(file.path(work.dir, 'data')), FALSE)
## [1] TRUE

Download the input data and load them

The input data is available along with the following publication https://doi.org/10.1038/s41597-020-00675-z . To generate the LUCAS Copernicus polygons, the LUCAS point geometries are downloaded from https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/LUCAS/LUCAS_harmonised/1_table/lucas_harmo_uf_2018.zip along with the table containing the 2018 harmonised survey data from https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/LUCAS/LUCAS_harmonised/2_geometry/LUCAS_th_geom.zip .

Read the shapefile LUCAS_th_geom.shp containing the LUCAS theroretical points locations (it could take take about 5 min as there are 0.7 1.351.293 points with 3 columns) and select only the points surveyed in 2018 (337.854 points).

# Read the shapefile file with the theroretical points locations
lucas_geom
## class       : SpatialPointsDataFrame 
## features    : 337854 
## extent      : -10.45588, 34.5606, 34.57025, 70.08408  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 3
## names       :    ID, POINT_ID, YEAR 
## min values  :     1, 26381954, 2018 
## max values  : 99996, 65241758, 2018

Read the table lucas_harmo_uf_2018.csv containing the LUCAS 2018 survey data (337.854 points with 117 columns) and merge it with the geometries loaded previously.

# Read the shapefile file with the theroretical points locations
str(lucas_table)
## 'data.frame':    337854 obs. of  117 variables:
##  $ id                   : int  254889 228881 359812 247225 226038 339864 301836 282775 368367 399979 ...
##  $ point_id             : int  50883322 47785074 43583324 41302926 40503126 43982352 47601610 44803216 46404066 50722882 ...
##  $ year                 : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
##  $ nuts0                : chr  "PL" "SE" "DE" "DE" ...
##  $ nuts1                : chr  "PL9" "SE3" "DE9" "DEB" ...
##  $ nuts2                : chr  "PL91" "SE33" "DE93" "DEB3" ...
##  $ nuts3                : chr  "PL912" "SE332" "DE93A" "DEB3G" ...
##  $ th_lat               : num  52.5 68.5 53 49.4 51.2 ...
##  $ th_long              : num  21.34 21.09 10.55 7.37 6.12 ...
##  $ office_pi            : int  0 1 0 0 1 0 0 0 0 0 ...
##  $ ex_ante              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ survey_date          : chr  "16/06/18" "19/04/18" "23/08/18" "06/06/18" ...
##  $ car_latitude         : num  52.5 88.9 53 49.4 88.9 ...
##  $ car_ew               : chr  "East" "Not relevant" "East" "East" ...
##  $ car_longitude        : num  21.34 88.89 10.55 7.37 88.89 ...
##  $ gps_proj             : chr  "WGS84" "Not relevant" "WGS84" "WGS84" ...
##  $ gps_prec             : int  0 8888 9 3 8888 3 3 7 6 3 ...
##  $ gps_altitude         : int  131 8888 72 325 8888 730 66 104 80 358 ...
##  $ gps_lat              : num  52.5 88.9 53 49.4 88.9 ...
##  $ gps_ew               : chr  "East" "Not relevant" "East" "East" ...
##  $ gps_long             : num  21.34 88.89 10.55 7.37 88.89 ...
##  $ obs_dist             : int  1 NA 0 1 NA 35 3 0 0 2 ...
##  $ obs_direct           : chr  "On the point" "On the point" "On the point" "On the point" ...
##  $ obs_type             : chr  "In situ < 100 m" "In office PI" "In situ < 100 m" "In situ < 100 m" ...
##  $ obs_radius           : logi  NA NA NA NA NA NA ...
##  $ letter_group         : chr  "F" "H" "B" "B" ...
##  $ lc1                  : chr  "F40" "H12" "B11" "B51" ...
##  $ lc1_label            : chr  "Other bare soil" "Peatbogs" "Common wheat" "Clovers" ...
##  $ lc1_spec             : chr  "8" "8" "8" "8" ...
##  $ lc1_spec_label       : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lc1_perc             : chr  "> 75 %" "> 75 %" "10 - 25 %" "> 75 %" ...
##  $ lc2                  : chr  "8" "8" "8" "8" ...
##  $ lc2_label            : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lc2_spec             : chr  "8" "8" "8" "8" ...
##  $ lc2_spec_label       : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lc2_perc             : chr  "Not relevant" "   0" "   0" "   0" ...
##  $ lu1                  : chr  "U112" "U350" "U111" "U111" ...
##  $ lu1_label            : chr  "Fallow land" "Community services" "Agriculture (excluding fallow land and kitchen gardens)" "Agriculture (excluding fallow land and kitchen gardens)" ...
##  $ lu1_type             : chr  "8" "8" "8" "8" ...
##  $ lu1_type_label       : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lu1_perc             : chr  "> 90 %" "> 90 %" "> 90 %" "> 90 %" ...
##  $ lu2                  : chr  "8" "8" "8" "8" ...
##  $ lu2_label            : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lu2_type             : chr  "8" "8" "8" "8" ...
##  $ lu2_type_label       : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lu2_perc             : chr  "   0" "   0" "   0" "   0" ...
##  $ parcel_area_ha       : chr  "1 - 10 ha" "1 - 10 ha" "1 - 10 ha" "1 - 10 ha" ...
##  $ tree_height_maturity : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ tree_height_survey   : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ feature_width        : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ lm_stone_walls       : chr  "No" "No" "No" "No" ...
##  $ crop_residues        : chr  "No" "No" "Yes" "No" ...
##  $ lm_grass_margins     : chr  "No" "No" "> 1 m width" "No" ...
##  $ grazing              : chr  "No signs of grazing" "No signs of grazing" "No signs of grazing" "No signs of grazing" ...
##  $ special_status       : chr  "No special status" "No special status" "Hunting" "Hunting" ...
##  $ lc_lu_special_remark : chr  "No remark" "No remark" "No remark" "No remark" ...
##  $ cprn_cando           : chr  "Yes" "Not relevant" "Yes" "Yes" ...
##  $ cprn_lc              : chr  "F4" "8" "B1" "B5" ...
##  $ cprn_lc_label        : chr  "Other bare soil" "Not relevant" "Cereals" "Fodder crops" ...
##  $ cprn_lc1n            : int  23 88 51 25 88 88 88 51 51 20 ...
##  $ cprnc_lc1e           : int  43 88 51 30 88 88 88 51 51 15 ...
##  $ cprnc_lc1s           : int  23 88 51 20 88 88 88 16 51 51 ...
##  $ cprnc_lc1w           : int  16 88 51 25 88 88 88 51 51 40 ...
##  $ cprn_lc1n_brdth      : int  30 888 888 100 888 888 888 888 888 0 ...
##  $ cprn_lc1e_brdth      : int  100 888 888 50 888 888 888 888 888 0 ...
##  $ cprn_lc1s_brdth      : int  40 888 888 100 888 888 888 40 888 888 ...
##  $ cprn_lc1w_brdth      : int  95 888 888 50 888 888 888 888 888 100 ...
##  $ cprn_lc1n_next       : chr  "C1" "8" "8" "C1" ...
##  $ cprn_lc1s_next       : chr  "B5" "8" "8" "E2" ...
##  $ cprn_lc1e_next       : chr  "B5" "8" "8" "C1" ...
##  $ cprn_lc1w_next       : chr  "C3" "8" "8" "E2" ...
##  $ cprn_urban           : chr  "No" "No" "No" "No" ...
##  $ cprn_impervious_perc : int  88 888 85 0 888 0 0 0 4 0 ...
##  $ inspire_plcc1        : int  0 888 888 888 0 0 0 90 57 0 ...
##  $ inspire_plcc2        : int  0 888 888 888 6 100 5 0 26 9 ...
##  $ inspire_plcc3        : int  0 888 888 888 9 0 9 3 4 35 ...
##  $ inspire_plcc4        : int  8 888 888 888 85 0 86 3 3 56 ...
##  $ inspire_plcc5        : int  0 888 888 888 0 0 0 4 6 0 ...
##  $ inspire_plcc6        : int  0 888 888 888 0 0 0 0 1 0 ...
##  $ inspire_plcc7        : int  92 888 888 888 0 0 0 0 3 0 ...
##  $ inspire_plcc8        : int  0 888 888 888 0 0 0 0 0 0 ...
##  $ eunis_complex        : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ grassland_sample     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ grass_cando          : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ wm                   : chr  "No visible water management" "Not relevant" "No visible water management" "No visible water management" ...
##  $ wm_source            : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ wm_type              : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ wm_delivery          : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ erosion_cando        : chr  "Yes" "Not relevant" "Yes" "Yes" ...
##  $ soil_stones_perc     : chr  "Not relevant" "Not relevant" " 0" " 0" ...
##  $ bio_sample           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ soil_bio_taken       : int  8 8 8 8 8 0 0 8 8 8 ...
##  $ bulk0_10_sample      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ soil_blk_0_10_taken  : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ bulk10_20_sample     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ soil_blk_10_20_taken : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ bulk20_30_sample     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ soil_blk_20_30_taken : chr  "Not relevant" "Not relevant" "Not relevant" "Not relevant" ...
##  $ standard_sample      : int  1 0 1 1 0 1 1 1 1 1 ...
##   [list output truncated]

The results is a SpatialPointsDataFrame of 337854 records with 119 variables. For a description of all the variables, see the record descriptor https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/LUCAS/LUCAS_harmonised/3_supporting/lucas_harmo_record_descriptor.xls.

spdf
## class       : SpatialPointsDataFrame 
## features    : 337854 
## extent      : -10.45588, 34.5606, 34.57025, 70.08408  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 119
## names       : POINT_ID,    ID, YEAR,   ID.1, YEAR.1, NUTS0, NUTS1, NUTS2, NUTS3,       TH_LAT,      TH_LONG, OFFICE_PI, EX_ANTE, SURVEY_DATE, CAR_LATITUDE, ... 
## min values  : 26381954,     1, 2018,      1,   2018,    AT,   AT1,  AT11, AT111,  34.57024918, -10.45587972,         0,       0,    01/05/18,    -4.167545, ... 
## max values  : 65241758, 99996, 2018, 572960,   2018,    UK,   UKN,  UKN1, UKN16, 70.084081637, 34.560604893,         1,       1,    31/10/18,    88.888888, ...

Build the polygon geometries

The polygon geometries are generated for 63.364 points for which the Copernicus module was collected. Each point need to be projected in ETRS89-LAEA to add the N, E, S, W planimetric distance to generate the polygon.

This part of the code could take a lot of time and is probably not optimized properly.

Quality check and filter wrong polygons

The first filter removes 67 polygons for shich GPS information is wrong (i.e. GPS_PREC==8888, GPS_LAT==0,GPS_LONG==0). The second filter removes 10 polygons for which the difference between TH_LAT and GPS_LAT or TH_LONG and GPS_LONG is greater than 0.1 degree. This occurs when GPS_EW is wrongly encoded.

The resulting spdf

copernicus_polygon
## class       : SpatialPolygonsDataFrame 
## features    : 63287 
## extent      : -10.18257, 34.06011, 34.59811, 69.95665  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs 
## variables   : 121
## names       : POINT_ID,     ID, YEAR,   ID.1, YEAR.1, NUTS0, NUTS1, NUTS2, NUTS3,       TH_LAT,      TH_LONG, OFFICE_PI, EX_ANTE, SURVEY_DATE, CAR_LATITUDE, ... 
## min values  : 26461768, 100007, 2018,  22534,   2018,    AT,   AT1,  AT11, AT111, 34.598316652, -10.18181545,         0,       0,    01/05/18,    -4.167545, ... 
## max values  : 65021668,  99974, 2018, 438280,   2018,    UK,   UKN,  UKN1, UKN16, 69.956514696, 34.059950365,         0,       0,    31/10/18,    69.956338, ...

Linking LUCAS Copernicus polygons to LUCAS core

In order to ineritate he LUCAS core point to the LUCAS Copernicus polygons, two checks are being done. First, the land cover level-2 should be the same for both LUCAS core (LC1) and LUCAS copernicus (CPRN_LC). Second, the LUCAS core point geometry (th_long,th_lat) should be within the LUCAS Copernicus polygon. These two conditions (i.e. CPRN_LC_SAME_LC1 and LUCAS_CORE_INTERSECT) are sumarized in the COPERNICUS_CLEANED attribute.

table(copernicus_polygon@data$COPERNICUS_CLEANED)
## 
## FALSE  TRUE 
##  4861 58426

Save output results

The outputs are saved in ./data as a Rdata (LUCAS_2018_Copernicus_polygons.RData) but are also exported as a shapefile in ./LUCAS_2018_Copernicus/LUCAS_2018_Copernicus_polygons.shp along with the attribute table as a csv ./LUCAS_2018_Copernicus/LUCAS_2018_Copernicus_attributes.csv. The sapefile along with the attribute table are compressed in ./data/LUCAS_2018_Copernicus.zip.

## [1] TRUE