Type of resources
Contact for the resource
Overview: Vegetation tree species sample points Traceability (lineage): This is an original dataset produced with a machine learning framework which used a combination of point datasets and raster datasets as inputs. Point dataset is a harmonized collection of tree occurrence data, comprising observations from National Forest Inventories (EU-Forest), GBIF and LUCAS. The complete dataset is available on Zenodo. Raster datasets used as input are: harmonized and gapfilled time series of seasonal aggregates of the Landsat GLAD ARD dataset (bands and spectral indices); monthly time series air and surface temperature and precipitation from a reprocessed version of the Copernicus ERA5 dataset; long term averages of bioclimatic variables from CHELSA, tree species distribution maps from the European Atlas of Forest Tree Species; elevation, slope and other elevation-derived metrics; long term monthly averages snow probability and long term monthly averages of cloud fraction from MODIS. For a more comprehensive list refer to Bonannella et al. (2022) (in review, preprint available at: https://doi.org/10.21203/rs.3.rs-1252972/v1). Scientific methodology: Probability and uncertainty maps were the output of a spatiotemporal ensemble machine learning framework based on stacked regularization. Three base models (random forest, gradient boosted trees and generalized linear models) were first trained on the input dataset and their predictions were used to train an additional model (logistic regression) which provided the final predictions. More details on the whole workflow are available in the listed publication. Usability: Probability maps can be used to detect potential forest degradation and compositional change across the time period analyzed. Some possible applications for these topics are explained in the listed publication. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: Distribution maps were validated using a spatial 5-fold cross validation following the workflow detailed in the listed publication. Completeness: The raster files perfectly cover the entire Geo-harmonizer region as defined by the landmask raster dataset available here. Consistency: Areas which are outside of the calibration area of the point dataset (Iceland, Norway) usually have high uncertainty values. This is not only a problem of extrapolation but also of poor representation in the feature space available to the model of the conditions that are present in this countries. Positional accuracy: The rasters have a spatial resolution of 30m. Temporal accuracy: The maps cover the period 2000 - 2020, each map covers a certain number of years according to the following scheme: (1) 2000--2002, (2) 2002--2006, (3) 2006--2010, (4) 2010--2014, (5) 2014--2018 and (6) 2018--2020 Thematic accuracy: Both probability and uncertainty maps contain values from 0 to 100: in the case of probability maps, they indicate the probability of occurrence of a single individual of the target species, while uncertainty maps indicate the standard deviation of the ensemble model.
Overview: corine: Landcover sample points Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.
This data set contains the administrative boundaries at country level of the world and is based on the geometry from EBM v12.x. of EuroGeographics for the members of Eurogeographics, the Global Administrative Units Layer (2015) from FAO (UN) and geometry from the Turkish National Statistical Office. This dataset consists of 2 feature classes (regions, boundaries) per scale level and there are 6 different scale levels (100K, 1M, 3M, 10M, 20M and 60M). The public data set (1M - 60M) is available under the Download link indicated below. The full data set (100K - 60M) GISCO.CNTR_2016 is available via the EC restricted download link.
To meet the demand for statistics at a local level, Eurostat maintains a system of Local Administrative Units (LAUs) compatible with NUTS. These LAUs are the building blocks of the NUTS, and comprise the municipalities and communes of the European Union. The LAUs are: - administrative for reasons such as the availability of data and policy implementation capacity; - a subdivision of the NUTS 3 regions covering the whole economic territory of the Member States; - appropriate for the implementation of local level typologies included in TERCET, namely the coastal area and DEGURBA classification. Since there are frequent changes to the LAUs, Eurostat publishes an updated list towards the end of each year. The LAUs are currently available from 2011 onwards. The NUTS regulation makes provision for EU Member States to send the lists of their LAUs to Eurostat. If available, Eurostat receives additionally basic administrative data by means of the annual LAU lists, namely total population and total area for each LAU.
The Nomenclature of Territorial Units for Statistics (NUTS) is a hierarchical classification of statistical regions and subdivides the EU economic territory into regions of four different levels (NUTS 0, 1, 2 and 3, moving respectively from larger to smaller territorial units). NUTS 1 is the most aggregated level. An additional Country level (NUTS 0) is also available for countries where the the nation at statistical level does not coincide with the administrative boundaries. For example Mt Athos in Greece and Mellum and Minsener Ogg in Germany. The NUTS classification has been officially established through Regulation (EC) No 2016/2066 of the European Parliament and of the Council and its amendments. A non-official NUTS-like classification has been defined for the EFTA countries and candidate countries. An introduction to the NUTS classification is available here: http://ec.europa.eu/eurostat/web/nuts/overview. The datasets are based on: EuroBoundaryMap (EBM) from EuroGeographics (scale of 1:100.000), Global Administrative Unit Layer (GAUL) country data from UN/FAO, data from the National Statistical Institute of Turkey (TurkStat) (might vary for different years). The different scale levels were derived by generalisation of the 100K scale. The public datasets are available under the Download link indicated below. Available scales are: 100k, 1M, 3M, 10M, 20M, 60M. Date of the NUTS regions are currently available for the years 2003, 2006, 2010, 2013, 2016 and 2021. The full datasets are available via the EC restricted download link. Here six scale ranges (100K, 1M, 3M, 10M and 20M, 60M) are available. Coverage is the economic territory of the EU, EFTA countries and candidate countries as in the respective year.
This dataset contains areas by degree of urbanisation (revised definition, 2018). The degree of urbanisation classifies local administrative units (LAU in Europe into three categories: thinly (rural), intermediate (towns and suburbs or small urban) and densely populated (cities or large urban) areas. The classification is based on a population distribution grid with raster cells of 1 sqkm size. Data are available for EU countries, Norway, Switzerland, Serbia and Iceland. The data is available at 1:100 000 resolution for internal commission users and a generalised 1:1 000 000 resolution is available to the public via the GISCO Dedicated Section on the Eurostat website. Classification based on method described in manual on Territitorial typologies in consulation with Member states.
When a natural disaster or disease outbreak occurs there is a rush to establish accurate health care location data that can be used to support people on the ground. This has been demonstrated by events such as the Haiti earthquake and the Ebola epidemic in West Africa. As a result valuable time is wasted establishing accurate and accessible baseline data. Healthsites.io establishes this data and the tools necessary to upload, manage and make the data easily accessible. Global scope The Global Healthsites Mapping Project is an initiative to create an online map of every health facility in the world and make the details of each location easily accessible. Open data collaboration Through collaborations with users, trusted partners and OpenStreetMap the Global Healthsites Mapping Project will capture and validate the location and contact details of every facility and make this data freely available under an Open Data License (ODBL). Accessible The Global Healthsites Mapping Project will make the data accessible over the Internet through an API and other formats such as GeoJSON, Shapefile, KML, CSV. Focus on health care location data The Global Healthsites Mapping Project's design philosophy is the long term curation and validation of health care location data. The healthsites.io map will enable users to discover what healthcare facilities exist at any global location and the associated services and resources.