Log In | Users | Register
Web of Christoph Schär's Group
spacer
Edit | Attach | New | Raw | Delete | History | Diff | Print | Pdf | | Tools
You are here: Hymet » EUCP-wiki

EUCP: CORDEX-FPS Convection

ETH Contribution to km-resolution Climate Simulations: The Full Datasets

Documentation done by Alicia Engelmann & Luna Lehmann

1. Introduction

CORDEX-FPS Convection is a so-called Flagship Pilot Study (FPS) of the Coordinated Regional Climate Downscaling Experiment (CORDEX). CORDEX addresses regional downscaling approaches by embedding regional climate models (RCMs) into general circulation models (GCMs). It aims to advance and coordinate this research field through global partnerships. To improve the regional downscaling, the FPS are introduced. Within these pilot studies, regions of high research interest are focused (like islands, convective systems, high mountain environments) and experimental setups are made. This aims to encourage smaller community initiatives to target key scientific questions on specific regional climate processes on sub-continental regions.

In this project FPS Convection is focused. This initiative targets convective phenomena over the Alpine and Mediterranean regions with focus on the Alps. Convective processes are of high interest in research due to their role in driving extreme events like heavy precipitation, floods, landslides and wind storms. Climate models working with parameterization of convection contribute to errors in climate simulations (Coppola et al. (2020)). Within this framework only convection permitting regional climate models (CP-RCM) are used for regional downscaling. The FPS Convection aims to provide a spatial resolution on km-scale and a high temporal resolution on hourly to monthly scale. This initiative is a shared project between 23 institutes, among them the group of Climate and Water Cycle of the Institute for Atmospheric and Climate Science at ETH Zurich (thereafter ETH) [1]. FPS Convection builds on the first initiative paper by Coppola et al. (2020). More than 20 institutes were setting up simulations for three test sites over the Alpine region. The first multi-model ensemble investigating convective events by using CP-RCMs was implemented. Within the study the simulation of convective events could be improved, especially for large-scale driven events. Coppola et al. (2020) gave a strong argument for investigating high impact convective processes by using an ensemble based approach.

CORDEX-FPS Convection is coordinated by the EUCP (European Climate Prediction system), a research project funded by EU-Horizon 2020, which aims to support climate scientists to provide reliable climate information at local to regional scales. Experiments of CORDEX-FPS Convection also contribute to EUCP projects. First EUCP papers were published by Ban et al. (2021) and Pichelli et al. (2021). Ban et al. (2021) evaluated precipitation simulated by multi-model CP-RCMs. The ensemble-approach allows to compare numerous varying simulations, based on different RCMs, run by different institutes, but investigating the same variables, periods and domains. The results of the study show that precipitation simulated at km scale (2-3 km) produces a more realistic representation of precipitation than the coarse resolution RCMs (12 km). Most improvement was found for heavy precipitation and precipitation frequency in the summer season. Pichelli et al. (2021) focused on historical and future simulations of precipitation. They provide an overview of the representation of precipitation characteristics and their projected changes over the greater Alpine domain. The “kilometer-scale ensemble is able to improve the representation of fine scale details of mean daily, wet-day/hour frequency, wet-day/hour intensity and heavy precipitation on a seasonal scale, reducing uncertainty over some regions”. Also in this study the representation of the summer diurnal cycle is improved, showing a more realistic onset and peak of convection. The results are encouraging towards the use of convection permitting model ensembles (like CORDEX-FPS Convection) to produce robust assessments of local impacts of future climate change. The collaboration of multiple institutions to FPS Convection provides a multi-model ensemble with standardized and well-known structured datasets.

This documentation explains the contribution of ETH to CORDEX-FPS Convection. It contains information about the data preparation according to CORDEX requirements (post-processing, output rewriting, data publishing) and gives an overview of the simulation set up, as well as details about variables, spatial and temporal resolutions, missing data, etc. The ETH contribution consists of a set of 20 experiments that focus on present-day and future conditions over three regions: The Alps, the European continent and the Canary islands and Madeira (thereafter Macaronesia). Historical, evaluation and future time series are covered with a period of 10 years each. Within each domain 81 variables are provided at certain time resolutions (1 hourly, 3 hourly, 6 hourly, daily, monthly). [2]

For each region, climate simulations were carried out using the Regional Climate Model (RCM) COSMO-crCLIM-v1-1 (hereafter COSMO-crCLIM), which is the convection resolving climate version of the state-of-the-art weather prediction COSMO (Consortium for Small Scale Modeling) model, running on GPU (Leutwyler et al. (2016)). The COSMO-crCLIM model dynamically downscales climate information from a GCM (MPI-ESM-LR) or a reanalysis dataset (ERA-Interim). ERA-Interim is a global atmospheric reanalysis by ECMWF covering the time from 1979 to 2019. Its horizontal resolution equals 0.75° combined with 60 vertical pressure levels (Dee et al. (2011)). The Max Planck Institute Earth system model (MPI-ESM, hereafter MPI) combines coupled general circulation models for atmosphere and ocean with subsystem models for land and vegetation. This enables the model to represent the carbon cycle within the model system and thus to reflect influences of the future carbon concentrations like the RCP8.5. In our setup MPI was applied in its low resolution (LR) configuration, like it is also used for the CMIP5. The spatial resolution for the atmosphere(ocean) is 1.9°(1.5°) with 47(40) pressure levels (Giorgetta et al. (2013)). Within the dynamical downscaling high-resolution regional models are forced laterally or internally by coarse-resolution simulations/analyses/forecasts. This allows to rely on explicit representations of physical principles (like the laws of thermodynamics and fluid mechanics) compared to the statistical downscaling approach (Giorgi and Gutowski (2015), Hong and Kanamitsu (2014)). In the ETH setup the Pseudo Global Warming (PGW) approach is used by adding the global warming delta from MPI simulations to the ERA-Interim reanalysis data [3]. For representing climate change the RCP8.5 was considered. The climate simulations consist of two one-way nested domains. The outer domain is at 12 km spatial resolution and covers a greater area to capture the synoptic-scale features. The inner domain is at 2.2 km spatial resolution for the Alpine and European domain and 1.1 km for the Macaronesian domain. Through the downscaling approach we remain with five different domains. EUR-11, ALP-3, REU-3, CAN-11 and CAN-1. An overview of the setups is shown in the following.

Model Setup

Table 1: In sum 20 simulations were run. However, the final data set only covers 19 simulations. Simulation 2 is replaced by simulation 12 (see text for more information). For simplification, the simulations are from now on called by their numbers found in the table. All future scenarios were carried out under RCP8.5. Simulations 11-12 and 15-16 were carried out by Nikolina Ban. All other simulations were run by Jesús Vergara-Temprado. All simulations were run in 2019/20. The European and Alpine experiments use the same 12-km domain.

Nr. Domain Spatial resolution [km] Period Experiment name
ALPINE SIMULATIONS DRIVEN BY ERA-INTERIM - PGW
1 ALP-3 2.2 2000-2009 Evaluation
2 EUR-11 12 1999-2008 Evaluation
3 ALP-3 2.2 2092-2101 PGW
4 EUR-11 12 2092-2101 PGW
ALPINE SIMULATIONS DRIVEN BY MPI
5 ALP-3 2.2 1996-2005 historical
6 EUR-11 12 1996-2005 historical
7 ALP-3 2.2 2041-2050 Near future
8 EUR-11 12 2041-2050 Near future
9 ALP-3 2.2 2090-2099 Far future
10 EUR-11 12 2090-2099 Far future
EUROPEAN SCALE SIMULATIONS DRIVEN BY ERA-INTERIM - PGW
11 REU-3 2.2 2000-2009 Evaluation
12 EUR-11 12 2000-2009 Evaluation
13 REU-3 2.2 2044-2053 Near future (with PGW)
14 EUR-11 12 2044-2053 Near future (with PGW)
15 REU-3 2.2 2080-2089 Far future (with PGW)
16 EUR-11 12 2080-2089 Far future (with PGW)
MACARONESIAN DOMAIN (CANARIES AND MADEIRA) DRIVEN BY ERA-INTERIM - PGW
17 CAN-1 1.1 2006-2015 Evaluation
18 CAN-11 12 2006-2015 Evaluation
19 CAN-1 1.1 2106-2115 PWG
20 CAN-11 12 2106-2115 PWG

Note that the European-scale simulations (EUR-11 and REU-3 driven by ERA-Interim and PGW) have different settings, specifically parametrization and tuning parameters between Jesús’ and Nikolina’s simulation runs. According to the experimental setups experiment 2 and 12 are equal (same domain, resolution and basically same time period). Therefore only experiment 12 is used afterwards and replaces simulation 2. The decision is based on the amount of missing files and the unintended time frame shift by one year in simulation 2. Consequently, care has to be taken when investigating "simulation 2", as simulation 1 is not directly derived from simulation 12 by dynamical downscaling (more details are found in section 3. "Challenges and Issues/ Simulation 2 is replaced by Simulation 12"). Simulation 12 was run by Nikolina and simulation 2 by Jesús, so the results are to be interpreted with caution, as differences appear. The name lists of Jesús’ and Nikolina’s simulations can be found in section “12.3 Namelist Files”.

All raw data of these 20 simulations are required to be provided according to the common CORDEX-FPS Convection framework. A standardized framework is of high importance, because the FPS Convection is a collaboration between multiple institutions. The data preparation is treated in multiple steps, that will be explained afterwards: 1. post-processing, 2. Climate Model Output Rewriting (thereafter CMOR or CMORizing), 3. quality check and 4. publishing on the ESGF node.

The two main steps 1. and 2. were done with the CMOR-tool on the Swiss Supercomputing Centre (CSCS). The tool works with NCO (netCDF Operators) and CDO (Climate Data Operators) software, as well as with the python packages datetime, netCDF4 (version 1.4) [4], cftime and numpy. It was developed and improved at ETH and can be found on the GitLab of ETH under the following link: https://gitlab.ethz.ch/hymet/CCLM2CMOR_hymet. With some exceptions mentioned below, all of the work for step 1 and 2 was done using nodes of the GPU partition of CSCS. Step 3. was done on the Jülich Supercomputing Centre (JSC) server.

2. Post-processing

The raw data are placed on /store/c2sm/pr04 on CSCS and are linked to the workspace $SCRATCH. To eliminate the relaxation zone of the raw data, a certain amount of grid points at each side of the domain are subtracted in each file. The amount of grid points are different in each domain (see table 2).

Table 2: Overview over the five domains, their number of grid points, cut off, relaxation zone and the resulting domain size. t is differing within the domain size according to time final resolution (1hr: t=8760, 3hr: t=2928; 6hr: t=1464; day: t= 365; mon: t=12)

Domain Domain size/grid points (h x w x v) Cut off Relaxation zone [km] Final domain size (h x w x t)
ALP-3 801 x 801 x 60 25 grid points (in total 50 points removed in each direction) 170 751 x 751 x t
EUR-11 361 x 361 x 60 13 (26) 170 335 x 335 x t
REU-3 1542 x 1542 x 60 25 (50) Not applicable 1492 x 1492 x t
CAN-1 1000 x 1000 x 60 100 (200) not given 800 x 800 x t
CAN-11 165 x 165 x 60 13 (26) 170 139 x 139 x t

The variables (raw data) are processed at certain temporal frequencies. Variables that share the same frequency are included in one netCDF file. For example the geopotential height and relative humidity are given at 6 hourly resolution and thus are found in one file for the specific time step. Next to numerous climate variables, two constant variables are provided per simulation, which do not change with time: orography and land fraction.

In the following the files are split into its single variables and located in individual folders to fulfill the requirements of the second step (CMORize). The script “master_post.sh” proceeds in two parts: (i) Addressing the script “first.sh”, the directory /outputpost is created and the variables are split into monthly files and put into monthly directories (YYYY_MM directories in /outputpost, see figure 1). (ii) The script “second.sh” then concatenates the variables into yearly files and puts them into one directory for each variable (folder /input_CMORlight). A documentation of the running process is provided under the folder /logs/shell with .log-files, whereas .err-files reflect incorrect processes.

Before running the “master_post.sh” script, initial information needs to be set up. It is defined which variables are processed, how big the cut-off of the boundary lines is and what the paths to the raw and output data are (“settings.sh”). The file “timeseries.sh” defines which variables get processed and in which directory the raw input data are found (e.g. 1h, 1h_second, 24h). The variables wind, geopotential height, specific and relative humidity, air temperature and upward air velocity are additionally defined for six pressure levels p (200, 500, 700, 850, 925, 1000 hPa). Furthermore, northward and eastward winds are reflected at the 100 m height level z. A variable is processed on a pressure or height level when applying the command timeseriesp or timeseriesz (or timeseries for variables that are not defined at one of the mentioned levels). If a line is commented out in “timeseries.sh”, the respective variable will not be processed by the CMOR tool. This means that not all variables are calculated in every simulation. For example, the snowfall flux is only available for the simulations 11,12,15 and 16.

After setting everything up, we started the post-processing by calling “sbatch master_post.sh -g GCM -x EXP -s YYYY -e YYYY” from the terminal, where -g names the GCM of the simulation, -x refers to the experiment (e.g. evaluation or ff-RCP85), and -s (-e) refers to the start (end) year of the processing.

run_master_post_new.png

Figure 1: Depicts the folder structure and the directories that get created by running the post-processing step.

Challenges and Issues during the Post-processing

The post-processing ran with some issues and challenges. The main issues are mentioned in the following.

Missing Files

To replace a corrupted or missing file we performed the following steps. First the file before or after the faulty file was copied. This file was then transferred into a .cdl file, by using the following command:

ncdump file.nc > file.cdl

Manually, in the .cdl file, the time variable and time_bnds variable were modified according to the date of the missing/corrupted file. To change the time in the format ‘seconds since xxxx’, 3600 seconds were added or subtracted to increase or decrease the time by 1hr (depending if the file before or after the missing file was copied). Other time resolutions were treated accordingly. Additionally the name at the top of the .cdl-file was changed into the correct name (according to the missing time step). Then a netCDF file with the correct name was created with the following command:

ncgen -o file.nc file.cdl

Lastly, the .cdl-file was removed with

rm file.cdl

The replaced files have been kept track of in the following document (Sheet ‘Missing Data): Overview Corrupted Data

Precipitation Handling

The model outputs files with precipitation information in six minute intervals. Some of these files are available only in daily format, where the original 6min-data has been concatenated into daily files. These files need to be split up again and re-concatenated into hourly files for further processing by the CMOR tool. Because this process is very time-intensive, especially for very high resolution simulations, the processing is done on a monthly basis, so that each month can be done in parallel via sbatch on Piz Daint.

The script that splits the 24h files into 6min files is called split_up_monthly.sh and it can be found in CCLM2CMOR_hymet/precipitation (GitLab ETHZ). The matching file to concatenate the data into hourly data again is found under pp_TOT_PREC.py. To run it, the python packages datetime, matplotlib and numpy need to be installed. This script has been adapted from a script graciously provided by Ruoyi Cui.

Wind Correction from Staggered to Mass Grid

The raw output files of the simulation runs depict the following wind variables on the staggered grid (“luvmasspoint=FALSE”): U200p-U1000p, U100z, V200p-V1000p, V100z (except U_10M and V_10M). U and V reflect the eastward and the northward winds, 200p-1000p the six pressure levels and 100z the height level the winds refer to. U_10M and V_10M are the east- and northward near-surface winds. The CORDEX requirements expect the wind variables to be reflected on the mass grid. Thus, the variables are transferred to the mass grid. The applied script can be found here CCLM2CMOR_hymet/winds (GitLab ETHZ).

The modified data are then used as input data for the post-processing. Note that this modified data has to replace the old input data for the wind variables to be processed correctly. To avoid any confusion, we post-processed the modified wind variables separated from the other variables.

Memory Issues during 2nd Step of Post-processing

This issue only applies to the very memory-intensive simulations, (REU-3 and CAN-1). The workload of the GPU is restricted to 60 GB, which was insufficient to process all the variables of these two domains. Thus, we asked CSCS for access to the multicore nodes, which can process data up to 120 GB. Due to the limitation of the multicore node hours, we focused on processing only the variables of the CAN-1 and REU-3 domains. For the variables ASWD_S, RUNOFF_T, TQW (rsds, mrro and clwvi after CMORizing, respectively) [5], which had a size of 80 GB per file, this was a necessary decision. However, these three mentioned variables (hereafter memory variables) still ran out of memory during the post-processing.

To avoid these memory issues we proceeded differently for the memory variables. First, we modified the call for master_post.sh by adding “--first” to the end. This means that only the script “first.sh” gets carried out. To prevent the memory error, we processed all other variables separately from the three memory variables. In the script “timeseries.sh” we commented the memory variables out and then called “sbatch master_post.sh … --second” to process all other variables with the script “second.sh”. As soon as this is done, all other variables get commented out in “timeseries.sh” and the memory variables are processed. For this they need to be added to “timeseries.sh” again. Also, we replaced the script “second.sh” by the modified script “second_memory.sh”. To avoid memory issues for the memory variables that are calculated during the second step of the post-processing routine, the script “second.sh” was rewritten to deal with the memory variables separately on a monthly basis instead of yearly. This script avoids the “out-of-memory” error. However, running this script takes a long time (sometimes more than 12 hours per variable). Therefore, we advise to separately start each variable in a new batch job to make sure the job does not run out of time. To process the memory variables individually, two of the functions in line 164, 167 and 170 were commented out and only one of the three was kept for each run.

Storage Capacity

Post-processing all data of 20 simulations creates a huge amount of data of approximately 500 TB in sum. Half of the data (250 TB) are created within the first post-processing part (found in /outputpost), the other half within the second part (found in /input_CMORlight). The data of the first and second part contain the same information, but they are differently structured (important for CMORizing). Thus the data double the amount of disk space to 500 TB. We decided to keep all the data, in case of incorrect operations during the post-processing. Especially the first part took quite some time and effort. As we were reminded by CSCS about the the rules and guidelines to not save all the data on $SCRATCH, we moved part of the data to $PROJECT (limited storage space) to save disk space on $SCRATCH. We decided to delete the post-processed data as soon as part 3, the CMORizing, and part 4, data passing the quality check, took place successfully.

Scratch Filesystem Quota

The filesystem on Piz Daint has a quota of a maximum of one million files for each user, that they are able to have on their personal $SCRATCH at any given time. For instance, the precipitation 6min-data sum up to about (10 years * 365 days * 24h * 10 files/hour) = 876000 files (more with leap years). Note that soft-link files also contribute to the file number quota, even though they do not influence the storage amount quota. This needs to be kept in mind in the next step, i.e., CMOR/input directory. We regularly exceeded the $SCRATCH quota since we did most of the parts of 2. post-processing and 3. CMORizing in parallel for multiple simulations. This limited us from doing any more work until the file limit was under the quota again, resulting in a slowdown in work. Therefore, we requested an additional file quota from CSCS for the post-processing and for the preparation of precipitation, which was granted to three million files. This permitted us to perform more work and therefore sped up the process.

3. CMORize (Climate Model Output Rewriting)

The CMORizing is the second main part of the tool. It transforms the data to meet the official CORDEX-FPS Convection standard [6]. This standard sets all data of the project into a common format, which is demanded in an internationally shared project to assure uniformity of the data. That does not only refer to the netCDF format, but also to the metadata such as global and variable attributes, the file names and the directory structure. Within the strict requirements of CORDEX, all data can be easily differentiated, compared and used for future research studies. Further, the end users of the data have a standardized data source, without preparing all data by themself before working with them.

To prepare the first step of the CMORizing, we adapted initial files for the tool. We had to provide separated coordinate-files. In case a file missed its coordinates, they could be added by the information of the given coordinate file. Every domain needed one coordinate file. In our case we had to prepare five files, for EUR-11, ALP-3, REU-3, CAN-11 and CAN-2. Secondly, a CSV list contains information about all variables and how they will be processed (see FPS_Conv_CMOR_COSMO-crCLIM_variables_table_final_updated.csv in GitLab ETHZ). The list encompassed information how variables are renamed [7], at which time resolution(s) variables will be provided, if variables are accumulated in time or are instantaneous and if the variable is calculated according to their minimum, mean or maximum. Further, it is defined if wind variables get derotated. The list also shows which variables are derived from other variables. In our case four variables are calculated as followed:

Table 3: Calculation of additional variables

Calculation of additional variables
Variable names before CMORizing (as they will be found in the scripts)
ASWD_S = ASOB_S + ASWDIFU_S
RUNOFF_T = RUNOFF_G + RUNOFF_S
TQW = TQC + TQI
FR_SNOW = Max(0.01,Min(1.,W_SNOW/0.015))*H(x)

Note that we did not keep all of the variables that were used to derive other ones, e.g., ASOB_S, RUNOFF_G or TQC, since they were not required for the final dataset. For example, ASOB_S was only needed to calculate ASWD_S (the Surface Downwelling Shortwave Radiation “rsds”). A more thorough explanation of the CSV table can be found here: CCLM2CMOR/Documentation/Explaining_CSV_table.pdf (GitLab ETHZ).

We also adapted the “ini-file” (short for “control_cmor.ini”). In this file, one sets up the naming of the data, the directory structure, and the global and variable attributes. Also, it defines all information about the experiment, the domain, the model setup, the nesting level and the institution. For example, for simulations run by ETH the folder structure, the metadata and the name of the file will include the name “CLMcom-ETH”, which is our institution ID within this project.

The process of CMOR is started by running “master_cmor.sh”. Figure 2 reflects the directories that will be created during this process. The extensive folder structure on the right of the figure reflects the elaborated organization according to the CORDEX standard. The directories are chosen to differentiate and select between the various simulations. First the project of interest can be chosen (here CORDEX-FPS Convection) by selecting the project ID. Followed by that, the domain, the institute that ran the simulations, the general circulation model and the experiment can be picked. Also the driving model ensemble member and the model ID are given. Within the driving model ensemble member it is differentiated between r0i0p0 and r1i1p1, whereas the first one leads to all constant variables and the latter one to all climate variables. Finally, the requested temporal resolution and the variable can be selected. This sophisticated structure provides unambiguous information about every single file, which is also reflected in the filename.

Addressing “sbatch master_cmor.sh -g GCM_input -x EXP_input -G GCM_CMOR -X EXP_CMOR -v/--all -s startdate -e enddate -M 24” runs the final script. Note that -g and -x recall the given GCM and experiment names from part 1. post-processing. This information is necessary for the tool to find the input data. Besides that the additional arguments -G and -X describe the name of the final outputs in accordance with the CORDEX standard. In case -G and -X are not set in the script “settings.sh” (GCM = ... and EXP = ...) it has to be manually written in the terminal line. The argument -v addresses the variables to be CMORized.

Replacing it by --all, all variables will be processed automatically. With -M multiprocessing is used by specifying the number of available cores. CMORizing is mainly based on the script “cmorlight.py”. In this script all directories are created, files renamed and each variable prepared at the required resolution(s). Also, global & variable attributes and time bounds are added. The files are processed yearly.

run_master_cmor.png

Figure 2: Schematic depiction of the folder structure after the CMORizing. The final directory structure is created, including the correct labeling and the creation of the needed time resolutions. The left side shows the directory structure of the CMOR tool, including where the logging files of the tool are placed (.out- and .err-files) and the right side shows an overview of the directory structure that is created by the CMOR tool and used to store the variables after CMORization is finished.

Challenges and Issues

Simulation 2 is replaced by Simulation 12

Due to applying the CORDEX standard and thus the uniform labeling of the files, the simulations 2 and 12 would have the same names and directories. They were basically run under the same experimental boundary conditions, however both simulations are not entirely equal (simulation 2 run under Jesús' setup and simulation 12 run under Nikolina's setup). We shortly addressed this topic in section 1. "Introduction/ Model Setup". By replacing simulation 2 with simulation 12, many limitations such as the lack of data in the simulation 2, particularly in the precipitation information, were reduced. When interpreting simulation 2, keep in mind that it is actually simulation 12. Consequently, we only proceeded with 19 instead of 20 simulations from here on.

Memory Error for Variables of REU-3 Domain

After solving the memory issue in the post-processing, the variables RUNOFF_T and RUNOFF_S (mrro and mrros in CORDEX format, respectively) caused memory errors for the REU-3 and CAN-1 domain duringe CMORizing. Even with using the multicore partition these two variables could not be processed. Thus, we had to process the variables monthly instead of yearly. The monthly files were then merged back to yearly files. The monthly processed CMORization is explained in detail in the following.

The CMOR tool has a function that the variables get CMORized on a monthly basis instead of a yearly basis. If the user does not want to CMORize all months from January until December, the start and the end month can be defined individually. However this function comes with some restrictions. By selecting the start (end) month, the respective end (start) month is automatically set to December (January). Changing both, start and end month, manually the tool automatically tries to process the whole year from January to December. To rewrite this bug without having major changes in the code, is to manually assign the addressed months in the script “cmorlight.py” in line 125 with the variable “firstlast=[start,end]”, after the if-else-statement. Start and end refer to the months in their numbered form, e.g. 1 for January, 2 for February, etc. This means that the variables can be CMORized in chunks of two to three months at a time, which in turn does not cause a memory error. A file will be created that contains only the data of the selected months. While it is possible to send out multiple jobs for all years for the calculation of Jan-Feb for instance, then the job has to finish first before the “firstlast” variable gets modified again to calculate the next chunk, to prevent the script from failing to run through. For example, first the months Jan-Feb are CMORized for all the years of the simulation. After that is finished (!), the months Mar-Apr get CMORized. The calculation worked well for Jan-Feb, Mar-Apr, May-Jun, Jul-Aug, Sept-Oct and Nov-Dec. When all processes are finished, the two-monthly files are found in the /temp folder. The files have then to be merged by time into a yearly file.

cdo mergetime *year* outfile

Finally, the resulting files need to be compressed as merging decompresses the files.

nccopy -d1 ifile ofile

Corrupted Reference Times

Some files were corrupted in between a year, particularly their reference times, which caused issues during CMORizing because the timeseries didn’t fit together. Thus these post-processed data had to be deleted and renewed. First the time resolution had to be adapted into the correct unit e.g. from hours into days.

cdo settunits,seconds infile.nc outfile.nc

Second the reference time was updated to the correct day, e.g. from “1999-02-01T00:00:00” to “1946-01-01T00:00:00”:

cdo -setreftime,'1946-01-01','00:00:00'

Unfortunately the command had been updated a few years ago so that the new reference time would not be written as “1946-01-01T00:00:00” but as “1946-1-1 00:00:00”, which prevented a successful post-processing. So the reference time had to be manually overwritten. Manually overwriting is only reasonable/appropriate here, since the reference time was already adapted to the correct time reference. Without the steps before, the manual overwriting would adapt the reference time but would change the timestamp of the edited file wrongly. This is because the timestamp is counted in seconds referring to the reference time (reference time marks starting point at 0 seconds). By changing the reference time without the proper NCO-tool but by manually overwriting, the counting of timestamp will not be considered correctly and is automatically set to the same date as the reference time.Thus it was important to keep the correct order of the commands on this issue.

ncatted -a 'units,time,o,c,seconds since 1946-01-01T00:00:00' infile.nc outfile.nc

The commands were looped over all affected files. After this the corrected files could be successfully post-processed and CMORized.

4. Quality Check

The Quality Assurance Tool (“QA-Checker”) is available at the Supercomputer Centre at the Forschungszentrum Jülich in Germany (JSC). Therefore it is necessary to get access to the JSC [8]. This QA-Checker has been adapted from the widely used CORDEX QA-Checker by Andreas Dobler (Norwegian Meteorological Institute) to meet the CORDEX-FPS Convention structure . The QA-Checker is used by all involved institutions in the FPS Convection project and aims to assure that everyone followed the CORDEX standard to create a uniform dataset. The QA-Checker verifies for example: data structure, time gaps, coordinate values, suspicious min/max outliers. It is designed to detect irregularities in the data even across files and it will write warning or error messages in case of deviations. Error messages have to be treated, whereas warnings might be ignored. However, this depends on the individual warning, we discuss this further in the section 4 “Challenges and Issues/Examples of warning/error messages by the QA-Checker” below.

After creating an account on Jülich and uploading the data to the Jülich Server (for example via the “rsync” command), the QA-Checker can be installed via the provided github repository (QA-Checker Repository) [9]. The readme file in the repository provides a good overview of the first steps needed to run the QA-Checker. We will also give a brief outline here. Following the installation, the QA-checker task file “qa-test.task” can be set up. The most important things to adjust are:

1. The path to the CMORized netCDF files to be checked, on jsc-cordex (replace $USER and $FPSCONV_ROOT accordingly); the checker goes through directory hierarchy recursively, for example:

PROJECT_DATA=$FPSCONV_ROOT/CORDEX-FPSCONV/output/ALP-3/FZJ-IDL/SMHI-EC-Earth/rcp85

2. The results of the check, this is highly structured output:

QA_RESULTS=$FPSCONV_ROOT/tmp/$USER/QA/results/simALP3_010122

Note that this result output directory has to differ for each new check that is done! Otherwise, running the checker leads to a segmentation fault. We strongly recommend to name the simulation, the checked variables (e.g. all or pr) and perhaps the date of the QA-Check run in the directory name, to later differentiate between check runs.

3. Which variables and which time interval to check. There are more options possible. Everything the QA-checker recursively finds under $PROJECT_DATA can be adjusted as shown here with some examples:

SELECT .*/1hr/* # check all variables at 1hr resolution
SELECT .*/* # check all variables
SELECT .*/1hr/cll # check all files of the cll-variable at 1hr resolution

To run the checker the following command is executed in the qa-dkrz-check directory:

./$FPSCONV_ROOT/Software/adobler/git/QA-DKRZ_FPSCONV/scripts/qa-dkrz -f qa-test.task

Custom QA-tables

The official tables did not include every variable that needed to be tested. The solution is to add those additional variables in to the custom tables, specifically in the file /home/$USER/qa-dkrz-check/QA_Tables/tables/projects/CORDEX-FPSCONV/CORDEX_variables_requirement_table.csv. We added the variables clh, cll, clm, hurs, rlut, snw, hurs1000, hurs200, hurs500, hurs700, hurs850 and hurs925 based on the example cape. Our modified table can be found here. Equally the tables did not include all our domains that needed to be checked, so we added them to the file “CORDEX_DRS_CV.csv”.

Checking all variables at the same time might take a very long time. To ensure that the QA-Checker is not interrupted use “screen”. This is a linux package that allows processes to continue running when their window is not visible, even if the window is closed entirely or the connection is otherwise lost. This is also helpful to later check how many files were processed in the check.

Challenges and Issues

Data Transfer from CSCS to JSC

Since the size of the files often amounts to multiple GB, most simulations sum up to many TB. These data have to be transferred from CSCS to JSC, which is only possible because both supercomputers are capable of transferring and receiving such an amount of data. However, it still takes a long time. In cases of the ALP-3 or CAN-1 domains the transfer was interrupted multiple times and had to be restarted again. We recommend using the rsync command. Also nohup and screen could not help to transfer the whole domain at once to the Jülich server. So far there is not any better option than rsync for the complete data transfer. Note that the transfer is only possible when started from the Jülich server. Trying to copy data from the CSCS terminal is blocked by the JSC.

Transfer of Simulation 11 to Julich took approximately 4 days, with several restarts of rsync in between. At this point all of the REU-3 Simulations have been uploaded to Julich.

Storage Capacity on JSC

Due to the limited storage capacity assigned to the CORDEX-FPS Convection project on the Jülich server, it would be convenient to check the simulations locally on Piz Daint. The REU-3 simulations require around 20 TB of disk space each, i.e., circa 60 TB in total. In order to check those simulations locally on CSCS instead of uploading them to the Jülich server, the Quality Checker was tried to be installed on Piz Daint with help of C2SM (contact: Michael Jähn). The setup of the Quality Checker was not fully compatible with the structure of Piz Daint. It required python2 and a specific version of the Gnu Compiler Collection, all of which were not on Piz Daint at that time. The environment, like libraries, for running the Quality Checker on Piz Daint were developed. The overall goal of installing the Quality Checker on CSCS is to not only use it for FPS Convection but also for further CORDEX initiatives, which will work with a related Quality Checker version and are based on the general CORDEX QA-Checker. However, all attempts of successfully running the QA-Checker after installing failed so far because of an unresolved “segmentation fault” error.

netCDF Format: netCDF4 classic vs. netCDF4

The CORDEX standard required submitting all data in netCDF4 classic format. Our final data were all in the netCDF4 format. netCDF4 is the newest version and has “new features such as groups, multiple unlimited dimensions, and new types, including user-defined types”[10]. The thought of keeping the classic format was the backward-compatibility to older netDCF versions like netCDF3. However, since the costs of the conversion into the classic model would exceed the benefit (e.g HDF5 compatibility), we internally decided to keep the netCDF4 model, which is already common in today's research, widely used by the scientific community and not likely to cause problems to the end user.

Examples of Warning/Error Messages by the QA-Checker (and Possibilities to solve them)

  • (i)
    • "Unmatched CORDEX boundaries for the Table 1 domain."
    • "CORDEX domain Table 1: Dimension does not match the grid definition."
    • "CORDEX domain Table 1: Dimension does not match the grid definition."
    • "Rotated N.Pole of CORDEX domain Table 1 does not match."
  • Error/Warning because domain is missing in csv table “CORDEX_DRS_CV.csv”
  • Solution: Add domain if it is not already given (in Table 1 of CSV table “CORDEX_DRS_CV.csv”)

  • (ii)
    • "CORDEX requires netCDF4 classic deflated (compressed), found netCDF4, deflated."
  • Error/Warning because files are not in netCDF4 classic format. The classic format is requested in case users want to convert in netCDF3 format etc.
  • Solution: Convert netCDF format into netCDF4 classic, e.g.: nccopy -d 1 -k 4 -s infile.nc outfile.nc. Note: Our data will be found in netCDF4 version. See discussion above.

  • (iii)
    • "Resolution of CORDEX domain=Table 1 does not match." * "Domain does not match Table 2 of the CORDEX Archive Design."
  • Error/Warning because domain of CSV table “CORDEX_DRS_CV.csv” does not correspond in its listed settings of table1 (coordinate data) to the characteristics of the file.
  • Solution: The table 1 has to be adapted in the CSV file according to the checked file data, implying that the coordinate data of the file are correct.

  • (iv)
    • "Gap between time bounds ranges across files."
    • "Gap between time values across files."
  • Error/Warning: A time gap in the time series between two (or more files) is found. Here it needs to be checked if the data are incorrect and lacking in information.
  • Solution: one directory contained sometimes two or three simulations, as experiments were run at different periods. The error evolved from the gap between a near future event (2044-2053), a far future event (2080-2089) and a third future event (2092-2101). Thus the warning can be ignored in our case. However the QA-tool will interrupt after this error message and will not run through all files. To prevent this the following command can be added to the end of the task file “qa-test.task”: note={6_13,L1}

  • (v)
    • "Pressure level value=100m in the filename is inappropriate."
    • "Auxiliary variable is missing."
  • Error/Warning: The filename is inappropriate because it refers to the variables ua100m and va100m. The wind variables in the tool are checked regarding pressure levels at 200 500 700 850 925 or 1000 hPa. The winds ua100m and va100m are simulated regarding altitude in meters (100 m). This is not considered in the tool yet but might be added in a future version.
  • Solution: Ignore the warning, since it is just because of the missing capability of the QA-Checker right now.

5. Data Publishing

The data will be published on the ESGF node. Data from all institutes will be uploaded there. Thus the user can find different simulations, run with different models etc. For accessing the data see useful information here↗.

6. Contact for CORDEX Simulations at ETH

Official contact for CORDEX at ETH = cordex-eth@env.ethz.ch

7. Contribution Statement

Alicia Engelmann and Luna Lehmann equally carried out the preprocessing, CMORization and Quality Checker. Marie-Estelle Demory supported technical issues. Alicia Engelmann and Luna Lehmann prepared the first version of the documentation. Alicia Engelmann, Luna Lehmann and Patricio Velasquez further extended the documentation. This was overall supervised by Patricio Velasquez and Christoph Schär.

8. Acknowledgment

Special thanks to Jesús Vergara-Temprado and Nikolina Ban for setting everything up, running the simulations and providing all raw data, which sum up to multiple of TB. Thanks to Ruoyi Cui as she provided the main script and the environment to convert the wind variables from the staggered to the mass grid. Further, she provided the original script to rearrange the precipitation 6min-data into hourly files.

Thanks to Michael Jähn (C2SM) for the technical support, concerning the installing approach of the QA-Checker on Piz Daint (CSCS).

Thanks to Andreas Dobler and Klaus Görgen for technical and administrative support regarding the QA-Checker on JSC.

Thanks to Christan Zeman for reviewing this documentation and for the constructive input.

9. Footnotes

[1] For an overview of the contributions of other institutions see: https://hymex.org/cordexfps-convection/wiki/doku.php?id=modellist (last access: 07.02.2023)

[2] An overview of all variables (including missing data) is found in here; or in section “12.2. Missing Data”.

[3] For more information see: Schär et al. (1996) and Brogli et al. (2023)

[4] Note that the netCDF4 module version has to be 1.4 as there were issues running the script with a newer version of the module

[5] ASWD_S = rsds = Surface Downwelling Shortwave Radiation, RUNOFF_T = mrro = Total Runoff, TQW = clwvi = Condensed Water Path, snc = Snow Area Fraction; For more information about the variable names see: Naming Of Variables

[6] Find more information of the CORDEX requirements regarding the FPS Convection project here: https://www.hymex.org/cordexfps-convection/wiki/doku.php?id=protocol (last access: 20.02.2023)

[7] Note: the variable names of the raw output and the post-processing differ from the variable names after the CMORizing

[8] Contact person: Klaus Görgen; general information: https://www.fz-juelich.de/en/ias/jsc

[9] Note that the QA-Checker for FPS Convection might have been moved to an official CORDEX repository in the meantime

[10] https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html (last check: 02.02.2023)

10. References

Ban et al., 2021, Climate Dynamics, The first multi-model ensemble of regional climate simulations at kilometer-scale resolution, part I: evaluation of precipitation, doi: 10.1007/s00382-021-05708-w

Brogli et al., 2023, Geoscientific Model Development Discussions, The pseudo-global-warming (PGW) approach: methodology, software package PGW4ERA5 v1. 1, validation and sensitivity analyses, doi: 10.5194/gmd-16-907-2023

Coppola et al., 2020, Climate Dynamics, A first-of-its-kind multi-model convection permitting ensemble for investigating convective phenomena over Europe and the Mediterranean, doi: 10.1007/s00382-018-4521-8

Dee et al., 2011, Quarterly Journal of the Royal Meteorological Society, The ERA-Interim reanalysis: configuration and performance of the data assimilation system, doi: 10.1002/qj.828

Giorgetta et al., 2013, Journal of Advances in Modeling Earth Systems, Climate and carbon cycle changes from 1850 to 2100 in MPI‐ESM simulations for the Coupled Model Intercomparison Project phase 5, doi: 10.1002/jame.20038

Giorgi and Gutowski, 2015, The Annual Review of Environment and Resources, Regional Dynamical Downscaling and the CORDEX Initiative, doi: 10.1146/annurev-environ-102014-021217

Hong and Kanamitsu, 2014, Asia-Pacific Journal of Atmospheric Sciences, Dynamical Downscaling: Fundamental Issues from an NWP Point of View and Recommendations, doi: 10.1007/s13143-014-0029-2

Leutwyler et al., 2016, Geoscientific Model Development, Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19, doi: 10.5194/gmd-9-3393-2016

Pichelli et al., 2021, Climate Dynamics, The first multi-model ensemble of regional climate simulations at kilometer-scale resolution part 2: historical and future simulations of precipitation, doi: 10.1007/s00382-021-05657-4

Schär et al., 1996, Geophysical Research Letters, Surrogate climate‐change scenarios for regional climate models, 23(6), 669-672., doi: 10.1029/96GL00265

11. List of Abbreviations

CDO Climate Data Operators
CMIP5 Coupled Model Intercomparison Project (phase 5)
CMOR Climate Model Output Rewriting
CORDEX Coordinated Regional Climate Downscaling Experiment
COSMO Consortium for Small-scale Modeling
CP-RCM Convection Permitting Regional Climate Model
CSCS Swiss National Supercomputing Centre
ECMWF European Centre for Medium-Range Weather Forecasts
EUCP European Climate Prediction System
ESGF Earth System Grid Federation
FPS Flagship Pilot Studies
GCM General Circulation Model
JSC Jülich Supercomputing Centre
LR Low Resolution
MPI-ESM Max Planck Institute Earth system model
NCO netCDF Operators
PGW Pseudo Global Warming
RCM Regional Climate Model
RCP8.5 Representative Concentration Pathway 8.5 W/m2

12. Supplementary Information

12.1. Slides to the presentation of the 06.02.2023 (Group Meeting Hymet)

DateSorted descending Speaker Title File
06 Feb 2023 Alicia Engelmann and Luna Lehmann ETH Contribution to km-resolution Climate Simulations: The Full Datasets - CORDEX-FPS Convection PDF

12.2. Missing Data

To get an overview over all missing data see Table Missing Data (Sheets: “Missing Data” and/or “Missing Data - All Vars”)

12.3. Namelist Files

Simulation 1:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/GA_fine_ERA/4_lm_f/YUSPECIF
  • txt file: SIM1_YUSPECIF.txt

Simulation 2:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/GA_fine_ERA/2_lm_c/YUSPECIF
  • txt file: SIM2_YUSPECIF.txt

Simulation 3:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/GA_fine_PGW_ff/4_lm_f/YUSPECIF
  • txt file: SIM3_YUSPECIF.txt

Simulation 4:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/PGW_ff_12km/2_lm_c/YUSPECIF
  • txt file: SIM4_YUSPECIF.txt

Simulation 5:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MPI_present/4_lm_f/YUSPECIF
  • txt file: SIM5_YUSPECIF.txt

Simulation 6:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MPI_present/2_lm_c/YUSPECIF
  • txt file: SIM6_YUSPECIF.txt

Simulation 7:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MPI_near_future/4_lm_f/YUSPECIF
  • txt file: SIM7_YUSPECIF.txt

Simulation 8:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MPI_near_future/2_lm_c/YUSPECIF
  • txt file: SIM8_YUSPECIF.txt

Simulation 9:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MPI_far_future/4_lm_f/YUSPECIF
  • txt file: SIM9_YUSPECIF.txt

Simulation 10:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MPI_far_future/2_lm_c/YUSPECIF
  • txt file: SIM10_YUSPECIF.txt

Simulation 11:

  • Path on CSCS: /store/c2sm/pr04/banni/results_pompa_crCLIMrun2/debug/YUSPECIF_lm_f
  • txt file: SIM11_YUSPECIF.txt

Simulation 12:

  • Path on CSCS: /store/c2sm/pr04/banni/results_pompa_crCLIMrun2/debug/YUSPECIF_lm_c
  • txt file: SIM12_YUSPECIF.txt

Simulation 13:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/EU_fine_PGW_nf_join/scripts_part1/4_lm_f/YUSPECIF
  • txt file: SIM13_YUSPECIF.txt

Simulation 14:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/PGW_nf_12km/2_lm_c/YUSPECIF
  • txt file: SIM14_YUSPECIF.txt

Simulation 15:

  • Path on CSCS: /store/c2sm/pr04/banni/results_pompa_crCLIMrun2_pgw/debug/YUSPECIF_lm_f
  • txt file: SIM15_YUSPECIF.txt

Simulation 16:

  • Path on CSCS: /store/c2sm/pr04/banni/results_pompa_crCLIMrun2_pgw/debug/YUSPECIF_lm_c
  • txt file: SIM16_YUSPECIF.txt

Simulation 17:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MAC1/4_lm_f/YUSPECIF
  • txt file: SIM17_YUSPECIF.txt

Simulation 18:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MAC12/2_lm_c/YUSPECIF
  • txt file: SIM18_YUSPECIF.txt

Simulation 19:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MAC_PGW/4_lm_f/YUSPECIF
  • txt file: SIM19_YUSPECIF.txt

Simulation 20:

  • Path on CSCS: /store/c2sm/pr04/jvergara/RUNS_IN_SCRATCH/MAC_PGW/2_lm_c/YUSPECIF
  • txt file: SIM20_YUSPECIF.txt

Access

  • Set DENYTOPICVIEW =
Attach
pdf
ETH Contribution to km-resolution Climate Simulations: The Full Datasets - CORDEX-FPS Convection
version 1 uploaded by AliciaEngelmann on 09 Feb 2023 - 11:29
ON_JULICH_CORDEX_variables_requirement_table.csv
CORDEX_variables_requirement_table.csv
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:35
png
Folder directory CMORize
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 09:45
png
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 14:40
png
Folder directory Post-processing
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 14:45
txt
Namelist file SIM 10
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:01
txt
Namelist file SIM 11
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:01
txt
Namelist file SIM 12
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:01
txt
Namelist file SIM 13
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 11:26
txt
Namelist file SIM 14
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:02
txt
Namelist file SIM 15
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:02
txt
Namelist file SIM 16
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:02
txt
Namelist file SIM 17
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 12:51
txt
Namelist file SIM 18
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 12:51
txt
Namelist file SIM 19
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 12:50
txt
Namelist file SIM 1
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 11:31
txt
Namelist file SIM 20
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 12:47
txt
Namelist file SIM 2
version 1 uploaded by AliciaEngelmann on 21 Feb 2023 - 12:47
txt
Namelist file SIM 3
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 12:44
txt
Namelist file SIM 4
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:01
txt
Namelist file SIM 5
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:00
txt
Namelist file SIM 6
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:00
txt
Namelist file SIM 7
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:00
txt
Namelist file SIM 8
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:00
txt
Namelist file SIM 9
version 1 uploaded by LunaSantinaLehmann on 21 Feb 2023 - 13:01
spacer
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Wiki? Send feedback
Syndicate this site RSS ATOM