function [data] = read_woce_netcdf(file_name) % read_woce_netcdf.m % % General purpose script to load all of a netcdf file % using getnc. This is supplementary to the help provided in the % netCDF primer on the WOCE V3 DVDs. It calls the matlab % libraries described in the primer. These must be set up first. % % Example: % % data = read_woce_netcdf('my_file.nc'); % % will return a structure array named data. % % It is necessary to include the trailing .nc in the filename. % % % If the data have been read into a structure array named data % as in the example above % then this can be unpacked to a set of single variables with % the following code: % % sname = 'data' % or whatever the structure array is called. % eval(['struct_var_names = fieldnames(' sname ') ']); % for i = 1:length(struct_var_names) % eval([struct_var_names{i} ' = ' sname '.' struct_var_names{i} ' ;']); % end % % % This script has some extra code to catch a few special cases. % If this code is not required, it could be commented out to increase % speed. The extra code is only required for % sss, pfloat, woce-ssf, sat_sl, sat_sst, sat_mwf % % Brian King, SOC, 23 Sepetember 2002. % % There has not been time for anyone else to review these notes % between BAK drafting them and their inclusion on the WOCE DVDs. % The author therefore apologies for any innacuracies. I hope % they are more of a help than a hindrance. As a netCDF novice, % it took me several; days to figure out quite why some things % weren't working as I expected. % There are two sets of optional code to catch cases where % the basic script doesn't work correctly. By default, these are % left active. So the full script works on all files on which % I have tested it. The optional code can be commented out % for better speed. But there coudl be unexpected combinations % of variables and attributes which don't work. Proceed with % caution when first loading new data types. % Apart from the optional code, there are very few % executable lines. These are reproduced here in a block % to make it easy to see what is going on. The error traps % have also been left out of this summary. %%%%% %%%%% function [data] = br(file_name) %%%%% ncdf_obj = netcdf(file_name); %%%%% var_names = ncnames(var(ncdf_obj)); %%%%% close(ncdf_obj); %%%%% num_vars = length(var_names); %%%%% disp(['file_name = ''' file_name '''']) %%%%% for ivar = 1:num_vars %%%%% name = var_names{ivar}; %%%%% command = ['data.' name ' = getnc(file_name,''' name ''');']; %%%%% disp(command) %%%%% eval(command); %%%%% end %%%%% % % General notes on use of the script: % % The variable file_name is the argument of the function. % % File_name should be of the form 'example.nc' and include % the trailing .nc % % This script worked OK for me in Matlab 6 on % a Sun unix workstation. I'm afraid there hasn't been time % to test it on any other platforms. % % One way to read the data from a file is to use the % ncload command. This is illustrated below. But ncload % doesn't take account of the variable attributes, such % as valid_min/max/range, _FillValue or missing_value. % % % Once we have the list of variable names, use getnc to load % them properly; getnc is supposed to take account of the variable % attributes, and substitute NaN where data match a fill vlaue % or missing value, or are outside a valid range. % % ------------------------------------------------- % See end of script for some important comments about % getnc, which concern the following directories % % sss ) Problems swapping missing values in char arrays to NaN. % pfloat ) % woce-ssf ) % % sat_sl ) Issues concerning use of scale factor in some variables. % sat_sst ) % sat_mwf1 ) % sat_mwf2 ) % ------------------------------------------------- % % % % If getnc is causing problems that can't be solved, % for example incorrectly changing % out_of_valid_range or missing_values to NaN , % then an alternative strategy is to use ncload % to read in the 'raw' values, and write further simple % code to fix particular variables. % % If speed, efficiency, or memory is important, then you could find % out the list of variables that you require, and insert % the particular getnc commands that you require, rather than % loading all variables. % % The relevant getnc comamnds are displayed to the screen when the % program is run. This could be used for building new scripts. % %--------------------------------------------------------- % script starts here with some error traps %--------------------------------------------------------- if nargin ~= 1 %file_name was not supplied. Ask for it. file_name = input('Type the file name : ','s') if length(file_name) == 0 disp('You must supply precisely one argument') disp('in the function call or in response to the prompt.') disp('Type') disp(' ') disp(['help ' mfilename]) disp(' ') return end end if exist(file_name,'file') ~= 2 disp(['The file named ' file_name ' does not exist']) return end %--------------------------------------------------------- % First step is to obtain variable names from the file. % One option would be %var_names = ncload(file_name); % This would load all the variables from the file, and put the % list of variable names into var_names. But this would make no % use of the fill values or valid ranges. % Instead, use the netcdf utility. %--------------------------------------------------------- % var_names = ncnames(var(netcdf(file_name))); % % The above construct works fine to get var_names, but it seems to leave the % netcdf file open at the end of the function. % If the script is used on a large number of files, % (218 in my case) matlab eventually complains. Therefore we must % use a netcdf close command. ncdf_obj = netcdf(file_name); var_names = ncnames(var(ncdf_obj)); close(ncdf_obj); % var_names is a cell array with a list of variable names % as character strings % Also loads the variables num_vars = length(var_names); % number of variables disp(['file_name = ''' file_name '''']) for ivar = 1:num_vars name = var_names{ivar}; % take each name in turn command = ['data.' name ' = getnc(file_name,''' name ''');']; % Use getnc to load the data into a structure % array called data. The structure array makes it easy % to pass data out of the function. % A simpler construct would be % command = ['data.' name ' = getnc(file_name,name);']; % but the echo of the command to the screen wouldn't be so helpful % The basic simple form of the getnc command would be % depth = getnc('example.nc','depth'); % to read a variable called 'depth' from a % file called 'example.nc' % There are two sets of conditions for which the above command % doesn't produce the correct results. These are discussed extensively % in comments at the end of this script. To handle these without % error, some non-default options are required on getnc. In this % default version of this script, these options are included. When not % using the particualr datasets, the extra lines can be commented out, % which will increase speed and efficiency. %--------------------------------------------------------- % % Special case 1. % % Some data files define an attribute scale_factor for one % or more variables. % getnc recognises this attribute, and then has to decide whether % to scale the data, and possibly other attributes, such as % fill value or valid range. The default action is % to scale data AND attributes. See help getnc for details. % For all the WOCE V3 datasets identified by BAK, which have defined % both scale_factor and valid range attributes, valid range attributes % are always in scaled units rather than unsacled units. % Therefore these attributes must NOT be scaled again by getnc. % Therefore, we need to set non-default action for getnc. % The datasets concerned are: SAT_SL, SAT_SST and SAT_MWF. % Note that woceflux uses scale_factor, but not valid range, % so the default values of getnc are OK. % Where needed, change rescale_opts in getnc from [1 1] to [1 0]. % command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,-1,-1,-1,[1 0]);']; %--------------------------------------------------------- % The next segment of code can be commented out for some data files. [att_vals att_names] = attnc(file_name,name); att_num = length(att_names); clear i_scale i_min i_max i_range for i_att = 1:att_num if strcmp(att_names{i_att},'scale_factor') i_scale = 1; end if strcmp(att_names{i_att},'valid_min') i_min = 1; end if strcmp(att_names{i_att},'valid_max') i_max = 1; end if strcmp(att_names{i_att},'valid_range') i_range = 1; end end if exist('i_scale') & (exist('i_min') | exist('i_max') | exist('i_range')) command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,-1,-1,-1,[1 0]);']; end clear i_scale i_min i_max i_range %end of code fragment %--------------------------------------------------------- % % Special case 2. % % getnc searches for attributes called _FillValue or missing_value. % % Where these attributes occur, the data values are compared to the % attribute value. When a match is found, the data value is set to NaN. % If the variable or attribute are of class char (ie character), then the % logic is flawed. The Matlab result could be an error, or a warning with % unpredictable matlab action. In some cases tested by BAK, the % character variable was set to a blank, which was not always % the intention. % The solution in these cases (_FillValue or missing value of type char) % is to suspend checking against fill or missing values. % command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,1);']; % % Directories requiring the fix: % % sss % pfloat (trajectory files) % woce-ssf %--------------------------------------------------------- % The next segment of code can be commented out for some data files. [att_vals att_names] = attnc(file_name,name); att_num = length(att_names); clear i_fill_miss for i_att = 1:att_num if strcmp(att_names{i_att},'_FillValue') & strcmp(class(att_vals{i_att}),'char') i_fill_miss = 1; end if strcmp(att_names{i_att},'missing_value') & strcmp(class(att_vals{i_att}),'char') i_fill_miss = 1; end end if exist('i_fill_miss') command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,1);']; end clear i_fill_miss %end of code fragment %--------------------------------------------------------- %--------------------------------------------------------- % end of special cases %--------------------------------------------------------- disp(command) % echo to the screen the commands used eval(command); % execute command to load this variable end %--------------------------------------------------------- % end of script. The remainder is comment and explanation. %--------------------------------------------------------- %------------------------------------------------------ % The script above was tested for at least one file from every % WOCE DAC. Where a Data Centre has created files of several % different types for their various data sets, I have tried % to test one file of each type. But I cannot claim % that testing has been exhaustive. % In particular, The above script read files from % the following directories. As far as I can tell the results were % correct. Reading from the other directories will require small % modifications, described below. % % bathymetry % cmdac % pfloat (profiles & trajectories file) % sadcp % sat_mwf1 % sat_mwf2 % sat_sl % sat_sst % sl_fast % slevel_dm % sss % svp % whp (ctd & bottle file) % woce-ssf % woce-uot % woceflux (several file types) % wocemet % % There is extra code to catch two special cases where the % default action of getnc was not satisfactory. These were: % % 1) correct handling of scaling % 2) Correct handling of 'fill values' on character strings % % % ------------------------------------ % Correct handling of scaling % ------------------------------------ % % Directories involved % % sat_mwf1 % sat_mwf2 % sat_sl % sat_sst % woceflux % % ***** The background: % % Some datasets are stored as integers, with a scaling factor. % The scaling factor is stored as an attribute of the variable. % For example, sat_sl stores sea_level as an integer % in the range +/- 32767, with a scaling factor of 0.001 % This saves on storage. % % ***** The problem: % % getnc must take the correct action when reading in the data. % % When getnc reads data, it has separate swithces that tell % it whether or not to scale the data, and whether or not to scale % the attributes. The default action is to scale both data and attributes. % This is not always correct. % % Variable attributes may include missing_value, _FillValue, valid_min % valid_max, valid_range. % % After the data and attributes have been scaled (or not, depending % on the switch settings) the data are compared with the attributes % and set to NaN if they match the fill/missing value or are outside % the valid range. % % Different DACs have used different conventions about whether or not % attributes are scaled, so special action is required. % % ***** The solution: % % getnc controls scaling with an argument rescale_opts. This % is an array of length 2. The first element controls % scaling of data. The second element controls scaling of % attributes. The default is [1 1] = yes and yes. % % Unfortunately, rescale_opts is the 10th argument of getnc. % Fortunately, the unused options can be defaulted to -1. % % Thus the default action is equivalent to % % rescale_opts = [1 1] % sea_level = getnc('example.nc','sea_level',-1,-1,-1,-1,-1,-1,-1,rescale_opts); % % Here is the convention used by DACs for scaling % and the required setting of rescale_opts: % % *** sat_mwf: use rescale_opts = [1 0] % Many variables use scaling % Valid_range in attributes is in scaled units % Fill and missing values in attributes are in unscaled units % Example % short zonal_wind_speed(latitude, longitude) ; % zonal_wind_speed:data_min = -9.75 ; % zonal_wind_speed:data_max = 11.3 ; % zonal_wind_speed:scale_factor = 0.01 ; % zonal_wind_speed:valid_min = -60. ; % zonal_wind_speed:valid_max = 60. ; % zonal_wind_speed:missing_value = 32767s ; % zonal_wind_speed:_FillValue = 32767s ; % % % The use of a scaled valid_range means that the attributes % MUST NOT be scaled again when reading in by getnc. % Otherwise the valid range becomes +/- 0.60, % and nearly all data fall outside the % new valid_range and are replaced by NaNs. % % getnc therefore compares scaled data (eg 32.767) % with unscaled fill and missing values, % and fails to match them. However, the data that are filled % or missing values (read in and converted to 327.67) % lie outside the valid range (+/- 60), so they are caught that way. % % % *** sat_sl and sat_sst: use rescale_opts = [1 0] % One variable in each dataset uses scaling % Valid range in attributes is in scaled units % Fill and missing values in attributes are in scaled units % Example % short sea_level(latitude, longitude) ; % sea_level:data_min = -0.259f ; % sea_level:data_max = 0.37f ; % sea_level:valid_min = -1.5f ; % sea_level:valid_max = 1.5f ; % sea_level:_FillValue = 32.767f ; % sea_level:missing_value = 32.766f ; % sea_level:scale_factor = 0.001f ; % % % The use of a scaled valid_range means that the attributes % MUST NOT be scaled again when reading in by getnc. % Otherwise the valid range becomes +/- 0.0015, % and nearly all data fall outside the % new valid_range and are replaced by NaNs. % % getnc compares scaled data with fill and missing values of 32.767 % and 32.766. Unfortunately, as far as I can tell, the match does not % work properly in matlab. The data are read in as integers (type 'short' % in netCDF) and then scaled by 0.001. The attribute is read in as % type 'float', and results in a slightly different value due to machine % error. However, the data that are filled or missing % values (read in and converted to 32.767 and 32.766) % lie outside the valid range, so they are caught that way. % % % *** woceflux: use rescale_opts = [1 1], so default is OK. % In some datasets, two variables use scaling % Valid_range is not used % Fill and missing values in attributes are in unscaled units % Example % short u(lat, lon) ; % u:scale_factor = 0.1f ; % u:missing_value = 32767s ; % % Since there is no valid_range to be checked, the attributes % MUST be scaled when they are read in by getnc. % % If a data value is read in as an integer 'short' value % of 32767, this is scaled by 0.1 and compared with % the missing_value of 32767, which must also be scaled by 0.1. % % Therefore the default value of rescale_opts is correct for these % files. % % % ----------------------------------- % Correct handling of fill values on character strings % ------------------------------------ % The solution, given in detail after the description of the problems, is to switch % off missing value checking in getnc for these variables. This solution % works for all the problems of this nature I have discovered. % % Directories affected % % sss % pfloat (trajectory files) % woce-ssf % *** woce-ssf % % The problem here concerns attributes of type character, where % a DAC has defined a missing value or a fill value. % % First, we must understand how character variables are stored in matlab % and (I believe) netCDF, though I'm not a netCDF expert. % % For example, in a file from woce-ssf % % dimensions: % time = UNLIMITED ; // (424 currently) % kstatus = 6 ; % variables: % char cstatus(time, kstatus) ; % cstatus:long_name = "status/method values" ; % cstatus:missing_value = " " ; % % % So variable cstatus is a 2-D array. Each element % of the 2-D array is a SINGLE CHARACTER. % One dimension is time. The intention is that the other % dimension allows character strings of length 6 to % provide a status variable. % % When getnc attempts ro read the attribute cstatus:missing_value, % it crashes. It complains that the missing value must be a single % character. I suggest that the problem here is in the netCDF file. % If the dimension is 424x6, the missing value is a single blank % not a string of 6 blanks. % % % *** sss and pfloat trajectory files % % % There is a more general problem with how getnc handles character variables and % missing/fill values. % % sss and pfloat trajectory files both attempt to % define a _FillValue for character variables. % % Example % % dimensions: % n_measurement = UNLIMITED ; // (1628 currently) % string4 = 4 ; % n_history = 1 ; % variables: % char history_software_release(n_measurement, n_history, string4) ; % history_software_release:_FillValue = " " ; % % % The problem here is that the variable is read in to a matlab char array % of dimension 1628 x 1 x 4. These values are then tested against the % fill value of " ". If any values match, getnc attempts to set them % to NaN, which is an error for a single element of a char array. % My version of matlab gives a warning, and sets the element in the % char array to blank. If the _FillValue was blank anyway, this is not % a problem. However, if the _FillValue was a character zero or nine, it becomes % character blank, which is clearly not what the DAC intends. % % *** Solution: % Read in these variables using ncload, or % using getnc with the option to substitute % missing values switched off. see help getnc for details. Note the use % of 1, rather than the default, in the seventh argument. % Example for pfloat trajectory file: % % temp_corrected_qc = getnc(file_name,'temp_corrected_qc',-1,-1,-1,-1,1); % % some code has been inserted in the script which catches all % the cases I know about. It can be commented out if not required. % % Here is a list of attributes that showed up in my test files that % would cause a problem if any elements of the char arrays matched the FillValue. % I therefore recommend users of these files to substitute appropriate getnc commands % of the form illustrated above. % % pfloat/trajectory file % trajectory_parameters:_FillValue = " " ; % inst_reference:_FillValue = " " ; % positioning_system:_FillValue = " " ; % juld_qc:_FillValue = "0" ; % position_accuracy:_FillValue = "9" ; % position_qc:_FillValue = "0" ; % temp_corrected_qc:_FillValue = "0" ; % psal_corrected_qc:_FillValue = "0" ; % cndc_corrected_qc:_FillValue = "0" ; % pres_corrected_qc:_FillValue = "0" ; % history_software_release:_FillValue = " " ; % sss % woce_date:_FillValue = " " ; % woce_time:_FillValue = " " ; % position_qc:_FillValue = "0" ; % sst_qc:_FillValue = "0" ; % sss_qc:_FillValue = "0" ; % woce-ssf % cstatus:missing_value = " " ; % % % ------------------------------------------------------