|
Appendix 2: Example matlab
script
function [data] = read_woce_netcdf(file_name)
% read_woce_netcdf.m
%
% General purpose script to load all of a netcdf file
% using getnc. This is supplementary to the help provided in the
% netCDF primer on the WOCE V3 DVDs. It calls the matlab
% libraries described in the primer. These must be set up first.
%
% Example:
%
% data = read_woce_netcdf('my_file.nc');
%
% will return a structure array named data.
%
% It is necessary to include the trailing .nc in the filename.
%
%
% If the data have been read into a structure array named data
% as in the example above
% then this can be unpacked to a set of single variables with
% the following code:
%
% sname = 'data' % or whatever the structure array is called.
% eval(['struct_var_names = fieldnames(' sname ') ']);
% for i = 1:length(struct_var_names)
% eval([struct_var_names{i} ' = ' sname '.' struct_var_names{i} ' ;']);
% end
%
%
% This script has some extra code to catch a few special cases.
% If this code is not required, it could be commented out to increase
% speed. The extra code is only required for
% sss, pfloat, woce-ssf, sat_sl, sat_sst, sat_mwf
%
% Brian King, SOC, 23 Sepetember 2002.
%
% There has not been time for anyone
else to review these notes
% between BAK drafting them and their inclusion on the WOCE DVDs.
% The author therefore apologies for any innacuracies. I hope
% they are more of a help than a hindrance. As a netCDF novice,
% it took me several; days to figure out quite why some things
% weren't working as I expected.
% There are two sets of optional
code to catch cases where
% the basic script doesn't work correctly. By default, these are
% left active. So the full script works on all files on which
% I have tested it. The optional code can be commented out
% for better speed. But there coudl be unexpected combinations
% of variables and attributes which don't work. Proceed with
% caution when first loading new data types.
% Apart from the optional code, there
are very few
% executable lines. These are reproduced here in a block
% to make it easy to see what is going on. The error traps
% have also been left out of this summary.
%%%%%
%%%%% function [data] = br(file_name)
%%%%% ncdf_obj = netcdf(file_name);
%%%%% var_names = ncnames(var(ncdf_obj));
%%%%% close(ncdf_obj);
%%%%% num_vars = length(var_names);
%%%%% disp(['file_name = ''' file_name ''''])
%%%%% for ivar = 1:num_vars
%%%%% name = var_names{ivar};
%%%%% command = ['data.' name ' = getnc(file_name,''' name ''');'];
%%%%% disp(command)
%%%%% eval(command);
%%%%% end
%%%%%
%
% General notes on use of the script:
%
% The variable file_name is the argument of the function.
%
% File_name should be of the form 'example.nc' and include
% the trailing .nc
%
% This script worked OK for me in Matlab 6 on
% a Sun unix workstation. I'm afraid there hasn't been time
% to test it on any other platforms.
%
% One way to read the data from a file is to use the
% ncload command. This is illustrated below. But ncload
% doesn't take account of the variable attributes, such
% as valid_min/max/range, _FillValue or missing_value.
%
%
% Once we have the list of variable names, use getnc to load
% them properly; getnc is supposed to take account of the variable
% attributes, and substitute NaN where data match a fill vlaue
% or missing value, or are outside a valid range.
%
% -------------------------------------------------
% See end of script for some important comments about
% getnc, which concern the following directories
%
% sss ) Problems swapping missing values in char arrays to NaN.
% pfloat )
% woce-ssf )
%
% sat_sl ) Issues concerning use of scale factor in some variables.
% sat_sst )
% sat_mwf1 )
% sat_mwf2 )
% -------------------------------------------------
%
%
%
% If getnc is causing problems that can't be solved,
% for example incorrectly changing
% out_of_valid_range or missing_values to NaN ,
% then an alternative strategy is to use ncload
% to read in the 'raw' values, and write further simple
% code to fix particular variables.
%
% If speed, efficiency, or memory is important, then you could find
% out the list of variables that you require, and insert
% the particular getnc commands that you require, rather than
% loading all variables.
%
% The relevant getnc comamnds are displayed to the screen when the
% program is run. This could be used for building new scripts.
%
%---------------------------------------------------------
% script starts here with some error traps
%---------------------------------------------------------
if nargin ~= 1
%file_name was not supplied. Ask for it.
file_name = input('Type the file name : ','s')
if length(file_name) == 0
disp('You must supply precisely one argument')
disp('in the function call or in response to the prompt.')
disp('Type')
disp(' ')
disp(['help ' mfilename])
disp(' ')
return
end
end
if exist(file_name,'file') ~= 2
disp(['The file named ' file_name ' does not exist'])
return
end
%---------------------------------------------------------
% First step is to obtain variable names from the file.
% One option would be
%var_names = ncload(file_name);
% This would load all the variables from the file, and put the
% list of variable names into var_names. But this would make no
% use of the fill values or valid ranges.
% Instead, use the netcdf utility.
%---------------------------------------------------------
% var_names = ncnames(var(netcdf(file_name)));
%
% The above construct works fine to get var_names, but it seems to leave the
% netcdf file open at the end of the function.
% If the script is used on a large number of files,
% (218 in my case) matlab eventually complains. Therefore we must
% use a netcdf close command.
ncdf_obj = netcdf(file_name);
var_names = ncnames(var(ncdf_obj));
close(ncdf_obj);
% var_names is a cell array with
a list of variable names
% as character strings
% Also loads the variables
num_vars = length(var_names);
% number of variables
disp(['file_name = ''' file_name
''''])
for ivar = 1:num_vars
name = var_names{ivar};
% take each name in turn
command = ['data.' name ' = getnc(file_name,''' name ''');'];
% Use getnc to load the data into a structure
% array called data. The structure array makes it easy
% to pass data out of the function.
% A simpler construct would be
% command = ['data.' name ' = getnc(file_name,name);'];
% but the echo of the command to the screen wouldn't be so helpful
% The basic simple form of the getnc
command would be
% depth = getnc('example.nc','depth');
% to read a variable called 'depth' from a
% file called 'example.nc'
% There are two sets of conditions
for which the above command
% doesn't produce the correct results. These are discussed extensively
% in comments at the end of this script. To handle these without
% error, some non-default options are required on getnc. In this
% default version of this script, these options are included. When not
% using the particualr datasets, the extra lines can be commented out,
% which will increase speed and efficiency.
%---------------------------------------------------------
%
% Special case 1.
%
% Some data files define an attribute scale_factor for one
% or more variables.
% getnc recognises this attribute, and then has to decide whether
% to scale the data, and possibly other attributes, such as
% fill value or valid range. The default action is
% to scale data AND attributes. See help getnc for details.
% For all the WOCE V3 datasets identified by BAK, which have defined
% both scale_factor and valid range attributes, valid range attributes
% are always in scaled units rather than unsacled units.
% Therefore these attributes must NOT be scaled again by getnc.
% Therefore, we need to set non-default action for getnc.
% The datasets concerned are: SAT_SL, SAT_SST and SAT_MWF.
% Note that woceflux uses scale_factor, but not valid range,
% so the default values of getnc are OK.
% Where needed, change rescale_opts in getnc from [1 1] to [1 0].
% command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,-1,-1,-1,[1
0]);'];
%---------------------------------------------------------
% The next segment of code can be commented out for some data files.
[att_vals att_names] = attnc(file_name,name);
att_num = length(att_names);
clear i_scale i_min i_max i_range
for i_att = 1:att_num
if strcmp(att_names{i_att},'scale_factor')
i_scale = 1;
end
if strcmp(att_names{i_att},'valid_min')
i_min = 1;
end
if strcmp(att_names{i_att},'valid_max')
i_max = 1;
end
if strcmp(att_names{i_att},'valid_range')
i_range = 1;
end
end
if exist('i_scale') & (exist('i_min')
| exist('i_max') | exist('i_range'))
command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,-1,-1,-1,[1
0]);'];
end
clear i_scale i_min i_max i_range
%end of code fragment
%---------------------------------------------------------
%
% Special case 2.
%
% getnc searches for attributes called _FillValue or missing_value.
%
% Where these attributes occur, the data values are compared to the
% attribute value. When a match is found, the data value is set to NaN.
% If the variable or attribute are of class char (ie character), then the
% logic is flawed. The Matlab result could be an error, or a warning with
% unpredictable matlab action. In some cases tested by BAK, the
% character variable was set to a blank, which was not always
% the intention.
% The solution in these cases (_FillValue or missing value of type char)
% is to suspend checking against fill or missing values.
% command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,1);'];
%
% Directories requiring the fix:
%
% sss
% pfloat (trajectory files)
% woce-ssf
%---------------------------------------------------------
% The next segment of code can be commented out for some data files.
[att_vals att_names] = attnc(file_name,name);
att_num = length(att_names);
clear i_fill_miss
for i_att = 1:att_num
if strcmp(att_names{i_att},'_FillValue') & strcmp(class(att_vals{i_att}),'char')
i_fill_miss = 1;
end
if strcmp(att_names{i_att},'missing_value') & strcmp(class(att_vals{i_att}),'char')
i_fill_miss = 1;
end
end
if exist('i_fill_miss')
command = ['data.' name ' = getnc(file_name,''' name ''',-1,-1,-1,-1,1);'];
end
clear i_fill_miss
%end of code fragment
%---------------------------------------------------------
%---------------------------------------------------------
% end of special cases
%---------------------------------------------------------
disp(command)
% echo to the screen the commands used
eval(command);
% execute command to load this variable
end
%---------------------------------------------------------
% end of script. The remainder is comment and explanation.
%---------------------------------------------------------
%------------------------------------------------------
% The script above was tested for at least one file from every
% WOCE DAC. Where a Data Centre has created files of several
% different types for their various data sets, I have tried
% to test one file of each type. But I cannot claim
% that testing has been exhaustive.
% In particular, The above script read files from
% the following directories. As far as I can tell the results were
% correct. Reading from the other directories will require small
% modifications, described below.
%
% bathymetry
% cmdac
% pfloat (profiles & trajectories file)
% sadcp
% sat_mwf1
% sat_mwf2
% sat_sl
% sat_sst
% sl_fast
% slevel_dm
% sss
% svp
% whp (ctd & bottle file)
% woce-ssf
% woce-uot
% woceflux (several file types)
% wocemet
%
% There is extra code to catch two special cases where the
% default action of getnc was not satisfactory. These were:
%
% 1) correct handling of scaling
% 2) Correct handling of 'fill values' on character strings
%
%
% ------------------------------------
% Correct handling of scaling
% ------------------------------------
%
% Directories involved
%
% sat_mwf1
% sat_mwf2
% sat_sl
% sat_sst
% woceflux
%
% ***** The background:
%
% Some datasets are stored as integers, with a scaling factor.
% The scaling factor is stored as an attribute of the variable.
% For example, sat_sl stores sea_level as an integer
% in the range +/- 32767, with a scaling factor of 0.001
% This saves on storage.
%
% ***** The problem:
%
% getnc must take the correct action when reading in the data.
%
% When getnc reads data, it has separate swithces that tell
% it whether or not to scale the data, and whether or not to scale
% the attributes. The default action is to scale both data and attributes.
% This is not always correct.
%
% Variable attributes may include missing_value, _FillValue, valid_min
% valid_max, valid_range.
%
% After the data and attributes have been scaled (or not, depending
% on the switch settings) the data are compared with the attributes
% and set to NaN if they match the fill/missing value or are outside
% the valid range.
%
% Different DACs have used different conventions about whether or not
% attributes are scaled, so special action is required.
%
% ***** The solution:
%
% getnc controls scaling with an argument rescale_opts. This
% is an array of length 2. The first element controls
% scaling of data. The second element controls scaling of
% attributes. The default is [1 1] = yes and yes.
%
% Unfortunately, rescale_opts is the 10th argument of getnc.
% Fortunately, the unused options can be defaulted to -1.
%
% Thus the default action is equivalent to
%
% rescale_opts = [1 1]
% sea_level = getnc('example.nc','sea_level',-1,-1,-1,-1,-1,-1,-1,rescale_opts);
%
% Here is the convention used by DACs for scaling
% and the required setting of rescale_opts:
%
% *** sat_mwf: use rescale_opts = [1 0]
% Many variables use scaling
% Valid_range in attributes is in scaled units
% Fill and missing values in attributes are in unscaled units
% Example
% short zonal_wind_speed(latitude, longitude) ;
% zonal_wind_speed:data_min = -9.75 ;
% zonal_wind_speed:data_max = 11.3 ;
% zonal_wind_speed:scale_factor = 0.01 ;
% zonal_wind_speed:valid_min = -60. ;
% zonal_wind_speed:valid_max = 60. ;
% zonal_wind_speed:missing_value = 32767s ;
% zonal_wind_speed:_FillValue = 32767s ;
%
%
% The use of a scaled valid_range means that the attributes
% MUST NOT be scaled again when reading in by getnc.
% Otherwise the valid range becomes +/- 0.60,
% and nearly all data fall outside the
% new valid_range and are replaced by NaNs.
%
% getnc therefore compares scaled data (eg 32.767)
% with unscaled fill and missing values,
% and fails to match them. However, the data that are filled
% or missing values (read in and converted to 327.67)
% lie outside the valid range (+/- 60), so they are caught that way.
%
%
% *** sat_sl and sat_sst: use rescale_opts = [1 0]
% One variable in each dataset uses scaling
% Valid range in attributes is in scaled units
% Fill and missing values in attributes are in scaled units
% Example
% short sea_level(latitude, longitude) ;
% sea_level:data_min = -0.259f ;
% sea_level:data_max = 0.37f ;
% sea_level:valid_min = -1.5f ;
% sea_level:valid_max = 1.5f ;
% sea_level:_FillValue = 32.767f ;
% sea_level:missing_value = 32.766f ;
% sea_level:scale_factor = 0.001f ;
%
%
% The use of a scaled valid_range means that the attributes
% MUST NOT be scaled again when reading in by getnc.
% Otherwise the valid range becomes +/- 0.0015,
% and nearly all data fall outside the
% new valid_range and are replaced by NaNs.
%
% getnc compares scaled data with fill and missing values of 32.767
% and 32.766. Unfortunately, as far as I can tell, the match does not
% work properly in matlab. The data are read in as integers (type 'short'
% in netCDF) and then scaled by 0.001. The attribute is read in as
% type 'float', and results in a slightly different value due to machine
% error. However, the data that are filled or missing
% values (read in and converted to 32.767 and 32.766)
% lie outside the valid range, so they are caught that way.
%
%
% *** woceflux: use rescale_opts = [1 1], so default is OK.
% In some datasets, two variables use scaling
% Valid_range is not used
% Fill and missing values in attributes are in unscaled units
% Example
% short u(lat, lon) ;
% u:scale_factor = 0.1f ;
% u:missing_value = 32767s ;
%
% Since there is no valid_range to be checked, the attributes
% MUST be scaled when they are read in by getnc.
%
% If a data value is read in as an integer 'short' value
% of 32767, this is scaled by 0.1 and compared with
% the missing_value of 32767, which must also be scaled by 0.1.
%
% Therefore the default value of rescale_opts is correct for these
% files.
%
%
% -----------------------------------
% Correct handling of fill values on character strings
% ------------------------------------
% The solution, given in detail after the description of the problems, is to
switch
% off missing value checking in getnc for these variables. This solution
% works for all the problems of this nature I have discovered.
%
% Directories affected
%
% sss
% pfloat (trajectory files)
% woce-ssf
% *** woce-ssf
%
% The problem here concerns attributes of type character, where
% a DAC has defined a missing value or a fill value.
%
% First, we must understand how character variables are stored in matlab
% and (I believe) netCDF, though I'm not a netCDF expert.
%
% For example, in a file from woce-ssf
%
% dimensions:
% time = UNLIMITED ; // (424 currently)
% kstatus = 6 ;
% variables:
% char cstatus(time, kstatus) ;
% cstatus:long_name = "status/method values" ;
% cstatus:missing_value = " " ;
%
%
% So variable cstatus is a 2-D array. Each element
% of the 2-D array is a SINGLE CHARACTER.
% One dimension is time. The intention is that the other
% dimension allows character strings of length 6 to
% provide a status variable.
%
% When getnc attempts ro read the attribute cstatus:missing_value,
% it crashes. It complains that the missing value must be a single
% character. I suggest that the problem here is in the netCDF file.
% If the dimension is 424x6, the missing value is a single blank
% not a string of 6 blanks.
%
%
% *** sss and pfloat trajectory files
%
%
% There is a more general problem with how getnc handles character variables
and
% missing/fill values.
%
% sss and pfloat trajectory files both attempt to
% define a _FillValue for character variables.
%
% Example
%
% dimensions:
% n_measurement = UNLIMITED ; // (1628 currently)
% string4 = 4 ;
% n_history = 1 ;
% variables:
% char history_software_release(n_measurement, n_history, string4) ;
% history_software_release:_FillValue = " " ;
%
%
% The problem here is that the variable is read in to a matlab char array
% of dimension 1628 x 1 x 4. These values are then tested against the
% fill value of " ". If any values match, getnc attempts to set them
% to NaN, which is an error for a single element of a char array.
% My version of matlab gives a warning, and sets the element in the
% char array to blank. If the _FillValue was blank anyway, this is not
% a problem. However, if the _FillValue was a character zero or nine, it becomes
% character blank, which is clearly not what the DAC intends.
%
% *** Solution:
% Read in these variables using ncload, or
% using getnc with the option to substitute
% missing values switched off. see help getnc for details. Note the use
% of 1, rather than the default, in the seventh argument.
% Example for pfloat trajectory file:
%
% temp_corrected_qc = getnc(file_name,'temp_corrected_qc',-1,-1,-1,-1,1);
%
% some code has been inserted in the script which catches all
% the cases I know about. It can be commented out if not required.
%
% Here is a list of attributes that showed up in my test files that
% would cause a problem if any elements of the char arrays matched the FillValue.
% I therefore recommend users of these files to substitute appropriate getnc
commands
% of the form illustrated above.
%
% pfloat/trajectory file
% trajectory_parameters:_FillValue = " " ;
% inst_reference:_FillValue = " " ;
% positioning_system:_FillValue = " " ;
% juld_qc:_FillValue = "0" ;
% position_accuracy:_FillValue = "9" ;
% position_qc:_FillValue = "0" ;
% temp_corrected_qc:_FillValue = "0" ;
% psal_corrected_qc:_FillValue = "0" ;
% cndc_corrected_qc:_FillValue = "0" ;
% pres_corrected_qc:_FillValue = "0" ;
% history_software_release:_FillValue = " " ;
% sss
% woce_date:_FillValue = " " ;
% woce_time:_FillValue = " " ;
% position_qc:_FillValue = "0" ;
% sst_qc:_FillValue = "0" ;
% sss_qc:_FillValue = "0" ;
% woce-ssf
% cstatus:missing_value = " " ;
%
%
% ------------------------------------------------------
|