This Manual has been produced within the context of the Global Temperature-Salinity Pilot Project (GTSPP). Because the work of assuring the quality of data handled by the Project is shared amongst data centres, it is important to have both consistent and well documented procedures. This Manual describes the means by which data quality is assessed and the actions taken as a result of the procedures.
The GTSPP handles all temperature and salinity profile data. This includes observations collected using water samplers, continuous profiling instruments such as CTDs, thermistor chain data and observations acquired using thermosalinographs. These data will reach data processing centres of the Project through the real-time channels of the IGOSS program or in delayed mode through the IODE system.
The procedures described here are intended to cover only the above-mentioned data types and specifically for data sent through the IGOSS system. However, there are obvious generalizations that can be made to other data types. Because of this, it is expected that this Manual will serve as a base on which to build more extensive procedures for the aforementioned data types and to broaden to other types, as well. Indeed, in some cases, tests of data types that are not strictly part of this Project are incorporated into this Manual simply because they are of obvious use and because these data types are often associated with the data of interest to the GTSPP.
Updates to this Manual are carried out as new procedures are recommended to the GTSPP and as these are accepted by the project Steering Group. Readers are encouraged to make suggestions on both how to improve existing tests, and of new tests that should be considered. In both cases, it is important to explain how the suggestion improves or expands upon the existing suite of tests. Suggestions may be forwarded to any participants of the GTSPP and these will be directed to the Steering Group. As tests are suggested but before incorporation, they will be documented in a section of the Manual. This will provide a means to accumulate suggestions, to disseminate them and solicit comments.
This Manual describes procedures that make extensive use
of flags to indicate data quality. To make full use of this
effort, participants of the GTSPP have agreed that data access
based on quality flags will be available. That is, GTSPP participants
will permit the selection of data from their archives based
on quality flags as well as other criteria. These flags are
always included with any data transfers that take place. Because
the flags are always included, and because of the policy regarding
changes to data, as described later, a user can expect the participants
to disseminate data at any stage of processing. Furthermore,
GTSPP participants have agreed to retain copies of the data
as originally received and to make these available to the user
if requested.
The implementation of the tests in this Manual requires interactive software to be written. The operator is consulted in the setting of flags or possibly in changing data values. In each case, information is provided to the operator to help them decide what action to take. In the descriptions of the tests, certain specific items of information and data displays are included. So, for example, when a station position fails a test of platform speed, a track chart of the platform is used. The amount of information displayed and the presentation technique is dependent upon the hardware and software capabilities at the implementation site. For this reason, the information to be displayed, and the method of presentation should be treated as recommendations
2.0 QUALITY FLAGGINGThe purpose of this Manual is to set standards for quality control of real-time data and to describe exactly the screening process that is employed. By reading this document, users may assess the applicability of the procedures to their requirements and thereby judge whether they need do further work before using the data.
Attached to every profile is a number indicating the version of the Quality Control Manual which describes the tests employed. As the procedures documented by this Manual are expanded to include others or to refine the older tests, a new version flag will be assigned. It is recognized that the suite of tests performed will undergo modifications with time. For this reason it is necessary to record which version of quality control procedures have been applied to the data. This version number is associated with updates to this Manual. The version applied is to be assigned to each profile as it is processed and to be carried thereafter with the data. This document constitutes version 1.0.
Also attached to every profile is a number that indicates
which tests have been employed. This number is constructed as
follows. Each test of the Quality Control Manual is assigned
an index number to base 2. The number that describes the suite
of tests employed against a profile is the sum of the index
numbers of the tests used. The index number is given with every
test documented in this Manual. This number is then written
in base 16. So the digits 0 through 9 represent numbers from
0 through 9, A=10 through to F=15. As an example, if there are
10 tests, and all are employed, the Test Number is then 3FF.
If a participating Data Center applies tests other than those described in this Manual, it should supply documentation with the data to explain the other tests. The use of other tests is indicated by a version number for the Manual that has a digit in the hundredths place. So, for example, a Version of 1.02 indicates that a Data Center has used the tests described in version 1.0 of the QC Manual but have also applied other tests (indicated by the digit 2) of their own. Each Data Centre may assign this last digit in a fashion suitable to their own operations.
The second type of flag is used to indicate the quality of
the data. It is considered unproductive to attach a flag describing
the result of each test performed to every observation since
this may result in numerous flags that generally would not be
used. Instead, it is deemed necessary to be able to assign flags
to individual or groups of data values to indicate the confidence
in the value. Participants of the GTSPP have agreed that the
following rules shall apply.
1. Both independent and dependent variables can have a flag
assignment.
2. Data aggregations (in the case here these are entire profiles)
can also be assigned a flag. So the word element used later
implies aggregations as well.
3. The flags indicating data quality are those currently used
in IGOSS processing with one extension.
The test descriptions allow for inferring values for those that have failed the test procedures. The inference of a correct value is done at the discretion of the person doing the quality control. It should be based on information which is not available to the test procedure but which the operator has at hand and assists in knowing what the correct value should be. Values should be changed only when there is certainty what is the correct value. In the instance where data values are changed, the original value is also preserved and is available to users or to other tests if needed.
Finally, because quality assessment is shared over processing centres, it is possible that data flagged as doubtful by one centre will be considered acceptable by another or vice versa. Flags can be changed by any processing centre as long as a record is kept of what the changes are.
The use of the flagging scheme described here will meet the stated requirements of the GTSPP. It is recognized that as new testing procedures are developed, it will be necessary to re-examine data. With version flags preserved with the data, it will be possible to identify what has been done, and therefore how best to approach the task of passing data through newer quality control procedures.
It is recognized that knowledge of the instrumentation used to make an observation can be useful in the assessment of the quality of the data. Likewise, knowledge of the platform from which the data were collected can also be used. Where available, this instrumentation knowledge should be sent with the data to the GTSPP participants. The present version of this Manual suggests tests that make use of instrumentation knowledge if available. It is expected that subsequent versions of the Manual will improve on this.
All processing centers should monitor the performance of their quality control tests. In this way, deficiencies can be identified and recommendations made to improve procedures. These recommendations should be sent to the Steering Group designated to maintain this Manual. They will be discussed and included as appropriate in subsequent versions of the Manual.
5.0 PRE AND POST PROCESSING
The quality control tests described in the appendix assume
a basic scrutiny has been applied to the data. Explicitly, the
data have passed a format checking procedure which ensures that
alphanumerics occur where expected and no illegal characters
are present. It does not assume that values of variables have
been checked to see if they are physically possible. None of the tests described here automatically assigns a
quality flag without the approval of the person doing the quality
assessment. When a value or element fails a test, a recommendation
of the flag to be assigned is made. The person doing the quality
assessment then must decide the appropriate flag to use from
a list of recommendations. The tests do restrict the flags that
may be assigned in that a user is not permitted to assign any
flag to a value or element failing a test. There is a need to find and remove data duplications. A check
for duplicate reports is necessary to eliminate statistical
biases which would arise in products incorporating the same
data more than once. In searching, the distinction between exact
and inexact duplicates should be kept in mind. An exact duplicate
is a report in which all the physical variable groups (including
space-time coordinates) are identical to those of a previous
report of the same type from the same platform. An inexact duplicate
will have at least one difference. Annex A contains the algorithm proposed by the Marine Environmental
Data Service for the identification of duplicates. It discusses
the implementation of the technique for data received in both
real-time and delayed mode. In the context of this Manual, only
the discussions of the handling of real-time data is relevant.
The algorithm is based on near coincidences of position, and
time. This means that tests 1.1 to 1.4 and test 2.1 of this
Manual must be applied before duplications are sought. The basic
criteria for a possible duplication is based on the experience
of the TOGA Subsurface Data Centre. So, if stations are collected
within 15 minutes or 5 km of each other, they may be duplicates.
The identification of the stations of potential duplicates are
then examined as well as the data to resolve whether or not
a duplication exists. Then, other tests of the quality control
are run on the output of the duplicates test. In this way, as
little as possible is done before duplications are tested for.
There will also be a need for scientific assessment of the
data quality. This would involve subjecting the data to a different
set of tests by applying knowledge of the characteristics of
the processes from which observations have been collected. It
may also be that more data may be gathered together so that
more sophisticated statistical tests can be applied. As such
tests become generally accepted and an established application
procedure developed, they could be incorporated into the context
of this Manual and become part of the regular screening process
conducted by participants of this project.
The complete set of tests is included in Annex B. Each description
has a number of sections that are always present. A description
of the information that each contains follows:
Test Name: This is the short name of the test. Each
test is numbered for ease of reference. The tests have been grouped according to stages. The first
stage is concerned with determining that the position, the time,
and the identification of a profile are sensible. The second
stage is concerned with resolving impossible values for variables.
The next stage examines the consistency of the incoming data
with respect to references such as climatologies. The next section
looks at the internal consistency within the data set. The grouping of the tests suggests a logical order of implementation
in that the simpler, more basic tests occur before more complicated
ones. The order of presentation of tests within a stage does
not imply an order in implementation. In fact, should a value
be changed as a result of a test, the new value should be retested
by all of the tests within the stage. Indeed, since data values
can be changed, the implementation of these tests cannot take
place in a strictly sequential fashion.
The tests detailed by this Manual cannot be mutually exclusive
in examining the various properties and characteristics of the
data. As much as possible, each test should focus on a particular
property to test if the data value or profile conforms to expectations.
Modifications to old tests will be incorporated as they refine
the focus of the test. New tests will be added to examine properties
of data that are not adequately covered by this version.
Each of the tests has been written from the point of view
that the data being examined have not been before. The difference
this makes is that quality flag assignments do not check if
the flag has already been set to something other than 0 (meaning
no quality control has been performed). If this is not the case,
the rules as written will need modifications to check if the
flag has previously been set. If this is the case, and a flag
indicates the value was changed, the user should be informed
of the original value of the data before another change is performed.
Then, if the flag is reset, the changed value should be preserved
in the history of the station if the flag is set to be anything
else. In other cases, where a flag is changed but the observation
is untouched, it is not necessary to record the old flag, but
simply to record that data have passed through a second organization
and the quality tests done there.
The tests described in stage 5 represent a visual inspection
of the data as received and usually after all other tests have
been completed. This stage is necessary to ensure that no questionable
data values pass through the suite of tests employed without
being detected. The testing and flagging procedure of this stage
relies upon the experience and knowledge of the person conducting
the test. As experience is gained with the tests contained within
this Manual, the processes used in the visual inspection of
stage 5 will be converted to objective tests included in other
sections of the Manual. However, there will always be a need
to conduct this visual inspection as the final judgement of
the validity of the data.
7.0 SUGGESTED ADDITIONAL TESTS
Other tests that have been suggested are listed in Annex
C. These have not yet reached the stage of being incorporated
into the Manual but have been suggested as worthy of consideration.
They are noted here so that participants may record their experiences
with their use and so that they may be considered for future
versions.
Annex D contains some details of how certain of the tests
are implemented in particular cases. The purpose of their inclusion
is to provide further details that may assist others in understanding
the details of a test procedure.
Contributions to the contents of the manual were made by
J.R. Keeley, S. Levitus, D. McLain, N. Mikhailov, C. Noe, J.-P.
Rebert, B. Searle, and W. White. Others have assisted in suggestions
of how to improve tests and clarify the text. Information describing
test procedures carried out by various organizations are noted
in the Reference section. This Manual reflects the knowledge
described by the references.
Prerequisites: This describes what tests are assumed
to have applied before and what preparation of the data set
is suggested before application of the test. If will also describe
what information files are required.
Description: This section describes how the test is implemented
and what actions are taken based on the results of the test.
History: This records any changes that have taken place
in the test procedure and the date on which they were recorded.
This section will record the evolution of a test procedure through
the various versions of the Manual.
Rules: This section lists the rules that are applied
to effect the various tests. Their numbering is for reference
value only since they have been written so that they may be
implemented in any order.
| 1. | Guidelines for evaluating and screening bathythermograph data, ICES Working Group on Marine Data Management, September, 1986. |
| 2. | Note sur les contrôles effectuées à Paris sur les données BATHY et TESAC par le centre SMISO, P. LeLay, Member of IGOSS OTA, 5 July, 1988. |
| 3. | Guide to Data Collection and Location Services Using Service Argos, Marine Meteorology and Related Oceanographic Activities Report 10, WMO/TD-No.262, 1988, Revised edition, 104pp. |
| 4. | Guide to Drifting Buoys, IOC/WMO Manuals and Guides 20, 1988, 69pp. |
| 5. | Ocean Temperature Fields, Northern Hemisphere Grid, 1985-1988, Office of Ocean Services, National Ocean Service, NOAA, June, 1988. |
| 6. | Ocean Temperature Fields, Southern Hemisphere Grid, 1985-1988, Office of Ocean Services, National Ocean Service, NOAA, July, 1989. |
| 7. | Guide to Operational Procedures for the Collection and Exchange of IGOSS Data, IOC/WMO Manuals and Guides 3, Revised June, 1989, 68pp. |
| 8. | Personal Communication, N. Mikhailov, 19 September, 1989. |
| 9. | Quality Improvement Profile System (QUIPS), Functional Description, R. Bauer, Compass Systems, 1987. |
| 10. | Seasonal Anomalies of Temperature and Salinity in the Northwest Atlantic in 1983, Canadian Technical Report of Hydrography and Ocean Sciences #74, March, 1988. |
| 11. | Reineger and Ross Interpolation Method, in Oceans IV: A Processing, Archiving and Retrieval System for Oceanographic Station Data, Marine Environmental Data Service Manuscript Report Series #15, 1970, pp40-41. |
| 12. | Marine Data Platforms - An Interactive Inventory, G. Soneira, W. Woodward and C. Noe, 7pp. |
| 13. | Guidelines for Evaluating and Screening Bathythermographic Data, ICES Working Group on Data Management, September, 1986, 4pp. |
| 14. | Data Monitoring and Quality Control of Marine Observations, W.S. Richardson and P.T. Reilly. |
| 15. | IOC/IODE "Manual of Quality Control Algorithms and Procedures for Oceanographic Data Going into International Oceanographic Data Exchange", draft, 1989. |
| 16. | IOC/WMO Guide to Operational Procedures for the Collection and Exchange of IGOSS Data, Manuals & Guides #3, 68pp, 1988. |
| 17. | UNESCO Technical Papers in Marine Science #44, Algorithms for the Computation of Fundamental Properties of Seawater, UNESCO, 1983. |