Table of Contents
There are two methods for reducing dataset size: packing
and compression. By packing we mean altering the data
in a way that reduces its precision. By compression we
mean techniques that store the data more efficiently
and result in no precision loss. Compression only
works in certain circumstances, e.g., when a variable
contains a significant amount of missing or repeated
data values. In this case it is possible to make use of
standard utilities, e.g., UNIX compress
or GNU gzip
, to
compress the entire file after it has been written. In
this section we offer an alternative compression method
that is applied on a variable by variable basis. This
has the advantage that only one variable need be
uncompressed at a given time. The disadvantage is that
generic utilities that don't recognize the CF conventions
will not be able to operate on compressed variables.
At the current time the netCDF interface does
not provide for packing data. However a simple
packing may be achieved through the use of the
optional NUG defined attributes
scale_factor
and
add_offset
.
After the data values of a variable
have been read, they are to be multiplied by
the
scale_factor
, and have
add_offset
added to
them. If both attributes are present, the data
are scaled before the offset is added. When
scaled data are written, the application should
first subtract the offset and then divide by the
scale factor. The units of a variable should be
representative of the unpacked data.
This standard is more restrictive than the NUG
with respect to the use of the
scale_factor
and
add_offset
attributes; ambiguities and precision
problems related to data type conversions
are resolved by these restrictions. If the
scale_factor
and
add_offset
attributes are of
the same data type as the associated variable,
the unpacked data is assumed to be of the
same data type as the packed data. However,
if the
scale_factor
and
add_offset
attributes
are of a different data type from the variable
(containing the packed data) then the unpacked
data should match the type of these attributes,
which must both be of type float
or both be of
type double
. An additional restriction in this
case is that the variable containing the packed
data must be of type byte
, short
or int
. It is
not advised to unpack an int
into a float
as
there is a potential precision loss.
When data to be packed contains missing values
the attributes that indicate missing values
(_FillValue
, valid_min
, valid_max
, valid_range
)
must be of the same data type as the packed
data. See Section 2.5.1, “Missing Data, valid and actual
range” for a discussion of how
applications should treat variables that have
attributes indicating both missing values and
transformations defined by a scale and/or offset.