[Requests] RFC: OGC Discrete Global Grid System (DGGS) Core Standard (15-104r3)

Daniel Lee Daniel.Lee at dwd.de
Wed Feb 3 10:04:22 EST 2016


Dear colleagues,

I would like to offer DWD's perspective on the proposed standard as a heavy user 
of georeferenced data from the field of numerical weather prediction (NWP).

In general, the proposal contains a lot of good ideas, but it also has some 
issues that in my opinion can lead to problems in the future, particularly for 
use in the field of meteorology. I've described my concerns below, which are 
also attached in reformatted form to match the comment template.

Best regards,
Daniel Lee

**************
General issues
**************

The proposal is a great idea for standardizing grids optimized for area-based 
analytics and storage. It has the potential to streamline a lot of distributed 
operations on multi-computer and multi-institute infrastructures, which are 
expected to gain increasing importance over the next years. It is mostly 
compatible with current grids used in operational NWP systems. In fact, some NWP 
models use grids that would be completely compatible with DGGS without 
adaptation (one example is the ICON model, which we at DWD began using 
operationally last year). Our research users who designed the ICON grid weren't 
familiar with the term "DGGS," but through convergent development they produced 
results that match it quite nicely.

This doesn't make DGGS perfect or even suited for many NWP applications, so it 
is not to be expected that the use of DGGS increases significantly in the field 
of NWP even if this standard is adopted (although it would surely be useful in 
the area of model output post-processing).

Replacement of "legacy coordinate systems"
==========================================

Not all NWP systems use DGGS-compatible, and often computational grids are 
optimized for different criteria, so that future NWP grids may never comply with 
this specification. Despite the advantages of DGGS, I don't see the need for 
other coordinate systems to disappear - DGGS is a specialized referencing system 
that is specialized to solve certain problems particularly well.

Every optimization has advantages and disadvantages; for example, DGGS 
discretizes all features, allowing for efficient sorting and searching. However, 
there are many applications where the original data should be stored for the 
sake of data consistency. This can be seen easily using the example of 
satellites on geosynchronous orbits. The size of each pixel in such an 
observation is dependent on the latitude at which it was viewed, so that the 
original observations would never fit into a DGGS without lots of resampling.

Additionally, NWP systems often are more concerned with conservation of 
vorticity than conservation of area. Many computational grids will be optimized 
for different purposes and this is unlikely to change in order to accomodate 
DGGS. So calling traditional CRS "legacy", with the long-term implication that 
they may be replaced by DGGS, makes me pretty uncomfortable.

Scope of the standard
=====================

The standard is both very specific and very general. On the one hand, it 
attempts to be general enough to accomodate a wide range of applications, but 
it's also very specific about some things. I'll talk about some of those below 
in the technical issues, but in general I feel like the the standard mixes 
requirements for simple data storage with those of applications designed to 
manage and analyze that data, which muddies the waters unnecessarily.

Surely, nobody likes a plethora of standards, but a strength of OGC is having a 
plug-and-play stance to standards which makes them much cleaner without 
sacrificing standardization. I recommend separating storage and processing into 
separate standards and restricting this core standard to the actual reference 
system.

n-dimensionality
================

In NWP, 2D data is a rarity. Almost all data has a vertical and a temporal 
dimension.

This raises interesting questions regarding extensions to the current proposal. 
  Does each refinement affect all dimensions? Currently, non-discrete coordinate 
systems allow for compact descriptions of points in 4D space because each 
dimension can be traversed individually and with an arbitrary precision. If we 
require cells to have a unique identifier in all dimensions, however, every 
possible combination of positions in each dimension needs to be addressable. 
This means that a huge address space is needed, whereas it might be more 
efficient to be able to traverse different dimensions independently (especially 
since many meteorological data sets are sparsely populated).  Also, refinement 
may be desired in some dimensions but not in others.

There will definitely be difficulties with conservation of area in a geocentric 
coordinate system that extends vertically. With increasing distance from the 
ellipsoid's center, a cell with the same horizontal extent mapped onto the 
ellipsoid's surface will have a greater area. How is area (or n-dimensional 
volume) conserved in this case? Would a different DGGS be required for each 
vertical layer? Also, vertical coordinates in NWP are often based on pressure 
and therefore vary in height over time. This is essentially the same as using a 
bumpy, mutable datum. What would be done in this case?

I see a certainty that extending this standard to other dimensions will be 
complicated, and therefore recommend that the standard state explicitly that it 
is defined for two dimensions, with the problems accompanying its extension to 
other dimensions to be solved at a later time.

Test suite
==========

Aside from the issues mentioned above, I don't think that the conformance tests 
do a good job of testing compliance with the standard in its current form. 
Details are listed below.

****************
Technical issues
****************

Very specific requirements that limit the standard's extendability
==================================================================

Many requirements go deep into implementation details, while the standard tries 
to stay as general as possible in other areas in order to accomodate a large 
number of different und unforeseen DGGS. Particularly, requirements 13-17 go 
into software implementation issues which I think are inappropriate for a 
standard that is ostensibly for a referencing system and not software to work 
with data described using such a system.

Here are some further comments about some individual requirements.

Requirement 6: Equal area precision
-----------------------------------

Why is the equal area precision defined as part of the DGGS? You could require 
that a given implementation of that DGGS specify its precision, but as the 
uncertainty is due to computational precision it shouldn't be part of the DGGS 
itself.

Requirement 9: Cells referenced by centroids
--------------------------------------------

Although defining a cell by its centroid is intuitive in many instances, it's 
probably not intuitive for all instances. It's definitely very specific, where 
the standard leaves certain other things open. For example, the standard does 
not specify that cells must be convex. Although I can't imagine how non-convex 
cells could be tesselated properly, I would not claim that this is impossible 
until I'd seen proof. If somebody creates a DGGS using non-convex cells, the 
centroids might lie outside of cells. It makes sense in many use cases to 
identify a cell at its centroid, but I think that that implementation detail 
should be left to the user and not predetermined in the standard.

Additionally, as no provision is made in the standard for n-dimensional data, I 
suggest using the centroid in 2D space, if this clause is kept.

Requirement 10: Initial tesselation
-----------------------------------

Specifying the entire initial tesselation and making it a part of the DGGS 
itself, rather than allowing users to specify certain parameters that apply to 
the initial and following tesselations, reduces flexibility. As I understand the 
proposal, if I want to use the same tesselation pattern but partition the Earth 
initially using other borders, this would require an entirely separate DGGS.

Here the proposal is somewhat unclear and I could be interpreting it wrong, but 
it seems to me that the initial tesselation would define the nodes that the 
borders of the top-level tesselation would run along. I'm not sure what benefit 
restricting the specification to this extent brings.

Requirement 11: Maximum refinement
----------------------------------

Why specify the maximum refinement? A given implementation could specify what 
refinements are available, but the benefit of restricting the refinement levels 
as part of the coordinate system is unclear. A practical restriction could be 
the address space, but there could be many cases where the required resolution 
is higher than originally thought. It would be a pity if reaching a desired 
resolution in the future resolution required creating an entirely new DGGS and 
registering it with the relative normative authorities.

Requirement 17: Standard data formats
--------------------------------------

The standard requires the DGGS to be able to export results to "standard data 
formats" without specifying what those are. How would this work? Would it be up 
to the user to decide what formats are standard? Would there be a list of 
standard formats that a DGGS would have to support? Would it break old 
implementations to expand that list? And, if we take one step back from that 
requirement, why should a standard for a coordinate system require that a 
software implementation be able to serialize data in a given format? I think 
this would be cleaner if the DGGS standard remained a standard for the reference 
system alone and not for software implementations designed to work on data 
described with that system.

Abstract tests
==============

A.1.3: Cell boundary overlap
----------------------------

This should also test all resolutions, not just a single one.

A.1.8: Spatial reference
------------------------

This should test all cells and ensure that querying for any of their IDs returns 
only one cell. Also, querying for a nonexistent ID should fail.

A.1.9: Cells referenced at their centroid
-----------------------------------------

This test passes if a skeleton method is included in the software as 
boilerplating. It should require a real implementation or be left out.

A.1.10: Initial tesselation
---------------------------

See comment for A.1.9.

A.1.11: Cell refinement
-----------------------

See comment for A.1.9.

A.1.12: Cell addressing
-----------------------

See comment for A.1.9.

A.1.13: Quantization Operations
-------------------------------

See comment for A.1.9.

A.1.14: Cell navigation
-----------------------

See comment for A.1.9.

Also, this test will fail for a correct DGGS at the highest and lowest 
refinement levels, because the lowest resolution cells will have no parents and 
the highest resolution cells will have no children.

I suggest a test along the following lines:

1. Beginning with the initial tesselation, iterate through all cells at the 
present refinement level. For each cell, identify and navigate to all sibling 
cells. Then identify and navigate to all children cells.
2. Repeat the procedure from (1) until the finest (desired) refinement level is 
reached, at which point children cells are no longer identified or navigated to.
3. At each successive refinement level, verify that each cell has 1 parent, 
which is identified and navigated to. Continue until the initial tesselation is 
reached.

A.1.15: Spatial analysis
------------------------

This test is so generally formulated that it is difficult to verify. Also, it 
delves much more deeply into the software implementation of an analysis system 
using a DGGS than IMHO is necessary, rather than specifying characteristics of 
the DGGS itself.

A.1.16: Interoperability query operations
-----------------------------------------

See comment for A.1.15.

A.1.17: Interoperability broadcast operations
---------------------------------------------

See comment for A.1.15.

-- 
______________________________

Daniel Lee
Deutscher Wetterdienst (DWD)
Referat TI12b: Entschlüsselung und Datenaufbereitung
Daniel.Lee at dwd.de
Tel: +49 (69) 8062-2706
Fax: +49 (69) 8062-3829
http://www.dwd.de
______________________________
-------------- next part --------------
Part A to be completed once.  Iterate Part B as needed.


PART A


1. Evaluator:
        Daniel Lee
        Deutscher Wetterdienst (DWD)
        Referat TI12b: Entschlüsselung und Datenaufbereitung
        Daniel.Lee at dwd.de
        Tel: +49 (69) 8062-2706
        Fax: +49 (69) 8062-3829
        http://www.dwd.de

2. Submission: [15-104r3, OGC Discrete Global Grid System (DGGS) Core Standard]



PART B


1. Requirement: General

2. Implementation Specification Section number: General

3. Criticality: Editorial

4. Comments/justifications for changes:

    The proposal is a great idea for standardizing grids optimized for area-based
    analytics and storage. It has the potential to streamline a lot of distributed
    operations on multi-computer and multi-institute infrastructures, which are
    expected to gain increasing importance over the next years. It is mostly
    compatible with current grids used in operational NWP systems. In fact, some
    NWP models use grids that would be completely compatible with DGGS without
    adaptation (one example is the ICON model, which we at DWD began using
    operationally last year). Our research users who designed the ICON grid weren't
    familiar with the term "DGGS," but through convergent development they produced
    results that match it quite nicely.

    This doesn't make DGGS perfect or even suited for many NWP applications, so it
    is not to be expected that the use of DGGS increases significantly in the field
    of NWP even if this standard is adopted (although it would surely be useful in
    the area of model output post-processing).

1. Requirement: General

2. Implementation Specification Section number: General

3. Criticality: Editorial

4. Comments/justifications for changes:

    Not all NWP systems use DGGS-compatible, and often computational grids are
    optimized for different criteria, so that future NWP grids may never comply
    with this specification. Despite the advantages of DGGS, I don't see the need
    for other coordinate systems to disappear - DGGS is a specialized referencing
    system that is specialized to solve certain problems particularly well.

    Every optimization has advantages and disadvantages; for
    example, DGGS discretizes all features, allowing for efficient sorting and
    searching. However, there are many applications where the original data should
    be stored for the sake of data consistency. This can be seen easily using the
    example of satellites on geosynchronous orbits. The size of each pixel in such
    an observation is dependent on the latitude at which it was viewed, so that the
    original observations would never fit into a DGGS without lots of resampling.

    Additionally, NWP systems often are more concerned with conservation of
    vorticity than conservation of area. Many computational grids will be optimized
    for different purposes and this is unlikely to change in order to accomodate
    DGGS. So calling traditional CRS "legacy", with the long-term implication that
    they may be replaced by DGGS, makes me pretty uncomfortable.

1. Requirement: General

2. Implementation Specification Section number: General

3. Criticality: Major

4. Comments/justifications for changes:

    The standard is both very specific and very general. On the one hand, it
    attempts to be general enough to accomodate a wide range of applications, but
    it's also very specific about some things. I'll talk about some of those below
    in the technical issues, but in general I feel like the the standard mixes
    requirements for simple data storage with those of applications designed to
    manage and analyze that data, which muddies the waters unnecessarily.

    Surely, nobody likes a plethora of standards, but a strength of OGC is having a
    plug-and-play stance to standards which makes them much cleaner without
    sacrificing standardization. I recommend separating storage and processing into
    separate standards and restricting this core standard to the actual reference
    system.

1. Requirement: General

2. Implementation Specification Section number: General

3. Criticality: Major

4. Comments/justifications for changes:

    In NWP, 2D data is a rarity. Almost all data has a vertical and a temporal
    dimension.

    This raises interesting questions regarding extensions to the current proposal.
    Does each refinement affect all dimensions? Currently, non-discrete coordinate
    systems allow for compact descriptions of points in 4D space because each
    dimension can be traversed individually and with an arbitrary precision. If we
    require cells to have a unique identifier in all dimensions,
    however, every possible combination of positions in each dimension needs to be
    addressable. This means that a huge address space is needed, whereas it might
    be more efficient to be able to traverse different dimensions independently
    (especially since many meteorological data sets are sparsely populated).
    Also, refinement may be desired in some dimensions but not in others.

    There will definitely be difficulties with conservation of area in a geocentric
    coordinate system that extends vertically. With increasing distance from the
    ellipsoid's center, a cell with the same horizontal extent mapped onto the
    ellipsoid's surface will have a greater area. How is area (or n-dimensional
    volume) conserved in this case? Would a different DGGS be required for each
    vertical layer? Also, vertical coordinates in NWP are often based on pressure
    and therefore vary in height over time. This is essentially the same as using a
    bumpy, mutable datum. What would be done in this case?

    I see a certainty that extending this standard to other dimensions will be
    complicated, and therefore recommend that the standard state explicitly that it
    is defined for two dimensions, with the problems accompanying its extension to
    other dimensions to be solved at a later time.

1. Requirement: General

2. Implementation Specification Section number: General

3. Criticality: Major

4. Comments/justifications for changes:

Aside from the issues mentioned above, I don't think that the conformance tests
do a good job of testing compliance with the standard in its current form.

1. Requirement: 6

2. Implementation Specification Section number: 6.2.4.1

3. Criticality: Major

4. Comments/justifications for changes:

    Why is the equal area precision defined as part of the DGGS? You could require
    that a given implementation of that DGGS specify its precision, but as the
    uncertainty is due to computational precision it shouldn't be part of the DGGS
    itself.

1. Requirement: 9

2. Implementation Specification Section number: 6.2.5.1

3. Criticality: Major

4. Comments/justifications for changes:

    Although defining a cell by its centroid is intuitive in many instances, it's
    probably not intuitive for all instances. It's definitely very specific, where
    the standard leaves certain other things open. For example, the standard does
    not specify that cells must be convex. Although I can't imagine how non-convex
    cells could be tesselated properly, I would not claim that this is impossible
    until I'd seen proof. If somebody creates a DGGS using non-convex cells, the
    centroids might lie outside of cells. It makes sense in many use cases to
    identify a cell at its centroid, but I think that that implementation detail
    should be left to the user and not predetermined in the standard.

    Additionally, as no provision is made in the standard for n-dimensional data, I
    suggest using the centroid in 2D space, if this clause is kept.

1. Requirement: 10

2. Implementation Specification Section number: 6.2.6.1

3. Criticality: Major

4. Comments/justifications for changes:

    Specifying the entire initial tesselation and making it a part of the DGGS
    itself, rather than allowing users to specify certain parameters that apply to
    the initial and following tesselations, reduces flexibility. As I understand
    the proposal, if I want to use the same tesselation pattern but partition the
    Earth initially using other borders, this would require an entirely separate
    DGGS.

    Here the proposal is somewhat unclear and I could be interpreting it wrong, but
    it seems to me that the initial tesselation would define the nodes that the
    borders of the top-level tesselation would run along. I'm not sure what benefit
    restricting the specification to this extent brings.

1. Requirement: 11

2. Implementation Specification Section number: 6.2.6.2

3. Criticality: Major

4. Comments/justifications for changes:

    Why specify the maximum refinement? A given implementation could specify what
    refinements are available, but the benefit of restricting the refinement levels
    as part of the coordinate system is unclear. A practical restriction could be
    the address space, but there could be many cases where the required resolution
    is higher than originally thought. It would be a pity if reaching a desired
    resolution in the future resolution required creating an entirely new DGGS and
    registering it with the relative normative authorities.

1. Requirement: 17

2. Implementation Specification Section number: 6.3.3.2

3. Criticality: Major

4. Comments/justifications for changes:

    The standard requires the DGGS to be able to export results to "standard data
    formats" without specifying what those are. How would this work? Would it be up
    to the user to decide what formats are standard? Would there be a list of
    standard formats that a DGGS would have to support? Would it break old
    implementations to expand that list? And, if we take one step back from that
    requirement, why should a standard for a coordinate system require that a
    software implementation be able to serialize data in a given format? I think
    this would be cleaner if the DGGS standard remained a standard for the
    reference system alone and not for software implementations designed to work on
    data described with that system.

1. Requirement: 3

2. Implementation Specification Section number: 6.2.1

3. Criticality: Major

4. Comments/justifications for changes:

    This should also test all resolutions, not just a single one.

1. Requirement: 8

2. Implementation Specification Section number: 6.2.5

3. Criticality: Major

4. Comments/justifications for changes: 

    This should test all cells and ensure that querying for any of their IDs
    returns only one cell. Also, querying for a nonexistent ID should fail.

1. Requirement: 9

2. Implementation Specification Section number: 6.2.5.1

3. Criticality: Major

4. Comments/justifications for changes: 

    This test passes if a skeleton method is included in the software as
    boilerplating. It should require a real implementation or be left out.

1. Requirement: 10

2. Implementation Specification Section number: 6.2.6.1

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for requirement 9.

1. Requirement: 11

2. Implementation Specification Section number: 6.2.6.2

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for requirement 9.

1. Requirement: 12

2. Implementation Specification Section number: 6.2.6.3

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for requirement 9.

1. Requirement: 13

2. Implementation Specification Section number: 6.3.1

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for requirement 9.

1. Requirement: 14

2. Implementation Specification Section number: 6.3.2

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for A.1.9.

    Also, this test will fail for a correct DGGS at the highest and lowest
    refinement levels, because the lowest resolution cells will have no parents and
    the highest resolution cells will have no children.

    I suggest a test along the following lines:

    1. Beginning with the initial tesselation, iterate through all cells at the
       present refinement level. For each cell, identify and navigate to all
       sibling cells. Then identify and navigate to all children cells.
    2. Repeat the procedure from (1) until the finest (desired) refinement level is
       reached, at which point children cells are no longer identified or navigated
       to.
    3. At each successive refinement level, verify that each cell has 1 parent,
       which is identified and navigated to. Continue until the initial tesselation
       is reached.

1. Requirement: 15

2. Implementation Specification Section number: 6.3.2

3. Criticality: Major

4. Comments/justifications for changes: 

    This test is so generally formulated that it is difficult to verify. Also, it
    delves much more deeply into the software implementation of an analysis system
    using a DGGS than IMHO is necessary, rather than specifying characteristics of
    the DGGS itself.

1. Requirement: 16

2. Implementation Specification Section number: 6.3.3.1

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for requirement 15.

1. Requirement: 17

2. Implementation Specification Section number: 6.3.3.2

3. Criticality: Major

4. Comments/justifications for changes: 

    See comment for requirement 15.


More information about the Requests mailing list