We already have NC (full) location and NC basic location. They have these raster maps in common:
These maps are in NC basic:
But in full NC they have different names:
Of course the longer names have their reasoning in differences with similar raster maps in the same location but I would say that having unified names is more advantageous for teaching/test datasets then absolute clearness of names. This should be in metadata anyway.
One can also argue about the unified names themselves (e.g. elevation vs dtm or usage of underscore) but most of it is pretty clear since it has to be the most general names possible.
The names must be obviously in English. If somebody would like to have data in different language, derived dataset must be created. Perhaps it would be possible to provide some batch version of g.rename (but there are also attribute columns and others).
The last issue might be what if there is nothing in the area which can be part of the map or if dam or pond are lakes. But we can allow for some inaccuracies when creating a training dataset.
The other locations which can be unified are Piemonte and Spearfish.
So, what are the next steps? Decide about which maps to include and which names to use? Let's start from the NC basic location.
I'm not sure if geology and soils would be available in other locations, so we could leave out them. However, they are available for Spearfish and maybe for Piemonte (my Italian is not really usable).
We would need to have at at least one map for each type. I'm not sure what are the crucial ones and broadly available but it seems that training datasets are usually near some civilization, so roads or schools might be available. Buildings would be nice to have.
Attribute data, time series and 3D rasters and (real) 3D vectors are of course whole new level. So, I would start with rasters and (mostly 2D) vectors.