Re: Retain attributes using 'blockmedian'
Paul Wessel <pwessel <at> hawaii.edu>
2012-05-14 03:32:53 GMT
Hi Jan Erik and others-
I have added a preliminary -Er option to blockmedian and blockmode that appends the record number of the
median/mode as the last output column. It has not been tested very much; please give feedback. This is GMT5
subversion only. Because output takes place via a double, the max number of records is 2^53 or
9,007,199,254,740,992 which should be plenty. If your SOURCE_IDs are text or not record numbers then you
can use the output to extract the value of the source id from the given record.
Paul
On May 11, 2012, at 12:53 PM, Paul Wessel wrote:
> On May 11, 2012, at 5:48 AM, Joaquim Luis wrote:
>
>> On 11-05-2012 16:14, Walter (HF) Smith wrote:
>>> Joaquim is a developer.
>>>
>>> Walter
>>
>> Which doesn't make my idea the 'official one'. It was just a thought for a quick way to solve this issue.
>>
>> Joaquim
>>
>>>
>>> On May 11, 2012, at 11:09 AM, Jan Erik Arndt wrote:
>>>
>>>> Thanks for your inputs,
>>>>
>>>> Joaquim, I think this is a nice approach. I am just wondering, if it's possible to obtain BLK_W (which I
imagine is the weightvalue corresponding to BLK_Z) it should also be possible to obtain the blockmedian
value from the fifth attribute(SOURCE_ID) directly. I think I will take a look on the source code soon, and
see what I can do and keep you informed if I find a nicer solution.
>>>>
>>>> Until then I am of course keen to know what the developers are thinking and if there might already is a
more easy way?
>>>>
>>>> Jan
>>>>
>>>> Am 11.05.2012 16:07, schrieb Joaquim Luis:
>>>>> Jan,
>>>>>
>>>>> I think a very pedestrian but easy to hack solution may be implemented by you right away.
>>>>> The idea is to embed the SOURCE_ID into the weight and save the weights instead of the Z's
>>>>>
>>>>> Considering that attributing weights to data is always a somewhat subjective assignment, you can
use weights with high numbers.
>>>>> Lets say minimum weight is 10000
>>>>> Next use small numbers to to the source_id (probably 0-255 will be enough)
>>>>> add weights and source_ids (for example)
>>>>> 10000 + 255 = 10255
>>>>> this, in terms of weighting is barely distinguishable from, let's say, 15000
>>>>>
>>>>> Now see the blockmedian.c code (current SVN version GMT5) and replace line 168
>>>>>
>>>>> extra[k] = 0.5 * (data[node].a[BLK_Z] + data[node1].a[BLK_Z]);
>>>>>
>>>>> by
>>>>>
>>>>> extra[k] = data[node1].a[BLK_W] - 10000; // or 'node' if you prefer
>>>>>
>>>>>
>>>>> and line 175
>>>>>
>>>>> extra[k] = data[node].a[BLK_Z];
>>>>>
>>>>> by
>>>>>
>>>>> extra[k] = data[node].a[BLK_W] - 10000;
>>>>>
>>>>>
>>>>> Of course, the 10000 in the above lines assume that you started your minimum weight start at 10000, AND
will screw the normal functioning of blockmedian, so you better make it a copy under another name.
>>>>>
>>>>> Joaquim
>>>>>
>>>>>> I could send you my code but you would have to hack it much more than you would have to hack the C source
for blockmedian in GMT.
>>>>>>
>>>>>> Mine was called "imgbm" for img block median, and it worked on data using native binary ints for the
depth, source id, and position. The position was recorded as an integer index to a cell in an 'img' style
file (Smith and Sandwell type bathymetry/gravity stuff). So it read and wrote a list of native binary
ints, and it simply sorted the list in a structured way.
>>>>>>
>>>>>> So it did not have the functionality of blockmedian for I/O and such.
>>>>>>
>>>>>> I don't have time to do this myself right now, so I hope the development team is looking at this. It
looks easy to add but there might be some side effects with blockmode and blockmean. In principle we would
have optionally 4 or 5, rather than 3 or 4, elements per data structure. There would be an option switch
similar to -W which would have to increment the number of elements, and that would have to control the I/O so
that the Source_ID got read and written. Then it could be carried along in the structure.
>>>>>>
>>>>>> One would have to be careful that, if both the -W and Source_ID options were turned on, the weight got
loaded in the right place, so that the machinery that does weighted CDFs would continue to work right.
>>>>>>
>>>>>> Walter
>>>>>>
>>>>>>
>>>>>> On May 11, 2012, at 4:07 AM, Jan Erik Arndt wrote:
>>>>>>
>>>>>>> The SOURCE_ID's are representing cruises, compilations or even digitized nautical charts.
These are all having an attribute WEIGHT, which is used in 'blockmedian' option '-W' to prioritize high
quality data sets against data sets with lower quality.
>>>>>>>
>>>>>>> What I am aiming for is a unique ID for every gridcell to visualize which source is responsible the
one value extracted by 'blockmedian'. This will give information about which source finally is used for
the following gridding process. In cells with an even number of points this of course means losing the
SOURCE_ID of the deeper value, but thats no problem since I only want to have one unique value per cell.
Getting the SOURCE_ID of the shallower point in case of an even point amount would be the best solution for
my purposes I think.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Jan
>>>>>>>
>>>>>>> Am 11.05.2012 09:32, schrieb Florian Wobbe:
>>>>>>>> Jan,
>>>>>>>>
>>>>>>>> is your SOURCE_ID unique for each data point or is this an attribute that multiple points share
(e.g. date or cruise name). In the latter case, how would you expect SOURCE_ID to be retained - e.g. when 2
shiptracks with different ID cross? Walter's blockmedian keeps the SOURCE_ID with the lowest elevation
so one ID would be lost in this case. I would prefer this output instead:
>>>>>>>>
>>>>>>>> x, y, numPointsInThisBlock,<list of all SOURCE_ID's (n=numPointsInThisBlock) ordered by weight>
>>>>>>>>
>>>>>>>> This way, you know which SOURCE_ID's contributed to a certain output point of blockmedian.
>>>>>>>>
>>>>>>>> For blockmedian numPointsInThisBlock may be 1 or 2, so max. 2 numPointsInThisBlock SOURCE_ID's
are listed.
>>>>>>>>
>>>>>>>> An interesting feature would be to prefer certain SOURCE_ID's. Say you know that the data quality
of shiptrack 0815 is worse than that of other tracks. You may wish to discard the information of 0815 in a
block where other points contribute as well but retain the information if its the only one available.
>>>>>>>>
>>>>>>>> Florian
>>>>>>>>
>>>>>>>>> Yes that is the point Walter. This would be the easiest and in fact the only way to determine which
source at last is responsible for the value of a specific cell of the grid. If that extension on
'blockmedian' exists, the source id grid generation would become much easier. So I am excited about the
developers opinion.
>>>>>>>>>
>>>>>>>>> Walter, just in case this extension is not available now, would you be so kind offering me your
special blockmedian version? That would fasten up my work a bit and I would definitely appreciate it a lot.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dipl.-Ing. Jan Erik Arndt
>>>>>>>>>
>>>>>>>>> Email:
>>>>>>>>> Jan.Erik.Arndt <at> awi.de
>>>>>>>>>
>>>>>>>>> Address:
>>>>>>>>> Alfred Wegener Institute
>>>>>>>>> Van-Ronzelen-Str. 2
>>>>>>>>> D-27568 Bremerhaven
>>>>>>>>>
>>>>>>>>> Telephone: +49(471)4831-1369
>>>>>>>>> Fax: +49(471)4831-1977
>>>>>>>>>
>>>>>>>>> Am 10.05.2012 22:38, schrieb Walter (HF) Smith:
>>>>>>>>>> I understand exactly what Jan Erik wants to do. We do the same thing in the SRTM30PLUS gridded
bathymetry process, where each cell is assigned the median depth and also the source identification
number that contributed that value.
>>>>>>>>>>
>>>>>>>>>> Keith is right that if there are an even number of data points (unweighted case) or if the
cumulative distribution midpoint falls between two values (weighted case), then there can be an
ambiguity. For a project like bathymetry it may make sense in the ambiguous cases to default one way or the
other, for example to always choose the ID number of the shallower point.
>>>>>>>>>>
>>>>>>>>>> I built a special version of blockmedian to do this for our project. But I think the GMT
developers discussed adding this extension to blockmedian. I wrote the original but I haven't looked
under the hood in awhile to see what it now does.
>>>>>>>>>>
>>>>>>>>>> Developers, do you want to comment? Is this something blockmedian and blockmode can do, or
could do with an extension, and if so, is it worth adding this?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Walter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Walter H F Smith
>>>>>>>>>> Chairman, GEBCO TSCOM/SCDB
>>>>>>>>>> Geophysicist, Laboratory for Satellite Altimetry
>>>>>>>>>> NOAA NESDIS code E/RA-31
>>>>>>>>>> 1335 East West Hwy, room 5408
>>>>>>>>>> Silver Spring MD 20910-3226
>>>>>>>>>> tel 301-713-7212 (NEW 26-01-2012)
>>>>>>>>>> fax 301-713-3136
>>>>>>>>>> Walter.HF.Smith <at> noaa.gov
>>>>>>>>>> http://www.star.nesdis.noaa.gov/star/Smith_WHF.php
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On May 10, 2012, at 11:36 AM, Keith Pickering wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm not sure this is logically possible. If the number of data sources is odd, you will always
have a single source from which the resulting gridblock is taken; but if the number of data sources is even,
then the gridblock result should be the mean of the two central data sources. Therefore it is always
possible to have more than one data source for any resulting grid block.
>>>>>>>>>>>
>>>>>>>>>>> Keith Pickering
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 10, 2012 at 9:05 AM, Jan Erik Arndt<Jan.Erik.Arndt <at> awi.de> wrote:
>>>>>>>>>>> Dear Paul, Walter and all others,
>>>>>>>>>>>
>>>>>>>>>>> I am currently using some GMT programs to generate a bathymetric grid using data from several
sources. One of these steps is 'blockmedian'. Before this step every point used in the calculation has the
values 'LON LAT DEP WEIGHT SOURCE_ID'. I intent to create a source id grid afterwards. Unfortunately
after 'blockmedian' the information about the SOURCE_ID gets lost and I have not found a way of retaining
that information within the standard 'blockmedian' program.
>>>>>>>>>>>
>>>>>>>>>>> So my question is:
>>>>>>>>>>> Is it possible to retain the attributes, like 'SOURCE_ID', corresponding to the blockmedian
value using the standard commands or would I have to work on the 'blockmedian' code itself to keep this information?
>>>>>>>>>>>
>>>>>>>>>>> Many thanks in advance,
>>>>>>>>>>> Jan
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>>>>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>>>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>>>
>>>
>>
>> To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu
>
To unsubscribe, send the message "signoff gmt-help" to listserv <at> lists.hawaii.edu