Gkclient problem ( regression ) in h233plus 1.25
Francisco Olarte <folarte <at> peoplecall.com>
2013-11-12 12:00:54 GMT
I think h323plus gkclient.cxx does not correctly process
reregistration after a gatekeeper reboot in 1.25.
My scenario is the following. We have several h323 endpoint programs (
yate instances and homebrew ones ) registered to a gatekeeper. As part
of a scheduled maintenance ew rebooted the gatekeeper. Every endpoint
compiled against h323plus 1.24 reregisterd correctly and kept working,
but a single one compiled with h323plus 1.25 ( and part of the
maintenance is upgrading to 1.25 ) did not.
The problem was the endpoint sent a keepalive RRQ with the old
gatekeeper data. The new gk instance replied with an RRJ with a full
registration required cause. The endpoint sent a full registration,
the gatekeeper replied with a RCF with a NEW ENDPOINT IDENTIFIER.
All the 1.24 programs then proceeded to send the new Ep Id, but the
1.25 one kept sending the old ones. I suspect it also sends it in
ARQs, so it was unable to call ( will verify it later ).
I've been debugging the source code and didn't notice any difference
in the clients, so I looked at the 1.25-1.24 diffs, in gkclient.cxx
I noticed a difference in endpoint identifier handling, which I
narrowed to this commit:
which does not store a new endpoint indentifer from an rrq if one is
set, but does not do further tests, and seems to be what is causing my
problem. I assume there is a reason for not storing the new endpoint
identifier, but I think it should at least be checked.
Also I think when receiving a fullregistrationrequired in a
registration the stored fields should be reset. Actually I've seen
it's processed like this:
case H225_RegistrationRejectReason::e_fullRegistrationRequired :
registrationFailReason = GatekeeperLostRegistration;
// Set timer to retry registration
reregisterNow = TRUE;
but I've been unable to find more actions than signalling an immediate
retry, I think in this case, if not in the general registration reject
case, the endpoint identifier should be cleared, as it shouldn't be
valid after a failed registration.
I've also studied the differences between tag v1.25 and HEAD
and haven't found any differences in the relevant areas.
Summarising, I think endpoint identifier, and maybe some mor
registration data should be cleared on registration rejects. And also
stored endpoint identifier should be tested again a potential
inconsistency against registration confirm data, althoguh this can be
problematic ( what to do if they do not match ? ) and can be left off
as it's not going to interfre with normal operations. I can try to
make a patch, in my case probabley clearing endpointIdentifier on
registration reject will work, but I fear it may broke other parts.
P.S. I cannot provide detailed packet traces at the moment, as they
need some rather involved setup, but I can try to do it if someone
think it's neccessary. F.O.