Steven McCoy | 1 Oct 03:43 2011
Picon

Re: [BUG] zmq_assert causes BOOM if you breath on OSX Lion kqueue wrong.

On 30 September 2011 18:22, Zed Shaw <zed.shaw <at> gmail.com> wrote:

<rant>
And, having to troll through C++ code to debug why I'm
getting the error is annoying.  At a minimum, add a 3rd parameter that
gives an error message that's other than something like "No such file
or directory".  WTF does that even mean for kqueue?  I sure as hell
didn't do anything to cause that. How could I possibly fix that?
</rant>


How is the OpenPGM method of error handling?  I followed the GLib route because a single error code is just annoying and tedious, but you don't want to add too much overhead and unnecessary confusion as it does add to the learning curve.
 
typedef struct { int domain; int code; char* message; } pgm_error_t; pgm_error_t* err = NULL;
 
if (!pgm_getaddrinfo (network, NULL, &res, &err)) {
   fprintf
(stderr, "Parsing network parameter: %s\n", (err && err->message) ? err->message : "(null)");
   pgm_error_free
(err); ... }

-- 
Steve-o
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Mikko Koppanen | 1 Oct 03:47 2011
Picon

Re: [BUG] zmq_assert causes BOOM if you breath on OSX Lion kqueue wrong.

On Fri, Sep 30, 2011 at 11:22 PM, Zed Shaw <zed.shaw <at> gmail.com> wrote:
> I was asked to report all asserts I encounter.  I went to the JIRA to
> submit this as a bug, but it looks like I have to create an account,
> or something, can't really figure it out even though I've used JIRA
> before.  I'm guessing this is the next place to report a bug, so here
> you go.
>
> If I run ZeroMQ 2.1.9 for even a reasonably complex load on OSX Lion I get this:
>
> No such file or directory
> rc != -1 (kqueue.cpp:76)
> Abort trap: 6
>
> I'll just report it here and then maybe someone can fix this.

Hi Zed,

I think I ran into the same issue:
https://zeromq.jira.com/browse/LIBZMQ-261. This usually happened to me
if the other peer disconnected. I haven't seen this issue after
applying the patch mentioned in the issue.

--

-- 
Mikko Koppanen
Elliot Saba | 1 Oct 10:14 2011
Picon

Re: [BUG] zmq_assert causes BOOM if you breath on OSX Lion kqueue wrong.

I have created a small test case that produces this error, please see the JIRA issue.


It occurs when I create a ROUTER socket, send a message, then disconnect, many many times over again.
-E

On Fri, Sep 30, 2011 at 6:47 PM, Mikko Koppanen <mikko.koppanen <at> gmail.com> wrote:
On Fri, Sep 30, 2011 at 11:22 PM, Zed Shaw <zed.shaw <at> gmail.com> wrote:
> I was asked to report all asserts I encounter.  I went to the JIRA to
> submit this as a bug, but it looks like I have to create an account,
> or something, can't really figure it out even though I've used JIRA
> before.  I'm guessing this is the next place to report a bug, so here
> you go.
>
> If I run ZeroMQ 2.1.9 for even a reasonably complex load on OSX Lion I get this:
>
> No such file or directory
> rc != -1 (kqueue.cpp:76)
> Abort trap: 6
>
> I'll just report it here and then maybe someone can fix this.

Hi Zed,

I think I ran into the same issue:
https://zeromq.jira.com/browse/LIBZMQ-261. This usually happened to me
if the other peer disconnected. I haven't seen this issue after
applying the patch mentioned in the issue.


--
Mikko Koppanen
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Martin Sustrik | 1 Oct 10:58 2011

Re: [BUG] zmq_assert causes BOOM if you breath on OSX Lion kqueue wrong.

Hi Zed,

> I was asked to report all asserts I encounter.  I went to the JIRA to
> submit this as a bug, but it looks like I have to create an account,
> or something, can't really figure it out even though I've used JIRA
> before.  I'm guessing this is the next place to report a bug, so here
> you go.

As Mikko says the bug is already reported.

> On another note:  Causing a full assert abort in *my* program from
> *your* library because of a little hicup in an external resource is
> stupid.

This is not a hiccup in resource. It looks more like a synchronisation 
issue.

>  I've been saying for close to a year now that *all* of the
> zmq_asserts need to go away.

Asserts check for bugs. To get rid of them we have to fix bugs. The 
other option is to ignore the bugs and allow 0MQ to continue operating 
in a broken state. That's OK as far as you are happy with undefined 
behaviour.

If you really want that I can add a compile time option to ignore all 
the asserts. It's just few lines of code, so let me know.

> libzmq needs to return valid error codes
> and stop aborting *my* servers.  Until they're gone completely I can't
> trust that some random socket error I have no control over won't abort
> my whole world. And, having to troll through C++ code to debug why I'm
> getting the error is annoying.

The errors can happen asynchronously.

What can be done is setting a global handler function that will be 
called if a bug is hit. It's not clear what the application should do 
then though. Maybe it can save its state and restart itself?

> At a minimum, add a 3rd parameter that
> gives an error message that's other than something like "No such file
> or directory".

The asserts can be enhanced by longer messages, like, in this case, 
"kqueue have returned an unexpected error: no such file or directory". I 
am not sure how helpful will that be though.

> WTF does that even mean for kqueue?  I sure as hell
> didn't do anything to cause that. How could I possibly fix that?

[ENOENT] The event could not be found to be modified or deleted.

What's happening, I guess, is that an event is referenced that was 
already removed from the kqueue.

In any case, I have no OSX system to reproduce the problem. If anyone 
bother to give me remote access, I can try to fix it.

Martin
Daniel Hyams | 1 Oct 14:13 2011
Picon

Re: Fundamentals of a REQ/REP

Ah, I see, that indeed works!


Can I suggest, though, an API that isn't quite so clumsy (zmq_reset(), say?).  It's certainly not the end of the world to have to close and reconnect, but that does mean that you have to carry along all of the socket setup information and redo it.  It seems that  just having a zmq_reset call to reset the communication pattern to its initial state would be appropriate for this rather common use case.  I assume that it's just a flag somewhere....

On Fri, Sep 30, 2011 at 5:03 PM, Chuck Remes <cremes.devlist <at> mac.com> wrote:

On Sep 30, 2011, at 3:53 PM, Daniel Hyams wrote:

> What I'm about to ask is so basic (very green to sockets and ZMQ), I have to be misunderstanding something fundamental :(  But I have been through the guide and searched the archives, and have not come up with a good answer for this.
>
> All I'm trying to do is have a server and client talk back and forth with a REQ/REP pattern.  (Client REQ, Server REP).  As long as the server side is up, everything is fine.
>
> But, what happens on the client end is unsavory, if the server is down.  What I would like to happen is for the client to be able to tell the user "hey, the server is down, and your request isn't going to happen" if the server does not respond to a REQ.  I cannot seem to make this happen....
>
> If the client does (in Python):
>   sock.send("the reqeust")
>   reply = sock.recv()
> then this deadlocks when the servers is down, obviously.
>
> If the client does:
>   sock.send("the request")
>   poller = zmq.Poller();
>   poller.register(sock.zmq.POLLIN)
>   socks = dict(poller.poll(1000))
>   if sock in socks:
>       reply = sock.recv()
>   else:
>       print "The server is not alive"
>       # oops, now the lockstep send/recv required for a REQ/REP setup is broken!
>       # socket is in the wrong state for a send.
>
> Then I get into trouble also, because the socket gets in the wrong state...I want to send next, not receive.  But I don't see any way of forcing the socket back into a "sending" state.
>
> I also tried a PAIR instead of REQ/REP, and did much better with that one, but since ZMQ queues messages on the sending end, my server gets all of the (undesired) queued up messages upon startup, and the client is not ready to receive the resulting replies.
>
> So I'm mystified.  Is the answer a ROUTER/DEALER setup?

The answer is to close the socket in the client and reopen it. This will "reset" its state. Use that in conjunction with the polling method you wrote about and it will be fine.

cr


_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev



--
Daniel Hyams
dhyams <at> gmail.com
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Chuck Remes | 1 Oct 19:40 2011
Picon

Re: Fundamentals of a REQ/REP

On Oct 1, 2011, at 7:13 AM, Daniel Hyams wrote:

> Ah, I see, that indeed works!
> 
> Can I suggest, though, an API that isn't quite so clumsy (zmq_reset(), say?).  It's certainly not the end of
the world to have to close and reconnect, but that does mean that you have to carry along all of the socket
setup information and redo it.  It seems that  just having a zmq_reset call to reset the communication
pattern to its initial state would be appropriate for this rather common use case.  I assume that it's just a
flag somewhere....

Daniel,

this is indeed clumsy. There has been discussion on this list about adding a timeout for
zmq_send()/zmq_recv(). Obviously that would be a better way of detecting that a response has not arrived
"in time."

However, it isn't as simple as just resetting the internal FSM. What if the timeout occurs, resets itself to
*only* allow a send, and then the response arrives late? Right now that would probably cause a EFSM error.
Alternately, perhaps it could just queue the message so the next call to zmq_recv() will return it and any
"duplicate" response would just get dropped.

Anyway, just wanted to point out that there are still complexities to be addressed even if there were such a
thing as a timeout (or a zmq_reset() function).

cr
Matthias Wächter | 1 Oct 20:08 2011
Picon

Re: Fundamentals of a REQ/REP

On 01.10.2011 19:40, Chuck Remes wrote:
> On Oct 1, 2011, at 7:13 AM, Daniel Hyams wrote:
>
>> Ah, I see, that indeed works!
>>
>> Can I suggest, though, an API that isn't quite so clumsy (zmq_reset(), say?).  It's certainly not the end
of the world to have to close and reconnect, but that does mean that you have to carry along all of the socket
setup information and redo it.  It seems that  just having a zmq_reset call to reset the communication
pattern to its initial state would be appropriate for this rather common use case.  I assume that it's just a
flag somewhere....
>
> Daniel,
>
> this is indeed clumsy. There has been discussion on this list about adding a timeout for
zmq_send()/zmq_recv(). Obviously that would be a better way of detecting that a response has not arrived
"in time."
>
> However, it isn't as simple as just resetting the internal FSM. What if the timeout occurs, resets itself
to *only* allow a send, and then the response arrives late? Right now that would probably cause a EFSM
error. Alternately, perhaps it could just queue the message so the next call to zmq_recv() will return it
and any "duplicate" response would just get dropped.
>
> Anyway, just wanted to point out that there are still complexities to be addressed even if there were such a
thing as a timeout (or a zmq_reset() function).

I think, what Daniel wants to say is that requiring the user code to keep track of all the setup 
details while this is all known to the library, is clumsy. A simple zmq_reset() could use all known 
socket connection setup and simply re-establish the connection based on this information.

– Matthias
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Pieter Hintjens | 1 Oct 21:34 2011

Re: ZMQ occupies random TCP ports on Windows

I've created an issue for this:
https://zeromq.jira.com/browse/LIBZMQ-265 and attached the patch to
it.

On Thu, Sep 29, 2011 at 2:49 PM, Martin Sustrik <sustrik <at> 250bpm.com> wrote:
> Hi guys,
>
> Here's a patch that could in theory solve the problem. It uses only a
> single port to create all internal connections (port 5905).
>
> I haven't tested it though.
>
> Please let me know whether it solves the problem.
>
> Martin
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev <at> lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
Sean Ochoa | 2 Oct 01:22 2011
Picon

multiprocessing + testing with pyzmq

Hey all.  I'm thinking of testing a design for our concurrent application using zeromq, but I'm wondering what type(s) of sockets I should use.  I need to communicate using IPC from multiple processes back into one process so that I can track the number of processes doing stuff in stages as work gets done.  Any ideas? 


--
Sean
_______________________________________________
zeromq-dev mailing list
zeromq-dev <at> lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Martin Sustrik | 2 Oct 08:56 2011

Re: multiprocessing + testing with pyzmq

On 10/02/2011 01:22 AM, Sean Ochoa wrote:
> Hey all.  I'm thinking of testing a design for our concurrent
> application using zeromq, but I'm wondering what type(s) of sockets I
> should use.  I need to communicate using IPC from multiple processes
> back into one process so that I can track the number of processes doing
> stuff in stages as work gets done.  Any ideas?

It's status monitoring. I would opt for PUB/SUB.

Martin

Gmane