Vitaly Kruglikov | 7 Jul 2010 22:15
Picon

Huge problem with ares_library_init() and ares_library_cleanup()

Dear c-ares developers,

prior to upgrading to the newer version of c-ares, I consulted c-ares documentation and discovered two new
functions that pose a huge problem to users of the library: ares_library_init() and ares_library_cleanup().

Both functions are defined as 'not thread-safe' and have to be executed before starting any other threads
(ares_library_init) and after terminating all threads (ares_library_cleanup).

This makes it impossibly to use the c-ares library safely by plug-ins in a multi-threaded process.  Since a
multi-threaded process that employs plug-ins (e.g., dynamically-loaded shared libraries) doesn't
know which libraries will be used by dynamically-discovered/dynamically-loaded plug-ins, there is no
safe way to initialize c-ares.  

Please, please provide a thread-safe, re-entrant initialization/cleanup mechanism that allows this
incredibly useful library to be used under the above-described, fairly common circumstances (without
forcing the overhead of static linking).

>> ares_library_init:
This function is not thread safe. You have to call it once the program has started, but this call must be done
before the program starts any other thread. This is required to avoid potential race conditions in
library initialization, and also due to the fact that ares_library_init(3)  might call functions from
other libraries that are thread unsafe, and could conflict with any other thread that is already using
these other libraries. 
<<

>> ares_library_cleanup:
This function is not thread safe. You have to call it once the program is about to terminate, but this call
must be done once the program has terminated every single thread that it could have initiated. This is
required to avoid potential race conditions in library deinitialization, and also due to the fact that
ares_library_cleanup(3)  might call functions from other libraries that are thread unsafe, and could
(Continue reading)

Vitaly Kruglikov | 7 Jul 2010 22:48
Picon

How to cancel a single gethostbyname request without canceling any others?

My program starts multiple ares_gethostbyname() requests, but then needs to cancel a specific one
(because the corresponding transaction was canceled by user), while letting other requests continue to
be processed on the same channel.  Is this possible without resorting to multiple channels?  I like the
efficiency of being able to share the same channel.  I didn't find it in the online doc, but would like to know
if such a feature may be in the works?

Thank you,

Vitaly
Neeraj | 12 Jul 2010 15:11
Picon

How to Use CHAOS as query class

Hi,

Can anybody throw some light on using CHAOS as query class using c-ares lib. I have seen adig.c file included in the source but it seems to have not been completely ready for using CHAOS.

I would appreciate if anyone could share small piece of code so that I can get started with it.

--
Thanks,
Neeraj

Daniel Stenberg | 12 Jul 2010 18:08
Picon
Favicon
Gravatar

Re: Huge problem with ares_library_init() and ares_library_cleanup()

On Wed, 7 Jul 2010, Vitaly Kruglikov wrote:

> prior to upgrading to the newer version of c-ares, I consulted c-ares 
> documentation and discovered two new functions that pose a huge problem to 
> users of the library: ares_library_init() and ares_library_cleanup().
>
> Both functions are defined as 'not thread-safe' and have to be executed 
> before starting any other threads (ares_library_init) and after terminating 
> all threads (ares_library_cleanup).
>
> This makes it impossibly to use the c-ares library safely by plug-ins in a 
> multi-threaded process.  Since a multi-threaded process that employs 
> plug-ins (e.g., dynamically-loaded shared libraries) doesn't know which 
> libraries will be used by dynamically-discovered/dynamically-loaded 
> plug-ins, there is no safe way to initialize c-ares.
>
> Please, please provide a thread-safe, re-entrant initialization/cleanup 
> mechanism that allows this incredibly useful library to be used under the 
> above-described, fairly common circumstances (without forcing the overhead 
> of static linking).

These functions are declared non-treadsafe as this is very common and typical 
by initialization/cleanup mechanisms. We can see this in lots of other libs.

We're of course always interested in improving the lib so feel free to bring 
on your code/patches and we'll discuss them!

--

-- 

  / daniel.haxx.se

Daniel Stenberg | 12 Jul 2010 18:10
Picon
Favicon
Gravatar

Re: How to cancel a single gethostbyname request without canceling any others?

On Wed, 7 Jul 2010, Vitaly Kruglikov wrote:

> My program starts multiple ares_gethostbyname() requests, but then needs to 
> cancel a specific one (because the corresponding transaction was canceled by 
> user), while letting other requests continue to be processed on the same 
> channel.  Is this possible without resorting to multiple channels?  I like 
> the efficiency of being able to share the same channel.  I didn't find it in 
> the online doc, but would like to know if such a feature may be in the 
> works?

No, I can't see any way to do this. I consider it a shortcoming, but it's not 
easily fixed without changing the API for ares_gethostbyname() and adding a 
new function similar to ares_cancel() but for a specific query.

--

-- 

  / daniel.haxx.se

Vitaly Kruglikov | 13 Jul 2010 21:18
Picon

Re: Huge problem with ares_library_init() and ares_library_cleanup()


>>>>>>
Message: 2
Date: Mon, 12 Jul 2010 18:08:18 +0200 (CEST)
From: Daniel Stenberg <daniel@...>
To: c-ares hacking <c-ares@...>
Subject: Re: Huge problem with ares_library_init() and
        ares_library_cleanup()
Message-ID: <alpine.DEB.2.00.1007121806340.6011@...>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Wed, 7 Jul 2010, Vitaly Kruglikov wrote:

> prior to upgrading to the newer version of c-ares, I consulted c-ares
> documentation and discovered two new functions that pose a huge problem to
> users of the library: ares_library_init() and ares_library_cleanup().
>
> Both functions are defined as 'not thread-safe' and have to be executed
> before starting any other threads (ares_library_init) and after terminating
> all threads (ares_library_cleanup).
>
> This makes it impossibly to use the c-ares library safely by plug-ins in a
> multi-threaded process.  Since a multi-threaded process that employs
> plug-ins (e.g., dynamically-loaded shared libraries) doesn't know which
> libraries will be used by dynamically-discovered/dynamically-loaded
> plug-ins, there is no safe way to initialize c-ares.
>
> Please, please provide a thread-safe, re-entrant initialization/cleanup
> mechanism that allows this incredibly useful library to be used under the
> above-described, fairly common circumstances (without forcing the overhead
> of static linking).

These functions are declared non-treadsafe as this is very common and typical
by initialization/cleanup mechanisms. We can see this in lots of other libs.

We're of course always interested in improving the lib so feel free to bring
on your code/patches and we'll discuss them!

--

  / daniel.haxx.se
<<<<

With respect to "These functions are declared non-treadsafe as this is very common and typical by
initialization/cleanup mechanisms. We can see this in lots of other libs.": 

It's true that several mainstream libs do this, including openssl and libcurl, but this unfortunately
makes those, otherwise very useful libs, either unusable or not safely-usable in multi-threaded apps
with plug-ins, or forces implementations to break down their abstraction mechanisms (thus forcing the
problem to be propagated by users).  This is one of those problems that once done by a shared module, it ends
up forcing a propagation of that problem by all outer layers. I am told that there is already a good
discussion going on about this type of problem on the openssl forum, as engineers have realized the
limitations of this practice.

Before investing effort in a patch, I'd like to explore our options a little bit.  A mechanism that readily
comes to mind is to make use of the atomic library constructor/destructor to initialize the library's
dependencies.  However, the c-ares doc states that doing so on Windows may cause a deadlock.  Is this a
common problem for Windows and Linux/Unix or just Windows?  What is the specific logic that would cause a
deadlock on Windows if executed from a dll's "constructor"?

Thank you,

Vitaly
Vitaly Kruglikov | 13 Jul 2010 21:26
Picon

Re: How to cancel a single gethostbyname request without canceling any others?

>>>>
Message: 3
Date: Mon, 12 Jul 2010 18:10:56 +0200 (CEST)
From: Daniel Stenberg <daniel@...>
To: c-ares hacking <c-ares@...>
Subject: Re: How to cancel a single gethostbyname request without
        canceling       any others?
Message-ID: <alpine.DEB.2.00.1007121809160.6011@...>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Wed, 7 Jul 2010, Vitaly Kruglikov wrote:

> My program starts multiple ares_gethostbyname() requests, but then needs to
> cancel a specific one (because the corresponding transaction was canceled by
> user), while letting other requests continue to be processed on the same
> channel.  Is this possible without resorting to multiple channels?  I like
> the efficiency of being able to share the same channel.  I didn't find it in
> the online doc, but would like to know if such a feature may be in the
> works?

No, I can't see any way to do this. I consider it a shortcoming, but it's not
easily fixed without changing the API for ares_gethostbyname() and adding a
new function similar to ares_cancel() but for a specific query.

--

  / daniel.haxx.se
<<<<

Is this something that you plan to tackle, and if so, when?

Thank you,

Vitaly
Daniel Stenberg | 13 Jul 2010 22:52
Picon
Favicon
Gravatar

Re: How to cancel a single gethostbyname request without canceling any others?

On Tue, 13 Jul 2010, Vitaly Kruglikov wrote:

> No, I can't see any way to do this. I consider it a shortcoming, but it's not
> easily fixed without changing the API for ares_gethostbyname() and adding a
> new function similar to ares_cancel() but for a specific query.
>
> Is this something that you plan to tackle, and if so, when?

The idea isn't really new:

 	http://c-ares.haxx.se/mail/c-ares-archive-2006-02/0007.shtml

My situation in this project has been "half-baked" since quite some time:

 	http://c-ares.haxx.se/mail/c-ares-archive-2006-02/0006.shtml

... so no, I would love to see this problem addressed but I'm most probably 
not going to work on it myself.

I'm just the maintainer and some kind of leader here, I certainly do not write 
c-ares myself and I never actually did most of the implementation work. Feel 
free to step up and bring on your ambition and energy. We need all the help we 
can get!

--

-- 

  / daniel.haxx.se

Daniel Stenberg | 13 Jul 2010 22:58
Picon
Favicon
Gravatar

Re: Huge problem with ares_library_init() and ares_library_cleanup()

On Tue, 13 Jul 2010, Vitaly Kruglikov wrote:

> With respect to "These functions are declared non-treadsafe as this is very 
> common and typical by initialization/cleanup mechanisms. We can see this in 
> lots of other libs.":
>
> It's true that several mainstream libs do this, including openssl and 
> libcurl, but this unfortunately makes those, otherwise very useful libs, 
> either unusable or not safely-usable in multi-threaded apps with plug-ins, 
> or forces implementations to break down their abstraction mechanisms (thus 
> forcing the problem to be propagated by users).  This is one of those 
> problems that once done by a shared module, it ends up forcing a propagation 
> of that problem by all outer layers. I am told that there is already a good 
> discussion going on about this type of problem on the openssl forum, as 
> engineers have realized the limitations of this practice.

Yes. libcurl for example does it only because its underlying libs do it, like 
said OpenSSL...

> Before investing effort in a patch, I'd like to explore our options a little 
> bit.  A mechanism that readily comes to mind is to make use of the atomic 
> library constructor/destructor to initialize the library's dependencies. 
> However, the c-ares doc states that doing so on Windows may cause a 
> deadlock.

What "atomic library constructor/destructor" are you talking about here?

> Is this a common problem for Windows and Linux/Unix or just Windows?

Right now the only global init that is going on in ares_library_init() is 
Window-specific...

> What is the specific logic that would cause a deadlock on Windows if 
> executed from a dll's "constructor"?

I know next to nothing about Windows, so I can't answer that!

--

-- 

  / daniel.haxx.se

William Ahern | 13 Jul 2010 23:53

Re: How to cancel a single gethostbyname request without canceling any others?

On Tue, Jul 13, 2010 at 10:52:38PM +0200, Daniel Stenberg wrote:
<snip>
> I'm just the maintainer and some kind of leader here, I certainly do not 
> write c-ares myself and I never actually did most of the implementation 
> work. Feel free to step up and bring on your ambition and energy. We need 
> all the help we can get!

IMHO the whole channel thing causes more headaches than its worth. It
obscures how each request is actually processed, and adding layer on top of
layer of interfaces to return some control back to the caller just makes for
more obsfuscation. It was all premature optimization, IMO. ADNS made the
same mistake. I'm not even sure it made sense back-in-the-day when c-ares
and ADNS were originally written, though socket creation has gotten orders
of magnitude more peformant in the intervening 10 or 15 years.

That's why in my DNS library there's a single socket object per resolver
object, and a single outstanding query per resolver. The caller owns that
resolver exclusively. The caller can of course cache and reuse the resolver,
but that's entirely up to the application. Usually it's far easier to
create, destroy, forget. (Or if caching the resolver, resetting its state.)
Configuration objects have their own lifetime and are immutable after
creation, so they can be cached and shared between resolvers. If you need
to do a million queries a second, all the underlying routines for socket I/O
and packet parsing are maintained as part of the API.

Also, the trend for DNS best-practice has been to randomize the source port
on a per-request basis. In that context, you lose nothing and gain a
significant amount of simplicity by keeping queries one-to-one with resolver
objects.

As for cancelling a c-ares query, the answer is simple: use one channel per
request. A typical libc getaddrinfo() implementation will call res_init()
every query anyhow, which will stat(2) /etc/resolv.conf. An unmodified
ares_init() might do superfluous work, but it's still a win over
getaddrinfo() and friends because it's async.


Gmane