Leo L. Schwab | 1 Aug 2005 04:10
Picon
Favicon

Re: Toward runtime power management in Linux

On Sat, Jul 30, 2005 at 10:36:56PM -0400, Alan Stern wrote:
> An example will make this clearer.  A PCI bridge is a parent, with a
> PCI device as its child.  The set of device states for both the parent and 
> the child is {D0, D1, D2, D3}.  (Maybe some variants of D3 for special
> situations; let's not worry about the details.)  The link states will
> also be D0 - D3.  When the child want to go from D0 to D3, it first
>                                                            ^^^^^^^^
> changes the device's actual state and then notifies the parent about
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> the link change.
> ^^^^^^^^^^^^^^^

	Strong disagreement.  Power state changes must be allowed to fail
("Spin up the 15K RPM drive?  I'm sorry, there's only 3 Watts of power left
to spare").  So you must first ask the parent for a power state change
before you perform your own so it has the opportunity to deny the request.
Besides, in the case of USB, you may not have any power at all until you
notify the parent bus/hub manager to wake up.

> These notifications are one-way, child-to-parent only.  We don't need
> pre- and post-notifications; each message will inform the parent of a
> single link-state change, which the parent will then carry out.

	I don't see how this will work.  Bringing up power/resuming must
happen in parent-to-child order, otherwise endpoint devices may not have any
power at all when you try to bring them up.  Cutting off power/suspending
must happen in child-to-parent order, since parents can't know when it's
safe to cut off power until the child is completely quiesced.

> Idle-timeout RTPM: We certainly should have an API whereby userspace
(Continue reading)

Shaohua Li | 1 Aug 2005 06:48
Picon
Favicon

[PATCH]Trival patch for swsusp

A trival patch for swsusp.

Signed-off-by: Shaohua Li<shaohua.li <at> intel.com>
---

 linux-2.6.13-rc4-root/kernel/power/disk.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletion(-)

diff -puN kernel/power/disk.c~swsusp-trival kernel/power/disk.c
--- linux-2.6.13-rc4/kernel/power/disk.c~swsusp-trival	2005-08-01 11:02:47.706757448 +0800
+++ linux-2.6.13-rc4-root/kernel/power/disk.c	2005-08-01 11:11:13.363885880 +0800
 <at>  <at>  -233,9 +233,12  <at>  <at>  static int software_resume(void)
 {
 	int error;

+	down(&pm_sem);
 	if (!swsusp_resume_device) {
-		if (!strlen(resume_file))
+		if (!strlen(resume_file)) {
+			up(&pm_sem);
 			return -ENOENT;
+		}
 		swsusp_resume_device = name_to_dev_t(resume_file);
 		pr_debug("swsusp: Resume From Partition %s\n", resume_file);
 	} else {
 <at>  <at>  -248,6 +251,7  <at>  <at>  static int software_resume(void)
 		 * FIXME: If noresume is specified, we need to find the partition
 		 * and reset it back to normal swap space.
 		 */
+		up(&pm_sem);
(Continue reading)

Pavel Machek | 1 Aug 2005 09:07
Picon

Re: [PATCH]Trival patch for swsusp

Hi!

> A trival patch for swsusp.

Aha, it is trying to protect swsusp_resume_device from two users
banging  it from userspace at the same time. Ok, altrough changelog
might be better. ACK. Will you push it or do you want me to?

								Pavel

> Signed-off-by: Shaohua Li<shaohua.li <at> intel.com>
> ---
> 
>  linux-2.6.13-rc4-root/kernel/power/disk.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff -puN kernel/power/disk.c~swsusp-trival kernel/power/disk.c
> --- linux-2.6.13-rc4/kernel/power/disk.c~swsusp-trival	2005-08-01 11:02:47.706757448 +0800
> +++ linux-2.6.13-rc4-root/kernel/power/disk.c	2005-08-01 11:11:13.363885880 +0800
>  <at>  <at>  -233,9 +233,12  <at>  <at>  static int software_resume(void)
>  {
>  	int error;
>  
> +	down(&pm_sem);
>  	if (!swsusp_resume_device) {
> -		if (!strlen(resume_file))
> +		if (!strlen(resume_file)) {
> +			up(&pm_sem);
>  			return -ENOENT;
> +		}
(Continue reading)

Amit Kucheria | 1 Aug 2005 13:44
Picon

Re: Toward runtime power management in Linux

On Sun, 2005-07-31 at 19:10 -0700, ext Leo L. Schwab wrote:

<snip>

> 
> > Idle-timeout RTPM: We certainly should have an API whereby userspace
> > can inform the kernel of an idle-timeout value to use for
> > autosuspending.  (In principle there could be multiple timeout values,
> > for successively deeper levels of power saving.)  This cries out to be
> > managed in a centralized way rather than letting each driver have its
> > own API.  It's not so clear what the most efficient implementation
> > will be.  Should every device have its own idle-timeout kernel timer?
> > (That's a lot of kernel timers.)

Per-device idle-timeouts are IMHO important in embedded space where one
wants finer-grained control. e.g. Sometimes it might be better to just
let a device be active to avoid wake-up latencies from power-save mode.
Also, some application usage profiles might dictate changes in
idle-timeout values to allow optimisation for the common use-cases.

> 	Whether you do it in user space or kernel space, you're going to
> potentially schedule a lot of timers.
> > Or should the RTPM kernel thread
> > wake up every second to scan a list of devices that may have exceeded
> > their idle timeouts?
> >
> 	This could potentially make performance-conscious apps "hiccup"
> once every second as this thread goes walking the list looking for
> candidates to shut off.  Try to avoid this; if nothing is happening, nothing
> should be running.
(Continue reading)

Alan Stern | 1 Aug 2005 16:07
Picon
Favicon

Re: Toward runtime power management in Linux

Evidently between us there is a tremendous communication gap.  No doubt 
it's mostly my fault for not making the original document sufficiently 
detailed.  Let me try to clear things up...

On Sun, 31 Jul 2005, Leo L. Schwab wrote:

> On Sat, Jul 30, 2005 at 10:36:56PM -0400, Alan Stern wrote:
> > An example will make this clearer.  A PCI bridge is a parent, with a
> > PCI device as its child.  The set of device states for both the parent and 
> > the child is {D0, D1, D2, D3}.  (Maybe some variants of D3 for special
> > situations; let's not worry about the details.)  The link states will
> > also be D0 - D3.  When the child want to go from D0 to D3, it first
> >                                                            ^^^^^^^^
> > changes the device's actual state and then notifies the parent about
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > the link change.
> > ^^^^^^^^^^^^^^^
> 
> 	Strong disagreement.  Power state changes must be allowed to fail
> ("Spin up the 15K RPM drive?  I'm sorry, there's only 3 Watts of power left
> to spare").  So you must first ask the parent for a power state change
> before you perform your own so it has the opportunity to deny the request.

Yes, power state changes must be allowed to fail.  I omitted discussing
error handling, but clearly the parent notification must be able to return
an error code, which might force the child to abort a state change.

However you have badly misunderstood this example.  In the example the
device goes from D0 to D3, thereby _reducing_ its power consumption.  
Hence it doesn't want to ask the parent to reduce the available power
(Continue reading)

Alan Stern | 1 Aug 2005 16:16
Picon
Favicon

Re: Toward runtime power management in Linux

On Mon, 1 Aug 2005, Amit Kucheria wrote:

> Per-device idle-timeouts are IMHO important in embedded space where one
> wants finer-grained control. e.g. Sometimes it might be better to just
> let a device be active to avoid wake-up latencies from power-save mode.
> Also, some application usage profiles might dictate changes in
> idle-timeout values to allow optimisation for the common use-cases.

There's no question about having per-device idle timeouts, or about
letting userspace set the timeout values (or make the value be infinite so
there is no idle timeout ever).  The question is how to implement them:
using kernel timers or using a kernel thread?  Note that having some sort
of thread available will be necessary in any case, since the actual
suspend/resume function calls require a process context.

> Whether the idle-timeout is implemented using kernel timers or a kernel
> thread checking for timeout values, is a more difficult question.
> Ideally, we would want to avoid unnecessary (processor) wakeups simply
> to check a list of timeout values. Dynamic tick patch allows us to
> wakeup the processor only when there is work to do - timers set to go
> off. So, that's one strike against the kernel thread approach. 
> 
> But it needs more thinking.

Yes, it's complex.  Constantly updating the timer expirations every time 
a device carries out an activity can also be expensive.

> If I tell a bus to power down, couldn't the PM framework take care of
> recursively sending 'power down' message to all children, wait for
> confirmation, and then power itself down?
(Continue reading)

Jordan Crouse | 1 Aug 2005 17:10
Picon
Favicon

Re: Toward runtime power management in Linux

> > 	This could potentially make performance-conscious apps "hiccup"
> > once every second as this thread goes walking the list looking for
> > candidates to shut off.  Try to avoid this; if nothing is happening, nothing
> > should be running.
> 
> I don't understand this comment at all.  Lots of things happen 
> periodically in the kernel: threads wake up, timers go off...  Are you 
> suggesting that, for example, the page-flush thread shouldn't wake up 
> from time to time either?

While I don't agree that it will be a horrible drain on performance, I do
see a large potential for abuse with a big kernel thread.  Things like the
page-flush thread are well known and (hopefully) optimized entities -
the RTPM thread will have to depend on hundreds of driver writers to be kind
to not suck time and resources from the system.  About the time that somebody
puts a large udelay into their AC97 driver to turn off the DAC, then I'm sure
we will question our motives in this regard.

That said, I think I tend to favor the big kernel thread, or at least timeout
threads on a bus level.  The single entity handling the idle math timeout
would facilitate future issues such as priority in handling idle timeouts 
(do we address certain buses/devices before others, for example), plus it
would help centralize the functionality, and make it easier to control with
any future power management policy concepts.

Jordan

--

-- 
Jordan Crouse
Senior Linux Engineer
(Continue reading)

Alan Stern | 1 Aug 2005 17:23
Picon
Favicon

Re: Re: Toward runtime power management in Linux

On Mon, 1 Aug 2005, Jordan Crouse wrote:

> While I don't agree that it will be a horrible drain on performance, I do
> see a large potential for abuse with a big kernel thread.  Things like the
> page-flush thread are well known and (hopefully) optimized entities -
> the RTPM thread will have to depend on hundreds of driver writers to be kind
> to not suck time and resources from the system.  About the time that somebody
> puts a large udelay into their AC97 driver to turn off the DAC, then I'm sure
> we will question our motives in this regard.

Hmmm...  A large delay in a suspend pathway will cause problems no matter 
how it gets invoked, right?  If we have a separate kernel thread for RTPM, 
then at least the only things affected will be other RTPM activities.  
Whereas if we use keventd instead to provide a process context, lots of 
other things would be affected as well.

> That said, I think I tend to favor the big kernel thread, or at least timeout
> threads on a bus level.  The single entity handling the idle math timeout
> would facilitate future issues such as priority in handling idle timeouts 
> (do we address certain buses/devices before others, for example), plus it
> would help centralize the functionality, and make it easier to control with
> any future power management policy concepts.

Those are good ideas.

Alan Stern

On Mon, 1 Aug 2005, Jordan Crouse wrote:
(Continue reading)

Alan Stern | 2 Aug 2005 16:35
Picon
Favicon

Re: Toward runtime power management in Linux

On Mon, 1 Aug 2005, Patrick Mochel wrote:

> > I agree that a parent may have to cope with situations where two children
> > are trying to change state at the same time.  struct device->semaphore
> > should help there.  This doesn't affect what I wrote, however.  Link-state
> > changes don't involve races, because a link state describes the
> > connection between one specific parent and one specific child.  If two
> > different links change state at the same time, that's not a race.
> 
> For two children changing state at the same time, it is not a race. But,
> there could be potentially racy conditions when notifying the parent.
> Maybe. As I think about it more, I'm not sure it's possible to get mixed
> up, since there should always be _some_ delay between a parent receiving a
> notification that a child has suspended and the parent actually suspending
> itself.

This is why we want the driver to lock the parent before notifying it.  
And since the driver will already be holding the child's lock, this is a 
case where we need to acquire locks in the wrong order (going up the 
tree).

> > Me neither.  In fact, I would go so far as to say that this is the main
> > impediment to RTPM implementations at the moment.  If I knew the answer to
> > the locking-order problem, I could fix up the USB RTPM code right now.
> 
> What exactly is the impediment? The locking constraints?

Yes.  In a separate message you worried about what happens when there are 
multiple parents.  I don't see that being a problem here, because for 
power-state notifications we will only need to lock one parent at a time:
(Continue reading)

Jesse Brandeburg | 2 Aug 2005 18:41
Picon
Gravatar

Re: [patch 2.6.13-rc3] pci: restore BAR values after D3hot->D0 for devices that need it

On 7/27/05, John W. Linville <linville <at> tuxdriver.com> wrote:
> Some PCI devices (e.g. 3c905B, 3c556B) lose all configuration
> (including BARs) when transitioning from D3hot->D0.  This leaves such
> a device in an inaccessible state.  The patch below causes the BARs
> to be restored when enabling such a device, so that its driver will
> be able to access it.
> 

Is it just me or will this stuff help the kexec guys as well?  They seem 
to have lots of problems because drivers put the device into D3 before the 
reload of the new kernel.  I think this might help.

Jesse


Gmane