Aneesh Kumar K.V | 19 Jun 2013 08:44
Picon
Gravatar

[PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries

From: "Aneesh Kumar K.V" <aneesh.kumar <at> linux.vnet.ibm.com>

When we collapse normal pages to hugepage, we first clear the pmd, then invalidate all
the PTE entries. The assumption here is that any low level page fault will see pmd as
none and take the slow path that will wait on mmap_sem. But we could very well be in
a hash_page with local ptep pointer value. Such a hash page can result in adding new
HPTE entries for normal subpages/small page. That means we could be modifying the
page content as we copy them to a huge page. Fix this by waiting on hash_page to finish
after marking the pmd none and bfore invalidating HPTE entries. We use the heavy
kick_all_cpus_sync(). This should be ok as we do this in the background khugepaged
thread and not in application context. But we block page fault handling for this time.
Also if we find collapse slow we can ideally increase the scan rate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar <at> linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable_64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index bbecac4..4bb44c3 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
 <at>  <at>  -543,6 +543,14  <at>  <at>  pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address,
 		pmd = *pmdp;
 		pmd_clear(pmdp);
 		/*
+		 * Wait for all pending hash_page to finish
+		 * We can do this by waiting for a context switch to happen on
+		 * the cpus. Any new hash_page after this will see pmd none
+		 * and fallback to code that takes mmap_sem and hence will block
(Continue reading)

Aneesh Kumar K.V | 19 Jun 2013 08:44
Picon
Gravatar

[PATCH] powerpc/kvm: Handle transparent hugepage in KVM

From: "Aneesh Kumar K.V" <aneesh.kumar <at> linux.vnet.ibm.com>

We can find pte that are splitting while walking page tables. Return
None pte in that case.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar <at> linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 51 ++++++++++++++++++--------------
 arch/powerpc/kvm/book3s_64_mmu_hv.c      |  7 +++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      |  4 +--
 3 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9c1ff33..ce20f7e 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 <at>  <at>  -162,33 +162,40  <at>  <at>  static inline int hpte_cache_flags_ok(unsigned long ptel, unsigned long io_type)
  * Lock and read a linux PTE.  If it's present and writable, atomically
  * set dirty and referenced bits and return the PTE, otherwise return 0.
  */
-static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *ptep, int writing,
+						 unsigned int hugepage)
 {
-	pte_t pte, tmp;
-
-	/* wait until _PAGE_BUSY is clear then set it atomically */
-	__asm__ __volatile__ (
-		"1:	ldarx	%0,0,%3\n"
-		"	andi.	%1,%0,%4\n"
(Continue reading)

Aneesh Kumar K.V | 19 Jun 2013 08:34
Picon
Gravatar

[PATCH -V2] powerpc: Fix bad pmd error with book3E config

From: "Aneesh Kumar K.V" <aneesh.kumar <at> linux.vnet.ibm.com>

Book3E uses the hugepd at PMD level and don't encode pte directly
at the pmd level. So it will find the lower bits of pmd set
and the pmd_bad check throws error. Infact the current code
will never take the free_hugepd_range call at all because it will
clear the pmd if it find a hugepd pointer. Fix this by clearing
bad pmd only if it is not a hugepd pointer.

This is regression introduced by e2b3d202d1dba8f3546ed28224ce485bc50010be
"powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format"

Reported-by: Scott Wood <scottwood <at> freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar <at> linux.vnet.ibm.com>
---
 arch/powerpc/mm/hugetlbpage.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index f2f01fd..5555778 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
 <at>  <at>  -536,8 +536,14  <at>  <at>  static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
 	do {
 		pmd = pmd_offset(pud, addr);
 		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
+		if (!is_hugepd(pmd)) {
+			/*
+			 * if it is not hugepd pointer, we should already find
(Continue reading)

Gavin Shan | 19 Jun 2013 08:18
Picon

Re: [PATCH 01/31] powerpc/eeh: Move common part to kernel directory

On Wed, Jun 19, 2013 at 02:11:53PM +0800, Gavin Shan wrote:
>On Wed, Jun 19, 2013 at 01:58:06PM +1000, Michael Neuling wrote:
>>Bunch of whitespace issues here:
>>
>>% git am ~/Mail/linuxppc/31202
>>Applying: powerpc/eeh: Move common part to kernel directory
>>/home/mikey/src/powerpc-test/.git/rebase-apply/patch:437: trailing whitespace.
>>
>>/home/mikey/src/powerpc-test/.git/rebase-apply/patch:594: space before tab in indent.
>>  	 */
>>/home/mikey/src/powerpc-test/.git/rebase-apply/patch:607: trailing whitespace.
>>	
>>/home/mikey/src/powerpc-test/.git/rebase-apply/patch:608: trailing whitespace.
>>	/* We might get hit with another EEH freeze as soon as the 
>>/home/mikey/src/powerpc-test/.git/rebase-apply/patch:673: trailing whitespace.
>>	
>>error: patch failed: arch/powerpc/platforms/pseries/eeh_pe.c:1
>>error: arch/powerpc/platforms/pseries/eeh_pe.c: patch does not apply
>>Patch failed at 0001 powerpc/eeh: Move common part to kernel directory
>>When you have resolved this problem run "git am --resolved".
>>If you would prefer to skip this patch, instead run "git am --skip".
>>To restore the original branch and stop patching run "git am --abort".
>>
>
>Sorry for the inconvenience, Mikey. Please apply the updated [01/31] and [02/31] in
>the attachment. The left patches except [17/31] are not changed. The updated [17/31]
>will be contained in the attachment of the corresponding thread (original message
>about [17/31]). Something like:
>

(Continue reading)

Scott Wood | 18 Jun 2013 22:14
Favicon

Pull request: scottwood/linux.git for-3.10

This fixes a regression that causes 83xx to oops on boot if a
non-express PCI bus is present.

The following changes since commit 17858ca65eef148d335ffd4cfc09228a1c1cbfb5:

  Merge tag 'please-pull-fixia64' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
(2013-06-18 06:29:19 -1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git for-3.10

for you to fetch changes up to 2383ea94854bcf5a0df3c6803b980868cef95418:

  powerpc/pci: Fix setup of Freescale PCI / PCIe controllers (2013-06-18 14:44:57 -0500)

----------------------------------------------------------------
Rojhalat Ibrahim (1):
      powerpc/pci: Fix setup of Freescale PCI / PCIe controllers

 arch/powerpc/sysdev/fsl_pci.c |   24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)
Gavin Shan | 18 Jun 2013 10:33
Picon

[PATCH v5 00/31] EEH Support for PowerNV platform

Initially, the series of patches is built based on 3.10.RC1 and the patchset
doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
EEH support on PowerNV platform is something to do in future.

The series of patches intends to support EEH for PowerNV platform. The EEH
core already supports multiple probe methods: device tree nodes and PCI
devices. For EEH on PowerNV, we're using PCI devices to do EEH probe, which
is different from the probe type used on pSeries platform. Another point I
should mention is that the overall EEH would be split up to 3 layers: EEH
core, platform layer and I/O chip layer. It would make the EEH on PowerNV
platform can achieve more flexibility and support more I/O chips in future.
Besides, the EEH event can be produced by detecting 0xFF's from reading
PCI config or I/O registers, or from interrupts dedicated for EEH error
reporting. So we have to handle the EEH error interrupts. On the other hand,
the EEH events will be processed by EEH core like pSeries platform does.

We will have exported debugfs entries ("/sys/kernel/debug/powerpc/PCIxxxx/err_injct"),
which allows you to control the 0xD10 register in order to force errors like
frozen PE and fenced PHB for testing purpose. The following example is usualy
what I'm using to control that register. The patchset has been verified on
Firebird-L machine where I have 2 Emulex ethernet card on PHB#0. I keep pinging
to one of the ethernet cards (eth0) from external and then use following commands
to produce frozen PE or fenced PHB errors. Eventually, the errors can be recovered
and the ethernet card is reachable after temporary connection lost.

Trigger frozen PE:

	echo 0x0000000002000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct
	sleep 1
	echo 0x0 > /sys/kernel/debug/powerpc/PCI0000/err_injct
(Continue reading)

Paul Gortmaker | 17 Jun 2013 22:10
Favicon

[PATCH] powerpc: delete __cpuinit usage from all users

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the powerpc uses of the __cpuinit macros.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker <paul.gortmaker <at> windriver.com>
---

[This was generated against today's linux-next tree ; I'm assuming all
 pending powerpc changes are in there currently.]

 arch/powerpc/include/asm/rtas.h        |  4 ++--
 arch/powerpc/include/asm/vdso.h        |  2 +-
 arch/powerpc/kernel/cacheinfo.c        | 19 +++++++++++--------
 arch/powerpc/kernel/rtas.c             |  4 ++--
 arch/powerpc/kernel/smp.c              |  2 +-
 arch/powerpc/kernel/sysfs.c            |  6 +++---
 arch/powerpc/kernel/time.c             |  1 -
 arch/powerpc/kernel/vdso.c             |  2 +-
 arch/powerpc/mm/44x_mmu.c              |  6 +++---
(Continue reading)

Benjamin Herrenschmidt | 16 Jun 2013 10:37

Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH

On Sun, 2013-06-16 at 15:27 +0800, Gavin Shan wrote:

> Thanks for the review, Ben.
> 
> >Getting better.... but:
> >
> > - I still don't like having a kthread for that. Why not use schedule_work() ?
> >
> 
> Ok. Will update it with schedule_work() in next revision :-)
> 
> > - We already have an EEH thread, why not just use it ? IE send it a special
> >type of message that makes it query the backend for error info instead ?
> >
> 
> Ok. I'll try to do as you suggested in next revision. Something like:
> 
> 	- Interrupt comes in
> 	- OPAL notifier callback
> 	- Mark all PHB and its subordinate PEs "isolated" since we don't know
> 	  which PHB/PE has problems (Note: we still need eeh_serialize_lock())

No, don't mark anything. It wouldn't be good to start marking "isolated"
things that aren't. It doesn't matter if we don't "know" they are
isolated just yet. Just "poke" the EEH thread with a different type of
message from the current one.

> 	- Create an EEH event without binding PE to EEH core.
> 	- EEH core starts new kthread and calls to next_error() backend
> 	  and handle the EEH errors accordingly.
(Continue reading)

Gavin Shan | 15 Jun 2013 11:02
Picon

[PATCH v4 00/27] EEH Support for PowerNV platform

Initially, the series of patches is built based on 3.10.RC1 and the patchset
doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
EEH support on PowerNV platform is something to do in future.

The series of patches intends to support EEH for PowerNV platform. The EEH
core already supports multiple probe methods: device tree nodes and PCI
devices. For EEH on PowerNV, we're using PCI devices to do EEH probe, which
is different from the probe type used on pSeries platform. Another point I
should mention is that the overall EEH would be split up to 3 layers: EEH
core, platform layer and I/O chip layer. It would make the EEH on PowerNV
platform can achieve more flexibility and support more I/O chips in future.
Besides, the EEH event can be produced by detecting 0xFF's from reading
PCI config or I/O registers, or from interrupts dedicated for EEH error
reporting. So we have to handle the EEH error interrupts. On the other hand,
the EEH events will be processed by EEH core like pSeries platform does.

We will have exported debugfs entries ("/sys/kernel/debug/powerpc/PCIxxxx/err_injct"),
which allows you to control the 0xD10 register in order to force errors like
frozen PE and fenced PHB for testing purpose. The following example is usualy
what I'm using to control that register. The patchset has been verified on
Firebird-L machine where I have 2 Emulex ethernet card on PHB#0. I keep pinging
to one of the ethernet cards (eth0) from external and then use following commands
to produce frozen PE or fenced PHB errors. Eventually, the errors can be recovered
and the ethernet card is reachable after temporary connection lost.

Trigger frozen PE:

	echo 0x0000000002000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct
	sleep 1
	echo 0x0 > /sys/kernel/debug/powerpc/PCI0000/err_injct
(Continue reading)

Benjamin Herrenschmidt | 15 Jun 2013 04:13

[PATCH] powerpc: Fix missing/delayed calls to irq_work

When replaying interrupts (as a result of the interrupt occurring
while soft-disabled), in the case of the decrementer, we are exclusively
testing for a pending timer target. However we also use decrementer
interrupts to trigger the new "irq_work", which in this case would
be missed.

This change the logic to force a replay in both cases of a timer
boundary reached and a decrementer interrupt having actually occurred
while disabled. The former test is still useful to catch cases where
a CPU having been hard-disabled for a long time completely misses the
interrupt due to a decrementer rollover.

Signed-off-by: Benjamin Herrenschmidt <benh <at> kernel.crashing.org>
---

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5cbcf4d..ea185e0 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
 <at>  <at>  -162,7 +162,7  <at>  <at>  notrace unsigned int __check_irq_replay(void)
 	 * in case we also had a rollover while hard disabled
 	 */
 	local_paca->irq_happened &= ~PACA_IRQ_DEC;
-	if (decrementer_check_overflow())
+	if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
 		return 0x900;

 	/* Finally check if an external interrupt happened */
Sebastien Bessiere | 14 Jun 2013 17:57
Picon

[PATCH] trivial: powerpc: fix typo in ioei_interrupt() description

Signed-off-by: Sebastien Bessiere <sebastien.bessiere <at> gmail.com>
---
 arch/powerpc/platforms/pseries/io_event_irq.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/io_event_irq.c b/arch/powerpc/platforms/pseries/io_event_irq.c
index ef9d9d8..5ea88d1 100644
--- a/arch/powerpc/platforms/pseries/io_event_irq.c
+++ b/arch/powerpc/platforms/pseries/io_event_irq.c
 <at>  <at>  -115,7 +115,7  <at>  <at>  static struct pseries_io_event * ioei_find_event(struct rtas_error_log *elog)
  *   by scope or event type alone. For example, Torrent ISR route change
  *   event is reported with scope 0x00 (Not Applicatable) rather than
  *   0x3B (Torrent-hub). It is better to let the clients to identify
- *   who owns the the event.
+ *   who owns the event.
  */

 static irqreturn_t ioei_interrupt(int irq, void *dev_id)
--

-- 
1.7.9.5

Gmane