Matteo Bertozzi (JIRA | 3 Jul 02:46 2015
Picon

[jira] [Resolved] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely


     [
https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi resolved HBASE-14016.
-------------------------------------
    Resolution: Duplicate

sorry closing as duplicate of HBASE-14017
(we don't need a full lock)

> Procedure V2: NPE in a delete table follow by create table closely
> ------------------------------------------------------------------
>
>                 Key: HBASE-14016
>                 URL: https://issues.apache.org/jira/browse/HBASE-14016
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>
> In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table
closely would leak zk lock due to NPE in ProcedureFairRunQueues
> {noformat}
> Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
> 	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
> 	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
(Continue reading)

Matteo Bertozzi (JIRA | 3 Jul 02:42 2015
Picon

[jira] [Created] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion

Matteo Bertozzi created HBASE-14017:
---------------------------------------

             Summary: Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
                 Key: HBASE-14017
                 URL: https://issues.apache.org/jira/browse/HBASE-14017
             Project: HBase
          Issue Type: Sub-task
          Components: proc-v2
    Affects Versions: 1.1.0.1, 2.0.0, 1.2.0
            Reporter: Matteo Bertozzi
            Assignee: Matteo Bertozzi
            Priority: Blocker
             Fix For: 2.0.0, 1.2.0, 1.1.2

[~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock
before deleting the table
{noformat}
Thread 1: Create table is running - tryWrite() acquire the lock, before set wlock=true;
Thread 2: markTableAsDeleted see the queue empty and wlock= false
Thread 1: set wlock=true; too late
Thread 2: delete the queue
Thread 1: never able to release the lock
{noformat}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

(Continue reading)

Stephen Yuan Jiang (JIRA | 3 Jul 02:37 2015
Picon

[jira] [Created] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

Stephen Yuan Jiang created HBASE-14016:
------------------------------------------

             Summary: Procedure V2: NPE in a delete table follow by create table closely
                 Key: HBASE-14016
                 URL: https://issues.apache.org/jira/browse/HBASE-14016
             Project: HBase
          Issue Type: Bug
          Components: proc-v2
    Affects Versions: 1.1.1, 2.0.0, 1.2.0, 1.3.0
            Reporter: Stephen Yuan Jiang
            Assignee: Stephen Yuan Jiang

In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table
closely would leak zk lock due to NPE in ProcedureFairRunQueues
{noformat}
Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException
	at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
{noformat}

Here is the code that cause the race condition:
{code}
protected boolean markTableAsDeleted(final TableName table) {
    TableRunQueue queue = getRunQueue(table);
    if (queue != null) {
        ...
        if (queue.isEmpty() && !queue.isLocked()) {
(Continue reading)

stack (JIRA | 3 Jul 02:10 2015
Picon

[jira] [Created] (HBASE-14015) Allow setting a richer state value when toString a pv2

stack created HBASE-14015:
-----------------------------

             Summary: Allow setting a richer state value when toString a pv2
                 Key: HBASE-14015
                 URL: https://issues.apache.org/jira/browse/HBASE-14015
             Project: HBase
          Issue Type: Improvement
          Components: proc-v2
            Reporter: stack
            Assignee: stack
            Priority: Minor

Debugging, my procedure after a crash was loaded out of the store and its state was RUNNING. It would help if I
knew in which of the states of a StateMachineProcedure it was going to start RUNNING at.

Chatting w/ Matteo, he suggested allowing Procedures customize the String.

Here is patch that makes it so StateMachineProcedure will now print out the base state -- RUNNING, FINISHED
-- followed by a ':' and then the StateMachineProcedure state: e.g. SimpleStateMachineProcedure state=RUNNABLE:SERVER_CRASH_ASSIGN

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

stack (JIRA | 2 Jul 23:42 2015
Picon

[jira] [Resolved] (HBASE-14013) Retry when RegionServerNotYetRunningException rather than go ahead with assign so for sure we don't skip WAL replay


     [
https://issues.apache.org/jira/browse/HBASE-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-14013.
---------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed

Pushed to branch-1.1+

> Retry when RegionServerNotYetRunningException rather than go ahead with assign so for sure we don't
skip WAL replay
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14013
>                 URL: https://issues.apache.org/jira/browse/HBASE-14013
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Region Assignment
>            Reporter: stack
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0
>
>         Attachments: hbase-13895_addendum3-branch-1.1.patch, hbase-13895_addendum3-branch-1.patch, hbase-13895_addendum3-master.patch
>
>
> Patches are copied from parent. They were done by [~enis]  +1 from. They continue the theme of the parent
applying it to RegionServerNotYetRunningException as well as the new region aborting exception ..
added in parent issue.
(Continue reading)

stack (JIRA | 2 Jul 23:27 2015
Picon

[jira] [Created] (HBASE-14013) Retry when RegionServerNotYetRunningException rather than go ahead with assign so for sure we don't skip WAL replay

stack created HBASE-14013:
-----------------------------

             Summary: Retry when RegionServerNotYetRunningException rather than go ahead with assign so for sure we
don't skip WAL replay
                 Key: HBASE-14013
                 URL: https://issues.apache.org/jira/browse/HBASE-14013
             Project: HBase
          Issue Type: Sub-task
          Components: Region Assignment
            Reporter: stack
            Assignee: Enis Soztutar
             Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0

Patches are copied from parent. They were done by [~enis]  +1 from. They continue the theme of the parent
applying it to RegionServerNotYetRunningException as well as the new region aborting exception ..
added in parent issue.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Lars Hofhansl (JIRA | 2 Jul 23:27 2015
Picon

[jira] [Created] (HBASE-14014) Explore row-by-row grouping options

Lars Hofhansl created HBASE-14014:
-------------------------------------

             Summary: Explore row-by-row grouping options
                 Key: HBASE-14014
                 URL: https://issues.apache.org/jira/browse/HBASE-14014
             Project: HBase
          Issue Type: Sub-task
            Reporter: Lars Hofhansl

See discussion in parent.

We need to considering the following attributes of WALKey:
* The cluster ids
* Table Name
* write time (here we could use the latest of any batch)
* seqNum

As long as we preserve these we can rearrange the cells between WALEdits. Since seqNum is unique this will be
a challenge. Currently it is not used, but we shouldn't design anything that prevents us guaranteeing
better ordering guarantees using seqNum.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Vladimir Rodionov | 2 Jul 23:00 2015
Picon

Backup/Restore (HBASE-7192) design doc

Hi, folks

Kindly soliciting feedback on a latest design doc:

https://issues.apache.org/jira/browse/HBASE-7912

-Vlad
Sean Busbey (JIRA | 2 Jul 22:10 2015
Picon

[jira] [Resolved] (HBASE-13861) BucketCacheTmpl.jamon has wrong bucket free and used labels


     [
https://issues.apache.org/jira/browse/HBASE-13861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Busbey resolved HBASE-13861.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.2
                   1.0.2

pushed to branch-1.1 and branch-1.0.

{quote}
+1. We don't define our UI's as API.
{quote}

If this is the case, what does the "Web page APIs" bullet under the Operational Compatibility section refer to?

> BucketCacheTmpl.jamon has wrong bucket free and used labels
> -----------------------------------------------------------
>
>                 Key: HBASE-13861
>                 URL: https://issues.apache.org/jira/browse/HBASE-13861
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, UI
>    Affects Versions: 1.1.0
>            Reporter: Lars George
>            Assignee: Matt Warhaftig
>              Labels: beginner
(Continue reading)

stack (JIRA | 2 Jul 22:06 2015
Picon

[jira] [Created] (HBASE-14012) Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover

stack created HBASE-14012:
-----------------------------

             Summary: Double Assignment and Dataloss when ServerCrashProcedure runs during Master failover
                 Key: HBASE-14012
                 URL: https://issues.apache.org/jira/browse/HBASE-14012
             Project: HBase
          Issue Type: Bug
          Components: master, Region Assignment
    Affects Versions: 2.0.0, 1.2.0
            Reporter: stack
            Assignee: stack
            Priority: Critical

ITBLL. Master comes up. It is joining a running cluster (all servers up except Master with most regions
assigned out on cluster). ProcedureStore has two ServerCrashProcedures unfinished (RUNNABLE state).
In SCP, we only check if failover in first step, not for every step, which means ServerCrashProcedure will
run if on reload it is beyond the first step.
{code}
    // Is master fully online? If not, yield. No processing of servers unless master is up
    if (!services.getAssignmentManager().isFailoverCleanupDone()) {
      throwProcedureYieldException("Waiting on master failover to complete");
    }
{code}

There is no definitive logging but it looks like we start running at the assign step. The regions to assign
were persisted before master crash. The regions to assign may not make sense post crash: i.e. here we
double-assign. Checking. We shouldn't run until master is fully up regardless.

--
(Continue reading)

Vladimir Rodionov | 2 Jul 22:05 2015
Picon

Backup/Restore (HBASE-7192) design doc

Hi, folks

Kindly soliciting feedback on a latest design doc:

https://issues.apache.org/jira/browse/HBASE-7912

-Vlad

Gmane