Busby.Cheung | 1 Feb 02:42
Favicon

Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes

Hi Alasdair,
             It works well as  what you said in the email. Thanks!

Best regards,
Busby
 
 > -----原始邮件----- > 发件人: "Alasdair G Kergon" <agk <at> redhat.com> > 发送时间: 2012年1月31日 星期二 > 收件人: "LVM general discussion and development" <linux-lvm <at> redhat.com> > 抄送: dm-devel <at> redhat.com, lvm-devel <at> redhat.com > 主题: Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes > > On Tue, Jan 31, 2012 at 05:52:21PM +0800, Busby.Cheung wrote: > > I tryed to use this LVM2 to create thin pool and thin lv, but it said PE was required. The VG I used is free, can anyone help me? Should any more args be needed? Is there any more detailed HowTo file than man file? > > > For now, you need either 2 PVs in the VG, or use --alloc anywhere with 1 PV, or > split the allocation manually like: > > > [root <at> host2 ~]# lvcreate -L100M -T vg_pool/pool -V 1T --name thin_lv /dev/sdl:0 /dev/sdl:1- > > I have a patch that fixes this problem, but I'm still testing it. > > Alasdair > > > _______________________________________________ > linux-lvm mailing list > linux-lvm <at> redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Busby.Cheung | 1 Feb 09:40
Favicon

Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes


When I used a VG of 1.82TB to create a 100GB pool, it faild.
The err mesg  "cannot allocte memory",however, my host's meminfo:

[root <at> host2 persistent-data]# cat /proc/meminfo
MemTotal:       10234416 kB
MemFree:         9685956 kB

whether I should configure somewhere to make it work well(create bigger pool)?

This is the mesgs:

---------------------
[root <at> host2 ~]# pvs
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  PV         VG         Fmt  Attr PSize   PFree  
  /dev/sda2  VolGroup00 lvm2 a--  931.41G      0 
  /dev/sdb   vg01       lvm2 a--  931.51G 193.51G
  /dev/sdc   vg01       lvm2 a--  931.51G 931.51G
  /dev/sdd   vg02       lvm2 a--  931.51G 927.51G
  /dev/sdg   vg_pool    lvm2 a--  931.51G 931.51G
  /dev/sdl   vg_pool    lvm2 a--  931.51G 931.51G
[root <at> host2 ~]# vgs
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  VG         #PV #LV #SN Attr   VSize   VFree  
  VolGroup00   1   2   0 wz--n- 931.41G      0 
  vg01         2   7   0 wz--n-   1.82T   1.10T
  vg02         1   1   0 wz--n- 931.51G 927.51G
  vg_pool      2   0   0 wz--n-   1.82T   1.82T
[root <at> host2 ~]# lvcreate -L 100G -T vg_pool/thin_lv 
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  Rounding up size to full physical extent 4.00 MB
  device-mapper: resume ioctl on  failed: Cannot allocate memory
  Unable to resume vg_pool-thin_lv-tpool (253:15)
  Aborting. Failed to activate thin thin_lv.
 

Busby

 > -----原始邮件----- > 发件人: "Alasdair G Kergon" <agk <at> redhat.com> > 发送时间: 2012年1月31日 星期二 > 收件人: "LVM general discussion and development" <linux-lvm <at> redhat.com> > 抄送: dm-devel <at> redhat.com, lvm-devel <at> redhat.com > 主题: Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes > > On Tue, Jan 31, 2012 at 05:52:21PM +0800, Busby.Cheung wrote: > > I tryed to use this LVM2 to create thin pool and thin lv, but it said PE was required. The VG I used is free, can anyone help me? Should any more args be needed? Is there any more detailed HowTo file than man file? > > > For now, you need either 2 PVs in the VG, or use --alloc anywhere with 1 PV, or > split the allocation manually like: > > > [root <at> host2 ~]# lvcreate -L100M -T vg_pool/pool -V 1T --name thin_lv /dev/sdl:0 /dev/sdl:1- > > I have a patch that fixes this problem, but I'm still testing it. > > Alasdair > > > _______________________________________________ > linux-lvm mailing list > linux-lvm <at> redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Alasdair G Kergon | 1 Feb 13:41
Picon
Favicon

Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes

On Wed, Feb 01, 2012 at 04:40:19PM +0800, Busby.Cheung wrote:
> This is the mesgs:

Can you also lookin the kernel message log and see where or not it gives any
additional reason there?

Alasdair

Busby.Cheung | 2 Feb 02:36
Favicon

Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes

Hi Alasdair,

The kernel mesg log "failed to resize data device" while use dmsetup CMDs to create pool will be ok.

use LVM2 cmd mesgs:
---------------------------
[root <at> host2 ~]# pvs
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  PV         VG         Fmt  Attr PSize   PFree 
  /dev/sda2  VolGroup00 lvm2 a--  931.41G      0
  /dev/sdg   vg_pool    lvm2 a--  931.51G 931.51G
  /dev/sdl   vg_pool    lvm2 a--  931.51G 931.51G
[root <at> host2 ~]# vgs
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  VG         #PV #LV #SN Attr   VSize   VFree 
  VolGroup00   1   2   0 wz--n- 931.41G      0
  vg_pool      2   0   0 wz--n-   1.82T   1.82T
[root <at> host2 ~]# lvcreate -L65G -T vg_pool/pool
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  Rounding up size to full physical extent 4.00 MB
  device-mapper: resume ioctl on  failed: Cannot allocate memory
  Unable to resume vg_pool-pool-tpool (253:12)
  Aborting. Failed to activate thin pool.
[root <at> host2 ~]# dmesg
device-mapper: space map checker: Loading debug space map from disk.  This may take some time
device-mapper: space map checker: Load complete
device-mapper: thin: failed to resize data device
[root <at> host2 ~]# cat /var/log/messages
Feb  2 09:14:45 host2 kernel: device-mapper: space map checker: Loading debug space map from disk.  This may take some time
Feb  2 09:14:45 host2 kernel: device-mapper: space map checker: Load complete
Feb  2 09:14:45 host2 kernel: device-mapper: thin: failed to resize data device
 --------------------------
use dmsetup cmd:

[root <at> host2 ~]# lvcreate -n metadata_lv -L40M  vg_pool
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  Logical volume "metadata_lv" created
[root <at> host2 ~]# lvcreate -n data_lv -L300G vg_pool
  Ignoring too small pv_min_size 512KB, using default 2048KB.
  Logical volume "data_lv" created
[root <at> host2 ~]# dmsetup create pool --table "0 209715200 thin-pool /dev/vg_pool/metadata_lv /dev/vg_pool/data_lv  1024  20000"
[root <at> host2 ~]# dmsetup status
vg_pool-metadata_lv: 0 81920 linear
VolGroup00-LogVol01: 0 24510464 linear
vg_pool-pool-tpool: 0 136314880 thin-pool 0 76/1024 0/0 -
vg_pool-pool_tdata: 0 136314880 linear
VolGroup00-LogVol00: 0 1928790016 linear
vg_pool-pool_tmeta: 0 8192 linear
pool: 0 209715200 thin-pool 0 21/10240 0/204800 -
vg_pool-data_lv: 0 629145600 linear

[root <at> host2 ~]# dmesg
device-mapper: space map checker: Loading debug space map from disk.  This may take some time
device-mapper: space map checker: Load complete
device-mapper: thin: failed to resize data device
device-mapper: space map checker: Loading debug space map from disk.  This may take some time
device-mapper: space map checker: Load complete
device-mapper: space map checker: free block counts differ, checker 1020, sm-disk:948
device-mapper: space map checker: free block counts differ, checker 10236, sm-disk:10219
 
 
Best regards,
Busby

 > -----原始邮件----- > 发件人: "Alasdair G Kergon" <agk <at> redhat.com> > 发送时间: 2012年2月1日 星期三 > 收件人: "Busby.Cheung" <chaimvy <at> 163.com> > 抄送: "LVM general discussion and development" <linux-lvm <at> redhat.com>, "agk <at> redhat.com" <agk <at> redhat.com>, dm-devel <at> redhat.com > 主题: Re: Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes > > On Wed, Feb 01, 2012 at 04:40:19PM +0800, Busby.Cheung wrote: > > This is the mesgs: > > Can you also lookin the kernel message log and see where or not it gives any > additional reason there? > > Alasdair

Zdenek Kabelac | 2 Feb 10:41
Picon
Favicon

Re: [linux-lvm] New LVM2 release 2.02.89: Thinly-provisioned logical volumes

Dne 2.2.2012 02:36, Busby.Cheung napsal(a):
> Hi Alasdair,
>
>               The kernel mesg log "failed to resize data device" while use dmsetup CMDs to create pool will be ok.
>
> use LVM2 cmd mesgs:
> ---------------------------
>
> [root <at> host2 ~]# pvs
>    Ignoring too small pv_min_size 512KB, using default 2048KB.
>    PV         VG         Fmt  Attr PSize   PFree
>    /dev/sda2  VolGroup00 lvm2 a--  931.41G      0
>    /dev/sdg   vg_pool    lvm2 a--  931.51G 931.51G
>    /dev/sdl   vg_pool    lvm2 a--  931.51G 931.51G
> [root <at> host2 ~]# vgs
>    Ignoring too small pv_min_size 512KB, using default 2048KB.
>    VG         #PV #LV #SN Attr   VSize   VFree
>    VolGroup00   1   2   0 wz--n- 931.41G      0
>    vg_pool      2   0   0 wz--n-   1.82T   1.82T
> [root <at> host2 ~]# lvcreate -L65G -T vg_pool/pool
>    Ignoring too small pv_min_size 512KB, using default 2048KB.
>    Rounding up size to full physical extent 4.00 MB
>    device-mapper: resume ioctl on  failed: Cannot allocate memory
>    Unable to resume vg_pool-pool-tpool (253:12)
>    Aborting. Failed to activate thin pool.
> [root <at> host2 ~]# dmesg
> device-mapper: space map checker: Loading debug space map from disk.  This may take some time
> device-mapper: space map checker: Load complete
> device-mapper: thin: failed to resize data device
> [root <at> host2 ~]# cat /var/log/messages
> Feb  2 09:14:45 host2 kernel: device-mapper: space map checker: Loading debug space map from disk.  This
may take some time
> Feb  2 09:14:45 host2 kernel: device-mapper: space map checker: Load complete
> Feb  2 09:14:45 host2 kernel: device-mapper: thin: failed to resize data device
>
>   --------------------------
> use dmsetup cmd:
>
> [root <at> host2 ~]# lvcreate -n metadata_lv -L40M  vg_pool
>    Ignoring too small pv_min_size 512KB, using default 2048KB.
>    Logical volume "metadata_lv" created
> [root <at> host2 ~]# lvcreate -n data_lv -L300G vg_pool
>    Ignoring too small pv_min_size 512KB, using default 2048KB.
>    Logical volume "data_lv" created
> [root <at> host2 ~]# dmsetup create pool --table "0 209715200 thin-pool /dev/vg_pool/metadata_lv
/dev/vg_pool/data_lv  1024  20000"
> [root <at> host2 ~]# dmsetup status
> vg_pool-metadata_lv: 0 81920 linear
> VolGroup00-LogVol01: 0 24510464 linear
> vg_pool-pool-tpool: 0 136314880 thin-pool 0 76/1024 0/0 -
> vg_pool-pool_tdata: 0 136314880 linear
> VolGroup00-LogVol00: 0 1928790016 linear
> vg_pool-pool_tmeta: 0 8192 linear
> pool: 0 209715200 thin-pool 0 21/10240 0/204800 -
> vg_pool-data_lv: 0 629145600 linear
>
> [root <at> host2 ~]# dmesg
> device-mapper: space map checker: Loading debug space map from disk.  This may take some time
> device-mapper: space map checker: Load complete
> device-mapper: thin: failed to resize data device
> device-mapper: space map checker: Loading debug space map from disk.  This may take some time
> device-mapper: space map checker: Load complete
> device-mapper: space map checker: free block counts differ, checker 1020, sm-disk:948
> device-mapper: space map checker: free block counts differ, checker 10236, sm-disk:10219
>

Have you tried to build kernel without CONFIG_DM_DEBUG_SPACE_MAPS and 
CONFIG_DM_DEBUG_BLOCK_STACK_TRACING ?

These two options are there only for debugging - and have major impact on 
performance (and possibly on memory resource as well).

To check consistency of metadata there are now user-space tools available.

Zdenek

Joe Thornber | 2 Feb 17:39
Picon
Favicon

[PATCH 00/11] Latest dm-thin patches

Clean patches that merge together various fixes in the thin-dev tree.

Joe Thornber (11):
  Unlock the superblock on an error path for new metadata dev creation.
  Remove redundant arg from value_ptr()
  [dm-thin] [bio prison] Don't use the bi_next field for the holder of
    a cell.
  [dm-thin] dm_thin_remove_block() wasn't decrementing the
    mapped_blocks counter.
  [dm-thin] btree-remove - fix rebalancing of 3 nodes.
  Remove entries from the ref_count tree if they're no longer needed.
  [dm-thin] Commit every second to prevent too much of a position
    building up.
  [dm-thin] Add support for external origins.
  [dm-thin] Discard support part 1
  [dm-thin] Add support for REQ_DISCARD
  [dm-thin] some tidy ups of the __open_device() error path (Mike
    Snitzer)

 Documentation/device-mapper/thin-provisioning.txt |   38 ++-
 drivers/md/dm-thin-metadata.c                     |   25 +-
 drivers/md/dm-thin.c                              |  442 ++++++++++++++++-----
 drivers/md/persistent-data/dm-btree-internal.h    |    7 +-
 drivers/md/persistent-data/dm-btree-remove.c      |  202 ++++++----
 drivers/md/persistent-data/dm-btree.c             |   27 +-
 drivers/md/persistent-data/dm-space-map-common.c  |    3 -
 7 files changed, 531 insertions(+), 213 deletions(-)

--

-- 
1.7.5.4

Joe Thornber | 2 Feb 17:39
Picon
Favicon

[PATCH 02/11] Remove redundant arg from value_ptr()

Now that the value_size is held within every node of the btrees we can
remove this argument from value_ptr().

For the last few months a BUG_ON has been checking this argument is
the same as that held in the node.  No issues were reported.  So this
is a safe change.
---
 drivers/md/persistent-data/dm-btree-internal.h |    7 +----
 drivers/md/persistent-data/dm-btree-remove.c   |   28 ++++++++++++------------
 drivers/md/persistent-data/dm-btree.c          |   27 +++++++++++------------
 3 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/drivers/md/persistent-data/dm-btree-internal.h b/drivers/md/persistent-data/dm-btree-internal.h
index d279c76..5709bfe 100644
--- a/drivers/md/persistent-data/dm-btree-internal.h
+++ b/drivers/md/persistent-data/dm-btree-internal.h
@@ -108,12 +108,9 @@ static inline void *value_base(struct node *n)
 	return &n->keys[le32_to_cpu(n->header.max_entries)];
 }

-/*
- * FIXME: Now that value size is stored in node we don't need the third parm.
- */
-static inline void *value_ptr(struct node *n, uint32_t index, size_t value_size)
+static inline void *value_ptr(struct node *n, uint32_t index)
 {
-	BUG_ON(value_size != le32_to_cpu(n->header.value_size));
+	uint32_t value_size = le32_to_cpu(n->header.value_size);
 	return value_base(n) + (value_size * index);
 }

diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c
index 023fbc2..8576d56 100644
--- a/drivers/md/persistent-data/dm-btree-remove.c
+++ b/drivers/md/persistent-data/dm-btree-remove.c
@@ -61,20 +61,20 @@ static void node_shift(struct node *n, int shift)
 	if (shift < 0) {
 		shift = -shift;
 		BUG_ON(shift > nr_entries);
-		BUG_ON((void *) key_ptr(n, shift) >= value_ptr(n, shift, value_size));
+		BUG_ON((void *) key_ptr(n, shift) >= value_ptr(n, shift));
 		memmove(key_ptr(n, 0),
 			key_ptr(n, shift),
 			(nr_entries - shift) * sizeof(__le64));
-		memmove(value_ptr(n, 0, value_size),
-			value_ptr(n, shift, value_size),
+		memmove(value_ptr(n, 0),
+			value_ptr(n, shift),
 			(nr_entries - shift) * value_size);
 	} else {
 		BUG_ON(nr_entries + shift > le32_to_cpu(n->header.max_entries));
 		memmove(key_ptr(n, shift),
 			key_ptr(n, 0),
 			nr_entries * sizeof(__le64));
-		memmove(value_ptr(n, shift, value_size),
-			value_ptr(n, 0, value_size),
+		memmove(value_ptr(n, shift),
+			value_ptr(n, 0),
 			nr_entries * value_size);
 	}
 }
@@ -91,16 +91,16 @@ static void node_copy(struct node *left, struct node *right, int shift)
 		memcpy(key_ptr(left, nr_left),
 		       key_ptr(right, 0),
 		       shift * sizeof(__le64));
-		memcpy(value_ptr(left, nr_left, value_size),
-		       value_ptr(right, 0, value_size),
+		memcpy(value_ptr(left, nr_left),
+		       value_ptr(right, 0),
 		       shift * value_size);
 	} else {
 		BUG_ON(shift > le32_to_cpu(right->header.max_entries));
 		memcpy(key_ptr(right, 0),
 		       key_ptr(left, nr_left - shift),
 		       shift * sizeof(__le64));
-		memcpy(value_ptr(right, 0, value_size),
-		       value_ptr(left, nr_left - shift, value_size),
+		memcpy(value_ptr(right, 0),
+		       value_ptr(left, nr_left - shift),
 		       shift * value_size);
 	}
 }
@@ -120,8 +120,8 @@ static void delete_at(struct node *n, unsigned index)
 			key_ptr(n, index + 1),
 			nr_to_copy * sizeof(__le64));

-		memmove(value_ptr(n, index, value_size),
-			value_ptr(n, index + 1, value_size),
+		memmove(value_ptr(n, index),
+			value_ptr(n, index + 1),
 			nr_to_copy * value_size);
 	}

@@ -175,7 +175,7 @@ static int init_child(struct dm_btree_info *info, struct node *parent,
 	if (inc)
 		inc_children(info->tm, result->n, &le64_type);

-	*((__le64 *) value_ptr(parent, index, sizeof(__le64))) =
+	*((__le64 *) value_ptr(parent, index)) =
 		cpu_to_le64(dm_block_location(result->block));

 	return 0;
@@ -496,7 +496,7 @@ static int remove_raw(struct shadow_spine *s, struct dm_btree_info *info,
 		 */
 		if (shadow_has_parent(s)) {
 			__le64 location = cpu_to_le64(dm_block_location(shadow_current(s)));
-			memcpy(value_ptr(dm_block_data(shadow_parent(s)), i, sizeof(__le64)),
+			memcpy(value_ptr(dm_block_data(shadow_parent(s)), i),
 			       &location, sizeof(__le64));
 		}

@@ -553,7 +553,7 @@ int dm_btree_remove(struct dm_btree_info *info, dm_block_t root,

 		if (info->value_type.dec)
 			info->value_type.dec(info->value_type.context,
-					     value_ptr(n, index, info->value_type.size));
+					     value_ptr(n, index));

 		delete_at(n, index);
 	}
diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c
index bd1e7ff..d12b2cc 100644
--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
@@ -74,8 +74,7 @@ void inc_children(struct dm_transaction_manager *tm, struct node *n,
 			dm_tm_inc(tm, value64(n, i));
 	else if (vt->inc)
 		for (i = 0; i < nr_entries; i++)
-			vt->inc(vt->context,
-				value_ptr(n, i, vt->size));
+			vt->inc(vt->context, value_ptr(n, i));
 }

 static int insert_at(size_t value_size, struct node *node, unsigned index,
@@ -281,7 +280,7 @@ int dm_btree_del(struct dm_btree_info *info, dm_block_t root)

 				for (i = 0; i < f->nr_children; i++)
 					info->value_type.dec(info->value_type.context,
-							     value_ptr(f->n, i, info->value_type.size));
+							     value_ptr(f->n, i));
 			}
 			f->current_child = f->nr_children;
 		}
@@ -320,7 +319,7 @@ static int btree_lookup_raw(struct ro_spine *s, dm_block_t block, uint64_t key,
 	} while (!(flags & LEAF_NODE));

 	*result_key = le64_to_cpu(ro_node(s)->keys[i]);
-	memcpy(v, value_ptr(ro_node(s), i, value_size), value_size);
+	memcpy(v, value_ptr(ro_node(s), i), value_size);

 	return 0;
 }
@@ -432,7 +431,7 @@ static int btree_split_sibling(struct shadow_spine *s, dm_block_t root,

 	size = le32_to_cpu(ln->header.flags) & INTERNAL_NODE ?
 		sizeof(uint64_t) : s->info->value_type.size;
-	memcpy(value_ptr(rn, 0, size), value_ptr(ln, nr_left, size),
+	memcpy(value_ptr(rn, 0), value_ptr(ln, nr_left),
 	       size * nr_right);

 	/*
@@ -443,7 +442,7 @@ static int btree_split_sibling(struct shadow_spine *s, dm_block_t root,
 	pn = dm_block_data(parent);
 	location = cpu_to_le64(dm_block_location(left));
 	__dm_bless_for_disk(&location);
-	memcpy_disk(value_ptr(pn, parent_index, sizeof(__le64)),
+	memcpy_disk(value_ptr(pn, parent_index),
 		    &location, sizeof(__le64));

 	location = cpu_to_le64(dm_block_location(right));
@@ -529,8 +528,8 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)

 	size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ?
 		sizeof(__le64) : s->info->value_type.size;
-	memcpy(value_ptr(ln, 0, size), value_ptr(pn, 0, size), nr_left * size);
-	memcpy(value_ptr(rn, 0, size), value_ptr(pn, nr_left, size),
+	memcpy(value_ptr(ln, 0), value_ptr(pn, 0), nr_left * size);
+	memcpy(value_ptr(rn, 0), value_ptr(pn, nr_left),
 	       nr_right * size);

 	/* new_parent should just point to l and r now */
@@ -545,12 +544,12 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)
 	val = cpu_to_le64(dm_block_location(left));
 	__dm_bless_for_disk(&val);
 	pn->keys[0] = ln->keys[0];
-	memcpy_disk(value_ptr(pn, 0, sizeof(__le64)), &val, sizeof(__le64));
+	memcpy_disk(value_ptr(pn, 0), &val, sizeof(__le64));

 	val = cpu_to_le64(dm_block_location(right));
 	__dm_bless_for_disk(&val);
 	pn->keys[1] = rn->keys[0];
-	memcpy_disk(value_ptr(pn, 1, sizeof(__le64)), &val, sizeof(__le64));
+	memcpy_disk(value_ptr(pn, 1), &val, sizeof(__le64));

 	/*
 	 * rejig the spine.  This is ugly, since it knows too
@@ -595,7 +594,7 @@ static int btree_insert_raw(struct shadow_spine *s, dm_block_t root,
 			__le64 location = cpu_to_le64(dm_block_location(shadow_current(s)));

 			__dm_bless_for_disk(&location);
-			memcpy_disk(value_ptr(dm_block_data(shadow_parent(s)), i, sizeof(uint64_t)),
+			memcpy_disk(value_ptr(dm_block_data(shadow_parent(s)), i),
 				    &location, sizeof(__le64));
 		}

@@ -710,12 +709,12 @@ static int insert(struct dm_btree_info *info, dm_block_t root,
 		    (!info->value_type.equal ||
 		     !info->value_type.equal(
 			     info->value_type.context,
-			     value_ptr(n, index, info->value_type.size),
+			     value_ptr(n, index),
 			     value))) {
 			info->value_type.dec(info->value_type.context,
-					     value_ptr(n, index, info->value_type.size));
+					     value_ptr(n, index));
 		}
-		memcpy_disk(value_ptr(n, index, info->value_type.size),
+		memcpy_disk(value_ptr(n, index),
 			    value, info->value_type.size);
 	}

--

-- 
1.7.5.4

Joe Thornber | 2 Feb 17:39
Picon
Favicon

[PATCH 10/11] [dm-thin] Add support for REQ_DISCARD

---
 drivers/md/dm-thin.c |  173 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 159 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index c5e3102..304a934 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -508,10 +508,12 @@ struct pool {
 	struct bio_list deferred_bios;
 	struct bio_list deferred_flush_bios;
 	struct list_head prepared_mappings;
+	struct list_head prepared_discards;

 	struct bio_list retry_on_resume_list;

 	struct deferred_set shared_read_ds;
+	struct deferred_set all_io_ds;

 	struct new_mapping *next_mapping;
 	mempool_t *mapping_pool;
@@ -609,6 +611,7 @@ static struct pool *__pool_table_lookup_metadata_dev(struct block_device *md_dev
 struct endio_hook {
 	struct thin_c *tc;
 	struct deferred_entry *shared_read_entry;
+	struct deferred_entry *all_io_entry;
 	struct new_mapping *overwrite_mapping;
 };

@@ -718,11 +721,12 @@ struct new_mapping {

 	unsigned quiesced:1;
 	unsigned prepared:1;
+	unsigned pass_discard:1;

 	struct thin_c *tc;
 	dm_block_t virt_block;
 	dm_block_t data_block;
-	struct cell *cell;
+	struct cell *cell, *cell2;
 	int err;

 	/*
@@ -867,7 +871,30 @@ static void process_prepared_mapping(struct new_mapping *m)
 	mempool_free(m, tc->pool->mapping_pool);
 }

-static void process_prepared_mappings(struct pool *pool)
+static void process_prepared_discard(struct new_mapping *m)
+{
+	int r;
+	struct thin_c *tc = m->tc;
+
+	r = dm_thin_remove_block(tc->td, m->virt_block);
+	if (r)
+		DMERR("dm_thin_metadata_remove() failed");
+
+	/*
+	 * Pass the discard down to the underlying device?
+	 */
+	if (m->pass_discard)
+		remap_and_issue(tc, m->bio, m->data_block);
+	else
+		bio_endio(m->bio, 0);
+
+	cell_defer_except(tc, m->cell, m->bio);
+	cell_defer_except(tc, m->cell2, m->bio);
+	mempool_free(m, tc->pool->mapping_pool);
+}
+
+static void process_prepared(struct pool *pool, struct list_head *head,
+			     void (*fn)(struct new_mapping *))
 {
 	unsigned long flags;
 	struct list_head maps;
@@ -875,21 +902,27 @@ static void process_prepared_mappings(struct pool *pool)

 	INIT_LIST_HEAD(&maps);
 	spin_lock_irqsave(&pool->lock, flags);
-	list_splice_init(&pool->prepared_mappings, &maps);
+	list_splice_init(head, &maps);
 	spin_unlock_irqrestore(&pool->lock, flags);

 	list_for_each_entry_safe(m, tmp, &maps, list)
-		process_prepared_mapping(m);
+		fn(m);
 }

 /*
  * Deferred bio jobs.
  */
-static int io_overwrites_block(struct pool *pool, struct bio *bio)
+static int io_overlaps_block(struct pool *pool, struct bio *bio)
 {
-	return ((bio_data_dir(bio) == WRITE) &&
-		!(bio->bi_sector & pool->offset_mask)) &&
+	return !(bio->bi_sector & pool->offset_mask) &&
 		(bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));
+
+}
+
+static int io_overwrites_block(struct pool *pool, struct bio *bio)
+{
+	return (bio_data_dir(bio) == WRITE) &&
+		io_overlaps_block(pool, bio);
 }

 static void save_and_set_endio(struct bio *bio, bio_end_io_t **save,
@@ -1127,6 +1160,86 @@ static void no_space(struct cell *cell)
 		retry_on_resume(bio);
 }

+static void process_discard(struct thin_c *tc, struct bio *bio)
+{
+	int r;
+	struct pool *pool = tc->pool;
+	struct cell *cell, *cell2;
+	struct cell_key key, key2;
+	dm_block_t block = get_bio_block(tc, bio);
+	struct dm_thin_lookup_result lookup_result;
+	struct new_mapping *m;
+
+	build_virtual_key(tc->td, block, &key);
+	if (bio_detain(tc->pool->prison, &key, bio, &cell))
+		return;
+
+	r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
+	switch (r) {
+	case 0:
+		/*
+		 * Check nobody is fiddling with this pool block.  This can
+		 * happen if someone's in the process of breaking sharing
+		 * on this block.
+		 */
+		build_data_key(tc->td, lookup_result.block, &key2);
+		if (bio_detain(tc->pool->prison, &key2, bio, &cell2)) {
+			cell_release_singleton(cell, bio);
+			break;
+		}
+
+		if (io_overlaps_block(pool, bio)) {
+			/*
+			 * IO may still be going to the destination block.  We must
+			 * quiesce before we can do the removal.
+			 */
+			m = get_next_mapping(pool);
+			m->tc = tc;
+			m->pass_discard = !lookup_result.shared;
+			m->virt_block = block;
+			m->data_block = lookup_result.block;
+			m->cell = cell;
+			m->cell2 = cell2;
+			m->err = 0;
+			m->bio = bio;
+
+			if (!ds_add_work(&pool->all_io_ds, &m->list)) {
+				list_add(&m->list, &pool->prepared_discards);
+				wake_worker(pool);
+			}
+		} else {
+			/*
+			 * This path is hit if people are ignoring
+			 * limits->discard_granularity.  It ignores any
+			 * part of the discard that is in a subsequent
+			 * block.
+			 */
+			sector_t offset = bio->bi_sector - (block << pool->block_shift);
+			unsigned remaining = (pool->sectors_per_block - offset) << 9;
+			bio->bi_size = min(bio->bi_size, remaining);
+
+			cell_release_singleton(cell, bio);
+			cell_release_singleton(cell2, bio);
+			remap_and_issue(tc, bio, lookup_result.block);
+		}
+		break;
+
+	case -ENODATA:
+		/*
+		 * It isn't provisioned, just forget it.
+		 */
+		cell_release_singleton(cell, bio);
+		bio_endio(bio, 0);
+		break;
+
+	default:
+		DMERR("discard: find block unexpectedly returned %d\n", r);
+		cell_release_singleton(cell, bio);
+		bio_io_error(bio);
+		break;
+	}
+}
+
 static void break_sharing(struct thin_c *tc, struct bio *bio, dm_block_t block,
 			  struct cell_key *key,
 			  struct dm_thin_lookup_result *lookup_result,
@@ -1272,6 +1385,7 @@ static void process_bio(struct thin_c *tc, struct bio *bio)

 	default:
 		DMERR("dm_thin_find_block() failed, error = %d", r);
+		cell_release_singleton(cell, bio);
 		bio_io_error(bio);
 		break;
 	}
@@ -1313,7 +1427,11 @@ static void process_deferred_bios(struct pool *pool)

 			break;
 		}
-		process_bio(tc, bio);
+
+		if (bio->bi_rw & REQ_DISCARD)
+			process_discard(tc, bio);
+		else
+			process_bio(tc, bio);
 	}

 	/*
@@ -1349,7 +1467,8 @@ static void do_worker(struct work_struct *ws)
 {
 	struct pool *pool = container_of(ws, struct pool, worker);

-	process_prepared_mappings(pool);
+	process_prepared(pool, &pool->prepared_mappings, process_prepared_mapping);
+	process_prepared(pool, &pool->prepared_discards, process_prepared_discard);
 	process_deferred_bios(pool);
 }

@@ -1392,6 +1511,7 @@ static struct endio_hook *thin_hook_bio(struct thin_c *tc, struct bio *bio)

 	h->tc = tc;
 	h->shared_read_entry = NULL;
+	h->all_io_entry = bio->bi_rw & REQ_DISCARD ? NULL : ds_inc(&pool->all_io_ds);
 	h->overwrite_mapping = NULL;

 	return h;
@@ -1410,7 +1530,7 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio,
 	struct dm_thin_lookup_result result;

 	map_context->ptr = thin_hook_bio(tc, bio);
-	if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) {
+	if (bio->bi_rw & (REQ_DISCARD | REQ_FLUSH | REQ_FUA)) {
 		thin_defer_bio(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -1586,10 +1706,12 @@ static struct pool *pool_create(struct mapped_device *pool_md,
 	bio_list_init(&pool->deferred_bios);
 	bio_list_init(&pool->deferred_flush_bios);
 	INIT_LIST_HEAD(&pool->prepared_mappings);
+	INIT_LIST_HEAD(&pool->prepared_discards);
 	pool->low_water_triggered = 0;
 	pool->no_free_space = 0;
 	bio_list_init(&pool->retry_on_resume_list);
 	ds_init(&pool->shared_read_ds);
+	ds_init(&pool->all_io_ds);

 	pool->next_mapping = NULL;
 	pool->mapping_pool =
@@ -1830,7 +1952,8 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	pt->low_water_blocks = low_water_blocks;
 	pt->zero_new_blocks = pf.zero_new_blocks;
 	ti->num_flush_requests = 1;
-	ti->num_discard_requests = 0;
+	ti->num_discard_requests = 1;
+	ti->discards_supported = 1;
 	ti->private = pt;

 	pt->callbacks.congested_fn = pool_is_congested;
@@ -2223,6 +2346,17 @@ static int pool_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
 	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
 }

+static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
+{
+	limits->max_discard_sectors = pool->sectors_per_block;
+
+	/*
+	 * This is just a hint, and not enforced.  We have to cope with
+	 * bios that overlap 2 blocks.
+	 */
+	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
+}
+
 static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
 {
 	struct pool_c *pt = ti->private;
@@ -2230,6 +2364,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)

 	blk_limits_io_min(limits, 0);
 	blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
+	set_discard_limits(pool, limits);
 }

 static struct target_type pool_target = {
@@ -2346,8 +2481,8 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)

 	ti->split_io = tc->pool->sectors_per_block;
 	ti->num_flush_requests = 1;
-	ti->num_discard_requests = 0;
-	ti->discards_supported = 0;
+	ti->num_discard_requests = 1;
+	ti->discards_supported = 1;

 	dm_put(pool_md);

@@ -2403,6 +2538,14 @@ static int thin_endio(struct dm_target *ti,
 		spin_unlock_irqrestore(&pool->lock, flags);
 	}

+	if (h->all_io_entry) {
+		INIT_LIST_HEAD(&work);
+		ds_dec(h->all_io_entry, &work);
+		list_for_each_entry_safe(m, tmp, &work, list)
+			list_add(&m->list, &pool->prepared_discards);
+	}
+
+	mempool_free(h, pool->endio_hook_pool);
 	return 0;
 }

@@ -2479,9 +2622,11 @@ static int thin_iterate_devices(struct dm_target *ti,
 static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 {
 	struct thin_c *tc = ti->private;
+	struct pool *pool = tc->pool;

 	blk_limits_io_min(limits, 0);
-	blk_limits_io_opt(limits, tc->pool->sectors_per_block << SECTOR_SHIFT);
+	blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
+	set_discard_limits(pool, limits);
 }

 static struct target_type thin_target = {
--

-- 
1.7.5.4

Joe Thornber | 2 Feb 17:39
Picon
Favicon

[PATCH 09/11] [dm-thin] Discard support part 1

This patch contains a lot of the ground work needed for supporting
discard.

- The thin target now has an endio function, that replaces
  shared_read_endio.

- An explicit 'quiesced' flag has been introduced into the new_mapping
  structure.  Before, this was implicitly indicated by m->list being
  empty.

- The map_info->ptr remains constant for the duration of a bio's trip
  through thinp.  Making it easier to reason about it.
---
 drivers/md/dm-thin.c |  125 ++++++++++++++++++++++++++++----------------------
 1 files changed, 70 insertions(+), 55 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 922ebf2..c5e3102 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -511,7 +511,7 @@ struct pool {

 	struct bio_list retry_on_resume_list;

-	struct deferred_set ds;	/* FIXME: move to thin_c */
+	struct deferred_set shared_read_ds;

 	struct new_mapping *next_mapping;
 	mempool_t *mapping_pool;
@@ -606,6 +606,12 @@ static struct pool *__pool_table_lookup_metadata_dev(struct block_device *md_dev

 /*----------------------------------------------------------------*/

+struct endio_hook {
+	struct thin_c *tc;
+	struct deferred_entry *shared_read_entry;
+	struct new_mapping *overwrite_mapping;
+};
+
 static void __requeue_bio_list(struct thin_c *tc, struct bio_list *master)
 {
 	struct bio *bio;
@@ -616,7 +622,8 @@ static void __requeue_bio_list(struct thin_c *tc, struct bio_list *master)
 	bio_list_init(master);

 	while ((bio = bio_list_pop(&bios))) {
-		if (dm_get_mapinfo(bio)->ptr == tc)
+		struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+		if (h->tc == tc)
 			bio_endio(bio, DM_ENDIO_REQUEUE);
 		else
 			bio_list_add(master, bio);
@@ -706,16 +713,11 @@ static void wake_worker(struct pool *pool)
 /*
  * Bio endio functions.
  */
-struct endio_hook {
-	struct thin_c *tc;
-	bio_end_io_t *saved_bi_end_io;
-	struct deferred_entry *entry;
-};
-
 struct new_mapping {
 	struct list_head list;

-	int prepared;
+	unsigned quiesced:1;
+	unsigned prepared:1;

 	struct thin_c *tc;
 	dm_block_t virt_block;
@@ -737,7 +739,7 @@ static void __maybe_add_mapping(struct new_mapping *m)
 {
 	struct pool *pool = m->tc->pool;

-	if (list_empty(&m->list) && m->prepared) {
+	if (m->quiesced && m->prepared) {
 		list_add(&m->list, &pool->prepared_mappings);
 		wake_worker(pool);
 	}
@@ -760,7 +762,8 @@ static void copy_complete(int read_err, unsigned long write_err, void *context)
 static void overwrite_endio(struct bio *bio, int err)
 {
 	unsigned long flags;
-	struct new_mapping *m = dm_get_mapinfo(bio)->ptr;
+	struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+	struct new_mapping *m = h->overwrite_mapping;
 	struct pool *pool = m->tc->pool;

 	m->err = err;
@@ -771,31 +774,6 @@ static void overwrite_endio(struct bio *bio, int err)
 	spin_unlock_irqrestore(&pool->lock, flags);
 }

-static void shared_read_endio(struct bio *bio, int err)
-{
-	struct list_head mappings;
-	struct new_mapping *m, *tmp;
-	struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
-	unsigned long flags;
-	struct pool *pool = h->tc->pool;
-
-	bio->bi_end_io = h->saved_bi_end_io;
-	bio_endio(bio, err);
-
-	INIT_LIST_HEAD(&mappings);
-	ds_dec(h->entry, &mappings);
-
-	spin_lock_irqsave(&pool->lock, flags);
-	list_for_each_entry_safe(m, tmp, &mappings, list) {
-		list_del(&m->list);
-		INIT_LIST_HEAD(&m->list);
-		__maybe_add_mapping(m);
-	}
-	spin_unlock_irqrestore(&pool->lock, flags);
-
-	mempool_free(h, pool->endio_hook_pool);
-}
-
 /*----------------------------------------------------------------*/

 /*
@@ -934,9 +912,7 @@ static int ensure_next_mapping(struct pool *pool)
 static struct new_mapping *get_next_mapping(struct pool *pool)
 {
 	struct new_mapping *r = pool->next_mapping;
-
 	BUG_ON(!pool->next_mapping);
-
 	pool->next_mapping = NULL;

 	return r;
@@ -952,6 +928,7 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	struct new_mapping *m = get_next_mapping(pool);

 	INIT_LIST_HEAD(&m->list);
+	m->quiesced = 0;
 	m->prepared = 0;
 	m->tc = tc;
 	m->virt_block = virt_block;
@@ -960,7 +937,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	m->err = 0;
 	m->bio = NULL;

-	ds_add_work(&pool->ds, &m->list);
+	if (!ds_add_work(&pool->shared_read_ds, &m->list))
+		m->quiesced = 1;

 	/*
 	 * IO to pool_dev remaps to the pool target's data_dev.
@@ -969,9 +947,10 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	 * bio immediately. Otherwise we use kcopyd to clone the data first.
 	 */
 	if (io_overwrites_block(pool, bio)) {
+		struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+		h->overwrite_mapping = m;
 		m->bio = bio;
 		save_and_set_endio(bio, &m->saved_bi_end_io, overwrite_endio);
-		dm_get_mapinfo(bio)->ptr = m;
 		remap_and_issue(tc, bio, data_dest);
 	} else {
 		struct dm_io_region from, to;
@@ -1018,6 +997,7 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 	struct new_mapping *m = get_next_mapping(pool);

 	INIT_LIST_HEAD(&m->list);
+	m->quiesced = 1;
 	m->prepared = 0;
 	m->tc = tc;
 	m->virt_block = virt_block;
@@ -1035,9 +1015,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 		process_prepared_mapping(m);

 	else if (io_overwrites_block(pool, bio)) {
+		struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+		h->overwrite_mapping = m;
 		m->bio = bio;
 		save_and_set_endio(bio, &m->saved_bi_end_io, overwrite_endio);
-		dm_get_mapinfo(bio)->ptr = m;
 		remap_and_issue(tc, bio, data_block);

 	} else {
@@ -1124,7 +1105,8 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result)
  */
 static void retry_on_resume(struct bio *bio)
 {
-	struct thin_c *tc = dm_get_mapinfo(bio)->ptr;
+	struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+	struct thin_c *tc = h->tc;
 	struct pool *pool = tc->pool;
 	unsigned long flags;

@@ -1190,13 +1172,9 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
 	if (bio_data_dir(bio) == WRITE)
 		break_sharing(tc, bio, block, &key, lookup_result, cell);
 	else {
-		struct endio_hook *h;
-		h = mempool_alloc(pool->endio_hook_pool, GFP_NOIO);
+		struct endio_hook *h = dm_get_mapinfo(bio)->ptr;

-		h->tc = tc;
-		h->entry = ds_inc(&pool->ds);
-		save_and_set_endio(bio, &h->saved_bi_end_io, shared_read_endio);
-		dm_get_mapinfo(bio)->ptr = h;
+		h->shared_read_entry = ds_inc(&pool->shared_read_ds);

 		cell_release_singleton(cell, bio);
 		remap_and_issue(tc, bio, lookup_result->block);
@@ -1320,7 +1298,9 @@ static void process_deferred_bios(struct pool *pool)
 	spin_unlock_irqrestore(&pool->lock, flags);

 	while ((bio = bio_list_pop(&bios))) {
-		struct thin_c *tc = dm_get_mapinfo(bio)->ptr;
+		struct endio_hook *h = dm_get_mapinfo(bio)->ptr;
+		struct thin_c *tc = h->tc;
+
 		/*
 		 * If we've got no free new_mapping structs, and processing
 		 * this bio might require one, we pause until there are some
@@ -1405,6 +1385,18 @@ static void thin_defer_bio(struct thin_c *tc, struct bio *bio)
 	wake_worker(pool);
 }

+static struct endio_hook *thin_hook_bio(struct thin_c *tc, struct bio *bio)
+{
+	struct pool *pool = tc->pool;
+	struct endio_hook *h = mempool_alloc(pool->endio_hook_pool, GFP_NOIO);
+
+	h->tc = tc;
+	h->shared_read_entry = NULL;
+	h->overwrite_mapping = NULL;
+
+	return h;
+}
+
 /*
  * Non-blocking function called from the thin target's map function.
  */
@@ -1417,11 +1409,7 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio,
 	struct dm_thin_device *td = tc->td;
 	struct dm_thin_lookup_result result;

-	/*
-	 * Save the thin context for easy access from the deferred bio later.
-	 */
-	map_context->ptr = tc;
-
+	map_context->ptr = thin_hook_bio(tc, bio);
 	if (bio->bi_rw & (REQ_FLUSH | REQ_FUA)) {
 		thin_defer_bio(tc, bio);
 		return DM_MAPIO_SUBMITTED;
@@ -1601,7 +1589,7 @@ static struct pool *pool_create(struct mapped_device *pool_md,
 	pool->low_water_triggered = 0;
 	pool->no_free_space = 0;
 	bio_list_init(&pool->retry_on_resume_list);
-	ds_init(&pool->ds);
+	ds_init(&pool->shared_read_ds);

 	pool->next_mapping = NULL;
 	pool->mapping_pool =
@@ -2392,6 +2380,32 @@ static int thin_map(struct dm_target *ti, struct bio *bio,
 	return thin_bio_map(ti, bio, map_context);
 }

+static int thin_endio(struct dm_target *ti,
+		      struct bio *bio, int err,
+		      union map_info *map_context)
+{
+	unsigned long flags;
+	struct endio_hook *h = map_context->ptr;
+	struct list_head work;
+	struct new_mapping *m, *tmp;
+	struct pool *pool = h->tc->pool;
+
+	if (h->shared_read_entry) {
+		INIT_LIST_HEAD(&work);
+		ds_dec(h->shared_read_entry, &work);
+
+		spin_lock_irqsave(&pool->lock, flags);
+		list_for_each_entry_safe(m, tmp, &work, list) {
+			list_del(&m->list);
+			m->quiesced = 1;
+			__maybe_add_mapping(m);
+		}
+		spin_unlock_irqrestore(&pool->lock, flags);
+	}
+
+	return 0;
+}
+
 static void thin_postsuspend(struct dm_target *ti)
 {
 	if (dm_noflush_suspending(ti))
@@ -2477,6 +2491,7 @@ static struct target_type thin_target = {
 	.ctr = thin_ctr,
 	.dtr = thin_dtr,
 	.map = thin_map,
+	.end_io = thin_endio,
 	.postsuspend = thin_postsuspend,
 	.status = thin_status,
 	.iterate_devices = thin_iterate_devices,
--

-- 
1.7.5.4

Joe Thornber | 2 Feb 17:39
Picon
Favicon

[PATCH 08/11] [dm-thin] Add support for external origins.

---
 Documentation/device-mapper/thin-provisioning.txt |   38 ++++++++++-
 drivers/md/dm-thin.c                              |   81 +++++++++++++++++----
 2 files changed, 105 insertions(+), 14 deletions(-)

diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 801d9d1..60fc5cf 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -167,6 +167,38 @@ ii) Using an internal snapshot.

     dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"

+External snapshots
+------------------
+
+You can use an external, _read only_, device as an origin for a thin
+device.  Any read to an unprovisioned area of the thin device will be
+passed through to the origin.  Writes trigger allocation of new blocks
+as usual.
+
+One possible use case for this would be VM hosts who want to run
+guests on thinp volumes, but have the base image on another device
+(possibly shared between many VMs).
+
+You must not write to the origin device if you use this technique!  Of
+course you can write to the thin device, and take internal snapshots
+of the thin.
+
+i) Creating an external snapshot
+
+  Same as creating a thin device.  You don't need to mention the
+  origin at this stage.
+
+    dmsetup message /dev/mapper/pool 0 "create_thin 0"
+
+ii) Using an external snapshot.
+
+  Add an extra parameter to the thin target specifying the origin:
+
+    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
+
+  All descendants (internal snapshots) of an external snapshot will
+  need the extra origin argument.
+
 Deactivation
 ------------

@@ -262,7 +294,7 @@ iii) Messages

 i) Constructor

-    thin <pool dev> <dev id>
+    thin <pool dev> <dev id> [external origin id]

     pool dev:
 	the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
@@ -271,6 +303,10 @@ i) Constructor
 	the internal device identifier of the device to be
 	activated.

+    external origin dev:
+        a block device; reads to unprovisioned areas of the thin target
+        will be mapped to here.
+
 The pool doesn't store any size against the thin devices.  If you
 load a thin target that is smaller than you've been using previously,
 then you'll have no access to blocks mapped beyond the end.  If you
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 19de11a..922ebf2 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -537,6 +537,7 @@ struct pool_c {
  */
 struct thin_c {
 	struct dm_dev *pool_dev;
+	struct dm_dev *origin_dev;
 	dm_thin_id dev_id;

 	struct pool *pool;
@@ -654,14 +655,16 @@ static void remap(struct thin_c *tc, struct bio *bio, dm_block_t block)
 		(bio->bi_sector & pool->offset_mask);
 }

-static void remap_and_issue(struct thin_c *tc, struct bio *bio,
-			    dm_block_t block)
+static void remap_to_origin(struct thin_c *tc, struct bio *bio)
+{
+	bio->bi_bdev = tc->origin_dev->bdev;
+}
+
+static void issue(struct thin_c *tc, struct bio *bio)
 {
 	struct pool *pool = tc->pool;
 	unsigned long flags;

-	remap(tc, bio, block);
-
 	/*
 	 * Batch together any FUA/FLUSH bios we find and then issue
 	 * a single commit for them in process_deferred_bios().
@@ -676,6 +679,19 @@ static void remap_and_issue(struct thin_c *tc, struct bio *bio,
 	}
 }

+static void remap_to_origin_and_issue(struct thin_c *tc, struct bio *bio)
+{
+	remap_to_origin(tc, bio);
+	issue(tc, bio);
+}
+
+static void remap_and_issue(struct thin_c *tc, struct bio *bio,
+			    dm_block_t block)
+{
+	remap(tc, bio, block);
+	issue(tc, bio);
+}
+
 /*
  * wake_worker() is used when new work is queued and when pool_resume is
  * ready to continue deferred IO processing.
@@ -927,7 +943,8 @@ static struct new_mapping *get_next_mapping(struct pool *pool)
 }

 static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
-			  dm_block_t data_origin, dm_block_t data_dest,
+			  struct dm_dev *origin, dm_block_t data_origin,
+			  dm_block_t data_dest,
 			  struct cell *cell, struct bio *bio)
 {
 	int r;
@@ -959,7 +976,7 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	} else {
 		struct dm_io_region from, to;

-		from.bdev = tc->pool_dev->bdev;
+		from.bdev = origin->bdev;
 		from.sector = data_origin * pool->sectors_per_block;
 		from.count = pool->sectors_per_block;

@@ -977,6 +994,22 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	}
 }

+static void schedule_internal_copy(struct thin_c *tc, dm_block_t virt_block,
+				   dm_block_t data_origin, dm_block_t data_dest,
+				   struct cell *cell, struct bio *bio)
+{
+	schedule_copy(tc, virt_block, tc->pool_dev,
+		      data_origin, data_dest, cell, bio);
+}
+
+static void schedule_external_copy(struct thin_c *tc, dm_block_t virt_block,
+				   dm_block_t data_dest,
+				   struct cell *cell, struct bio *bio)
+{
+	schedule_copy(tc, virt_block, tc->origin_dev,
+		      virt_block, data_dest, cell, bio);
+}
+
 static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 			  dm_block_t data_block, struct cell *cell,
 			  struct bio *bio)
@@ -1123,8 +1156,8 @@ static void break_sharing(struct thin_c *tc, struct bio *bio, dm_block_t block,
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
-		schedule_copy(tc, block, lookup_result->block,
-			      data_block, cell, bio);
+		schedule_internal_copy(tc, block, lookup_result->block,
+				       data_block, cell, bio);
 		break;

 	case -ENOSPC:
@@ -1198,7 +1231,10 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 	r = alloc_data_block(tc, &data_block);
 	switch (r) {
 	case 0:
-		schedule_zero(tc, block, data_block, cell, bio);
+		if (tc->origin_dev)
+			schedule_external_copy(tc, block, data_block, cell, bio);
+		else
+			schedule_zero(tc, block, data_block, cell, bio);
 		break;

 	case -ENOSPC:
@@ -1249,7 +1285,11 @@ static void process_bio(struct thin_c *tc, struct bio *bio)
 		break;

 	case -ENODATA:
-		provision_block(tc, bio, block, cell);
+		if (bio_data_dir(bio) == READ && tc->origin_dev) {
+			cell_release_singleton(cell, bio);
+			remap_to_origin_and_issue(tc, bio);
+		} else
+			provision_block(tc, bio, block, cell);
 		break;

 	default:
@@ -2235,6 +2275,8 @@ static void thin_dtr(struct dm_target *ti)
 	__pool_dec(tc->pool);
 	dm_pool_close_thin_device(tc->td);
 	dm_put_device(ti, tc->pool_dev);
+	if (tc->origin_dev)
+		dm_put_device(ti, tc->origin_dev);
 	kfree(tc);

 	mutex_unlock(&dm_thin_pool_table.mutex);
@@ -2243,21 +2285,22 @@ static void thin_dtr(struct dm_target *ti)
 /*
  * Thin target parameters:
  *
- * <pool_dev> <dev_id>
+ * <pool_dev> <dev_id> [origin_dev]
  *
  * pool_dev: the path to the pool (eg, /dev/mapper/my_pool)
  * dev_id: the internal device identifier
+ * origin_dev: a device external to the pool that should act as the origin
  */
 static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
 {
 	int r;
 	struct thin_c *tc;
-	struct dm_dev *pool_dev;
+	struct dm_dev *pool_dev, *origin_dev;
 	struct mapped_device *pool_md;

 	mutex_lock(&dm_thin_pool_table.mutex);

-	if (argc != 2) {
+	if (argc != 2 && argc != 3) {
 		ti->error = "Invalid argument count";
 		r = -EINVAL;
 		goto out_unlock;
@@ -2270,6 +2313,15 @@ static int thin_ctr(struct dm_target *ti, unsigned argc, char **argv)
 		goto out_unlock;
 	}

+	if (argc == 3) {
+		r = dm_get_device(ti, argv[2], FMODE_READ, &origin_dev);
+		if (r) {
+			ti->error = "Error opening origin device";
+			goto bad_origin_dev;
+		}
+		tc->origin_dev = origin_dev;
+	}
+
 	r = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &pool_dev);
 	if (r) {
 		ti->error = "Error opening pool device";
@@ -2322,6 +2374,9 @@ bad_pool_lookup:
 bad_common:
 	dm_put_device(ti, tc->pool_dev);
 bad_pool_dev:
+	if (tc->origin_dev)
+		dm_put_device(ti, tc->origin_dev);
+bad_origin_dev:
 	kfree(tc);
 out_unlock:
 	mutex_unlock(&dm_thin_pool_table.mutex);
--

-- 
1.7.5.4


Gmane