Ian Hinder | 27 May 01:04 2015
Picon

du vs qgroup

Hi,

I have a 350 GB btrfs filesystem in which I am storing backups of virtual machine disk images.  These are
rsynced periodically from the VM host to a "current" subvolume, followed by a snapshot operation to a
dated subvolume.  One disk image is about 50 GB in size, as reported by ls -l and du.  However, the qgroup
assigned to the "current" subvolume reports a "refr" size of 75 GB.  I have performed a "btrfs quota rescan"
to make sure the quota is up-to-date.  I have tried defragmenting the file, but this did not significantly help.

Is there an explanation for the discrepancy between the logical file size and the data used in btrfs to store it?

I have one theory, which is that when the updated VM image is synced into the current subvolume, the changes
affect only a small part of each data storage node (I'm not sure if I'm using the terminology "node"
correctly here), but each node needs to be duplicated due to the COW nature of the filesystem, and the fact
that the nodes are shared with the existing snapshots, so they cannot be rewritten to be more efficient. 
This means that most of the data in such a node is actually duplicated, even though it only counts once
toward the logical size of the file.  I do not know how to determine the node size of my filesystem, but as far
as I can tell from searching, the node size is never more than 65 K.  It seems to me unlikely that such a small
node size could cause the problems I am seeing, 
 but I suppose it's not impossible, especially because this virtual machine disk image hosts a number of git
repositories, typically containing large numbers of small files, which have undergone significant
churn in the past.  If this were the problem, would deduplication help, or does it operate only at the level
of nodes?

I am using Linux version 3.16.0-38-generic, Ubuntu 14.04.2 LTS.  This is from August 2014.  I know that it is
preferable to use the latest kernel for btrfs; Ubuntu provides up to 3.19.0-18, and I would consider
upgrading if this is likely to help the problem.

What is the most up-to-date description of how btrfs stores data? I have found this,
https://oss.oracle.com/~mason/btrfs/btrfs-design.html, for example.

(Continue reading)

Lennert Buytenhek | 26 May 19:36 2015

intermittent -ENOSPC errors on btrfs filesystem with 170G free

Hi!

The btrfs filesystem on my newly installed laptop has managed to
hose itself rather thoroughly, and it's now in a state where it
works okay if you don't write too much to it, but if you do, it
starts returning -ENOSPC on a random subset of your filesystem
operations until you let it cool down again.

This was a fresh Fedora 21 install, upgraded to F22, installed
about a month ago, with a ~250G btrfs filesystem on a 256G SSD,
and this system has only even run 4.0, and it has never had more
than ~60G on it.  It's currently running:

# uname -a
Linux foobox 4.0.1-300.fc22.x86_64 #1 SMP Wed Apr 29 15:48:25 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.0

# btrfs fi show
Label: 'foobox'  uuid: [...]
        Total devices 1 FS bytes used 58.87GiB
        devid    1 size 229.97GiB used 229.97GiB path /dev/[...]

# btrfs fi df /
Data, single: total=227.94GiB, used=58.16GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=730.80MiB
Metadata, single: total=8.00MiB, used=0.00B
(Continue reading)

Timofey Titovets | 26 May 13:23 2015
Picon

RAID1: system stability

Hi list,
I'm regular on this list and I very like btrfs, I want use it on production server, and I want replace hw raid on it.

Test case: server with N scsi discs
2 SAS disks used for raid 1 root fs
If I just remove one disk physically, all okay, kernel show me write errors and system continue work some
time. But after first sync call, example
# sync
# dd if=/Dev/zero of=/zero

Kernel will crush and system freeze.
Yes, after reboot, I can mount with degraded and recovery options, and I can add failed disk again, and btrfs
will rebuild array.
But kernel crush and reboot expected in this case, or I can skip it? How?
# mount -o remount, degraded -> kernel crush
Insert failed disk again -> kernel crush

May be I missing something? I just want avoid shutdown time or/and reboot =.=
fdmanana | 26 May 01:55 2015

[PATCH] Btrfs: fix hang during inode eviction due to concurrent readahead

From: Filipe Manana <fdmanana <at> suse.com>

Zygo Blaxell and other users have reported occasional hangs while an
inode is being evicted, leading to traces like the following:

[ 5281.972322] INFO: task rm:20488 blocked for more than 120 seconds.
[ 5281.973836]       Not tainted 4.0.0-rc5-btrfs-next-9+ #2
[ 5281.974818] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5281.976364] rm              D ffff8800724cfc38     0 20488   7747 0x00000000
[ 5281.977506]  ffff8800724cfc38 ffff8800724cfc38 ffff880065da5c50 0000000000000001
[ 5281.978461]  ffff8800724cffd8 ffff8801540a5f50 0000000000000008 ffff8801540a5f78
[ 5281.979541]  ffff8801540a5f50 ffff8800724cfc58 ffffffff8143107e 0000000000000123
[ 5281.981396] Call Trace:
[ 5281.982066]  [<ffffffff8143107e>] schedule+0x74/0x83
[ 5281.983341]  [<ffffffffa03b33cf>] wait_on_state+0xac/0xcd [btrfs]
[ 5281.985127]  [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31
[ 5281.986715]  [<ffffffffa03b4b71>] wait_extent_bit.constprop.32+0x7c/0xde [btrfs]
[ 5281.988680]  [<ffffffffa03b540b>] lock_extent_bits+0x5d/0x88 [btrfs]
[ 5281.990200]  [<ffffffffa03a621d>] btrfs_evict_inode+0x24e/0x5be [btrfs]
[ 5281.991781]  [<ffffffff8116964d>] evict+0xa0/0x148
[ 5281.992735]  [<ffffffff8116a43d>] iput+0x18f/0x1e5
[ 5281.993796]  [<ffffffff81160d4a>] do_unlinkat+0x15b/0x1fa
[ 5281.994806]  [<ffffffff81435b54>] ? ret_from_sys_call+0x1d/0x58
[ 5281.996120]  [<ffffffff8107d314>] ? trace_hardirqs_on_caller+0x18f/0x1ab
[ 5281.997562]  [<ffffffff8123960b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 5281.998815]  [<ffffffff81161a16>] SyS_unlinkat+0x29/0x2b
[ 5281.999920]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
[ 5282.001299] 1 lock held by rm/20488:
[ 5282.002066]  #0:  (sb_writers#12){.+.+.+}, at: [<ffffffff8116dd81>] mnt_want_write+0x24/0x4b

(Continue reading)

sri | 26 May 10:45 2015
Picon

btrfs on disk stability

Hi,

According to btrfs wiki page, under "Stability status" it is written that

"The filesystem disk format is no longer unstable".

Does this mean if there are more I/Os are going on a btrfs file system, 
copy of entire disk (all disk blocks) gives a stable disk?

Just to elaborate more, if btrfs file system is created on 2 disks 
/dev/sda and /dev/sdb and if I start copying blocks of sda and sdb to sdc 
and sdc respectively by just opening file handlers of sda and sdb and 
mounting the new copy via /dev/sdc and /dev/sdd will give consistent file 
system??

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Felix Koop | 25 May 18:06 2015
Picon

Help needed to recover a RAID5 btrfs

Hello,

I have a RAID5 filesystem where one disk has crashed. Now the filesystem is not
recognized any more. Any help available?

Here is some info:

root <at> server:~# uname -a
Linux server 4.0.0-1-amd64 #1 SMP Debian 4.0.2-1 (2015-05-11) x86_64 GNU/Linux

root <at> server:~# btrfs --version
btrfs-progs v4.0

root <at> server:~# btrfs f sh /dev/Data1vg/afs0
warning, device 3 is missing
warning devid 3 not found already
checksum verify failed on 1111813750784 found 18019A1D wanted FCE227AB
checksum verify failed on 1111813750784 found 18019A1D wanted FCE227AB
bytenr mismatch, want=1111813750784, have=65536
Couldn't read tree root
Label: none uuid: 7d4b023a-a1ef-4991-b01d-31e7c4fdfbcf
 Total devices 3 FS bytes used 213.96GiB
 devid 2 size 150.00GiB used 108.53GiB path /dev/mapper/Data1vg-afs0
 devid 4 size 150.00GiB used 108.53GiB path /dev/mapper/Data3vg-afs0
 *** Some devices missing

btrfs-progs v4.0

root <at> server:~# mount -o degraded -r /dev/Data1vg/afs0 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/mapper/Data1vg-afs0,
(Continue reading)

Kai Krakow | 25 May 17:33 2015
Picon

booting btrfs RAID with dracut/systemd results in open_ctree failed

Hi!

I need to boot with dracut to get my btrfs root partition properly 
initialized (because it is a multi-device btrfs). Today, after upgrading to 
systemd v220, I tracked a booting issue down to what looks like a general 
problem with the btrfs udev rules distributed with systemd:

If I drop down to an emergency shell through rd.break=pre-mount, when trying 
to mount sysroot, I get the error "open_ctree failed" and "BTRFS: failed to 
read the system array". This is generally a problem when probing for btrfs 
devices hasn't been done yet.

So I looked into the dracut sources to find that it brings it's own udev 
rule which properly does this. The caveat however is: If it already finds a 
udev rules for btrfs, it won't install its own rule. The rule in question 
is:

$ cat 64-btrfs.rules
# do not edit this file, it will be overwritten on update

SUBSYSTEM!="block", GOTO="btrfs_end"
ACTION=="remove", GOTO="btrfs_end"
ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"

# let the kernel know about this btrfs filesystem, and check if it is 
complete
IMPORT{builtin}="btrfs ready $devnode"

# mark the device as not ready to be used by the system
ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
(Continue reading)

Liu Bo | 25 May 11:30 2015
Picon

[PATCH 1/2 RESEND] Btrfs: add missing free_extent_buffer

read_tree_block may take a reference on the 'eb', a following
free_extent_buffer is necessary.

Signed-off-by: Liu Bo <bo.li.liu <at> oracle.com>
---
This is based on the latest for-linus-4.1.

 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0ec3acd..a129254 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
 <at>  <at>  -7922,6 +7922,7  <at>  <at>  walk_down:
 			eb = read_tree_block(root, child_bytenr, child_gen);
 			if (!eb || !extent_buffer_uptodate(eb)) {
 				ret = -EIO;
+				free_extent_buffer(eb);
 				goto out;
 			}

--

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Liu Bo | 25 May 11:16 2015
Picon

[PATCH] Btrfs: add missing free_extent_buffer

read_tree_block may take a reference on the 'eb', a following
free_extent_buffer is necessary.

Signed-off-by: Liu Bo <bo.li.liu <at> oracle.com>
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8b353ad..bb8a221 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
 <at>  <at>  -7649,6 +7649,7  <at>  <at>  walk_down:
 			eb = read_tree_block(root, child_bytenr, child_gen);
 			if (!eb || !extent_buffer_uptodate(eb)) {
 				ret = -EIO;
+				free_extent_buffer(eb);
 				goto out;
 			}

--

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo | 25 May 05:20 2015
Picon

[PATCH] Btrfs: remove csum_bytes_left

After commit 8407f553268a
("Btrfs: fix data corruption after fast fsync and writeback error"),
during wait_ordered_extents(), we wait for ordered extent setting
BTRFS_ORDERED_IO_DONE or BTRFS_ORDERED_IOERR, at which point we've
already got checksum information, so we don't need to check
(csum_bytes_left == 0) in the whole logging path.

Signed-off-by: Liu Bo <bo.li.liu <at> oracle.com>
---
 fs/btrfs/ordered-data.c | 7 -------
 fs/btrfs/ordered-data.h | 3 ---
 fs/btrfs/tree-log.c     | 6 ------
 3 files changed, 16 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 157cc54..e3c7540 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
 <at>  <at>  -198,9 +198,6  <at>  <at>  static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 	entry->file_offset = file_offset;
 	entry->start = start;
 	entry->len = len;
-	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) &&
-	    !(type == BTRFS_ORDERED_NOCOW))
-		entry->csum_bytes_left = disk_len;
 	entry->disk_len = disk_len;
 	entry->bytes_left = len;
 	entry->inode = igrab(inode);
 <at>  <at>  -286,10 +283,6  <at>  <at>  void btrfs_add_ordered_sum(struct inode *inode,
 	tree = &BTRFS_I(inode)->ordered_tree;
(Continue reading)

Anthony Plack | 25 May 04:42 2015
Picon

Additional Debug and other various pr_ additions

Would I step on anyone’s toes if I started submitting some extra patches to increase the verbosity of the
BTRFS code in the kernel log?

I would probably start with most things as pr_debug just to keep it quiet on non-debug kernels, but I just
thought that it might add a great deal of clarity to the code base and maybe help sysadmins figure out what is
a BTRFS issue and what is some other issue.

I have read through the Developer, SubmittingPatches, and Coding Style.  I am at peace with git.

I am not promising everything done, but what I can help with doing, I will do.

As of right now there are:

36 	pr_ 
323 	WARN_ON
506 	BUG_ON

107,197 lines of code

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane