Ian Chard | 31 Jul 16:13 2014

cleanup process is 13GB and takes >16 hours at 100% CPU

Hi,

I'm running a cleanup of a large target, and the process I get is 13GB
in size and uses 100% CPU for many hours.  So far it's been running for
over 16 hours, and apart from reading a signatures file every few hours
it's giving no indication of progress.  It's spinning at 100% on the
CPU, so this isn't the remote end being slow.  I'm having to run it on a
spare machine because it's using so much memory.

My command line is:

duplicity -v9 --archive-dir /data/duplicity-archive/ --gpg-options
--homedir=/data/gpg-home/ --encrypt-key xxxxxxxx --asynchronous-upload
--full-if-older-than 10D --allow-source-mismatch --num-retries 1 cleanup
--force cf+http://my.target.name

I'm using duplicity 0.6.24, python 2.7.3, and Debian Wheezy.

Is this a scalability problem of duplicity, or more likely to be a bug?

I've logged a Launchpad bug
(https://bugs.launchpad.net/duplicity/+bug/1350404), but I see there are
many bugs in 'New' status, so I thought I'd try the mailing list too.

Thanks for any help
- Ian

--

-- 
Ian Chard   <ian <at> mysociety.org>
mySociety systems administrator   http://www.mysociety.org/
(Continue reading)

a.grandi@gmail.com | 25 Jul 11:19 2014
Picon

ImportError: cannot import name _librsync

Hi,

I'm trying to re-create the development environment for Duplicity on
two different machines and I always get the same error when I try to
run duplicity.

I've installed all the requirements, checked the source code and finally I do:

python setup.py build

to build the libs. Everything run without any error. Then I do:

PYTHONPATH=. ./bin/duplicity

but I get this error:

Traceback (most recent call last):
  File "./bin/duplicity", line 41, in <module>
    from duplicity import collections
  File "/home/andrea/Documents/development/duplicity-sx/duplicity/collections.py",
line 32, in <module>
    from duplicity import path
  File "/home/andrea/Documents/development/duplicity-sx/duplicity/path.py",
line 38, in <module>
    from duplicity import librsync
  File "/home/andrea/Documents/development/duplicity-sx/duplicity/librsync.py",
line 29, in <module>
    from . import _librsync
ImportError: cannot import name _librsync

(Continue reading)

Sergei G | 24 Jul 10:02 2014
Picon

FYI: pip freeze of my duplicity on mac and installation of duplicity with it

Quick installation instructions for OS X and not only OS X.

If you have virtualenvwrapper configured then:

mkproj MyProjNameLikeBackup

This will create virtual environment with Python 2.7.
then run

pip install -r requirements.txt

the content of requirements.txt is the following block of lines:

duplicity==0.6.24
ecdsa==0.11
lockfile==0.9.1
paramiko==1.14.0
pycrypto==2.6.1
wsgiref==0.1.2

Install Pretty Good Privacy.

Install other protocol specific dependencies, for example ncftp.

I think it is worth including requirements file with duplicity source code.
Sergei G | 24 Jul 09:47 2014
Picon

Feedback based on restore experience

Background: My old Mac Mini failed, so I had to restore data from 
duplicity backup. Unfortunately, I experienced a set of mostly unrelated 
issues with hardware and FreeBSD 10 handling of external USB drives. 
These issues made me to try restore multiple times. My experience ended 
up focused on sftp and later ftp.

Absolutely every attempt was delayed by a single type of error message:
     CollectionsError: No backup chains found
I provide a full stack trace at the end.

The problem with this error is that it reports a consequence and not an 
issue. Duplicity failed to get to the desired source directory, but it 
fails to report the preceding error.

For example, If FTP user's root is /home/userName, but backup is at 
/mnt/extufs/, FTP client will fail to cd to the new directory, while 
interactive console will have no problem.  FTP failure to change path is 
not properly reported and that's the preceding error.

 From what I remember the absolute form of sftp was one of the reasons 
for the same error message. I think sftp properly reports failure to cd 
to the remote directory. In my case, I was using 
sftp://user <at> host/abs/path/to/dir, but I should've used 
sftp://user <at> host//abs/path/to/dir. RTFM, but reporting a hint would be nice.

I usually had to find answers using other programs to questions:

1. What's my login remote directory? Is it the same as in other program 
I used?
2. What's my current remote directory when error occurred?
(Continue reading)

Rubin Abdi | 23 Jul 03:57 2014
Picon

Re: Backing up desktop virtual machines?

Laurence Perkins (OE) wrote on 2014-07-22 09:52:
> My attempts to reply to the list don't seem to be getting through, so
> I'll send you my best guess as to the problem directly and you can
> forward it on to the list if I am correct.

Maybe the list admin is around and will see this.

> Duplicity only backs up changed parts of the file, but must scan the
> entire file to determine which parts have changed.  If your .VDIs for
> your VMs are large, this will take a while.  Try snapshotting the VMs.
> This will cause all changes to the disks to be written to a separate,
> potentially much smaller file.  If you snapshot after each incremental,
> then all the changes for the VM will be in their own, small file that
> duplicity can just pick up and add to the archive without having to scan
> the big, unchanged root VDI.

Thanks to both you and Edgar for pointing that out. I had no idea.
That's kind of awesome. It now makes sense that the slow down is simply
from Duplicity having to deal with scanning a large file in order to
find diffs.

> Then, before running your next full backup, merge all the snapshots back
> into the main image.

I'll try using Virtualbox with snapshots and see how that goes. I've
only been using Duplicity for a month. How often should I do a full
backup if I'm running it for my whole laptop?

Thanks!

(Continue reading)

Aaron Whitehouse | 21 Jul 23:06 2014
Picon

Re: Behaviour/Man page of --verify and --compare-restore (was Big bandwidth bill from Rackspace)

Sending this again as it doesn't seem the first one made it to the list.
From: edgar.soldin <at> web.de
On 10.06.2013 21:56, James Patterson wrote:
Thanks. I would suggest: Enter verify mode instead of restore. This will restore each file from the latest backup and compare it to the local copy. If the --file-to-restore option is given, restrict verify to that file or directory. duplicity will exit with a non-zero error level if any files are different. On verbosity level 4 or higher, log a message for each file that has changed.
it's taken a year but good things etc. here is the change's branch https://code.launchpad.net/~ed.so/duplicity/manpage.verify
That does seem a better description of how I understand the behaviour to work, but I am still not convinced that description is completely correct or even what verify should actually be doing. Kenneth explained the design of verify here (comment #13): https://answers.launchpad.net/duplicity/+question/116587
"Duplicity does verify the contents of the archives *as they were*, it does not do a comparison with the contents on the filesystem. Verify is done by comparing the archive contents with the stored signatures, i.e. the original file with its hash value."
So on that basis, saying that it will restore each file and compare it to the local copy is a little misleading, though the latest manpage does clarify that you need the --compare-data option to enable data comparison. I have suggested some text below. This has also reminded me of a discussion that we have had a few times now. In my view, by default verify should not be concerned with the current contents of the filesystem at all, whether that is actual file contents or timestamps. This is particularly the case now that we have the --compare-data option that people can use if they want this functionality. If you agree, feel free to fast-forward to the end and let me know - the rest of this traverses the various comments we have had on this topic to date, so that we don't go over the ground again. As per Kenneth (again, comment #13): https://answers.launchpad.net/duplicity/+question/116587
"The assumption is that the filesystem will probably change shortly after backup. What you look for in a verify is a check to see if the backup is stored properly and can be restored. If you want a comparison function, you'll need to restore and compare the original with the restored files, or provide a direct comparison function for us to integrate into duplicity. If you want to test verify, backup to a local file system, hexedit one of the archives and try to verify. It will fail to verify. You can modify the original files at will, and verify will succeed, as it is designed to do."
I agree with that design decision, though I don't believe verify will in fact succeed if one modifies original files - as even though the original file contents are not checked (unless the new --compare-data option is used), the timestamps are. As per Edgar (comment #1): https://bugs.launchpad.net/duplicity/+bug/644816 "
we really should remove the functionality that verify in addition to checking the backups integrity is comparing dates/modtimes with the backups source. here a citation from the mailing list lately: -->
2. Why we get 'Difference found: File etc/resolv.conf has mtime Wed Jan 19 09:49:14 2011, expected Wed Jan 19 00:21:25 2011' lines on this process?
confusing isn't it. For reasons not transparent to me, additionally to verifying the backed up data, verify also compares the date with the source. This should be removed from my point of view. It could be part of a new command compare, which actually really compares backup with source. <--"
Peter Schuller echoed this sentiment (https://answers.launchpad.net/duplicity/+question/116587 comment #16) :
" If the intent of verify is just to verify internal integrity, why is a file system even involved in the process (i.e., why even compare a file system hierarchy at all)?"
As I mentioned here (http://nongnu.13855.n7.nabble.com/Big-bandwidth-bill-from-Rackspace-td169114.html#a169136), this causes me issues because duplicity errors when the file-system changes shortly after a backup (Kenneth's "assumption" mentioned above). We now have that new separate command (--compare-data). Consistent with the various comments to date, I therefore propose that the comparison of dates/modtimes is only carried out if --compare-data is used. On that basis, verify would not give an error if the file-system changes after the backup, so long as it can restore the files and they match the signatures from the time of the backup. If we are all agreed conceptually, I will file a bug and have a go at making this work. I would also then suggest the above man page read: "Enter verify mode instead of restore. Verify tests the integrity of the backup archives at the remote location by checking each file can restore and that the restored file matches the signature of that file stored in the backup, i.e. compares the archived file with its hash value at archival time. Verify does not actually restore and will not overwrite any local files. If the --file-to-restore option is given, it will restrict verify to that file or directory. The --time option allows the selection of a specific backup to verify. Duplicity will exit with a non-zero error level if any files do not match the signature stored in the archive for that file. On verbosity level 4 or higher, it will log a message for each file that differs from the stored signature. Files must be downloaded to the local machine in order to compare them. Verify does not compare the backed-up version of the file to the current local copy of the files unless the --compare-data option is used (see below)." Aaron
_______________________________________________
Duplicity-talk mailing list
Duplicity-talk <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Rubin Abdi | 21 Jul 07:50 2014
Picon

Backing up desktop virtual machines?

Hello.

So I've been using Duplicity for the last week (wrapped with duply) and
it's been great. My only current sore spot is if I touch any of my
virtual machines (through Virtual Box), a backup jumps from 15 minutes
to over 2 hours. And so I have three questions.

The first one I'm pretty sure the answer is no, is there any way to
backup only the changes within the vdi volume container file to my
Duplicity session?

Given that the answer to the first question is no, is there any sane way
of ignoring my virtual machine directory for incremental backups until I
decided it's time to also include the new changes to the virtual machine
directory?

And so, is there any way to have Duplicity only maintain two versions of
files in a particular directory while still having that be included in a
full system backup?

If the answer two question two is either no, or too much of a pain, I'm
guessing my only solution really is to have a separate Duplicity session
for the virtual machines that I run once in a while and only maintain
two revisions.

Am I thinking correctly about all this? Thanks!

--

-- 
Rubin
rubin <at> starset.net

_______________________________________________
Duplicity-talk mailing list
Duplicity-talk <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Erik Romijn | 20 Jul 19:39 2014

duplicity collection status slowness

Hello all,

I'm using duplicity to run a few backups for my servers, and generally found it to work very well. However,
although my data is incredibly tiny, duplicity has become incredibly slow, which I think I've narrowed
down to the collection status process.

My source file size is only 20MB, but running this backup takes about 7 minutes, and is almost completely cpu
bound. Running the collection status takes nearly the same amount, so it would seem that this is where the
slowness comes from.

I make incremental backups every 15 minutes, with a full backup after 23 hours, so 92 sets per day. I
currently have 19 backup chains, according to collection status, and there are no orphaned or incomplete
sets.  The source file size is about 20MB. In total the destination volume is 154MB now. Running verify
confirms that the backups are correct.

These numbers are for the backups of my /var/log, but I have another backup of an unrelated directory of
about 300MB on the same backup schema, which shows similar numbers for collection status.

One workaround would be for me to move the files away from the duplicity destination, so that the total
collection appears smaller. But that leaves me to wonder: why does collection status take so much time,
particularly considering it's cpu bound?

I'm running duplicity 0.6.23 with python 2.7.6 on an Ubuntu 14.04 VPS.

The full duplicity command line I use is:
/usr/bin/duplicity --full-if-older-than 23h --encrypt-sign-key [...] --verbosity info
--ssh-options=-oIdentityFile=/root/.ssh/backup_rsa --exclude-globbing-filelist
/root/duplicity_log_exclude_filelist.txt /var/log sftp://[...] <at> [...]/[...]/backups/log

Can anyone here provide insights into what might be the issue, and what would be the best approach to tackle this?

cheers,
Erik
a.grandi@gmail.com | 3 Jul 22:32 2014
Picon

local_path.exists() fails when testing a new backend

Hi guys,

as you probably know I'm working on the Duplicity backend for Skylable
service. I'm basing my work on your "devel" branch (0.7.x) and you can
find my work in progress here:
https://github.com/andreagrandi/duplicity-sx

Now, let's go to the issue. It looks like I've problems implementing
the _get method of my backend.

On high level, this is the error I get when I execute a command like this:

PYTHONPATH=. ./bin/duplicity -v9 sx://indian.skylable.com/vol-andy80/ ./test-bkp

output here: http://pastebin.com/JF4LuWXZ

Debugging my code with ipdb I can see this:

> /home/andrea/Downloads/duplicity/duplicity/backends/sxbackend.py(39)_get()
     38         commandline = "sxcp {0} {1}".format(remote_path,
local_path.name)
---> 39         self.subprocess_popen(commandline)
     40

ipdb> n
> /home/andrea/Downloads/duplicity/duplicity/backend.py(541)get()
    540             self.backend._get(remote_filename, local_path)
--> 541             if not local_path.exists():
    542                 raise BackendException(_("File %s not found
locally after get "

ipdb> local_path.exists()
ipdb> local_path
(() /tmp/duplicity-V8v6rE-tempdir/mktemp-0tsmWi-2 None)

as you can see the .exists() method doesn't return anything, while
inspecting the object it contains a file that I've verified being
existing (the only two things that I don't understand are: is that
empty tumple at the beginning ok? It's the None at the end ok?):

andrea-Inspiron-660:duplicity andrea [master] $ ls -al
/tmp/duplicity-V8v6rE-tempdir/mktemp-0tsmWi-2
-rw-rw-r-- 1 andrea andrea 4455795 lug  3 21:20
/tmp/duplicity-V8v6rE-tempdir/mktemp-0tsmWi-2

The code of course fails because tha exists() fails:

> /home/andrea/Downloads/duplicity/duplicity/backend.py(542)get()
    541             if not local_path.exists():
--> 542                 raise BackendException(_("File %s not found
locally after get "
    543                                          "from backend") %
util.ufn(local_path.name))

Now my question is: why the .exists() fails if the path exists? What's
wrong with my code?

Thank you so much. Cheers.

--

-- 
Andrea Grandi -  Software Engineer / Qt Ambassador / Nokia Developer Champion
website: http://www.andreagrandi.it
Benoit Tigeot | 24 Jun 16:10 2014
Picon

File listed but impossible to restore

Hello

I'm trying a restore a config file but have few problems. I'm using Duplicity backup threw the script of
Zertrin (https://github.com/zertrin/duplicity-backup) when I'm listing the file

~/duplicity-backup# ./duplicity-backup.sh -c duplicity-backup.conf --list-current-files | grep default.cfg

Tue may 20 14:59:05 2014 home/martin/.willie/default.cfg

So ok I've got the file 

When I'm trying to do a restore : ./duplicity-backup.sh -c duplicity-backup.conf --restore-file
home/martin/.willie/default.cfg /home/martin/.willierestore/

YOU ARE ABOUT TO...
>> RESTORE: home/martin/.willie/default.cfg
>> TO: /home/martin/.willierestore/

Are you sure you want to do that ('yes' to continue)?
yes
Restoring now ...

But .willierestore is empty. If I do ./duplicity-backup.sh -c duplicity-backup.conf --restore-dir
home/martin/.willie /home/martin/.willierestore/
I get lot's of file but not the 'default.cfg'

What's happen?

Thanks
Henri Salo | 19 Jun 17:21 2014
Picon

CVE-2014-3495 duplicity: improper verification of SSL certificates

Eric Christensen of Red Hat Product Security reported [1] that Duplicity did not
handle wildcard certificates properly.  If Duplicity were to connect to a remote
host that used a wildcard certificate, and the hostname does not match the
wildcard, it would still consider the connection valid.

1: https://bugs.launchpad.net/duplicity/+bug/1314234

Why is that upstream bug report still embargoed? Is there a fix for this
security issue already? If yes - what version or source control revision?

Debian: https://bugs.debian.org/751902
RedHat: https://bugzilla.redhat.com/show_bug.cgi?id=1109999

---
Henri Salo
_______________________________________________
Duplicity-talk mailing list
Duplicity-talk <at> nongnu.org
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

Gmane