Sage Weil | 1 May 01:07 2012
Picon

v0.46 released

Another sprint, and v0.46 is ready.  Big items in this release include:

 * rbd: new caching mode (see below)
 * rbd: trim/discard support
 * cluster naming
 * osd: new default locations (slimmer .conf files, see below)
 * osd: various journal replay fixes for non-btrfs file systems
 * log: async and multi-level logging (see below)

The biggest new item here is the new RBD (librbd) cache mode that Josh has 
been working on.  This reuses a caching module that ceph-fuse and 
libcephfs have used for ages, so the cache portion of the code is 
well-tested, but the integration with librbd is new, and there are some 
(rare) failure cases that are not yet handled in this version. We 
recommend it for performance and failure testing at this stage, but not 
for production use just yet-wait for v0.47.  librbd also got trim/discard 
support.  Patches for wire it up to qemu are still working their way 
upstream (and won't work for virtio until virtio gets discard support).

We've revamped some of the default locations for data directories and log 
files and incoporated a cluster name configurable.  By default, the 
cluster name is `ceph', and the config file is /etc/ceph/$cluster.conf (so 
ceph.conf is still the default).  The $cluster substitution variable is 
used the other default locations, allowing the same host to contain 
daemons participating in different clusters.  All data defaults to 
/var/lib/ceph/$type/$cluster-$id (e.g., /var/lib/ceph/osd/ceph-123 for 
osd_data), and logs go to /var/log/ceph/$cluster.$type.$id.  You can, of 
course, still override these with your own locations as before.

There is also new logging code that allows the daemons to gather debug 
(Continue reading)

Greg Farnum | 1 May 18:36 2012

Re: global_init fails when only specifying monitor address

What we ended up merging into master is a change so that if you don't specify a configuration file, you'll get
a small warning printed out but the program will continue executing. We may remove the warning in future,
but right now we thought it was important to leave in since using the default parameters can result in some
truly bizarre error messages if they don't match what the rest of the system is using. 
-Greg

On Saturday, April 28, 2012 at 6:10 PM, Colin McCabe wrote:

> There's no technical reason why you can't simply remove these lines:
> 
> > if (ret == -EDOM) {
> > dout_emergency("global_init: error parsing config file.\n");
> > _exit(1);
> > }
> 
> 
> 
> Without a configuration file, you'll just get the defaults for
> everything. Wido, it sounds like that is what you're already doing by
> passing in "-c /dev/null"
> 
> So why didn't we do that in the first place? It was basically a
> philosophical reason. We have a section in the configuration file for
> clients. If clients don't use that section, then it would seem kind
> of pointless to have it. If I were a system administrator, I might be
> annoyed by accidentally failing to set CEPH_CONF in my .bashrc, and
> getting a bunch of defaults flooding in when I ran /usr/bin/ceph (or
> some other tool).
> 
> With that being said, there is a case to be made that we should let
(Continue reading)

Yehuda Sadeh Weinraub | 1 May 19:59 2012
Picon

Re: [PATCH] Wireshark dissector updated, work with the current development tree of wireshark. The way I patched it is not really clean, but it can be useful if some people quickly need to inspect ceph network flows.

Hi,

this is really awesome, thanks.

   Sorry for the late response. We got another wireshark update
out-of-list a few weeks ago, and I assumed it was the same one.
Comparing them both now, it is obvious that these are completely two
different implementations. Both require cleanups, and from what I can
tell need extra work to get them upstream (that is, upstream to
wireshark). I'll try to push both into branches so that the work is
not lost and is readily useful.

The extra work that needs to be done and that the original dissector
suffered from is (AFAIR) that we dereference structures directly,
which shouldn't be done according to the wireshark style guide. I'm
not too sure now whether we do it all around or on just packed
structures, and whether it matters if it's the latter, but that should
be looked into. It'd be nice if someone could pick that up and work
through whatever is needed to get it upstream.

Thanks,
Yehuda

On Wed, Apr 25, 2012 at 7:23 AM, Pierre Rognant <prognant <at> oodrive.fr> wrote:
>
> From: Pierre Rognant <prognant <at> oodrive.com>
>
> ---
>  wireshark/ceph/ceph_fs.h     |  213 +++++++++++++++-----
>  wireshark/ceph/crc32c.h      |    2 +-
(Continue reading)

Nick Bartos | 2 May 01:24 2012

Disabling logs in the mon data directory?

I'm trying to get all logs going to syslog, and disable all ceph log
files.  I added the following to [global]:

    log file = ""
    log to syslog = true

Which I thought would do the trick, however I see that there are still
logs generated in the monitor data directory.  I also tried adding the
following to [global], but it doesn't seem to help:

    clog to monitors = false
    clog to syslog = true

What is the recommended way to get all logs going to syslog, and
disabling any ceph-specific log files?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Colin McCabe | 2 May 04:06 2012
Picon

Re: rados import/export

(sorry for repost, vger was cranky about the first message)

The code you are interested in is in src/rados_sync.cc.

It would have been impossible to preserve rados object names fully,
since they can be looong (it was either 2k or 4k, I forget), and most
Linux local filesystems like ext3 can only hold 256 bytes in a single
path component.

The solution was to store the true name in an extended attribute of
the locally exported file, and make the local name simply an
approximation of that true name.  If things get hairy, it appends a
hash of the true name to the end of the mangled name.  Rados import
ignores the mangled names, and checks the truename stored in the
user.rados_full_name xattr.

Is this a complete solution?  No.  Two objects could have names that
mangle to the same short name, and also have the same hash code.  If
you are interested in implementing the complete solution, simply use
an incrementing counter rather than a hash.

Another note.  If these omaps you speak of are big, you are probably
stuck using a separate file.  Don't forget the rather short limits
that ext3/ext4 puts on xattrs.  It should be straightforward enough
just to create a $FOO.omap for every $FOO.

cheers,
Colin

On Mon, Apr 30, 2012 at 2:20 PM, Tommi Virtanen
(Continue reading)

Vladimir Bashkirtsev | 3 May 00:28 2012

Possible memory leak in mon?

Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM 
while third one happily lived with 150M. This immediately prompts few 
questions:

1. What is expected memory use of mon? I believed that mon merely 
directs clients to relevant OSDs and should not consume a lot of 
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by 
disk-full condition and both machines where incidents happened are 64 
bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
same partition - ceph osd produced a lot of messages, filled up disk, 
mon crashed (no core as disk was full), manually deleted logs, restarted 
mon without any issue, some time later found mon using 4G of RAM. 
Running 0.45. Should I deliberately recreate conditions and crash mon to 
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
potentially can consume more than 4G?

Regards,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Vladimir Bashkirtsev | 3 May 00:49 2012

Possible memory leak in mon?

Dear devs,

I have three mons and two of them suddenly consumed around 4G of RAM 
while third one happily lived with 150M. This immediately prompts few 
questions:

1. What is expected memory use of mon? I believed that mon merely 
directs clients to relevant OSDs and should not consume a lot of 
resources - please correct me if I am wrong.
2. In both cases where mon consumed a lot of memory it was preceded by 
disk-full condition and both machines where incidents happened are 64 
bit, rest of cluster 32 bit. mon fs and log files happened to be in the 
same partition - ceph osd produced a lot of messages, filled up disk, 
mon crashed (no core as disk was full), manually deleted logs, restarted 
mon without any issue, some time later found mon using 4G of RAM. 
Running 0.45. Should I deliberately recreate conditions and crash mon to 
get more debug info (if you need it of course, and if yes then what)?
3. Does figure 4G per process coming from 32 bit pointers in mon? Or mon 
potentially can consume more than 4G?
4. I guess it is good idea to keep mon fs on separate partition so it 
would not experience disk-full state. Currently it is around 80Mb while 
whole ceph 42% full of 2100Gb with 6 OSDs and 600 pgs. Can you provide 
some idea how to estimate mon fs size?

Regards,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

Clint Byrum | 3 May 00:42 2012

Re: weighted distributed processing.

Excerpts from Joseph Perry's message of Wed May 02 15:05:23 -0700 2012:
> Hello All,
> First off, I'm sending this email to three discussion groups:
> gearman <at> googlegroups.com - distributed processing library
> ceph-devel <at> vger.kernel.org - distributed file system
> archivematica <at> googlegroups.com - my project's discussion list, a 
> distributed processing system.
> 
> I'd like to start a discussion about something I'll refer to as weighted 
> distributed task based processing.
> Presently, we are using gearman's library's to meet our distributed 
> processing needs. The majority of our processing is file based, and our 
> processing stations are accessing the files over an nfs share. We are 
> looking at replacing the nfs server share with a distributed file 
> systems, like ceph.
> 
> It occurs to me that our processing times could theoretically be reduced 
> by by assigning tasks to processing clients where the file resides, over 
> places where it would need to be copied over the network. In order for 
> this to happen, the gearman server would need to get file location 
> information from the ceph system.
> 

If I understand the design of CEPH completely, it spreads I/O at the
block level, not the file level.

So there is little point in weighting since it seeks to spread the whole
file across all the machines/block devices in the cluster. Even if you
do ask ceph "which servers is file X on", which I'm sure it could tell
you, You will end up with high weights for most of the servers, and no
(Continue reading)

Greg Farnum | 3 May 01:26 2012

Re: weighted distributed processing.


On Wednesday, May 2, 2012 at 3:42 PM, Clint Byrum wrote:

> Excerpts from Joseph Perry's message of Wed May 02 15:05:23 -0700 2012:
> > Hello All,
> > First off, I'm sending this email to three discussion groups:
> > gearman <at> googlegroups.com (mailto:gearman <at> googlegroups.com) - distributed processing library
> > ceph-devel <at> vger.kernel.org (mailto:ceph-devel <at> vger.kernel.org) - distributed file system
> > archivematica <at> googlegroups.com (mailto:archivematica <at> googlegroups.com) - my project's
discussion list, a  
> > distributed processing system.
> >  
> > I'd like to start a discussion about something I'll refer to as weighted  
> > distributed task based processing.
> > Presently, we are using gearman's library's to meet our distributed  
> > processing needs. The majority of our processing is file based, and our  
> > processing stations are accessing the files over an nfs share. We are  
> > looking at replacing the nfs server share with a distributed file  
> > systems, like ceph.
> >  
> > It occurs to me that our processing times could theoretically be reduced  
> > by by assigning tasks to processing clients where the file resides, over  
> > places where it would need to be copied over the network. In order for  
> > this to happen, the gearman server would need to get file location  
> > information from the ceph system.
>  
>  
>  
> If I understand the design of CEPH completely, it spreads I/O at the
> block level, not the file level.
(Continue reading)

Greg Farnum | 3 May 01:30 2012

Re: weighted distributed processing.

(Trimmed CC:) apparently neither Gearman nor Archivematica lists allow posting from non-members, which
leads to some wonderful spam from Google and is going to make holding a cross-list
conversation…difficult.  

On Wednesday, May 2, 2012 at 4:26 PM, Greg Farnum wrote:

>  
>  
> On Wednesday, May 2, 2012 at 3:42 PM, Clint Byrum wrote:
>  
> > Excerpts from Joseph Perry's message of Wed May 02 15:05:23 -0700 2012:
> > > Hello All,
> > > First off, I'm sending this email to three discussion groups:
> > > gearman <at> googlegroups.com (mailto:gearman <at> googlegroups.com) - distributed processing library
> > > ceph-devel <at> vger.kernel.org (mailto:ceph-devel <at> vger.kernel.org) - distributed file system
> > > archivematica <at> googlegroups.com (mailto:archivematica <at> googlegroups.com) - my project's
discussion list, a  
> > > distributed processing system.
> > >  
> > > I'd like to start a discussion about something I'll refer to as weighted  
> > > distributed task based processing.
> > > Presently, we are using gearman's library's to meet our distributed  
> > > processing needs. The majority of our processing is file based, and our  
> > > processing stations are accessing the files over an nfs share. We are  
> > > looking at replacing the nfs server share with a distributed file  
> > > systems, like ceph.
> > >  
> > > It occurs to me that our processing times could theoretically be reduced  
> > > by by assigning tasks to processing clients where the file resides, over  
> > > places where it would need to be copied over the network. In order for  
(Continue reading)


Gmane