Tom Lane | 1 Sep 01:21 2007
Picon

Re: WAL to RAW devices ?

Alex Vinogradovs <AVinogradovs <at> clearpathnet.com> writes:
>  The idea is to have say 2 raw devices which would be used as 2 WAL
> segments (round-robin). RO servers will go after the one that's not used
> at a given time with something like xlogdump utility and produce INSERT
> statements to be then executed locally. After that import is done, a
> command will be issued to the WO server to switch to the other segment
> so that the cycle can repeat.

Why would you insist on these being raw devices?  Do you enjoy writing
filesystems from scratch?

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo <at> postgresql.org so that your
       message can get through to the mailing list cleanly

Alex Vinogradovs | 1 Sep 01:49 2007

Re: WAL to RAW devices ?

WAL segments already have their structure. Filesystem would be an
overhead, plus I meantioned access to the same storage from
multiple hosts - no filesystem mounting, synchronization and
other problems.

I figured PG folks aren't interested in adding enterprise-level storage
functionality (movable tablespaces, raw devices for tablespaces, etc),
thus I foresee the model described as the only way to achieve somewhat
decent performance in a stressed environment.

On Fri, 2007-08-31 at 19:21 -0400, Tom Lane wrote:
> Alex Vinogradovs <AVinogradovs <at> clearpathnet.com> writes:
> >  The idea is to have say 2 raw devices which would be used as 2 WAL
> > segments (round-robin). RO servers will go after the one that's not used
> > at a given time with something like xlogdump utility and produce INSERT
> > statements to be then executed locally. After that import is done, a
> > command will be issued to the WO server to switch to the other segment
> > so that the cycle can repeat.
> 
> Why would you insist on these being raw devices?  Do you enjoy writing
> filesystems from scratch?
> 
> 			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Tom Lane | 1 Sep 02:08 2007
Picon

Re: WAL to RAW devices ?

Alex Vinogradovs <AVinogradovs <at> Clearpathnet.com> writes:
> WAL segments already have their structure. Filesystem would be an
> overhead,

Just because you'd like that to be true doesn't make it true.  We have
to manage a variable number of active segments; track whether a given
segment is waiting for future use, active, waiting to be archived, etc;
manage status signaling to the archiver process; and so on.  Now I'll
freely admit that using a filesystem is only one of the ways that those
problems could be attacked, but that's how they've been attacked in
Postgres.  If you want to not have that functionality present then
you'd need to rewrite all that code and provide some other
infrastructure for it to use.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Alex Vinogradovs | 1 Sep 02:18 2007

Re: WAL to RAW devices ?

But would it be a problem to have only 1 active segment at all times ?
My inspiration pretty much comes from Oracle, where redo logs are
pre-configured and can be switched by a command issued to the instance.

> Just because you'd like that to be true doesn't make it true.  We have
> to manage a variable number of active segments; track whether a given
> segment is waiting for future use, active, waiting to be archived, etc;
> manage status signaling to the archiver process; and so on.  Now I'll
> freely admit that using a filesystem is only one of the ways that those
> problems could be attacked, but that's how they've been attacked in
> Postgres.  If you want to not have that functionality present then
> you'd need to rewrite all that code and provide some other
> infrastructure for it to use.
> 
> 			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Alvaro Herrera | 1 Sep 02:23 2007

Re: WAL to RAW devices ?

Alex Vinogradovs wrote:
> WAL segments already have their structure. Filesystem would be an
> overhead,

In this case you can choose a filesystem with lower overhead.  For
example with WAL you don't need a journalling filesystem at all, so
using ext2 is not a bad idea.  For Pg data files, you need journalling
of metadata only, not of data; the latter is provided by WAL.  So you
can mount the data filesystem with the option data=writeback.

--

-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"All rings of power are equal,
But some rings of power are more equal than others."
                                 (George Orwell's The Lord of the Rings)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Alex Vinogradovs | 1 Sep 02:31 2007

Re: WAL to RAW devices ?

Probably you missed that part... In my setup, I need at least
2 boxes going after those files, while 3rd box keeps on writing
to them... I can't mount ext2 even in R/O mode while it's being
written to by another guy. I can't unmount it before mounting
exclusively on any of them either, since PG will be writing to
that location. The only way is to do the WAL shipping, which
probably wouldn't be that bad since the copying would be done
via DMA, but still isn't as good as it could be since that would
utilize the same spindles...

On Fri, 2007-08-31 at 20:23 -0400, Alvaro Herrera wrote:
> Alex Vinogradovs wrote:
> > WAL segments already have their structure. Filesystem would be an
> > overhead,
> 
> In this case you can choose a filesystem with lower overhead.  For
> example with WAL you don't need a journalling filesystem at all, so
> using ext2 is not a bad idea.  For Pg data files, you need journalling
> of metadata only, not of data; the latter is provided by WAL.  So you
> can mount the data filesystem with the option data=writeback.
> 

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Joshua D. Drake | 1 Sep 02:45 2007

Re: WAL to RAW devices ?


Alex Vinogradovs wrote:
> Hi guys,
> 
> 
> I've got a bunch of PosgreSQL servers connected to external storage,
>  where a single server needs to be serving as WO database dealing with
> INSERTs only, and bunch of other guys need to obtain a copy of that
> data for RO serving, without taking resources on WO server.

You can't do that with PostgreSQL without replication. Unless you are
willing to have outages with your RO servers to apply the logs.

Further you are considering the wrong logs. It is not the WAL logs, but
the archive logs that you need.

Sincerely,

Joshua D. Drake

>  The idea is to have say 2 raw devices which would be used as 2 WAL
> segments (round-robin). RO servers will go after the one that's not used
> at a given time with something like xlogdump utility and produce INSERT
> statements to be then executed locally. After that import is done, a
> command will be issued to the WO server to switch to the other segment
> so that the cycle can repeat.
>  The objective of that replication model is to ensure that SELECT
> queries won't ever affect the performance of the WO server,
> which may experience uneven loads.
> 
(Continue reading)

Alex Vinogradovs | 1 Sep 02:45 2007

Re: WAL to RAW devices ?

Oh well, I guess I will just use some trigger to invoke a C
function and store the statements in a raw device with some
proprietary format, while the actual inserts don't take place
at all.

In case anyone has more ideas, please let me know.

On Fri, 2007-08-31 at 17:45 -0700, Joshua D. Drake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Alex Vinogradovs wrote:
> > Hi guys,
> > 
> > 
> > I've got a bunch of PosgreSQL servers connected to external storage,
> >  where a single server needs to be serving as WO database dealing with
> > INSERTs only, and bunch of other guys need to obtain a copy of that
> > data for RO serving, without taking resources on WO server.
> 
> You can't do that with PostgreSQL without replication. Unless you are
> willing to have outages with your RO servers to apply the logs.
> 
> Further you are considering the wrong logs. It is not the WAL logs, but
> the archive logs that you need.
> 
> Sincerely,
> 
> Joshua D. Drake
> 
(Continue reading)

Alvaro Herrera | 1 Sep 03:01 2007

Re: WAL to RAW devices ?

Alex Vinogradovs wrote:
> Probably you missed that part... In my setup, I need at least
> 2 boxes going after those files, while 3rd box keeps on writing
> to them... I can't mount ext2 even in R/O mode while it's being
> written to by another guy. I can't unmount it before mounting
> exclusively on any of them either, since PG will be writing to
> that location. The only way is to do the WAL shipping, which
> probably wouldn't be that bad since the copying would be done
> via DMA, but still isn't as good as it could be since that would
> utilize the same spindles...

Oh, I see.

What I've seen described is to put a PITR slave on a filesystem with
snapshotting ability, like ZFS on Solaris.

You can then have two copies of the PITR logs.  One gets a postmaster
running in "warm standby" mode, i.e. recovering logs in a loop.  The
other one, in a sort of jail (I don't know the Solaris terminology for
this) stops the recovery and enters normal mode.  You can query it all
you like at that point.

Periodically you stop the server in normal mode, resync the snapshot
(which basically resets the "modified" block list in the filesystem),
take a new snapshot, create the jail and stop the recovery mode again.
So you have a fresher postmaster for queries.

It's not as good as having a true hot standby, for sure.  But it seems
it's good enough while we wait.

(Continue reading)

Alex Vinogradovs | 1 Sep 03:10 2007

Re: WAL to RAW devices ?

Yeah, that's the trick... I need high availability with
high performance and nearly real-time synchronization ;-)
Also, I've got FreeBSD here... ZFS will be out with 7.0
release, plus UFS2 has snapshotting capability too. But
the whole method isn't good enough anyway. 

> Oh, I see.
> 
> What I've seen described is to put a PITR slave on a filesystem with
> snapshotting ability, like ZFS on Solaris.
> 
> You can then have two copies of the PITR logs.  One gets a postmaster
> running in "warm standby" mode, i.e. recovering logs in a loop.  The
> other one, in a sort of jail (I don't know the Solaris terminology for
> this) stops the recovery and enters normal mode.  You can query it all
> you like at that point.
> 
> Periodically you stop the server in normal mode, resync the snapshot
> (which basically resets the "modified" block list in the filesystem),
> take a new snapshot, create the jail and stop the recovery mode again.
> So you have a fresher postmaster for queries.
> 
> It's not as good as having a true hot standby, for sure.  But it seems
> it's good enough while we wait.
> 

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

(Continue reading)


Gmane