Edgar Fuß | 24 May 18:26
Picon
Favicon

raw/block device disc troughput

It seems that I have to update my understanding of raw and block devices
for discs.

Using a (non-recent) 6.0_BETA INSTALL kernel and an ST9146853SS 15k SAS disc
behind an LSI SAS 1068E (i.e. mpt(4)), I did a
	dd if=/dev/zero od=/dev/[r]sd0b bs=nn, count=xxx.
For the raw device, the troughput dramatically increased with the block size:
	Block size		16k	64k	256k	1M
	Troughput (MByte/s)	4	15	49	112
For the block device, throughput was around 81MByte/s independent of block size.

This surprised me in two ways:
1. I would have expected the raw device to outperform the block devices
   with not too small block sizes.
2. I would have expected inceasing the block size above MAXPHYS not
   improving the performance.

So obviously, my understanding is wrong.

I then build a RAID 1 with SectorsPerSU=128 (e.g. a 64k stripe size) on two
of these discs, and, after the parity initialisation was complete, wrote
to [r]raid0b.
On the raw device, throghput ranged from 4MByte/s to 97MByte/s depending on bs.
On the block device, it was always 3MByte/s. Furthermore, dd's WCHAN was
"vnode" for the whole run. Why is that so and why is throughput so low?

Christos Zoulas | 24 May 01:37

lwp resource limit

Hello,

This is a new resource limit to prevent users from exhausting kernel
resources that lwps use.

- The limit is per uid
- The default is 1024 per user unless the architecture overrides it
- The kernel is never prohibited from creating threads
- Exceeding the thread limit does not prevent process creation, but
  it will prevent processes from creating additional threads. So the
  effective thread limit is nlwp + nproc
- The name NTHR was chosen to follow prior art
- There could be atomicity issues for setuid and lwp exits
- This diff also adds a sysctl kern.uidinfo.* to show the user the uid
  limits

comments?

christos
Index: kern/init_main.c
===================================================================
RCS file: /cvsroot/src/sys/kern/init_main.c,v
retrieving revision 1.442
diff -u -p -u -r1.442 init_main.c
--- kern/init_main.c	19 Feb 2012 21:06:47 -0000	1.442
+++ kern/init_main.c	23 May 2012 23:19:31 -0000
@@ -256,6 +256,7 @@ int	cold = 1;			/* still working on star
 struct timespec boottime;	        /* time at system startup - will only follow settime deltas */

 int	start_init_exec;		/* semaphore for start_init() */
(Continue reading)

Martin Husemann | 23 May 18:15
Picon

mlockall() and small memory systems

In the regular sparc test runs on qemu the emulated sparc machine only
has 32MB of ram. In this setup the /usr/tests/lib/libc/sys/t_mincore test
"mincore_resid" fails. If we allow qemu ot provide more memory, the test
succeeds.

The part of the test that fails in "low" memory environments is:

An anonymous mmap of 128 pages size is created:

        addr = mmap(NULL, npgs * page, PROT_READ | PROT_WRITE,
            MAP_ANON | MAP_PRIVATE, -1, (off_t)0);

Nothing in this mapping is touched, so nothing gets paged in/modified.
Via mincore() the test now validates that 0 of the mapped pages are
resident. So far, this works.

Now a mlockall(MCL_CURRENT|MCL_FUTURE) call is done.

The test now expects all pages in the above mapping to be resident and
again checks via mincore().

This is the part where the test fails: some of the pages are not resident.

I do believe this is not a bug in the sparc pmap.

Am I wrong? Is this an MI bug? Or a bogus test?

Martin

(Continue reading)

Brian Buhrow | 21 May 19:52

Re: Breaking out of the emulation dir

	hello. My understanding is that if namei can't find a file in the
/emul tree, it looks in the real root tree.  So, if you just remove all
traces of the thing you want in the real tree from the emnulation tree,
you'll achieve the results you seek.  The down side is that if you're
dealing with binary files which are potentially nonportable, you won't be
able to have different versions for different emulations.
-Brian

On May 21,  3:44pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Breaking out of the emulation dir
} What's the suggested method for breaking out of the emulation directory?
} I want <EMULDIR>/opt/tivoli/tsm/client/ba/bin/dsm.opt to be a symlink to
} <no-emulation>/usr/pkg/etc/tsm/dsm.opt.
} I can achieve this with a considerable amount of ../, but that amount depends
} on the value of <EMULDIR>, more precisely, it's expanded value, which happens
} to be /usr/pkg/emul/linux32 in the case in question.
} I was thinking about a /emul/none -> / symlink.
} 
} I'm not sure how the behaviour of the emulation sort-of-chroot is defined
} to be if it, for example, looks up /usr/local/foo/bar and e.g. a "real"
} /usr/local (or /usr/local/foo) exists, but /usr/local/foo
} (or /usr/local/foo/bar) doesn't.
} Is this documented anywhere? Has it change since 4.0?
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=

Edgar Fuß | 21 May 15:44
Picon
Favicon

Breaking out of the emulation dir

What's the suggested method for breaking out of the emulation directory?
I want <EMULDIR>/opt/tivoli/tsm/client/ba/bin/dsm.opt to be a symlink to
<no-emulation>/usr/pkg/etc/tsm/dsm.opt.
I can achieve this with a considerable amount of ../, but that amount depends
on the value of <EMULDIR>, more precisely, it's expanded value, which happens
to be /usr/pkg/emul/linux32 in the case in question.
I was thinking about a /emul/none -> / symlink.

I'm not sure how the behaviour of the emulation sort-of-chroot is defined
to be if it, for example, looks up /usr/local/foo/bar and e.g. a "real"
/usr/local (or /usr/local/foo) exists, but /usr/local/foo
(or /usr/local/foo/bar) doesn't.
Is this documented anywhere? Has it change since 4.0?

Edgar Fuß | 21 May 15:36
Picon
Favicon

"bad tag" message

What does this mean:

May 21 15:32:12 trave /netbsd: /emul/linux32/usr/bin/dsmc: bad tag 1: [5 4, 2 4, SuSE  PaX]

Edgar Fuß | 20 May 18:10
Picon
Favicon

libquota units

I was somewhat surprised to learn that with libquota, qv_usage etc.
were still in units of what someone called "a constant of nature introduced
by DEC", e.g., 512-byte "blocks".
Given the fields are 64 bits wide, I would have expected them to be in
bytes instead, but there are probably reasons not do do that.
So, I would at least expect that to be documented or machine-readable.

Edgar Fuß | 19 May 18:51
Picon
Favicon

accessing another process' resource limits

Is there an interface for reading (or even writing) another process' ulimits?

Mouse | 18 May 20:32

Re: choosing the file system block size

>> I'd suggest reading over the source to newfs and/or fsck; they know
>> a good deal about that stuff, and are much smaller and more
>> comprehensible than the filesystem kernel code.
> Basically, I gave up on that after realising that both "number of
> blocks" and "number of data blocks" where actually in units of --
> fragments!  There seems to be a lot of stuff in that which is
> probably perfectly clear for those actually dealing with FS code, but
> close to incomprehensible for a newcomer in that area.

Heh.  The FFS code is full of delightful little surprises like that.
In fsresize.c, the source to my program which becamse resize_ffs, there
are a number of minor rants about other filesystem programs, such as
fsck and newfs/mkfs.  Most/all of them are still present in resize_ffs
source as of 4.0.1; I haven't bothered checking anything more recent.

> Is there any good book on the subject?

I don't know.  I don't know of any such book, but I've never looked; my
own knowledge of such things comes from experimentation and code
reading.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse <at> rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

David Laight | 18 May 18:54
Picon

Re: Allocating inodes (was: choosing the file system block size)

On Fri, May 18, 2012 at 02:42:37PM +0200, Edgar Fu? wrote:
> > possibly also relevant is that inode space is not available for ordinary
> > data storage even if the inodes in question are not being used.

> Oops, is that still true for FFSv2?

Yes.

> What does "dynamic inodes" mean, then?

Probably that FFSv2 doesn't initialise all the on-disk inode structures
until they are needed.
This significantly reduces the number of writes (and hence time) to
initialise a filesystem.

IIRC FFS has the following overheads per 'cylinder group':
- 8k for a copy of the superblock (+boot area in the first one).
- 1 filesystem-sized block for CG info and the per-cg inode and allocation
  maps. One bit per inode and 1 bit per fragment.
- space for all the inodes, 128 bytes each for FFSv1, 256 for FFSv2.

Although 'cylinder groups' were originally made up of logical cylinders
(with an array of per-cyclinder info), they are now just a big chunk of disk.
IIRC they are initialised as if they are a single cylinder (ie one copy
of the original per-cylinder info).

Since the per-cylinder allocation maps are constrained to fit in a single
fs block (FFSv2 should probably have removed this restriction!) a large
disk needs a lot of cylinder groups.
If you double the FS fragment and block size, you divide the number of
(Continue reading)

Edgar Fuß | 14 May 17:58
Picon
Favicon

Editing (new) quota for a new user

I just started to play around with the new quota system.
I tried to set quotas for a user that didn't own any files on the
file system in question.
With an "interactive" edquota, I got
	edquota: /export/test (ufs/ffs quota v2):
	: bad format
With a edquota -h ... -s ..., I got a zero exit status, a following
"interactive" edquota showing what I entered, but repquota not showing
anything for that user.

When I allocate a file for that user, edquota -h ... -s ... works, but an
"interactive" edquota throws the same error and disables that user's quota.

Is this expected behaviour?


Gmane