Brian Dessent | 1 Nov 2007 04:45
Favicon

Re: copying a million tiny files?

sam reckoner wrote:

> I'm not exaggerating. I have over one million small files that like to
> move between disks. The problem is that even getting a directory
> listing takes forever.
> 
> Is there a best practice for this?

I know it's heresy but if you just want to copy files why not use the
native XCOPY?  It will not suffer the performance degredation of having
to emulate the POSIX stat semantics on every file, just like the native
DIR command in a large directory does not take ages because it simply
uses FindFirstFile/FindNextFile which are fairly efficient (but do not
provide sufficient information to emulate POSIX.)

Brian

Larry Hall (Cygwin | 1 Nov 2007 05:00
Favicon

Re: copying a million tiny files?

Brian Dessent wrote:
> sam reckoner wrote:
> 
>> I'm not exaggerating. I have over one million small files that like to
>> move between disks. The problem is that even getting a directory
>> listing takes forever.
>>
>> Is there a best practice for this?
> 
> I know it's heresy but if you just want to copy files why not use the
> native XCOPY?  It will not suffer the performance degredation of having
> to emulate the POSIX stat semantics on every file, just like the native
> DIR command in a large directory does not take ages because it simply
> uses FindFirstFile/FindNextFile which are fairly efficient (but do not
> provide sufficient information to emulate POSIX.)

I'm sorry Brian.  We put up with allot from you but I think we have to
draw the line at heresy.  What is the penalty for heresy around here
anyway?  Expulsion?  Flogging?  Burning at the stake?  Whatever it is,
I think we need to make an example of Brian.

--

-- 
Larry Hall                              http://www.rfk.com
RFK Partners, Inc.                      (508) 893-9779 - RFK Office
216 Dalton Rd.                          (508) 893-9889 - FAX
Holliston, MA 01746

_____________________________________________________________________

A: Yes.
(Continue reading)

Gary R. Van Sickle | 1 Nov 2007 05:13
Picon

RE: copying a million tiny files?

> From: Brian Dessent
> 
> sam reckoner wrote:
> 
> > I'm not exaggerating. I have over one million small files 
> that like to 
> > move between disks. The problem is that even getting a directory 
> > listing takes forever.
> > 
> > Is there a best practice for this?
> 
> I know it's heresy but if you just want to copy files why not 
> use the native XCOPY?  It will not suffer the performance 
> degredation of having to emulate the POSIX stat semantics on 
> every file, just like the native DIR command in a large 
> directory does not take ages because it simply uses 
> FindFirstFile/FindNextFile which are fairly efficient (but do 
> not provide sufficient information to emulate POSIX.)
> 
> Brian
> 

I have a similar situation to the OP (copying many thousands of small files
over a fairly slow link), and actually timed using XCOPY vs. Cygwin methods
(cp in my case).  It didn't make a significant difference.  Ultimately what
I think you run into in these sorts of situations is that you bump up
against the slowness of the link (or physical disk) because, POSIX emulation
or not, all your caches do is thrash.

--

-- 
(Continue reading)

zirtik | 1 Nov 2007 07:43
Picon

Re: can't read sequential files


After adding the line:

	if (fp==NULL)
	{
	   printf("error, NULL pointer!\n");
	   return(1);
	} 

and then rebuilding the code, everything worked. But it's strange that if I
delete that code segment and never check whether the fp pointer is NULL or
not, I always get a segmentation fault. Can this be some kind of an
optimization problem? I don't know why it happens. Thank you for the
comments. 

zirtik wrote:
> 
> Hi, I'm using cygwin and windows XP together with Eclipse IDE and CDT. I
> have a following piece of code:
> 
> ....
> 
> 	int i;
> 	fp = fopen ("phi.txt","r");
> 	
> 	for( i = 0; i < 51; i++ ) {
> 		fscanf(fp, "%d\n", &original_phi[i]);	
> 	}
> 
> ...
(Continue reading)

Andrew DeFaria | 1 Nov 2007 07:47
Favicon

Re: copying a million tiny files?

Larry Hall (Cygwin) wrote:
> I'm sorry Brian.  We put up with allot from you but I think we have to 
> draw the line at heresy.  What is the penalty for heresy around here 
> anyway?  Expulsion?  Flogging?  Burning at the stake?
They all sound good! But we must make sure that whatever we choose is 
down publicly!
>   Whatever it is, I think we need to make an example of Brian.
I agree! ;-)
--

-- 
Andrew DeFaria <http://defaria.com>
If vegetarians eat vegetables, what do humanitarians eat?

Andrew DeFaria | 1 Nov 2007 07:52
Favicon

Re: copying a million tiny files?

Gary R. Van Sickle wrote:
> I have a similar situation to the OP (copying many thousands of small 
> files over a fairly slow link), and actually timed using XCOPY vs. 
> Cygwin methods (cp in my case). It didn't make a significant 
> difference. Ultimately what I think you run into in these sorts of 
> situations is that you bump up against the slowness of the link (or 
> physical disk) because, POSIX emulation or not, all your caches do is 
> thrash.
I think your situation is vastly different. Granted copying over a 
network connection will be soo much slower that the additional overhead 
of Cygwin emulation fades away. But the OP was talking about copying 
between to supposedly local disks. With the network link out of the way 
the overhead of Cygwin emulation could become significant.

As for your cp across a link... At one time I was using cp to copy our 
release images (about 9 files of about 50-150 meg mind you) across a WAN 
link (from Santa Clara, California -> Shanghai, China). The link was 
real slow and it was talking ~30-45 minutes per image IIRC!

Then I changed it to an ncftp and the time per image dropped down to ~5 
minutes.

Moral of the story? SMB is perhaps the worse way to transfer data across 
network links.

YMMV.
--

-- 
Andrew DeFaria <http://defaria.com>
I used to have an open mind but my brains kept falling out.

(Continue reading)

d.henman | 1 Nov 2007 07:57
Picon

RE: copying a million tiny files?


From what Gary mentions.....   indeed rsync is the best way to go.
At least for thinking, on time backups.

With rsync, only the first time is slow.

For one shot backups of many files,,,,,using tar to group them into one and 
then sending is a good idea.

Using xcopy, is kind of silly and wont get you compatiblity...... especially in scripts....

regards

Gary R. Van Sickle <g.r.vansickle <at> worldnet.att.net> wrote:

> > From: Brian Dessent
> > 
> > sam reckoner wrote:
> > 
> > > I'm not exaggerating. I have over one million small files 
> > that like to 
> > > move between disks. The problem is that even getting a directory 
> > > listing takes forever.
> > > 
> > > Is there a best practice for this?
> > 
> > I know it's heresy but if you just want to copy files why not 
> > use the native XCOPY?  It will not suffer the performance 
> > degredation of having to emulate the POSIX stat semantics on 
> > every file, just like the native DIR command in a large 
(Continue reading)

Brian Dessent | 1 Nov 2007 08:14
Favicon

Re: copying a million tiny files?

"d.henman" wrote:

> >From what Gary mentions.....   indeed rsync is the best way to go.
> At least for thinking, on time backups.
> 
> With rsync, only the first time is slow.

Did you even *read* the original question?  He didn't say anything about
doing incremental backups, he just wanted to "move some files between
disks".  He also explicitly said that he's currently using rsync but
that it was unsatisfactorily slow in even just coming up with the
candidate list of files to transfer, let alone actually doing anything. 
The rsync algorithm won't do anything to help in this case.

> Using xcopy, is kind of silly and wont get you compatiblity...... especially in scripts....

Portability to non-Windows systems is of course a problem but xcopy is
present on every install of Windows that has ever existed going back to
some very old version of MS-DOS so it is probably one of the most
portable commands in existance on this platform.

Brian

michael.vogt | 1 Nov 2007 09:43
Picon
Favicon

RE: cygwin stable and cvs snapshot - fork() bug

> If you want anything like this to be looked at faster, the best thing
> you can do is http://www.cygwin.com/acronyms/#PPAST. Apparently the
> cygwin developers have not so far been interested to download mpd,
> make unspecified changes to the mpd sources to get them to compile
> (the changes you listed on the bug report were not sufficient), and
> then setup the configuration files for mpd, figure out what mpd is
> supposed to do, and THEN debug the problem.
...
> Lev

hey, thanks for your reply

ok i tried to create a STC, but i'm don't really know ipc and shared
mem.. anyway i traced down the part, which cause trouble

(init code, called before the fork occours)
if ((shmid = shmget(IPC_PRIVATE, allocationSize, IPC_CREAT | 0777)) < 0)
FATAL("problems shmget'ing\n");
//if ((shmid = shmget(IPC_PRIVATE, allocationSize, IPC_CREAT | 0600)) <
0)	FATAL("problems shmget'ing\n");
if (!(playerData_pd = shmat(shmid, NULL, 0))) FATAL("problems
shmat'ing\n");
if (shmctl(shmid, IPC_RMID, NULL) < 0) 	FATAL("problems shmctl'ing\n");

when i comment out 
  if (shmctl(shmid, IPC_RMID, NULL) < 0) 	FATAL("problems
shmctl'ing\n");

mpd works like a charm. 

(Continue reading)

Erich Dollansky | 1 Nov 2007 10:33
Picon

Re: copying a million tiny files?

Hi,

Brian Dessent wrote:
> 
>> Using xcopy, is kind of silly and wont get you compatiblity...... especially in scripts....
> 
> Portability to non-Windows systems is of course a problem but xcopy is
> present on every install of Windows that has ever existed going back to
> some very old version of MS-DOS so it is probably one of the most
> portable commands in existance on this platform.
> 
if I remember right, XCOPY is older than any networking stuff on this 
plattform. It should be there since the first hard disks have been there.

Erich


Gmane