Sean Cavanaugh | 1 May 04:37 2011
Picon

Re: How about a Hash template?

On 4/29/2011 6:19 PM, Alexander wrote:
> On 29.04.2011 21:58, Andrei Alexandrescu wrote:
>
>> You need to replace the assert and compile with -O -release -inline. My results:
>
> [snip]
>
> Still, straight comparison wins - 2x faster ;)
>
> /Alexander

When understanding the CPU platform you are on, one of the benchmarks 
you can do is to measure how many linear integer comparions you can do 
vs a binary search of the same size, and graph it out.

There is a crossover point where the binary search will be faster, but 
with modern CPUs the number of linear items you can search increases 
every year.

The linear search also makes extremely good use of the cache and 
hardware prefetching, and the branches (as well as the loop itself) will 
be predicted correctly until the terminating condition is found, where 
the binary search is mispredicted 50% of the time.  The last time I 
measured the crossover point around 60 integer values, and it wouldn't 
surprise me at all that its over 100 on newer chipsets (Sandy Bridge, 
Bulldozer etc).

Tyro[a.c.edwards] | 1 May 06:25 2011
Picon

htmlget.d example and unicode parsing

Hello all,

I am trying to learn how to parse, modify, and redisplay a Japanese 
webpage passed to me in a form and am wondering if anyone has an example 
of how to do this.

I looked at htmlget and found that it has a couple problems: namely, it 
is not conform to current D2 practices. I am not sure that my hack can 
be considered a fix but have attached it nonetheless. It now works 
correctly on ascii based urls but not utf-8.

My lack of knowledge on how to properly parsing unicode documents has 
left me stumped. I am therefore requesting some assistance in updating 
the code such that it works with any url. I have taken a look at std.utf 
and there are a few things there that could possibly assist me however 
without examples I'm somewhat at a loss.

I'm assuming that the problem exists here:

	for (iw = 0; iw != line.length; iw++)
         {
             if (!icmp("</html>", line[iw .. line.length]))
                 break print_lines;
         }

 From what I understanding, one cannot index a utf sequence the same as 
you index ASCII. What is the proper what to rewrite this such that it 
parses the utf characters correctly? And example would do wonders.

Thanks
(Continue reading)

Sean Kelly | 1 May 07:18 2011

Re: Old comments about Java

They kinda already do. Look into how core.mutex works. 

Sent from my iPhone

On Apr 30, 2011, at 1:43 PM, Peter Alexander <peter.alexander.au <at> gmail.com> wrote:

> On 30/04/11 8:29 PM, Walter Bright wrote:
>> On 4/23/2011 4:43 PM, bearophile wrote:
>>> First, they impose a full word of overhead on each and every object,
>>> just in
>>> case someone somewhere sometime wants to grab a lock on that object.
>>> What,
>>> you say that you know that nobody outside of your code will ever get a
>>> pointer to this object, and that you do your locking elsewhere, and
>>> you have
>>> a zillion of these objects so you'd like them to take up as little
>>> memory as
>>> possible? Sorry. You're screwed. [I have not yet understood why D
>>> shared this
>>> Java design choice.]
>> 
>> The extra pointer slot is a handy place for all kinds of things, not
>> just a mutex. Currently, it is also used for the "signals and slots"
>> implementation. Andrei and I have discussed using it for a ref counting
>> system (though we decided against that for other reasons).
> 
> That may be so, but it would be nice if the programmer had control over whether or not they want to use that slot.

maarten van damme | 1 May 11:07 2011
Picon

Re: link from a dll to another function in another dll?

Wow, thanks for the help

The first thing I did was in the .di file adding extern(windows){ ... }
and now compiling doesn't give errors and when examining with dllexp I can see that it exports the same functions as the real kernel32.dll :D

Now I'm going to implement all other suggested changes, thanks a lot

2011/4/30 Rainer Schuetze <r.sagitario <at> gmx.de>
I'm not sure your wrapping will work with kernel32.dll, but in general here are a few tips:

- most functions in the windows API use the __stdcall calling convention in C/C++, which translates to D as "extern(Windows)"

- this will usually add the number of bytes passed on the stack as a " <at> NN" postfix to the function name. This postfix does not exist in kernel32.dll, but in the import library kernel32.lib that you find in the dmd lib folder. Maybe you can use the standard import library, or use the translation shown below.

- as the exported function and the function you want to chain to have identical names, you have to change at least one of these and modify them in some build step. I'd suggest to do this in the def file:

The symbols in the d-source file containing:

----
extern(Windows) HANDLE imported_GetCurrentProcess();

export extern(Windows) HANDLE internal_GetCurrentProcess()
{
 return imported_GetCurrentProcess();
}
----

can be mapped to other symbols in the def file:

----
EXPORTS
 GetCurrentProcess = internal_GetCurrentProcess

IMPORTS
 imported_GetCurrentProcess = kernel33.GetCurrentProcess
----

- if you don't know the number of arguments, you should not call the wrapped function, as this will change the callstack. Instead, you should just jump to it:

void internal_hread()
{
 asm
 {
   naked;
   jmp imported_hread;
 }
}

I haven't tried all that, though, so there might be some mistakes...

Rainer



Denis Koroskin wrote:
On Sat, 30 Apr 2011 13:47:53 +0400, maarten van damme <maartenvd1994 <at> gmail.com> wrote:

I've changed this, I think I'm still kinda confused with lib files. They've
told me you can't do something with them without a .di file
So I went ahead and made a kernel33.di file. I now import it in kernel32.d
and my declaration is
System(C){
export void * exportedfunctionblablabal(){
  return exportedfunctionblablablal();
}
...
}

The file in the directory are:
kernel32.d : http://dl.dropbox.com/u/15024434/d/kernel32.d
kernel33.di : http://dl.dropbox.com/u/15024434/d/kernel33.di
kernel33.lib : http://dl.dropbox.com/u/15024434/d/kernel33.lib
kernel33.dll : http://dl.dropbox.com/u/15024434/d/kernel33.dll

I've tried to compile using dmd -d kernel32.d kernel33.di kernel33.lib but
it throws errors like
"Error 42: Symbol undifined _Dkernel1336_hreadfzpV"
I have literally no clue why this is the case, can someone help me out or
look at the files?

2011/4/27 maarten van damme <maartenvd1994 <at> gmail.com>

I'm afraid I've been a little unclear.
I've copied kernel32.dll from the windows dir, renamed it to kernel33.dll
and generated a .lib from it using implib.
Then I've created a d file with a correct dllmain(stolen from examples) and
between

system(C){
export void * exportedfunctionfromkernel33.dll();
export void * exportedfunction2fromkernel33.dll();
...
}

But it looks like you can't both declare a function from another lib and
export it at the same time.


In your kernel33.di, try making it extern (C) export void* _hread(); etc. You functions get D mangling otherwise.

I'd also suggest you to start with a less complex example, e.g. export only one function, make sure it works, then add the rest.

If you think your .lib files doesn't do its job, try using .def file instead. I find them extremely helpful, and they are a lot easier to edit/extend.

Hope that helps.

lenochware | 1 May 15:23 2011
Picon

Re: A few general thoughts

== Quote from KennyTM~ (kennytm <at> gmail.com)'s article

> You could use x"" string, or just escape those characters
>      auto x = x"f1f2f3 f4";
>      auto y = "\xf1\xf2\xf3\xf4";
> (And if your "string" is not a UTF-8 string at all, you should use a
> ubyte[], not char[].
>      const(ubyte)[] z = [0xf1, 0xf2, 0xf3, 0xf4];
>      auto t = cast(const(ubyte)[]) x"f1f2f3f4";
> )

It would be very unclean write strings this way, but it is not so important. The
point is that I don't like features which cannot be disabled. For example
variables in D are initialized, which is good, but you can write int i = void; and
disable it. The final decision is on the programmer. You are not forced to do it
only one "good" way.
This is philosophy which I like.
Of course, I understand that it is not possible make everything optional, it has
negatives, like everything has, but if there are serious doubts about some feature
and it is not big deal to make it optional, it SHOULD be optional. At least I
think so.

maarten van damme | 1 May 15:28 2011
Picon

Re: link from a dll to another function in another dll?

Number overflow?

So I implemented the suggested changes (you can check them out at http://dl.dropbox.com/u/15024434/version2.zip)
But now I get when I compile it : 
"kernel32.def(738) : Error 12: Number Overflow: (strange symbol over here)"

I do agree I should've picked a simpler example but I think the statisfaction will be even bigger if I were to succeed :p

2011/5/1 maarten van damme <maartenvd1994 <at> gmail.com>
Wow, thanks for the help
The first thing I did was in the .di file adding extern(windows){ ... }
and now compiling doesn't give errors and when examining with dllexp I can see that it exports the same functions as the real kernel32.dll :D

Now I'm going to implement all other suggested changes, thanks a lot


2011/4/30 Rainer Schuetze <r.sagitario <at> gmx.de>
I'm not sure your wrapping will work with kernel32.dll, but in general here are a few tips:

- most functions in the windows API use the __stdcall calling convention in C/C++, which translates to D as "extern(Windows)"

- this will usually add the number of bytes passed on the stack as a " <at> NN" postfix to the function name. This postfix does not exist in kernel32.dll, but in the import library kernel32.lib that you find in the dmd lib folder. Maybe you can use the standard import library, or use the translation shown below.

- as the exported function and the function you want to chain to have identical names, you have to change at least one of these and modify them in some build step. I'd suggest to do this in the def file:

The symbols in the d-source file containing:

----
extern(Windows) HANDLE imported_GetCurrentProcess();

export extern(Windows) HANDLE internal_GetCurrentProcess()
{
 return imported_GetCurrentProcess();
}
----

can be mapped to other symbols in the def file:

----
EXPORTS
 GetCurrentProcess = internal_GetCurrentProcess

IMPORTS
 imported_GetCurrentProcess = kernel33.GetCurrentProcess
----

- if you don't know the number of arguments, you should not call the wrapped function, as this will change the callstack. Instead, you should just jump to it:

void internal_hread()
{
 asm
 {
   naked;
   jmp imported_hread;
 }
}

I haven't tried all that, though, so there might be some mistakes...

Rainer



Denis Koroskin wrote:
On Sat, 30 Apr 2011 13:47:53 +0400, maarten van damme <maartenvd1994 <at> gmail.com> wrote:

I've changed this, I think I'm still kinda confused with lib files. They've
told me you can't do something with them without a .di file
So I went ahead and made a kernel33.di file. I now import it in kernel32.d
and my declaration is
System(C){
export void * exportedfunctionblablabal(){
  return exportedfunctionblablablal();
}
...
}

The file in the directory are:
kernel32.d : http://dl.dropbox.com/u/15024434/d/kernel32.d
kernel33.di : http://dl.dropbox.com/u/15024434/d/kernel33.di
kernel33.lib : http://dl.dropbox.com/u/15024434/d/kernel33.lib
kernel33.dll : http://dl.dropbox.com/u/15024434/d/kernel33.dll

I've tried to compile using dmd -d kernel32.d kernel33.di kernel33.lib but
it throws errors like
"Error 42: Symbol undifined _Dkernel1336_hreadfzpV"
I have literally no clue why this is the case, can someone help me out or
look at the files?

2011/4/27 maarten van damme <maartenvd1994 <at> gmail.com>

I'm afraid I've been a little unclear.
I've copied kernel32.dll from the windows dir, renamed it to kernel33.dll
and generated a .lib from it using implib.
Then I've created a d file with a correct dllmain(stolen from examples) and
between

system(C){
export void * exportedfunctionfromkernel33.dll();
export void * exportedfunction2fromkernel33.dll();
...
}

But it looks like you can't both declare a function from another lib and
export it at the same time.


In your kernel33.di, try making it extern (C) export void* _hread(); etc. You functions get D mangling otherwise.

I'd also suggest you to start with a less complex example, e.g. export only one function, make sure it works, then add the rest.

If you think your .lib files doesn't do its job, try using .def file instead. I find them extremely helpful, and they are a lot easier to edit/extend.

Hope that helps.


Rainer Schuetze | 1 May 18:48 2011
Picon
Picon

Re: link from a dll to another function in another dll?

It seems you have hit another of those dreaded optlink bugs.

With less symbols, it works if you declare the imports like this 
(because of the described name mangling):

IMPORTS	
	_imported_hread <at> 0 =  kernel33._hread

2 more notes:
- you don't need to import kernel33.di
- you should not use "SINGLE" in the DATA statement of the def file, it 
will share the memory across processes.

maarten van damme wrote:
> Number overflow?
> So I implemented the suggested changes (you can check them out 
> at http://dl.dropbox.com/u/15024434/version2.zip)
> But now I get when I compile it : 
> "kernel32.def(738) : Error 12: Number Overflow: (strange symbol over here)"
> 
> I do agree I should've picked a simpler example but I think the 
> statisfaction will be even bigger if I were to succeed :p
> 
> 2011/5/1 maarten van damme <maartenvd1994 <at> gmail.com 
> <mailto:maartenvd1994 <at> gmail.com>>
> 
>     Wow, thanks for the help
>     The first thing I did was in the .di file adding extern(windows){ ... }
>     and now compiling doesn't give errors and when examining with dllexp
>     I can see that it exports the same functions as the real kernel32.dll :D
> 
>     Now I'm going to implement all other suggested changes, thanks a lot
> 
> 
>     2011/4/30 Rainer Schuetze <r.sagitario <at> gmx.de
>     <mailto:r.sagitario <at> gmx.de>>
> 
>         I'm not sure your wrapping will work with kernel32.dll, but in
>         general here are a few tips:
> 
>         - most functions in the windows API use the __stdcall calling
>         convention in C/C++, which translates to D as "extern(Windows)"
> 
>         - this will usually add the number of bytes passed on the stack
>         as a " <at> NN" postfix to the function name. This postfix does not
>         exist in kernel32.dll, but in the import library kernel32.lib
>         that you find in the dmd lib folder. Maybe you can use the
>         standard import library, or use the translation shown below.
> 
>         - as the exported function and the function you want to chain to
>         have identical names, you have to change at least one of these
>         and modify them in some build step. I'd suggest to do this in
>         the def file:
> 
>         The symbols in the d-source file containing:
> 
>         ----
>         extern(Windows) HANDLE imported_GetCurrentProcess();
> 
>         export extern(Windows) HANDLE internal_GetCurrentProcess()
>         {
>          return imported_GetCurrentProcess();
>         }
>         ----
> 
>         can be mapped to other symbols in the def file:
> 
>         ----
>         EXPORTS
>          GetCurrentProcess = internal_GetCurrentProcess
> 
>         IMPORTS
>          imported_GetCurrentProcess = kernel33.GetCurrentProcess
>         ----
> 
>         - if you don't know the number of arguments, you should not call
>         the wrapped function, as this will change the callstack.
>         Instead, you should just jump to it:
> 
>         void internal_hread()
>         {
>          asm
>          {
>            naked;
>            jmp imported_hread;
>          }
>         }
> 
>         I haven't tried all that, though, so there might be some mistakes...
> 
>         Rainer
> 
> 
> 
>         Denis Koroskin wrote:
> 
>             On Sat, 30 Apr 2011 13:47:53 +0400, maarten van damme
>             <maartenvd1994 <at> gmail.com <mailto:maartenvd1994 <at> gmail.com>>
>             wrote:
> 
>                 I've changed this, I think I'm still kinda confused with
>                 lib files. They've
>                 told me you can't do something with them without a .di file
>                 So I went ahead and made a kernel33.di file. I now
>                 import it in kernel32.d
>                 and my declaration is
>                 System(C){
>                 export void * exportedfunctionblablabal(){
>                   return exportedfunctionblablablal();
>                 }
>                 ....
>                 }
> 
>                 The file in the directory are:
>                 kernel32.d : http://dl.dropbox.com/u/15024434/d/kernel32.d
>                 kernel33.di : http://dl.dropbox.com/u/15024434/d/kernel33.di
>                 kernel33.lib :
>                 http://dl.dropbox.com/u/15024434/d/kernel33.lib
>                 kernel33.dll :
>                 http://dl.dropbox.com/u/15024434/d/kernel33.dll
> 
>                 I've tried to compile using dmd -d kernel32.d
>                 kernel33.di kernel33.lib but
>                 it throws errors like
>                 "Error 42: Symbol undifined _Dkernel1336_hreadfzpV"
>                 I have literally no clue why this is the case, can
>                 someone help me out or
>                 look at the files?
> 
>                 2011/4/27 maarten van damme <maartenvd1994 <at> gmail.com
>                 <mailto:maartenvd1994 <at> gmail.com>>
> 
>                     I'm afraid I've been a little unclear.
>                     I've copied kernel32.dll from the windows dir,
>                     renamed it to kernel33.dll
>                     and generated a .lib from it using implib.
>                     Then I've created a d file with a correct
>                     dllmain(stolen from examples) and
>                     between
> 
>                     system(C){
>                     export void * exportedfunctionfromkernel33.dll();
>                     export void * exportedfunction2fromkernel33.dll();
>                     ....
>                     }
> 
>                     But it looks like you can't both declare a function
>                     from another lib and
>                     export it at the same time.
> 
> 
>             In your kernel33.di, try making it extern (C) export void*
>             _hread(); etc. You functions get D mangling otherwise.
> 
>             I'd also suggest you to start with a less complex example,
>             e.g. export only one function, make sure it works, then add
>             the rest.
> 
>             If you think your .lib files doesn't do its job, try using
>             .def file instead. I find them extremely helpful, and they
>             are a lot easier to edit/extend.
> 
>             Hope that helps.
> 
> 
> 

KennyTM~ | 1 May 19:22 2011
Picon

Re: A few general thoughts

On May 1, 11 21:23, lenochware wrote:
> == Quote from KennyTM~ (kennytm <at> gmail.com)'s article
>
>> You could use x"" string, or just escape those characters
>>       auto x = x"f1f2f3 f4";
>>       auto y = "\xf1\xf2\xf3\xf4";
>> (And if your "string" is not a UTF-8 string at all, you should use a
>> ubyte[], not char[].
>>       const(ubyte)[] z = [0xf1, 0xf2, 0xf3, 0xf4];
>>       auto t = cast(const(ubyte)[]) x"f1f2f3f4";
>> )
>
> It would be very unclean write strings this way, but it is not so important.

It's very *unportable* to write a string in your way. Not every 
editor/OS default to ISO-8859-1 when an encoding is not found (say 
Notepad.exe), and your source file is likely destroyed because when I 
"Save As..." it all those 'åéüîø' becomes '?????', or the file is 
re-encoded into UTF-8, and the program will think you've actually 
written 'åéüîø'.

(Of course that also happens to UTF-8. Therefore it's best to restrict 
to ASCII only.)

 > The point is that I don't like features which cannot be disabled. For 
example
> variables in D are initialized, which is good, but you can write int i = void; and
> disable it. The final decision is on the programmer. You are not forced to do it
> only one "good" way.

If D allowed non-UTF encoding without error, it's possible that a string 
in those settings got misinterpreted, but it's not easy to determine when.

That's different from '= void' or 'cast' is those are *explicit*.

> This is philosophy which I like.
> Of course, I understand that it is not possible make everything optional, it has
> negatives, like everything has, but if there are serious doubts about some feature
> and it is not big deal to make it optional, it SHOULD be optional. At least I
> think so.

Yes it's possible that DMD add a -wno-utf-warning switch. But you'd have 
better chance convincing Walter to split the different -w options :).

maarten van damme | 1 May 20:00 2011
Picon

Re: link from a dll to another function in another dll?

Great, now the error in kernel32.def is resolved but it gets the same problem in kernel33.def.

here is the start of the exports from kernel33.def:
EXPORTS
_hread <at> 1334
how can I change this to resolve that?

2011/5/1 Rainer Schuetze <r.sagitario <at> gmx.de>
It seems you have hit another of those dreaded optlink bugs.

With less symbols, it works if you declare the imports like this (because of the described name mangling):

IMPORTS
       _imported_hread <at> 0 =  kernel33._hread

2 more notes:
- you don't need to import kernel33.di
- you should not use "SINGLE" in the DATA statement of the def file, it will share the memory across processes.



maarten van damme wrote:
Number overflow?
So I implemented the suggested changes (you can check them out at http://dl.dropbox.com/u/15024434/version2.zip)

But now I get when I compile it : "kernel32.def(738) : Error 12: Number Overflow: (strange symbol over here)"

I do agree I should've picked a simpler example but I think the statisfaction will be even bigger if I were to succeed :p

2011/5/1 maarten van damme <maartenvd1994 <at> gmail.com <mailto:maartenvd1994 <at> gmail.com>>


   Wow, thanks for the help
   The first thing I did was in the .di file adding extern(windows){ ... }
   and now compiling doesn't give errors and when examining with dllexp
   I can see that it exports the same functions as the real kernel32.dll :D

   Now I'm going to implement all other suggested changes, thanks a lot


   2011/4/30 Rainer Schuetze <r.sagitario <at> gmx.de
   <mailto:r.sagitario <at> gmx.de>>


       I'm not sure your wrapping will work with kernel32.dll, but in
       general here are a few tips:

       - most functions in the windows API use the __stdcall calling
       convention in C/C++, which translates to D as "extern(Windows)"

       - this will usually add the number of bytes passed on the stack
       as a " <at> NN" postfix to the function name. This postfix does not
       exist in kernel32.dll, but in the import library kernel32.lib
       that you find in the dmd lib folder. Maybe you can use the
       standard import library, or use the translation shown below.

       - as the exported function and the function you want to chain to
       have identical names, you have to change at least one of these
       and modify them in some build step. I'd suggest to do this in
       the def file:

       The symbols in the d-source file containing:

       ----
       extern(Windows) HANDLE imported_GetCurrentProcess();

       export extern(Windows) HANDLE internal_GetCurrentProcess()
       {
        return imported_GetCurrentProcess();
       }
       ----

       can be mapped to other symbols in the def file:

       ----
       EXPORTS
        GetCurrentProcess = internal_GetCurrentProcess

       IMPORTS
        imported_GetCurrentProcess = kernel33.GetCurrentProcess
       ----

       - if you don't know the number of arguments, you should not call
       the wrapped function, as this will change the callstack.
       Instead, you should just jump to it:

       void internal_hread()
       {
        asm
        {
          naked;
          jmp imported_hread;
        }
       }

       I haven't tried all that, though, so there might be some mistakes...

       Rainer



       Denis Koroskin wrote:

           On Sat, 30 Apr 2011 13:47:53 +0400, maarten van damme
           <maartenvd1994 <at> gmail.com <mailto:maartenvd1994 <at> gmail.com>>

           wrote:

               I've changed this, I think I'm still kinda confused with
               lib files. They've
               told me you can't do something with them without a .di file
               So I went ahead and made a kernel33.di file. I now
               import it in kernel32.d
               and my declaration is
               System(C){
               export void * exportedfunctionblablabal(){
                 return exportedfunctionblablablal();
               }
               ....
               }

               The file in the directory are:
               kernel32.d : http://dl.dropbox.com/u/15024434/d/kernel32.d
               kernel33.di : http://dl.dropbox.com/u/15024434/d/kernel33.di
               kernel33.lib :
               http://dl.dropbox.com/u/15024434/d/kernel33.lib
               kernel33.dll :
               http://dl.dropbox.com/u/15024434/d/kernel33.dll

               I've tried to compile using dmd -d kernel32.d
               kernel33.di kernel33.lib but
               it throws errors like
               "Error 42: Symbol undifined _Dkernel1336_hreadfzpV"
               I have literally no clue why this is the case, can
               someone help me out or
               look at the files?

               2011/4/27 maarten van damme <maartenvd1994 <at> gmail.com
               <mailto:maartenvd1994 <at> gmail.com>>


                   I'm afraid I've been a little unclear.
                   I've copied kernel32.dll from the windows dir,
                   renamed it to kernel33.dll
                   and generated a .lib from it using implib.
                   Then I've created a d file with a correct
                   dllmain(stolen from examples) and
                   between

                   system(C){
                   export void * exportedfunctionfromkernel33.dll();
                   export void * exportedfunction2fromkernel33.dll();
                   ....
                   }

                   But it looks like you can't both declare a function
                   from another lib and
                   export it at the same time.


           In your kernel33.di, try making it extern (C) export void*
           _hread(); etc. You functions get D mangling otherwise.

           I'd also suggest you to start with a less complex example,
           e.g. export only one function, make sure it works, then add
           the rest.

           If you think your .lib files doesn't do its job, try using
           .def file instead. I find them extremely helpful, and they
           are a lot easier to edit/extend.

           Hope that helps.




Nick Sabalausky | 1 May 20:54 2011

Re: htmlget.d example and unicode parsing

"Tyro[a.c.edwards]" <nospam <at> home.com> wrote in message 
news:ipinj3$1c77$1 <at> digitalmars.com...
> Hello all,
>
> I am trying to learn how to parse, modify, and redisplay a Japanese
> webpage passed to me in a form and am wondering if anyone has an example
> of how to do this.
>
> I looked at htmlget and found that it has a couple problems: namely, it
> is not conform to current D2 practices. I am not sure that my hack can
> be considered a fix but have attached it nonetheless. It now works
> correctly on ascii based urls but not utf-8.
>
> My lack of knowledge on how to properly parsing unicode documents has
> left me stumped. I am therefore requesting some assistance in updating
> the code such that it works with any url. I have taken a look at std.utf
> and there are a few things there that could possibly assist me however
> without examples I'm somewhat at a loss.
>
> I'm assuming that the problem exists here:
>
> for (iw = 0; iw != line.length; iw++)
>         {
>             if (!icmp("</html>", line[iw .. line.length]))
>                 break print_lines;
>         }
>
> From what I understanding, one cannot index a utf sequence the same as
> you index ASCII.

Depends on what exactly you're doing. There are many cases where indexing 
utf like ASCII works fine, and your code above looks like one of the cases 
where it should work (Unless icmp throws or asserts on invalid code-unit 
sequences. Anyone know offhand if it does?).

But you do have a non-utf-related bug in that loop. If there's anything in 
'line' after the "</html>" tag, then it won't detect the tag because you're 
slicing with the length of 'line' instead of the length of "</html>".

So it should be:

for (iw = 0; iw != line.length; iw++)
{
    immutable endTag = "</html>";
    if (line.length >= endTag.length && !icmp(endTag, line[iw .. 
endTag.length]))
        break print_lines;
}

On the topic of unicode, this is a really good introduction to the details 
of it:
http://www.joelonsoftware.com/articles/Unicode.html

But once you read that, keep in mind there's a few important details he 
failed to mention: A code-point is made up of code-units, yes, but a single 
code-point is *not* always an entire character (aka "grapheme"). Because of 
combining codes, a character could be made up of multiple code points (just 
like how a code point can be made up of multiple code units). Also, there 
are certain characters that can be represented with more than one specific 
sequence of code points (and that gets into unicode normalization).


Gmane