James Clark | 1 Oct 2003 05:24

Re: Bug fix release

Bruce D'Arcus wrote:
> Question re: encodings:
> 
> I had my .emac file setup so that I got proper display of utf-8 files 
> awhile back.  I just realized that I've lost that, though I'm not sure 
> when.  Specifically, my em-dashes and similar characters are now blank 
> boxes.  I wonder if this has something to do with interactions with 
> recent changes in nxml?

Unlikely.  This has not changed significantly in nXML in the last two weeks.

James

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Upgrade to 128-Bit SSL Security!
http://us.click.yahoo.com/p7cEmB/s7qGAA/yigFAA/2U_rlB/TM
---------------------------------------------------------------------~->

To unsubscribe from this group, send an email to:
emacs-nxml-mode-unsubscribe <at> yahoogroups.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

James Clark | 1 Oct 2003 07:17

Character entities

Sebastian Rahtz wrote:

> indeed. it seems much better to me to leave that work
> entirely to Mule. I use an input method, for instance, which watches me
> type, and turns &eacute; into the right symbol on the fly

I think you're right that it's better for the solution to the problem of 
entering and displaying Unicode characters to be independent of 
nxml-mode, since it isn't an XML-specific problem.  However, I'm not yet 
ready to declare the whole problem of non-ASCII characters in nxml-mode 
solved.  I think there are legitimate reasons for not using Unicode 
characters directly:

- you might want to use an encoding that cannot encode a character that 
you want

- Emacs doesn't support all of Unicode yet (Unicode CJK and non-BMP 
ranges are not supported)

- you might not have a font that contains an appropriate glyph for a 
character

- you might have a font with a appropriate glyph but it might be too 
hard to distinguish from other characters (for example, in a fixed width 
font an emdash might be hard to distinguish from an endash)

All these problems can be solved using either an entity or character 
reference.  Character references have the big advantage of not requiring 
a DTD.  But they also have two big disadvantages compared to entities:

(Continue reading)

Sebastian Rahtz | 1 Oct 2003 08:51
Picon
Picon
Favicon
Gravatar

Re: Character entities

> I think there are legitimate reasons for not using Unicode 
> characters directly:

agreed in principle, but I am not as convinced as you

> - you might want to use an encoding that cannot encode a character that 
> you want

that is the author's choice, to make life hard

> - Emacs doesn't support all of Unicode yet (Unicode CJK and non-BMP 
> ranges are not supported)

thats true, and beyond our control

> - you might not have a font that contains an appropriate glyph for a 
> character

so we get a font ...

> - you might have a font with a appropriate glyph but it might be too 
> hard to distinguish from other characters (for example, in a fixed width 
> font an emdash might be hard to distinguish from an endash)

maybe

> Problem (a) is easily soluble by providing a command that allows you to 
> enter a character reference by specifying an entity name. 
Love's sgml-input could obviously be varied to do this in two shakes of
a lamb's tail
(Continue reading)

James Clark | 1 Oct 2003 10:43

Re: Character entities

Sebastian Rahtz wrote:

> I find this a bit  desparate, to be honest. it perpetuates those
> short names which, while familiar to SGML english-speaking long-timers,
> mean nothing to people new to the field. 

In an ideal world, everybody would have their environment set up to 
display all the characters they want and XML files wouldn't use 
character references. I still think this is some way off.  Even if you 
have got yourself properly set up, you may need to exchange files with 
somebody who hasn't.  So I would like to nxml mode to help people out here.

I think I would prefer to point people in the direction of character 
references rather than entities.  Your point about the short SGML names 
is well-taken. So what's the alternative?  One is to try and show a 
glyph for the referenced character as well as/instead of the reference. 
  Another possibility is to take advantage of the Unicode names.  These 
are a bit long to display inline, but they could be used for input and 
for providing a tooltip over a character reference.

> by the way, are there emacs commands to switch from character entities
> to UTF-8 and vice-versa?

Nope.  This is not entirely trivial since character references/entities 
aren't recognized in all contexts. Is this a feature request?

James

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Upgrade to 128-Bit SSL Security!
(Continue reading)

Sebastian Rahtz | 1 Oct 2003 09:43
Picon
Picon
Favicon
Gravatar

Re: Character entities

> > One is to try and show a 
> >glyph for the referenced character as well as/instead of the reference. 
> >  Another possibility is to take advantage of the Unicode names.  These 
> >are a bit long to display inline, but they could be used for input and 
> >for providing a tooltip over a character reference.

I quite like the idea of seeing &#2016;[X] where X is the
actual character if available. but how do you tell if it will
be available, and not just be a white box?
The full Unicode name as tooltip would be good, though.

> > by the way, are there emacs commands to switch from character entities
> > to UTF-8 and vice-versa?
> 
> Nope.  This is not entirely trivial since character references/entities 
> aren't recognized in all contexts.
ah, I had not considered that. this is attribute and element names?

>  Is this a feature request?
Perhaps. The same thing can be done with an identity transform
using some XML language, so its not vital; but an emacs solution
would be nice. If my file is full of white boxes, toggling  to a
display full of number codes might be a good alternative view.
--

-- 
Sebastian Rahtz <sebastian.rahtz <at> computing-services.oxford.ac.uk>
OUCS

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Upgrade to 128-Bit SSL Security!
http://us.click.yahoo.com/p7cEmB/s7qGAA/yigFAA/2U_rlB/TM
(Continue reading)

Lars Marius Garshol | 1 Oct 2003 11:56
Picon

Re: Character entities


* James Clark
| 
| In an ideal world, everybody would have their environment set up to
| display all the characters they want and XML files wouldn't use
| character references. I still think this is some way off.  Even if
| you have got yourself properly set up, you may need to exchange
| files with somebody who hasn't.  So I would like to nxml mode to
| help people out here.
| 
| I think I would prefer to point people in the direction of character 
| references rather than entities. 

I definitely agree with all of this. This is a real problem for
people, and character entities are the wrong solution. Character
references are much better, and the job of making them user-friendly
effectively rests with the editor.

| Your point about the short SGML names is well-taken. So what's the
| alternative?

I think having some form of name-to-character mapping is the way to
go, but perhaps there should be support for different kinds of names?
Some people might prefer the SGML entity names, others the LaTeX macro
names, and still others the names from the Unicode character database.

If there is a configurable mapping list with tab-completion I think
that might do the trick. I'd be perfectly happy to insert &#x2014; by
typing C-c something EM SPC S TAB RET, for example. 

(Continue reading)

James Clark | 1 Oct 2003 12:12

Re: Character entities

Sebastian Rahtz wrote:

> I quite like the idea of seeing &#2016;[X] where X is the
> actual character if available. but how do you tell if it will
> be available, and not just be a white box?

As far as I know, Emacs doesn't provide a way to tell, so nxml mode 
would have to guess. Based on the window-system, you can make a 
reasonable guess at a minimum set of Unicode characters that should be 
displayable.  The user could customize this to augment with particular 
Unicode blocks.  If the guess is occasionally wrong, it's not a big problem.

>>>by the way, are there emacs commands to switch from character entities
>>>to UTF-8 and vice-versa?
>>
>>Nope.  This is not entirely trivial since character references/entities 
>>aren't recognized in all contexts.
> 
> ah, I had not considered that. this is attribute and element names?

And comments and processing instructions.

James

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Upgrade to 128-Bit SSL Security!
http://us.click.yahoo.com/p7cEmB/s7qGAA/yigFAA/2U_rlB/TM
---------------------------------------------------------------------~->

To unsubscribe from this group, send an email to:
(Continue reading)

Bruce D'Arcus | 1 Oct 2003 15:44

Re: Bug fix release


On Tuesday, September 30, 2003, at 11:24  PM, James Clark wrote:

>> I wonder if this has something to do with interactions with
>> recent changes in nxml?
>
> Unlikely.  This has not changed significantly in nXML in the last two 
> weeks.

You're right.  The problem, FYI, was slightly different builds of emacs 
on the two machines.  Problem solved.  Ugh...

Bruce

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Upgrade to 128-Bit SSL Security!
http://us.click.yahoo.com/p7cEmB/s7qGAA/yigFAA/2U_rlB/TM
---------------------------------------------------------------------~->

To unsubscribe from this group, send an email to:
emacs-nxml-mode-unsubscribe <at> yahoogroups.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

Bruce D'Arcus | 2 Oct 2003 15:52

new xml weblog system


Only slightly off-topic, but has anyone seem this new xml/xslt-based 
weblog system called Syncato?

http://www.syncato.org/WK/blog/508?t=page

It uses the new Sleep Cat XML DB library for Berkeley DB.

This got me thinking: it probably wouldn't be too hard to turn emacs + 
nxml into a weblog editor.  I don't exactly understand how this system 
works, except that it uses "http requests" to communicate with the DB.  
Might there be some existing emacs mode that could handle this sort of 
thing?

Also, the system is only trivially a weblog system, and could easily be 
extended to cover other XML data (apparently most of the logic is 
implemented via xsl).  I'm interested in using it for bibliographic 
data and notes.

Bruce

------------------------ Yahoo! Groups Sponsor ---------------------~-->
Upgrade to 128-Bit SSL Security!
http://us.click.yahoo.com/p7cEmB/s7qGAA/yigFAA/2U_rlB/TM
---------------------------------------------------------------------~->

To unsubscribe from this group, send an email to:
emacs-nxml-mode-unsubscribe <at> yahoogroups.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
(Continue reading)

Xavier Cazin | 3 Oct 2003 09:16
Picon

Re: Character entities

Lars Marius Garshol <larsga <at> garshol.priv.no> writes:
> 
> I think having some form of name-to-character mapping is the way to
> go, but perhaps there should be support for different kinds of names?
> Some people might prefer the SGML entity names, others the LaTeX macro
> names, and still others the names from the Unicode character database.

I agree, provision for such 

[any typist-readable string] => [character reference]

mappings would be great. That would imply that user may provide nxml
with arbitray maps.

> If there is a configurable mapping list with tab-completion I think
> that might do the trick. I'd be perfectly happy to insert &#x2014; by
> typing C-c something EM SPC S TAB RET, for example. 

Me too.

> Yep. If the glyph is missing your best reference is really the Unicode
> code point and the Unicode name. The names are usually accurate and
> usually quite helpful.

Agreed again. But since code/names may be unconveniently long, I'd
rather see a toggle for missing glyphs that either displays a one
character long default glyph or the whole code+unicode name.

Would it be bad to represent those unicode names as empty elements
like <uni:zero-width-no-break-space code="65279" type="sepchar"/>. By
(Continue reading)


Gmane