Differences between xhtml.k and xml.k
(1) Usage of Latin-1 ISO-8859-1 character set based on the following parsed documents
Latin-1 characters http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
Special characters http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
(2) dddd; HHHH; hhhh; numerical entities are translated to &entity;
references. Internally all Latin-1 entity references will be translated to
8-bit "\311"-style octals and back again to &entity; references.
- characters in the 16-bit range like for ex., λ λ
will not be translated and stay as they are.
- "unknown &entities;" will be translated to "unknown &entities;"
(3) The octal representation of xml.k was discarded because the W3C specs only
allow ddd; or hh; numerical references. Besides that, reportedly
Netscape and MS Word cannot cope with hex references. HTML editors like
DreamWeaver also generate those opcode-like numerical references. Therefore
all numerical references will be parsed by DX but never emitted by XD.
Only *readable* &entity; references are to be found in a resulting XHTML text.
(4) attribute=value pairs containing space chars in the value string are parsed
correctly now. Also, value strings containing attr=value pairs themselves
will be recognized.
Valid white space after the last attr=value pairs is handled correctly, e.g.,
xhtml.k does no longer create dummy attributes tuples.
(5) Mixed-content as well as elements-only models are supported. If there is plain
text as well as container tags on the same axis the plain text part will be
represented by an empty-symbol tuple equivalent to <>text>, like in
(6) Although w3c-xhtml1.0 is not clear about the distinction between an empty
string in and a NULL string in
this implementation reduces empty container tags to empty tags according to the
reuirements of older browsers, for ex.
(7) All empty tags emitted get an additional space between the tag name and the
right terminating angle bracket, like in
(8) will be suppressed as required
(Not defined in the xhtml specs but nevertheless useful:)
ASP/JSP/KSP inline code <%code%>, <%=expr%>, <%@directive%>, <%!stmt%> will
be mapped to (`_;"[=@!]*code") tuples. This could also be rewritten for
, but don't mismatch it w/ xml PIs. The feature can easily be disabled
by removing the cloaking pp function.
Not implemented yet
escaping for <% code %>, , etc.
This is not a validating parser, however, the KML layer (see below) already
knows which tags are valid with each XHTML DTD.
Why I wrote xhtml.k
Last year I wrote KSP as a technology testbed in order to try out some web
technologies w/o the penalty of large piles of conventional code or -even
worse- no code at all. I never published the thing because it finally became was
it was, ironically: just another feature-complete, but boring template engine.
KSP basically consists of a standalone web application server written entirely
in K with features like HTTP/1.1 support, request / response / server /
session "objects", appl/page-level-security / url-rewrite, error trapping,
multiple cookies handling, emebbeded browser w/ chunked transfer-encoding,
mime-types as usual. The only part missing yet is an accelerator cache.
Now, by the end of last year I detected LAML (Kurt NÝrmark's Lisp abstracted markup
language) and decided to write a piece of software based on its principles in
K ("KML"), this being much more in the spirit of functional languages - and also
much more fun to use.
After a while I redesigned KML to support validating XHTML code. Whereas previous
versions generated xhtml pages directly (by text manipulation), the next version
should use a xhtml parser which makes page generation much more elegant and faster.
xhtml.k as well as kml.k (coming soon) use the same internal data structures
as proposed by Arthur Whitney's original xml.k parser.
Thanks to Christian Langreiter from Austria (of the KXR alias XML-RPC for K fame)
who participated in mail debates and tested all this stuff across European borders.