SuikaWiki Markup Language (SWML)

SuikaWiki Project, 18 November 2023

Latest version
https://suikawiki.github.io/spec-swml/spec/
Version history
https://github.com/suikawiki/spec-swml/commits/gh-pages

Abstract

This document defines the SWML syntax and the SWML vocabulary.

Table of contents

  1. 1 Introduction
    1. 1.1 History
  2. 2 Terminology
    1. 2.1 Namespaces
    2. 2.2 Definitions
  3. 3 The SWML text serialization
    1. 3.1 Document structure and header
    2. 3.2 Body part blocks
    3. 3.3 Inline contents
    4. 3.4 Images
    5. 3.5 Lexical structures
  4. 4 Parsing documents in the SWML text serialization
    1. 4.1 Tokenization of lines
      1. 4.1.1 The "initial" mode
      2. 4.1.2 The "body" mode
      3. 4.1.3 The "preformatted" mode
      4. 4.1.4 The "preformatted block" mode
      5. 4.1.5 The "image data" mode
    2. 4.2 Tokenization of a table row
    3. 4.3 Tokenization of a text
    4. 4.4 Parsing a magic line
    5. 4.5 Tree construction
      1. 4.5.1 The "in section" insertion mode
      2. 4.5.2 The "in table row" insertion mode
      3. 4.5.3 The "in paragraph" insertion mode
  5. 5 Serializing SWML text serialization documents
  6. 6 Element definitions for the SWML text serialization
  7. 7 The SWML XML serialization
  8. 8 SWML MIME types
  9. 9 Semantics of elements and attributes
    1. 9.1 Document structures
      1. 9.1.1 The document element in the SuikaWiki/0.9 namespace
      2. 9.1.2 The Name attribute in the SuikaWiki/0.9 namespace
      3. 9.1.3 The Version attribute in the SuikaWiki/0.9 namespace
      4. 9.1.4 The parameter element in the SuikaWiki/0.9 namespace
      5. 9.1.5 The value element in the SuikaWiki/0.9 namespace
      6. 9.1.6 The class attribute
      7. 9.1.7 The id attribute
      8. 9.1.8 The itemprop attribute
      9. 9.1.9 The xml:lang attribute
    2. 9.2 Blocks
      1. 9.2.1 The dr element in the SuikaWiki/0.9 namespace
      2. 9.2.2 The comment-p element in the SuikaWiki/0.10 namespace
      3. 9.2.3 The history element in the SuikaWiki/0.9 namespace
      4. 9.2.4 The example element in the SuikaWiki/0.9 namespace
      5. 9.2.5 The preamble element in the SuikaWiki/0.9 namespace
      6. 9.2.6 The postamble element in the SuikaWiki/0.9 namespace
      7. 9.2.7 The box element in the SuikaWiki/0.9 namespace
      8. 9.2.8 The sw-items element in the SuikaWiki/0.9 namespace
      9. 9.2.9 The sw-itemtypes element in the SuikaWiki/0.9 namespace
    3. 9.3 Dialogues
      1. 9.3.1 The dialogue element in the SuikaWiki/0.9 namespace
      2. 9.3.2 The talk element in the SuikaWiki/0.9 namespace
      3. 9.3.3 The speaker element in the SuikaWiki/0.9 namespace
    4. 9.4 Hyperlinks
      1. 9.4.1 The anchor element in the SuikaWiki/0.9 namespace
      2. 9.4.2 The anchor-internal element in the SuikaWiki/0.9 namespace
      3. 9.4.3 The anchor-end element in the SuikaWiki/0.9 namespace
      4. 9.4.4 The anchor attribute in the SuikaWiki/0.9 namespace
      5. 9.4.5 The anchor-external element in the SuikaWiki/0.9 namespace
      6. 9.4.6 The resScheme attribute in the SuikaWiki/0.9 namespace
      7. 9.4.7 The resParameter attribute in the SuikaWiki/0.9 namespace
    5. 9.5 Embedded objects
      1. 9.5.1 The form element in the SuikaWiki/0.9 namespace
      2. 9.5.2 The image element in the SuikaWiki/0.9 namespace
      3. 9.5.3 The replace element in the SuikaWiki/0.9 namespace
      4. 9.5.4 The text element in the SuikaWiki/0.9 namespace
    6. 9.6 Citations
      1. 9.6.1 The csection element in the SuikaWiki/0.10 namespace
      2. 9.6.2 The src element in the SuikaWiki/0.10 namespace
      3. 9.6.3 The refs element in the SuikaWiki/0.9 namespace
      4. 9.6.4 The sw-see element in the SuikaWiki/0.9 namespace
    7. 9.7 Writing directions
      1. 9.7.1 The sw-l element in the SuikaWiki/0.9 namespace
      2. 9.7.2 The sw-lt element in the SuikaWiki/0.9 namespace
      3. 9.7.3 The sw-r element in the SuikaWiki/0.9 namespace
      4. 9.7.4 The sw-rt element in the SuikaWiki/0.9 namespace
      5. 9.7.5 The sw-v element in the SuikaWiki/0.9 namespace
      6. 9.7.6 The sw-vt element in the SuikaWiki/0.9 namespace
      7. 9.7.7 The sw-vb element in the SuikaWiki/0.9 namespace
      8. 9.7.8 The sw-vbt element in the SuikaWiki/0.9 namespace
      9. 9.7.9 The sw-tate element in the SuikaWiki/0.9 namespace
      10. 9.7.10 The yoko element in the SuikaWiki/0.9 namespace
      11. 9.7.11 The sw-mirrored element in the SuikaWiki/0.9 namespace
      12. 9.7.12 The sw-left element in the SuikaWiki/0.9 namespace
      13. 9.7.13 The sw-right element in the SuikaWiki/0.9 namespace
      14. 9.7.14 The sw-vlr element in the SuikaWiki/0.9 namespace
      15. 9.7.15 The sw-vrl element in the SuikaWiki/0.9 namespace
      16. 9.7.16 The sw-leftbox element in the SuikaWiki/0.9 namespace
      17. 9.7.17 The sw-rightbox element in the SuikaWiki/0.9 namespace
      18. 9.7.18 The sw-leftbtbox element in the SuikaWiki/0.9 namespace
      19. 9.7.19 The sw-rightbtbox element in the SuikaWiki/0.9 namespace
      20. 9.7.20 The sw-vlrbox element in the SuikaWiki/0.9 namespace
      21. 9.7.21 The sw-vrlbox element in the SuikaWiki/0.9 namespace
    8. 9.8 Inline structures
      1. 9.8.1 The fenced element in the SuikaWiki/0.9 namespace
      2. 9.8.2 The openfence element in the SuikaWiki/0.9 namespace
      3. 9.8.3 The fencedtext element in the SuikaWiki/0.9 namespace
      4. 9.8.4 The closefence element in the SuikaWiki/0.9 namespace
      5. 9.8.5 The lines element in the SuikaWiki/0.9 namespace
      6. 9.8.6 The line element in the SuikaWiki/0.9 namespace
    9. 9.9 Editorial annotations
      1. 9.9.1 The insert element in the SuikaWiki/0.9 namespace
      2. 9.9.2 The delete element in the SuikaWiki/0.9 namespace
      3. 9.9.3 The ed element in the SuikaWiki/0.10 namespace
      4. 9.9.4 The asis element in the SuikaWiki/0.9 namespace
      5. 9.9.5 The snip element in the SuikaWiki/0.9 namespace
    10. 9.10 Inline annotations
      1. 9.10.1 The emph element in the SuikaWiki/0.9 namespace
      2. 9.10.2 The rubyb element in the SuikaWiki/0.9 namespace
      3. 9.10.3 The okuri element in the SuikaWiki/0.9 namespace
      4. 9.10.4 The weak element in the SuikaWiki/0.9 namespace
      5. 9.10.5 The title element in the SuikaWiki/0.10 namespace
    11. 9.11 Mathematical representations
      1. 9.11.1 The dotabove element in the SuikaWiki/0.9 namespace
      2. 9.11.2 The sw-macron element in the SuikaWiki/0.9 namespace
      3. 9.11.3 The vector element in the SuikaWiki/0.9 namespace
      4. 9.11.4 The subsup element in the SuikaWiki/0.9 namespace
      5. 9.11.5 The subscript element in the SuikaWiki/0.9 namespace
      6. 9.11.6 The superscript element in the SuikaWiki/0.9 namespace
    12. 9.12 Values
      1. 9.12.1 The f element in the SuikaWiki/0.9 namespace
      2. 9.12.2 The key element in the SuikaWiki/0.10 namespace
      3. 9.12.3 The n element in the SuikaWiki/0.9 namespace
      4. 9.12.4 The lat element in the SuikaWiki/0.9 namespace
      5. 9.12.5 The lon element in the SuikaWiki/0.9 namespace
      6. 9.12.6 The tz element in the SuikaWiki/0.9 namespace
      7. 9.12.7 The cc element in the SuikaWiki/0.9 namespace
      8. 9.12.8 The cn element in the SuikaWiki/0.9 namespace
      9. 9.12.9 The ch element in the SuikaWiki/0.9 namespace
      10. 9.12.10 The sw-value element in the SuikaWiki/0.9 namespace
      11. 9.12.11 The attrvalue element in the SuikaWiki/0.10 namespace
    13. 9.13 Conformance keywords
      1. 9.13.1 The MUST element in the SuikaWiki/0.9 namespace
      2. 9.13.2 The SHOULD element in the SuikaWiki/0.9 namespace
      3. 9.13.3 The MAY element in the SuikaWiki/0.9 namespace
    14. 9.14 Physical representations
      1. 9.14.1 The sw-cursive element in the SuikaWiki/0.9 namespace
      2. 9.14.2 The smallcaps element in the SuikaWiki/0.9 namespace
      3. 9.14.3 The sw-br element in the SuikaWiki/0.9 namespace
    15. 9.15 Qualified names
      1. 9.15.1 The qn element in the SuikaWiki/0.10 namespace
      2. 9.15.2 The qname element in the SuikaWiki/0.10 namespace
      3. 9.15.3 The nsuri element in the SuikaWiki/0.10 namespace
    16. 9.16 Fallback elements
      1. 9.16.1 Uppercase elements in the SuikaWiki/0.10 namespace
  10. 10 Security
  11. References
    1. Normative references
  12. Tests and implementation
  13. Author

1 Introduction

This section is non-normative.

This specification defines SuikaWiki Markup Language (SWML). SWML is the markup language developed and implemented for SuikaWiki hypertext system. It is also used by other systems as a human-friendly document authoring format.

1.1 History

SuikaWiki's Wiki syntax (now known as SWML text serialization) derived from WalWiki, which derived from YukiWiki, in .

The first specification of the extended language, SuikaWiki/0.9 Document Markup Format: Syntax Specification, was published in and was frequently updated until .

Then several updates to the language, known as SuikaWiki/0.10, were (incompletely) defined by following documents:

These "two" versions of the language was merged and rewritten as the SWML specification in 2008.

Previous versions of the SWML specification were published at https://suika.suikawiki.org/www/markup/suikawiki/spec/swml-work.

There is an obsolete list of possible new features that might have been introduced in a revision of this specification.

Revisions of the SWML specification are now available in the GitHub repository.

2 Terminology

This specification depends on the Infra Standard.

2.1 Namespaces

For historical reason, different elements and attributes defined or used in this specification belong to different namespaces.

The AA namespace is http://pc5.2ch.net/test/read.cgi/hp/1096723178/aavocab#. The preferred prefix is aa. The aa element is defined by Strict-HTML スレッド24 >>16.

The HTML namespace's preferred prefix is html.

Following elements are defined by the HTML Standard:

Following atributes are defined by the HTML Standard:

The HTML3 namespace is urn:x-suika-fam-cx:markup:ietf:html:3:draft:00:. The note element is defined in the HyperText Markup Language Specification Version 3.0 and belongs to the HTML3 namespace for the purpose of this specification.

The MathML namespace's preferred prefix is math. Following elements are defined by the MathML specification:

The SuikaWiki/0.9 namespace is urn:x-suika-fam-cx:markup:suikawiki:0:9:. The preferred prefix is sw.

The SuikaWiki/0.10 namespace is urn:x-suika-fam-cx:markup:suikawiki:0:10:. The preferred prefix is sw10.

The XHTML2 namespace is http://www.w3.org/2002/06/xhtml2/.

The XML namespace's preferred prefix is xml.

The urn:x-suika-fam-cx: URN namespace is defined by The <urn:x-suika-fam-cx:> Namespace.

2.2 Definitions

The terms byte sequence, code point, string, starts with, split on ASCII whitespace, ASCII case-insensitive, concatenate, list, is empty, append, ordered map, key, value, exists, for each, forgiving-base64 encode, forgiving-base64 decode, HTML namespace, MathML namespace, XML namespace, XMLNS namespace, are defined by the Infra Standard.

The term Vertical_Orientation is defined by The Unicode Standard and UAX #50.

Terms node tree, node, parent, child, descendant, ancestor, append, element, create an element, element's local name, element's namespace, element's namespace prefix, descendant text content, attribute, has an attribute, get an attribute value, set an attribute value, attribute's local name, attribute's namespace, and attribute's namespace prefix, attribute's value, attribute's element, list of elements with namespace and local name, are defined by the DOM Standard.

Terms applicable specification, expected, content attribute, IDL attribute, valid integer, rules for parsing integers, valid e-mail address, set of space-separated tokens, HTML element, represents, inter-element whitespace, text, flow content, phrasing content, script-supporting elements, nothing, item, item types, and property names are defined by the HTML Standard.

Terms convert, UTF-8, and UTF-8 decode are defined by the Encoding Standard.

Terms URL and valid URL string are defined by the URL Standard.

Terms MIME type, essence, and XML MIME type are defined by the MIME Sniffing Standard.

White space characters are U+0009 CHARACTER TABULATION and U+0020 SPACE.

Digits are characters in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE.

Uppercase letters are characters in the range U+0041 LATIN CAPITAL LETTER A .. U+005A LATIN CAPITAL LETTER Z.

Lowercase letters are characters in the range U+0061 LATIN SMALL LETTER A .. U+007A LATIN SMALL LETTER Z.

Language tag characters are digits, uppercase letters, lowercase letters, and U+002D HYPHEN-MINUS.

Scheme characters are digits, uppercase letters, lowercase letters, U+0025 PERCENT SIGN, U+002A PLUS SIGN, U+002D HYPHEN-MINUS, U+002E FULL STOP and U+005F LOW LINE.

A language specification is a string consist of a @ character followed by zero or more language tag characters. The body of a language specification is the substring in the language specification except for the first @ character. It might be the empty string.

Semantically, the body of a language specification represents a language tag, similar to the xml:lang attribute.

Terms document white space and preserved white space are defined by the CSS Text specification.

3 The SWML text serialization

This section is non‐normative.

Obviously, this section is incomplete; some prose definition is not yet available; some xrefs does not work yet. It should be specified why this is non-normative. ABNF definition & charset consideration need to be addressed.

Both prose and ABNF descriptions are non-normative. The conformance of a SWML text serialization document is defined in terms of the parser and its output.

Conformance checking steps

3.1 Document structure and header

A document in the SWML text serialization consists of three parts: header part, body part, and optional image.

Several construct in a document refers page. A page is a unit of data in a hypertext database. The name of a page is sometimes referred to as WikiName. A page sometimes represents or is associated with an image. How to implement these concept, including how to resolve WikiNames, is not defined in this specification.

document
= header-part body-part [obs-image]

The header part has to be empty. In previous versions of SWML, a magic line could be contained, and in fact was required in some versions, in the header part of a document.

A magic line has to contain a string #?, followed by the format name, followed by a / character, followed by the format version. They identifies the version of the markup language in which the document is written. Historically, only two combinations of format name and format version as shown in the table below were defined, used, and implemented:

Format name Format version Description
SuikaWiki 0.9 The SuikaWiki/0.9 markup language.
SuikaWikiImage 0.9 The SuikaWikiImage/0.9 markup language.

A magic line can contain zero or more parameters after the format version. A parameter consists of one or more white space characters, followed by the name, followed by a = character, followed by a quoted string whose value representing zero or more values separeted by a , character. A parameter value consists of zero or more text characters except for the separator character ,. Historically, following combinations of parameter names and values was defined and used:

Name Values Description
default-name Zero or more text characters except for , The value represetns the default user name for WikiForm input fields. Exactly one value can be specified. The default when this parameter is implementation dependent.
import Zero or more text characters except for , A value represents the WikiName by which definitions for entity references are imported. When this parameter is not specified, no definition is imported.
interactive yes or no Value yes means that the document contains an interactive content such as WikiForm. Value no, the default value used when the parameter is not specified, means the document does not contain such a content. It was intended to be used for the convinience of cache control mechanisms.
obsolete yes or no Value yes means the content of the document is obsolete, and value no, the default value used when the parameter is not specified, means the content is not obsolete.

The parameter name obsolete was defined in the SuikaWiki/0.9 specification, but the parameter name that had been actually implemented in SuikaWiki2 and used was the parameter name obsoleted.

obsoleted
page-icon Zero or more text characters except for , The value represents the WikiName by which the page icon is imported. The page icon can be used as favicon @@ [ref], for example. Exactly one value can be specified. The default when this parameter is implementation dependent.
image-alt Zero or more text characters except for , The value represents the alternative text for the image embedded in the document. Exactly one value can be specified. The default when this parameter is the empty string.
image-type An Internet Media Type with no parameter, white spaces, comments The value represents the type of the image embedded in the document. Exactly one value can be specified. This parameter has to be specified when the document contains an image.

The order in which parameters are specified is not significant. The parameter name of a parameter has to be different from the parameter name of any other parameter.

A magic line has to be terminated by zero or more white space characters followed by a newline.

header-part
= [obs-magic-line]
obs-magic-line
= "#?" format-name "/" format-version *(1*white-space parameter) *white-space newline
format-name
= identifier
format-version
= identifier
parameter
= parameter-name "=" quoted-string
parameter-name
= identifier
parameter-value-list
= [parameter-value *("," parameter-value)]
parameter-value
:= *(char − ",")

3.2 Body part blocks

The body part of a document consists of zero or more blocks.

There are several kinds of blocks: paragraphs, headings, lists, labeled lists, quotations, preformatted paragraphs, editted sections, tables, editorial notes, comment paragraphs, hrs, and empty blocks. In addition, forms and entity references can also be used as blocks.

Empty blocks, which is represented by an empty line, can be inserted between any two blocks. It is sometimes necessary to prevent a block from being interpreted as a part of the previous block.

For example, consider the following fragment:

- List item.
This line is part of the list item.

The second line is part of the list, by definition. If it is not desired, an empty block can be inserted between two lines as:

- List item.

This line is not part of the list item.

... such that the third line represents a paragraph.

body-part
= *block
block
= paragraph / heading / list / labeled-list / quotation / preformatted-paragraph / section-block / table / editorial-note / comment-paragraph / hr / empty-block / form / entity-reference
empty-block
= newline

A paragraph represents a unit of the text, similar to HTML's p element. It consists of an optional destination anchor number, followed by a line contents, followed by a newline, followed by zero or more block children.

A paragraph cannot begin with a form or entity reference, since it is treated as a block when it appears at the beginning of a line. A paragraph cannot begin with a white space character, since it is treated as a preformatted paragraph then.

A block child is one of an optional destination anchor number followed by line contents followed by a newline, a list, a labeled list, a preformatted paragraph, an section block, a table, an editorial note, a comment paragraph, or an hr.

An editorial note represents an editorial note. It is represented by a string @@, followed by zero or more white space characters, followed by zero or more block children.

A comment paragraph represents a note. It is represented by a string ;;, followed by zero or more white space characters, followed by zero or more block children.

An hr represents a break in the run of blocks in which it occurs, smilar to the HTML hr element. It is represented by a string -*-*-, followed by an optional class specification, followed by zero or more white space characters, finally followed by a newline.

paragraph
= [destination-anchor-number] line-contents newline *block-child
comment-paragraph
= ";;" *white-space [destination-anchor-number] [line-contents] newline *block-child
editorial-note
= "@@" *white-space [destination-anchor-number] [line-contents] newline *block-child
hr
= "-*-*-" [class-specification] *white-space newline
block-child
= [destination-anchor-number] line-contents newline / list / labeled-list / preformatted-paragraph / section-block / table / editorial-note / comment-paragraph / hr

A heading introduces a section. It is represented by one or more * characters, followed by zero or more white space characters, optionally followed by a destination anchor number, optionally followed by line contents, followed by a newline. The number of the * represents the depth of the section. A heading with only one * character begins a larger section than a heading with more than one * characters. The line contents represents the name or caption for the section.

heading
= 1*"*" *white-space [destination-anchor-number] [line-contents] newline

There are three kinds of lists: ordered lists, unordered lists, and labeled lists. Ordered lists and unordered lists are called lists in this specification.

A list consists of zero or more items. An item in the list is represented by one or more - or = characters, followed by zero or more white space characters, optionally followed by a destination anchor number, optionally followed by line contents, followed by a newline, followed by zero or more block children. The number of - or = characters at the beginning of the item represents the depth of the list. In a list, depth of items has to be the same value. If there is another list in block children, it's items' depth has to be greater than the depth of the parent item. The last character that represents the depth of an item indicates the type of the list: - indicates unordered list while = indicates ordered list. In a list all items has to be same type.

A labeled list consists of one or more labeled list items. A labeled list item is represented by a : character, followed by zero or more white space characters, optionally followed by a destination anchor number, optionally followed by line contents, followed by zero or more white space characters, followed by a : character, followed by a destination anchor number, followed by zero or more white space characters, optionally followed by line contents, followed by newline, followed by zero or more block chidlren. The former line contents, if any, represents the label. Block children cannot contain a labeled list.

list
= 1*list-item
list-item
= 1*("-" / "=") *white-space [destination-anchor-number line-contents] newline *block-child
labeled-list
= 1*labeled-list-item
labeled-list-item
= ":" *white-space [destination-anchor-number] [line-contents] *white-space [destination-anchor-number] [line-contents] newline *block-child

The following example contains no quotation:

>>1 This is a reference, not a quote.
quotation
= 1*quoted-block
quoted-block
= 1*">" *white-space (paragraph / editorial-note / comment-paragraph / newline)
preformatted-paragraph
= preformatted-paragraph-block / obs-preformatted-paragraph
preformatted-paragraph-block
= '[PRE[' [class-specification] "[" *white-space newline *([destination-anchor-number] [line-contents] newline) ']PRE]' *white-space
obs-preformatted-paragraph
= white-space [line-contents] newline *([destination-anchor-number] [line-contents] newline)

A section block is a marked section of zero or more blocks, preceded by a section block start tag and followed by a section block end tag.

A section block start tag is a [ character, followed by a section block tag name, followed by an optional class specification, followed by a [ character, followed by zero or more white space characters, optionally followed by line contents, followed by a newline.

Whether the line contents component is allowed or not and its semantics depends on the section block tag name.

For example, the line contents component of a FIG block represents a caption (i.e. a short form of FIGCAPTION child.

A section block end tag is a ] character, followed by a section block tag name, followed by a ] character, followed by zero or more white space characters, followed by a newline.

A section block tag name represents the type of the section block. The section block tag name of a section block has to be the same value. Their semantics are described by the Block Element Table.

section-block
= '[' tag-name [class-specification] "[" *white-space [destination-anchor-number] [line-contents] newline body-part ']' tag-name ']' *white-space newline

A table represents a two-dimensional tabular data. It is similar to HTML table element, but what can be represented is even narrower than HTML table model. A table consists of one or more table rows. A table row consists of one or more table cells. Syntactically a table row is followed by a newline.

There are three kinds of table cells: data cells, header cells, and colspan cells. The first cell in a row has to be a data cell or a header cell. Syntactically a cell is preceded by a , character followed by zero or more white space characters, and is followed by zero or more white space characters.

A data cell represents a cell that contains data, like HTML td element. Likewise, a header cell represents a cell that contains data, like HTML th element. The data of a header cell has to be preceded by a * character. The cell consists of an optional destination anchor number, optionally followed by line contents. Syntactically, the cell can be provided as a quoted string, in which case its value is interpreted as an optional destination anchor number, optionally followed by line contents.

A colspan cell represents that the cell that would be placed there forms an integrated part of the cell just before that cell. The cell just before that cell might also be a colspan cell.

table
= 1*table-row
table-row
= "," data-cell *("," cell) newline
cell
= data-cell / header-cell / colspan-cell
data-cell
= *white-space ([cstartchar *cchar] / quoted-string) *white-space
header-cell
= *white-space "*" *white-space ([cstartchar *cchar] / quoted-string) *white-space
cstartchar
= char − ("," / %x22 / white-space)
cchar
= char − ","
colspan-cell
= "=="

3.3 Inline contents

Need prose definitions...

line-contents
= 1*(text / anchor-internal / anchor-external / anchor / tagged-inline-element / form / strong / emphasis / entity-reference)
text
= 1*char
External reference scheme Syntax of external reference parameter Semantics
IMG valid URL string URL of an image
IW (identifier / quoted-string) ":" (identifier / quoted-string) InterWiki reference (An InterWikiName followed by a parameter)
M valid URL string URL of an embedded object
MAIL valid e-mail address Internet mail address
URI valid URL string URL
URL valid URL string URL

InterWiki is a mechanism for the hyperlinking and the combination of an InterWikiName and a parameter identifies the destination of the link. The interpretation of an InterWiki link is implementation dependent.

External reference schemes URI and URL ought not to be used.

destination-anchor-number
= "[" 1*DIGIT "]"
anchor-internal
= ">>" 1*DIGIT
anchor-external
= "<" external-reference ">"
external-reference
= URL / external-reference-scheme ":" external-reference-parameter
URL
= 1*uschar ":" external-reference-parameter
external-reference-scheme
= 1*xschar
external-reference-parameter
= *(char − ("<" / ">" / %x22) / quoted-string)
uschar
= char − (":" / UALPHA)
xschar
= char − (":" / LALPHA)
anchor
= "[[" [line-contents] [inline-middle-tag [line-contents]] inline-end-tag
Tag name Number of middle tags Internal reference source anchor External reference source anchor Semantics
AA 0 Not allowed Not allowed So-called ASCII-art (aa element)
ABBR 0 or 1 Not allowed Not allowed Abbreviation (HTML abbr element)
ASIS 1 Not allowed Not allowed “As-is” annotation (asis element)
B 0 Not allowed Not allowed Bold (or sans-serif) text (HTML b element)
BR 0 Not allowed Not allowed Explicit line break (sw-br element)
CC 0 or 1 Not allowed Not allowed Character code (sw-cc element)
CH 0 Not allowed Not allowed Character (sw-ch element)
CITE 0 Not allowed Not allowed Title of a work (HTML cite element)
CN 0 or 1 Not allowed Not allowed Character name (sw-cn element)
CODE 0 or 1 Not allowed Not allowed Code (HTML code element)
CURSIVE 0 Not allowed Not allowed Cursive text (sw-cursive element)
CSECTION 0 Not allowed Not allowed Title of a section in a work (csection element)
DATA 0 or 1 Allowed Allowed Data (HTML data element)
DEL 0 Allowed Allowed Removal (HTML del element)
DFN 0 or 1 Not allowed Not allowed Defined term (HTML dfn element)
DOTABOVE 0 Not allowed Not allowed Text with dot above (dotabove element)
EMPH 0 Not allowed Not allowed Emphasized text (emph element)
F 0 Not allowed Not allowed Field name (f element)
FENCED 0, 1, or 2 Not allowed Not allowed Text enclosed by parentheses (fenced element)
FRAC 1 Not allowed Not allowed Fraction (MathML mfrac element)
I 0 Not allowed Not allowed Italic text (HTML i element)
INS 0 Allowed Allowed Insertion (HTML ins element)
KBD 0 Not allowed Not allowed User input (HTML kbd element)
KEY 0 Not allowed Not allowed Keyboard's key (key element)
L 0 Not allowed Not allowed Horizontal left-to-right text (sw-l element)
LAT 0 or 1 Not allowed Not allowed Latitude (lat element)
LINES 1 or more Not allowed Not allowed Set of sublines (lines element)
LON 0 or 1 Not allowed Not allowed Longitude (lon element)
LT 0 Not allowed Not allowed Horizontal left-to-right, turned text (sw-lt element)
MACRON 0 Not allowed Not allowed Text with macron (sw-macron element)
MAY 0 Not allowed Not allowed RFC 2119 keyword "MAY" (MAY element)
MIRRORED 0 Not allowed Not allowed Mirrored text (sw-mirrored element)
MUST 0 Not allowed Not allowed RFC 2119 keyword "MUST" (MUST element)
N 0 or 1 Not allowed Not allowed Number (n element)
OKURI 0 or 1 Not allowed Not allowed Okuri-gana annotations in Kambun (okuri element)
Q 0 Allowed Allowed Quotation (HTML q element)
QN 0 or 1 Not allowed Not allowed Qualified name (qn element)
R 0 Not allowed Not allowed Horizontal right-to-left text (sw-r element)
ROOT 1 Not allowed Not allowed Root (MathML mroot element)
RT 0 Not allowed Not allowed Horizontal right-to-left, turned text (sw-rt element)
RUBY 1 or 2 Not allowed Not allowed Ruby annotation (HTML ruby element)
RUBYB 1 Not allowed Not allowed Secondary ruby annotation (rubyb element)
SAMP 0 Not allowed Not allowed Sample (HTML samp element)
SEE 0 Not allowed Not allowed "See also" references (sw-see element)
SHOULD 0 Not allowed Not allowed RFC 2119 keyword "SHOULD" (SHOULD element)
SMALLCAPS 0 Not allowed Not allowed Small-caps text (smallcaps element)
SNIP 0 Not allowed Not allowed Snipped point annotation (snip element)
SPAN 0 or 1 Not allowed Not allowed Span of text (HTML span element)
SQRT 0 Not allowed Not allowed Square root (MathML msqrt element)
SRC 0 Not allowed Not allowed Short annotation for citation (src element)
SUBSUP 1 Not allowed Not allowed Pair of subscript and superscript (subsup element)
SUB 0 Not allowed Not allowed Subscript (HTML sub element)
SUP 0 Not allowed Not allowed Superscript (HTML sup element)
TATE 0 Not allowed Not allowed Vertical text (sw-tate element)
TIME 0 or 1 Not allowed Not allowed Date or time (HTML time element)
TZ 0 or 1 Not allowed Not allowed Time zone offset (tz element)
U 0 Not allowed Not allowed Underlined text (HTML u element)
UNDER 0 Not allowed Not allowed Text with an annotation under it (MathML munder element)
UNDEROVER 1 Not allowed Not allowed Text with annotations under and over it (MathML munderover element)
V 0 Not allowed Not allowed Vertical top-to-bottom text (sw-v element)
VB 0 Not allowed Not allowed Vertical bottom-to-top text (sw-vb element)
VT 0 Not allowed Not allowed Vertical top-to-bottom, turned text (sw-vt element)
VBT 0 Not allowed Not allowed Vertical bottom-to-top, turned text (sw-vbt element)
VAR 0 Not allowed Not allowed Variable (HTML var element)
VECTOR 0 Not allowed Not allowed Text with vector symbol (vector element)
WEAK 0 Not allowed Not allowed Small print (weak element)
YOKO 0 Not allowed Not allowed Horizontal text (yoko element)

A future revison to this specification might define more tag names.

An inline start tag whose tag name is INS or DEL might not be placed at the beginning of a line contents construct, since it could be interpreted as a block start tag.

A class specification represents properties for the construct including it, such as class names. The class specification syntactically consist of a ( character followed by the body of the class specification followed by a ) character. The body of a class specification consists of zero or more text characters excluding (, ), and \.

A body of a class specification is a set of space-separated tokens with the following tokens:

A # character followed by zero or more characters
It is a # character followed by an ID value (which is semantically equivalent to the HTML id attribute). At most one ID value is allowed in a class specification.
A . character followed by zero or more characters
It is a . character followed by an item type value (which is semantically equivalent to a token in the HTML itemprop attribute).
Any other value
It is a class value (which is semantically equivalent to a token in the HTML class attribute).
tagged-inline-element
= inline-start-tag [line-contents] *(inline-middle-tag [line-contents]) inline-end-tag
inline-start-tag
= "[" tag-name [class-specification] [language-specification] "["
tag-name
= 1*LALPHA
class-specification
= "(" *clchar ")"
clchar
= char − ("(" / ")" / "\")
language-specification
= "@" *ltchar
ltchar
= ALPHA / DIGIT / "-"
inline-middle-tag
= "]" *white-space [language-specification] "["
inline-end-tag
= "]" [anchor-internal / anchor-external] "]"

The form name specification, if any, defines the name of the form. It has to be different from any other form name defined in the document. A form name specification is syntactically class specification and the body of it is the form name. A form name cannot contain white space characters.

Specific form name Syntax of specific form parameters Semantics
comment Empty Comment input form.
embed ['IMG:'] identifier Embedding another page. The parameter specifies the WikiName of the page embedded. If the parameter begins with a string IMG:, the page is embedded as an image and the string does not form the part of the WikiName.
form N/A Reserved.
rcomment Empty Comment input form; a new comment is inserted after the form.
searched identifier Insert a search result for the parameter.

The form is an extension mechanism for the SWML text serialization. ...

The generic form can be used to embed a WikiForm specification. WikiForm provides a generic framework for describing user input forms and templates used for processing form inputs.

Three form fields in a form represents input template, output template, and options. Interpretation and processing for these fields are implementation dependent.

The name form cannot be used.

Names embed, rcomment, and searched are obsolete and cannot be used.

form
= generic-form / specific-form
generic-form
= "[[#" 'form' [form-name-specification] ":" form-field ":" form-field [":" form-field] "]]"
form-name-specification
= class-specification
form-field
= "'" *(char − ("'" / "\") / quoted-pair) "'"
specific-form
= "[[#" specific-form-name [":" specific-form-parameters] "]]"
specific-form-name
= 1*(LALPHA / "-")
specific-form-parameters
= identifier *(":" identifier)
strong
= "'''" [line-contents] "'''"
emphasis
= "''" [line-contents] "''"

3.4 Images

A document can contain an image by including a string __IMAGE__ followed by a newline followed by Base64-encoded image data, at the end of the document. Parameters image-type and image-alt provide metadata for the image.

obs-image
= '__IMAGE__' *char

3.5 Lexical structures

An entity reference is a string __&&, followed by a entity reference name, followed by a string &&__.

An entity reference name is a string of one or more text characters that does not contain substring &&__.

If the Character Reference Table [entity reference name] exists, the entity reference represent a character Character Reference Table [entity reference name].

Otherwise, the entity reference is an obsolete entity reference, which was expected to be replaced by a fragment imported from another document. It is no longer supported.

entity-reference
= "__&&" 1*char "&&__"
entity-reference-name
= 1*char − *char "&&__" *char

A text character is a code point that is not U+000A or U+000D.

A newline is a U+000D character, a U+000A character, or a U+000D character followed by a U+000A character.

A quoted string is zero or more text characters preceded by a " character and followed by a " character. In a quoted string, character \ can only be used as part of quoted pair. A quoted pair is \ followed by a text character. The value of a quoted string is the string obtained by removing the first and last " characters removing the \ characters at the beginning of any quoted pair.

identifier
= 1*(ALPHA / DIGIT / "-" / non-ascii)
non-ascii
= char − %x00-7f
char
= <Any character> − (%x0d / %x0a)
quoted-string
= %x22 *(char − ("\" / %x22) / quoted-pair) %x22
quoted-pair
= "\" char
newline
= %x0d %x0a / %x0d / %x0a
white-space
= %x09 / %x20

4 Parsing documents in the SWML text serialization

This section specifies how to convert a string into a node tree, assuming the string is written in the SWML text serialization. This process is referred to as parsing and an implementation that performes this process is referred to as parser.

How to convert a byte sequence into a string is outside of the scope of this specification.

The parser MUST run these steps to parse a string string:

  1. Let document be a new Document.
  2. Let html be the result of creating an html element in the HTML namespace in document.
  3. Set xmlns attribute in the XMLNS namespace to the HTML namespace.
  4. Append html to document
  5. Let head be the result of creating a head element in the HTML namespace.
  6. Append head to html.
  7. Let body be the result of creating a body element in the HTML namespace.
  8. Append body to html.
  9. Let tokens be the result of running the tokenization stage with string.
  10. Run the tree construction stage with tokens and document.
  11. Return document.

Following definitions are used to describe the parsing algorithm:

4.1 Tokenization of lines

When a string of characters is tokenized, the string s MUST be processed as follows:

  1. Let pos be zero (0). It represents the index in s. The index of the first character in data is zero (0).
  2. If pos is greater than or equal to the length of s, then emit an end-of-file token and abort these steps.
  3. Let line be the empty string.
  4. If the posth character of s is U+000D CARRIAGE RETURN, process line. Set line to the empty string. If the (pos + 1)th character of s is U+000A LINE FEED, increment pos by one (1).
  5. Otherwise, if the posth character of s is U+000A LINE FEED, process line. Set line to the empty string.
  6. Otherwise, append the posth character of s to line.
  7. Increase pos by one (1).
  8. Go back to the fourth step of these steps.

The steps above emit one or more sequence of tokens, which are inputs to the tree construction stage. A token can have zero or more properties, depending on the kind of the token. There are several kinds of tokens and properties as follows:

Block start tag token
Classes and tag name properties.
Block end tag token
Tag name property.
Character token
Data property.
Comment paragraph start token
No property.
Editorial note start token
No property.
Element token
Local name, namespace, anchor attribute, by attribute, resScheme attribute, resParameter attribute, and content attribute. Default for these properties are null.
Emphasis token
No property.
Empty line token
No property.
End-of-file token
No property.
Form token
Name, id, and parameters properties.
Heading start token
Depth property.
Heading end token
No property.
Inline start tag token
Tag name, classes, and language properties. Default for these properties is null.
Inline middle tag token
language property, whose default is null.
Inline end tag token
Anchor attribute, resScheme attribute, and resParameter attribute properties. Default for these properties is null.
Labeled list start token
No property.
Labeled list middle token
No property.
List start token
Depth property.
Preformatted start token
No property.
Preformatted end token
No property.
Quotation start token
Depth property.
Strong token
No property.
Table row start token
No property.
Table row end token
No property.
Table cell start token
Header property.
Table cell end token
No property.
Table colspan cell token
No property.
Block element token
classes property, whose default is null.

Mode is a state of the tokenizer and is one of "initial" (the initial value used when the tokenization starts), "body", "preformatted", "preformatted block", and "image data".

Continuous line flag is another flag of the tokenizer, representing whether a new line character should be appended to the data, and takes either true or false. This flag is mainly used in the "body" mode.

When a line is processed, rules specified in the following subsections is used according to the appropriate mode. Rules below sometimes require the line be reprocessed. In such cases, rules for the appropriate mode MUST be followed with the same line.

4.1.1 The "initial" mode

In the "initial" mode, line MUST be processed as follows:

If line starts with #?
Parse a magic line line.
Otherwise
  1. Set the continuous line flag to false.
  2. Switch to the "body" mode and reprocess line.

4.1.2 The "body" mode

In the "body" mode, line MUST be processed as follows:

If line is empty
  1. Set the continuous line flag to false.
  2. Emit an empty line token.
If line starts with a white space character
  1. Emit a preformatted start token.
  2. Run the algorithm to tokenize a text with line.
  3. Switch to the "preformatted" mode.
If line starts with *
  1. Let data be line.
  2. Let depth be zero (0).
  3. While the first character of data, if any, is *, run the following substeps:
    1. Increase depth by one (1).
    2. Remove the first character of data. (The removed character will be *.)
  4. Remove white space characters at the beginning of data, if any.
  5. Emit a heading start token whose depth set to depth.
  6. Run the algorithm to tokenize a text with data.
  7. Emit a heading end token.
  8. Finally, set the continuous line flag to false.
If line is a string consists of -*-*-, optionally followed by a class specification, followed by zero or more white space characters
  1. Let classes be the body of the class specification in the matched substring of data, if any, or null, otherwise.
  2. Emit a block element token whose classes set to classes.
  3. Set the continuous line flag to false.
If line starts with - or =
  1. Let data be line.
  2. Let depth be the empty string.
  3. While the first character of data, if any, is - or =, run the following substeps:
    1. Append the first character of data to depth.
    2. Remove the first character of data.
  4. Remove white space characters at the beginning of data, if any.
  5. Emit a list start token whose depth set to depth.
  6. Run the algorithm to tokenize a text with data.
  7. Finally, set the continuous line flag to true.
If line starts with :
  1. Let name be the empty string.
  2. Let data be line.
  3. Remove the first character of data. (The removed character will be :.)
  4. While data is not empty and the first character of data is not :, run the following substeps:
    1. Append the first character of data to name.
    2. Remove the first character of data.
  5. If name is the empty string, run the following substeps:
    1. Emit a character token whose data is a : character.
    2. Run the algorithm to tokenize a text with name.

    In this case, line does not represent a description list.

  6. Otherwise, run the following substeps:
    1. Remove white space characters at the beginning of name, if any.
    2. Remove white space characters at the end of name, if any.
    3. Emit a labeled list start token.
    4. Run the algorithm to tokenize a text with name.
    5. Remove the first character of data. (The removed character will be :.)
    6. Remove white space characters at the beginning of data, if any.
    7. Emit a labeled list middle token.
    8. Run the algorithm to tokenize a text with data.
  7. Finally, set the continuous line flag to true.
If line starts with >
  1. Let data be line.
  2. Let depth be zero (0).
  3. While the first character of data, if any, is >, run the following substeps:
    1. Increase depth by one (1).
    2. Remove the first character of data. (The removed character will be >.
  4. If depth is two (2), data is not empty, and the first character of data is one of digits, run the following substeps:
    1. Prepend two > characters to data.
    2. If the continuous line flag is true, preprend a U+000A LINE FEED character to data.
    3. Run the algorithm to tokenize a text with data.
    4. Set the continuous line flag to true.
  5. Otherwise, run the following substeps:
    1. Emit a quotation start token whose depth set to depth.
    2. Remove white space characters at the beginning of data, if any.
    3. If the length of data is greater than one (1) and the first two characters of data are @@, run the following substeps:
      1. Remove the first two characters of data. (The removed characters will be @@).
      2. Emit a editorial note start token.
      3. Remove white space characters at the beginning of data, if any.
      4. Set the continuous line flag to true.
    4. If the length of data is greater than one (1) and the first two characters of data are ;;, run the following substeps:
      1. Remove the first two characters of data. (The removed characters will be ;;).
      2. Emit a comment paragraph start token.
      3. Remove white space characters at the beginning of data, if any.
      4. Set the continuous line flag to true.
    5. Otherwise, if data is not empty, set the continuous line flag to true.
    6. Otherwise, set the continuous line flag to false.
    7. In any case, run the algorithm to tokenize a text with data.
If line is a string consist of a [ character, followed by a section block tag name, optionally followed by class specification, followed by a [ character, followed by zero or more white space characters, followed by zero or more characters
  1. Emit a block start tag token whose tag name is the section block tag name, and classes is the body of the class specification, if any, or null otherwise.
  2. Remove the substring of line, from the beginning of the string, to the [ character after the section block tag name and class specification (if any), from line.
  3. Remove white space characters at the beginning of line, if any.
  4. If line is not the empty string:
    1. Set tag name to FIGCAPTION.
    2. If the the section block tag name is TALK, set tag name to SPEAKER.
    3. Emit a block start tag token whose tag name is tag name.
    4. Run the algorithm to tokenize a text with line.
    5. Emit a block end tag token whose tag name is tag name.
  5. Set the continuous line flag to false.
If line is a string consist of [PRE, optionally followed by class specification, followed by a [ character, followed by zero or more white space characters
  1. Emit a block start tag token whose tag name is PRE and classes is the body of the class specification, if any, or null otherwise.
  2. Set the continuous line flag to false.
  3. Switch to the "preformatted block" mode.
If line starts with @@
  1. Let data be line.
  2. Remove the first two characters of data. (The removed characters will be @@.)
  3. Remove white space characters at the beginning of data, if any.
  4. Emit a editorial note start token.
  5. Run the algorithm to tokenize a text with data.
  6. Set the continuous line flag to true.
If line starts with ;;
  1. Let data be line.
  2. Remove the first two characters of data. (The removed characters will be ;;.)
  3. Remove white space characters at the beginning of data, if any.
  4. Emit a comment paragraph start token.
  5. Run the algorithm to tokenize a text with data.
  6. Set the continuous line flag to true.
If line is a string consist of a ] character, followed by a section block tag name, followed by a ] character, followed by zero or more white space characters
  1. Emit a block end tag token whose tag name is the section block tag name.
  2. Set the continuous line flag to false.
If line starts with ,
  1. Run the algorithm to tokenize a table row with line.
  2. Set the continuous line flag to false.
If line is __IMAGE__
Switch to the "image data" mode.
Otherwise
  1. If the continuous line flag is true, emit a character token whose data is a U+000A LINE FEED character.
  2. Run the algorithm to tokenize a text with line.
  3. Set the continuous line flag to true.

4.1.3 The "preformatted" mode

In the "preformatted" mode, line MUST be processed as follows:

If line is the empty string
  1. Emit a preformatted end token.
  2. Switch to the "body" mode and reprocess line.
If line is a string consist of a ] character, followed by a section block tag name, followed by a ] character, followed by zero or more white space characters
  1. Emit a preformatted end token.
  2. Emit a block end tag token whose tag name is the section block tag name.
  3. Set the continuous line flag to false.
  4. Switch to the "body" mode.
Otherwise
  1. Emit a character token whose data is a U+000A LINE FEED character.
  2. Run the algorithm to tokenize a text with line.

4.1.4 The "preformatted block" mode

In the "preformatted block" mode, line MUST be processed as follows:

If line is a string consist of ]PRE] followed by zero or more white space characters
  1. Emit a block end tag token whose tag name is PRE.
  2. Set the continuous line flag to false.
  3. Switch to the "body" mode.
Otherwise
  1. If the continuous line flag is true, emit a character token whose data is a U+000A LINE FEED character.
  2. Run the algorithm to tokenize a text with line.
  3. Set continuous line flag to true.

4.1.5 The "image data" mode

In the "image data" mode, line MUST be processed as follows:

  1. If the image element is null, then create an image element in the SuikaWiki/0.9 namespace and set the image element to that element. Append the image element to the document element.
  2. Otherwise, append a character U+000A LINE FEED to the image element.
  3. Then, append each character in line in the same order to the image element.

4.2 Tokenization of a table row

The algorithm to tokenize a table row data is as follows:

  1. Let pos be zero (0). It represents the index in data. The index of the first character in data is zero (0).
  2. Emit a table row start token.
  3. LOOP: If pos is greater than or equal to the length of data, emit a table row end token and abort this algorithm.
  4. Increase pos by one (1).
  5. Let cell be the empty string.
  6. Let cell quoted be null.
  7. If pos is greater than or equal to the length of data, emit a table row end token and abort this algorithm.
  8. If the posth character in data is a white space character, increase pos by one (1) and go back to the previous step.
  9. If the posth character in data is a * character, set the header cell flag and increase pos by one (1).
  10. If the posth character in data is ", set cell quoted to the empty string and follow the substeps below:
    1. Increase pos by one (1).
    2. If pos is greater than or equal to the length of data, abort these substeps.
    3. Otherwise, if the posth character in data is ", abort these substeps.
    4. Otherwise, if the posth character in data is \, follow the substeps below:
      1. Increase pos by one (1).
      2. If pos is greater than or equal to the length of data, abort these substeps.
      3. Otherwise, append the posth character in data to cell quoted.
    5. Otherwise, append the posth character in data to cell quoted.
    6. Go back to the first substep in these substeps.
  11. While pos is less than the length of data, run the following substeps:
    1. If the posth character in data is ,, abort these substeps.
    2. Append the posth character in data to cell.
    3. Increase pos by one (1).
  12. Remove white space characters at the end of data, if any.
  13. If header cell flag is not set, cell quoted is null, and cell is equal to ==, then emit a table colspan cell token and go back to the step labeled LOOP.
  14. Emit a table cell start token whose header is whether header cell flag is set or not.
  15. If cell quoted is not null, run the algorithm to tokenize a text with cell quoted.
  16. Run the algorithm to tokenize a text with cell.
  17. Emit a table cell end token.
  18. Go back to the step labeled LOOP.

4.3 Tokenization of a text

The algorithm to tokenize a text data is as follows:

  1. Let nest level be zero (0).
  2. If data begins with [ followed by one or more digits followed by ], run the following steps:
    1. Let number be the digits in the matched substring.
    2. Remove the matched substring frm data.
    3. Emit an element token whose local name is anchor-end, namespace is the SuikaWiki/0.9 namespace, anchor attribute is number, and content is [ followed by number followed by ].
  3. While the length of data is not zero (0), run the appropriate steps:
    If data begins with [[#, followed by one or more lowercase letters or U+002D HYPHEN-MINUS
    1. Let name be the lowercase letters and U+002D HYPHEN-MINUS in the matched substring.
    2. Remove the matched substring from data.
    3. Let id be null.
    4. Let parameters be an empty list.
    5. If data begins with a class specification, run the following substeps:
      1. Set the id to the body of the class specification.
      2. Remove the class specification from data.
    6. While the first character of data is :, run the following substeps:
      1. Remove the first character of data.
      2. If the length of data is greater than one (1) and the first two characters of data are ]], abort these substeps.
      3. Let parameter be the empty string.
      4. If data is empty, append parameter to parameters and abort these substeps.
      5. If the first character of data is ', run the following steps:
        1. Remove the first character of data.
        2. If data is empty, abort these substeps.
        3. If the first character of data is ', abort these substeps.
        4. If the first character of data is \, run the following substeps:
          1. Remove the first character of data.
          2. If data is empty, abort these substeps.
          3. Append the first character of data to parameter.
        5. Otherwise, append the first character of data to parameter.
        6. Go back to the first substep in these substeps.
      6. Otherwise, run the following steps:
        1. If data is empty, or if the first character of data is :, abort these substeps.
        2. Append the first character of data to parameter.
        3. Remove the first character of data.
        4. Go back to the first substep of these substeps.
      7. Append parameter to parameters.
    7. If the length of data is greater than one (1) and the first two characters of data are ]], remove these characters from data.
    8. Emit a form token whose name is name, id is id, and parameters is parameters.
    Otherwise, if the data begins with [[
    1. Remove the matched substring from data.
    2. Emit an inline start tag token.
    3. Increase nest level by one (1).
    If data begins with [, followed by one or more uppercase letters, optionally followed by a class specification, optionally followed by a language specification, followed by [
    1. Let tag name be the uppercase letters in the matched substring of data.
    2. Let classes be the body of the class specification in the matched substring of data, if any, or null, otherwise.
    3. Let language be the body of the language specification in the matched substring of data, if any, or null, otherwise.
    4. Remove the matched substring from data.
    5. Emit an inline start tag token whose tag name is tag name, classes is classes, and language is language.
    6. Increase nest level by one (1).
    If data begins with ]]
    1. Remove the matched substring from data.
    2. Emit an inline end tag token.
    3. If nest level is greater than zero (0), decrease nest level by one (1).
    If data begins with ]<, followed by one or more scheme characters, followed by :
    1. Remove the matched substring from data and then act as if the first two character of the original data before the removal were < instead of ]<, except that the emitted token is an inline end tag token instead of an element token. The resScheme attribute of the token MUST be the resScheme attribute of the token that would be emitted if the first two character were <. The resParameter attribute of the token MUST be the resParameter attribute of the token that would be emitted if the first two character were <.
    2. If data begins with ], remove the character from data.
    3. If nest level is greater than zero (0), decrease nest level by one (1).
    If data begins with ]>> followed by one or more digits, followed by ]
    1. Let number be the digits in the matched substring.
    2. Remove the matched substring from data.
    3. Emit an inline end tag token whose anchor is number.
    4. If nest level is greater than zero (0), decrease nest level by one (1).
    If nest level is greater than zero (0) and data begins with ] followed by zero or more white space characters followed by [
    If nest level is greater than zero (0) and data begins with ] followed by zero or more white space characters followed by a language specification followed by [
    1. Let lang be the body of the language specification in the matched substring of data, if any, or null, otherwise.
    2. Remove the matched substring from data.
    3. Emit an inline middle tag token whose language is lang.
    If data begins with <, followed by one or more scheme characters, followed by :
    1. Let scheme be the scheme characters part of the matched substring.
    2. Remove the matched substring from data.
    3. Let value be the empty string.
    4. Run the following steps:
      1. If data is empty, abort these steps.
      2. If the first character of data is >, remove the first character of data and abort these steps.
      3. If the first character of data is ", append " to data and run the following substeps:
        1. Remove the first character of data.
        2. If data is empty, abort these steps.
        3. If the first character of data is ", append " to value, remove the first character of data, and abort these substeps.
        4. If the first character of data is \, run the following substeps:
          1. Append \ to value.
          2. Remove the first character of data.
          3. If data is empty, abort these steps.
          4. Append the first character of data to value.
        5. Otherwise, append the first character of data to value.
        6. Return back to the first substep of these substeps.
      4. Otherwise, run the following substeps:
        1. Append the first character of data to value.
        2. Remove the first character of data.
      5. Go back to the first substeps in these substeps.
    5. Let content be scheme followed by : followed by value.
    6. If scheme does not contain one of uppercase letters, set value to content and set scheme to URI.
    7. Emit an element token whose local name is anchor-external, namespace is the SuikaWiki/0.9 namespace, resScheme attribute is scheme, resParameter attribute is value, and content is content.
    If data begins with '''
    1. Remove the matched substring from data.
    2. Emit a strong token.
    Otherwise, if data begins with ''
    1. Remove the matched substring from data.
    2. Emit an emphasis token.
    If data begins with >> followed by one or more digits
    1. Emit an element token whose local name is anchor-internal, namespace is the SuikaWiki/0.9 namespace, anchor attribute is the digits part of the matched substring, and content is the matched substring.
    2. Remove the matched substring from data.
    If data begins with __&&
    1. Remove the matched substring from data.
    2. If data begins with &&__, or if data does not contain &&__ as a substring:
      1. Emit four character tokens whose data are _, _, &, and &, respectively.
      2. Remove the first four characters from data.
    3. Otherwise:
      1. Let name be the substring of data, from the start of the string to the first occurence of &&__, exclusive.
      2. Remove the substring of data, from the start of the string to the first occurence of &&__, inclusive, from data.
      3. If Character Reference Table [name] exists:
        1. Emit a character token whose data is Character Reference Table [name].
      4. Otherwise:
        1. Emit an element token whose local name is replace, namespace is the SuikaWiki/0.9 namespace, by attribute is name.
    Otherwise
    1. Emit a character token whose data set to the first character of data.
    2. Remove the first character of data.

4.4 Parsing a magic line

To parse a magic line data, the following steps MUST be used:

  1. Remove the first two characters of data. (It will be #?.)
  2. If there are one or more characters that are not white space characters at the beginning of data, run the following substeps:
    1. Let name be those characters.
    2. Let version be null.
    3. Remove those characters from data.
    4. If name contains /, set the substring after the first occurence of the character to version. Note that version might become the empty string. Remove the / character and the substring after the character from name.
    5. Set the Name content attribute of the document element in the SuikaWiki/0.9 namespace to name.
    6. If version is not null, set the Version content attribute of the document element in the SuikaWiki/0.9 namespace to version.
  3. Run the following substeps:
    1. If data is empty, abort these substeps.
    2. If the first character of data is a white space character, remove the character from data and go back to the first substep of these substeps.
    3. Let name be the empty string.
    4. If data begins with one or more characters that are not =, set name to those characters and remove those characters from data.
    5. Let parameter be a newly created parameter element in the SuikaWiki/0.9 namespace and set the name content attribute of parameter to name.
    6. Remove the first character of data. (It will be =.)
    7. If the first character of data, if any, is ", remove that character from data.
    8. Run the following substeps:
      1. Let value be the empty string.
      2. If data is empty, or if the first character of data is ", create a value element in the SuikaWiki/0.9 namespace, set the textContent IDL attribute of the node to value, and append the node to parameter.
      3. Otherwise, if the first character of data, if any, is \, run the following substeps:
        1. Remove the first character of data. (The removed character will be \.)
        2. If the first character of data, if any, is ,, abort these substeps.
        3. Otherwise, append the first character of data, if any, to value.
      4. In any case, if the first character of data is ,, create a value element in the SuikaWiki/0.9 namespace, set the textContent IDL attribute of the node to value, append the node to parameter, and go back to the first substep of these substeps.
      5. Otherwise, append the first character of data to value.
      6. Go back to the second substep of these substeps.
    9. If the first character of data, if any, is ", remove that character from data.
    10. Append parameter to the head element.
    11. Go back to the first substep of these substeps.

4.5 Tree construction

The tree construction stage constructs a node tree from a series of tokens emitted by the tokenization stage. The tree construction stage has two state variables: insertion mode and stack of open elements.

The insertion mode is one of "in section", "in table row", or "in paragraph". The default that MUST be used when the tree construction begins is the "in section" insertion mode. The rules for these insertion modes are described in the subsections below.

When the algorithm below says that the parser is to do something “using the rules for the m insertion mode”, the parser MUST use the rules described under the m insertion mode's section, but MUST leave the insertion mode unchanged.

The stack of open elements contains tuples of (element node, section depth, quotation depth, list depth). These stack grows downdards; the topmost entry on the stack is the first one added to the stack, and the bottommost entry of the stack is the most recently added entry in the stack. It initially contains only a tuple: (the body element, 0, 0, 0). When an entry is pushed to the stack of open elements, the items of the new tupple is set to the same values as the bottommost tuple unless otherwise specified.

The current element is the element node of the bottommost entry in the stack of open elements.


To process classes with element element and string classes, run these steps:

  1. Let class list be an empty list.
  2. Let itemprop list be an empty list.
  3. Let tokens be the result of splitting classes on ASCII whitespace.
  4. For each token in tokens:
    1. If token starts with #:
      1. Let id be token.
      2. Remove the first character from id.
      3. If element does not have an attribute whose namespace is null and local name is id:
        1. Set an attribute value for element with id and id.
    2. Otherwise, if token starts with .:
      1. Let itemprop be token.
      2. Remove the first character from itemprop.
      3. Append itemprop to itemprop list.
    3. Otherwise:
      1. Append token to class list.
  5. If class list is not empty:
    1. Let class be the concatenation of class list with a U+0020 SPACE character.
    2. Set an attribute value for element with class and class.
  6. If itemprop list is not empty:
    1. Let itemprop be the concatenation of itemprop list with a U+0020 SPACE character.
    2. Set an attribute value for element with itemprop and itemprop.

4.5.1 The "in section" insertion mode

In the "in section" insertion mode, a token MUST be processed as follows:

A heading start token
  1. If the local name of the current element is not one of body, section, and block elements, then pop the element off the stack of open elements and follow this substep again.
  2. Let current depth be the section depth of the bottommost entry in the stack of open elements.
  3. If depth of the token is less than or equal to the current depth, pop the element off the stack of open elements and go back to the first substep of these substeps.
  4. Otherwise, if depth of the token is greater than current depth + 1, create a section element in the HTML namespace, append the element created to the current element, push the element created to the stack of open elements with section depth set to current depth + 1, quotation depth set to zero (0), and list depth set to zero (0), and go back to the first substep of these substeps.
  5. Create a section element in the HTML namespace.
  6. Append the element created to the current element.
  7. Push the element created to the stack of open elements with section depth set to depth, quotation depth set to zero (0), and list depth set to zero (0).
  8. Create a h1 element in the HTML namespace.
  9. Append the element created to the current element.
  10. Push the element created to the stack of open elements.
  11. Switch to the "in paragraph" insertion mode.
A block start tag token whose tag name is not PRE
  1. If the token's tag name is TALK:
    1. If the current element's local name is not dialogue:
      1. Let element be a dialogue element in the SuikaWiki/0.9 namespace.
      2. Append element created to the current element.
      3. Push element to the stack of open elements.
    2. Otherwise:
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Let row be the table row of the Block Element Table whose tag name is the token's tag name.
      3. Let element be the result of creating an element whose namespace is row's namespace and whose local name is row's namespace.
      4. Append element to the current element.
      5. Push element to the stack of open elements with section depth set to zero (0), quotation depth set to zero (0), and list depth set to zero (0).
      6. Run the steps to process classes with element and token's classes.
      A block end tag token whose tag name is not PRE
      1. Let row be the table row of the Block Element Table whose tag name is the token's tag name.
      2. Let local name be row's local name.
      3. If the stack of open elements contains an element whose local name is local name, pop the current element off the stack of open elements until an element whose local name is local name has been popped from the stack of open elements.
      4. Set the continuous line to false.
      A block element token
      1. Let element be the result of creating an hr element in the HTML namespace.
      2. Append element to the current element.
      3. Run the steps to process classes with element and token's classes.
      A quotation start token
      1. If the local name of the current element is not one of blockquote, body, section, and block elements, then pop the element off the stack of open elements and follow this substep again.
      2. Let current depth be the quotation depth of the bottommost entry in the stack of open elements.
      3. If depth of the token is less than the current depth, pop the element off the stack of open elements and go back to the first substep of these substeps.
      4. Otherwise, if depth of the token is greater than current depth, create a blockquote element in the HTML namespace, append the element created to the current element, push the element created to the stack of open elements with section depth set to zero (0), quotation depth set to current depth + 1, and list depth set to zero (0), and go back to the first substep of these substeps.
      A list start token
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Let current depth be the list depth of the current element.
      3. Let inserted depth be the length of depth of the token.
      4. Let local name be ul, if the last character in depth is -, or ol, otherwise.
      5. If current depth is greater than inserted depth, pop the current element off the stack of open elements and go back to the first substep of these substeps.
      6. If the list depth of the current element is equal to inserted depth and the local name of the current element is not local name, pop the current element off the stack of open elements and go back to the first substep of these substeps.
      7. If current depth is less than inserted depth, run the following substeps:
        1. Let type be the character at the index equal to current depth in depth of the token, where the index of the first character in depth is zero (0).
        2. If type is -, create a ul element in the HTML namespace.
        3. Otherwise, create a ol element in the HTML namespace.
        4. Append the element created to the current element.
        5. Push the element created to the stack of open elements, with list depth set to current depth + 1.
        6. If current depth + 1 is less than inserted depth, run the following substeps:
          1. Create a li element in the HTML namespace.
          2. Append the element created to the current element.
          3. Push the element created to the stack of open elements.
        7. Go back to the first substep for the list start token.
      8. Create a li element in the HTML namespace.
      9. Append the element created to the current element.
      10. Push the element created to the stack of open elements.
      11. Switch to the "in paragraph" insertion mode.
      A labeled list start token
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. If the local name of the current element is dd, pop the element off the stack of open elements.
      3. If the local name of the current element is not dl, create a dl element in the HTML namespace, append the element created to the current element, and push the element created to the stack of open elements.
      4. Create a dt element in the HTML namespace.
      5. Append the element created to the current element.
      6. Push the element created to the stack of open elements.
      7. Switch to the "in paragraph" insertion mode.
      A table row start token
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Create a table element in the HTML namespace.
      3. Append the element created to the current element.
      4. Push the element created to the stack of open elements.
      5. Create a tbody element in the HTML namespace.
      6. Append the element created to the current element.
      7. Push the element created to the stack of open elements.
      8. Create a tr element in the HTML namespace.
      9. Append the element created to the current element.
      10. Push the element created to the stack of open elements.
      11. Switch to the "in table row" insertion mode.
      A block start tag token whose tag name is PRE
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Let element be the result of creating a pre element in the HTML namespace.
      3. Append element to the current element.
      4. Push element to the stack of open elements.
      5. Run the steps to process classes with element and token's classes.
      6. Switch to the "in paragraph" insertion mode.
      A preformatted start token
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Create a pre element in the HTML namespace.
      3. Append the element created to the current element.
      4. Push the element created to the stack of open elements.
      5. Switch to the "in paragraph" insertion mode.
      A comment paragraph start token
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Create a comment-p element in the SuikaWiki/0.10 namespace.
      3. Append the element created to the current element.
      4. Push the element created to the stack of open elements.
      5. Switch to the "in paragraph" insertion mode.
      A editorial note start token
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. Create a ed element in the SuikaWiki/0.10 namespace.
      3. Append the element created to the current element.
      4. Push the element created to the stack of open elements.
      5. Switch to the "in paragraph" insertion mode.
      An empty line token
      1. If the current element's local name is not one of body, section, dialogue, and block elements, then pop the element off the stack of open elements and follow this substep again.
      A form token
      An element token whose local name is replace
      Process the token using the rules for the "in paragraph" insertion mode.
      An end-of-file token
      Now the Document has been constructed. Abort the parser.
      Any other block start tag token
      A labeled list middle token, heading end token, preformatted end token, table row end token, table cell start token, table cell end token, or table colspan cell token
      Ignore the token.
      Anything else
      1. If the current element's local name is dialogue, pop the current element off the stack of open elements.
      2. If the current element's local name is not one of p, li, dd, figcaption, comment-p, or ed, or if the current element's local name is figcaption or speaker and the current element has any child, run the following substeps:
        1. Create a p element in the HTML namespace.
        2. Append the element created to the current element.
        3. Push the element created to the stack of open elements.
      3. Switch to the "in paragraph" insertion mode and reprocess the token.

4.5.2 The "in table row" insertion mode

In the "in table row" insertion mode, a token MUST be processed as follows:

A table cell start token
  1. Let local name be th if the header of the token is true, or td otherwise.
  2. Create a local name element in the HTML namespace.
  3. Append the element created to the current element.
  4. Push the element created to the stack of open elements.
  5. Switch to the "in paragraph" insertion mode.
A table colspan cell token
  1. If the local name of the node returned by the lastChild IDL attribute of the current element, if any, is td or th, increase the value of colspan IDL attribute of the node by one (1) and abort these substeps.
  2. Create a td element in the HTML namespace.
  3. Append the element created to the current element.
A table row end token
If the local name of the current element is tr, pop the element off the stack of open elements.
A table row start token
  1. Create a tr element in the HTML namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
Anything else
Switch to the "in section" insertion mode and reprocess the token.

4.5.3 The "in paragraph" insertion mode

In the "in paragraph" insertion mode, a token MUST be processed as follows:

A character token
Append the character in data of the token to the current element.
An inline start tag token whose tag name is null
  1. Create an anchor element in the SuikaWiki/0.9 namespace.
  2. Append the element created to the current element.
  3. Push the element created to the stack of open elements.
Any other inline start tag token
  1. Let element be the result of creating an element. The namespace and local name of element is determined according to the tag name of the inline start tag token as shown in the following table:
    Tag name Namespace Local name
    AA The AA namespace aa
    ABBR The HTML namespace abbr
    ASIS The SuikaWiki/0.9 namespace asis
    B The HTML namespace b
    BR The SuikaWiki/0.9 namespace sw-br
    CC The SuikaWiki/0.9 namespace sw-cc
    CH The SuikaWiki/0.9 namespace sw-ch
    CITE The HTML namespace cite
    CN The SuikaWiki/0.9 namespace sw-cn
    CODE The HTML namespace code
    CURSIVE The SuikaWiki/0.9 namespace sw-cursive
    CSECTION The SuikaWiki/0.10 namespace csection
    DEL The HTML namespace del
    DATA The HTML namespace data
    DFN The HTML namespace dfn
    DOTABOVE The SuikaWiki/0.9 namespace dotabove
    EMPH The SuikaWiki/0.9 namespace emph
    F The SuikaWiki/0.9 namespace f
    FENCED The SuikaWiki/0.9 namespace fenced
    FRAC The MathML namespace mfrac
    I The HTML namespace i
    INS The HTML namespace ins
    KBD The HTML namespace kbd
    KEY The SuikaWiki/0.10 namespace key
    L The SuikaWiki/0.9 namespace sw-l
    LAT The SuikaWiki/0.9 namespace lat
    LINES The SuikaWiki/0.9 namespace lines
    LON The SuikaWiki/0.9 namespace lon
    LT The SuikaWiki/0.9 namespace sw-lt
    MACRON The SuikaWiki/0.9 namespace sw-macron
    MAY The SuikaWiki/0.9 namespace MAY
    MIRRORED The SuikaWiki/0.9 namespace sw-mirrored
    MUST The SuikaWiki/0.9 namespace MUST
    N The SuikaWiki/0.9 namespace n
    OKURI The SuikaWiki/0.9 namespace okuri
    Q The HTML namespace q
    QN The SuikaWiki/0.10 namespace qn
    R The SuikaWiki/0.9 namespace sw-r
    ROOT The MathML namespace mroot
    RT The SuikaWiki/0.9 namespace sw-rt
    RUBY The HTML namespace ruby
    RUBYB The SuikaWiki/0.9 namespace rubyb
    SAMP The HTML namespace samp
    SEE The SuikaWiki/0.9 namespace sw-see
    SHOULD The SuikaWiki/0.9 namespace SHOULD
    SMALLCAPS The SuikaWiki/0.9 namespace smallcaps
    SNIP The SuikaWiki/0.9 namespace snip
    SPAN The HTML namespace span
    SQRT The MathML namespace msqrt
    SRC The SuikaWiki/0.10 namespace src
    SUB The HTML namespace sub
    SUBSUP The SuikaWiki/0.9 namespace subsup
    SUP The HTML namespace sup
    TATE The SuikaWiki/0.9 namespace sw-tate
    TIME The HTML namespace time
    TZ The SuikaWiki/0.9 namespace tz
    U The HTML namespace u
    UNDER The MathML namespace munder
    UNDEROVER The MathML namespace munderover
    V The SuikaWiki/0.9 namespace sw-v
    VAR The HTML namespace var
    VB The SuikaWiki/0.9 namespace sw-vb
    VBT The SuikaWiki/0.9 namespace sw-vbt
    VECTOR The SuikaWiki/0.9 namespace vector
    VT The SuikaWiki/0.9 namespace sw-vt
    WEAK The SuikaWiki/0.9 namespace weak
    YOKO The SuikaWiki/0.9 namespace yoko
    Anything else The SuikaWiki/0.10 namespace Same as tag name
  2. Run the steps to process classes with element and token's classes.
  3. If the token's language is not null, set the lang attribute in the XML namespace of element to language.
  4. Append element to the current element.
  5. Push element to the stack of open elements.
  6. Switch by token's tag name:
    FRAC, ROOT, SQRT, UNDER, or UNDEROVER
    1. Create an mtext element in the MathML namespace.
    2. Append the element created to the current element.
    3. Push the element created to the stack of open elements.
    LINES
    1. Create a line element in the SuikaWiki/0.9 namespace.
    2. Append the element created to the current element.
    3. Push the element created to the stack of open elements.
    SUBSUP
    1. Create a subscript element in the SuikaWiki/0.9 namespace.
    2. Append the element created to the current element.
    3. Push the element created to the stack of open elements.
    FENCED
    1. Create an openfence element in the SuikaWiki/0.9 namespace.
    2. Append the element created to the current element.
    3. Push the element created to the stack of open elements.
    OKURI
    1. Create a rt element in the HTML namespace.
    2. Append the element created to the current element.
    3. Push the element created to the stack of open elements.
A inline middle tag token
  1. Let local name be title.
  2. Let namespace be the SuikaWiki/0.10 namespace.
  3. Switch by the local name of the current element:
    data
    1. Set local name to sw-value.
    2. Set namespace to the SuikaWiki/0.9 namespace.
    rt
    1. Set local name to rt.
    2. Set namespace to the HTML namespace.
    3. Pop the current element off the stack of open elements.
    title, nsuri, tz, n, lat, lon, closefence, superscript, or sw-value
    1. Set local name to attrvalue.
    2. Set namespace to the SuikaWiki/0.10 namespace.
    3. Pop the current element off the stack of open elements.
    qn
    1. Set local name to nsuri.
    2. Set namespace to the SuikaWiki/0.10 namespace.
    ruby or rubyb
    1. Set local name to rt.
    2. Set namespace to the HTML namespace.
    mtext
    1. Set local name to mtext.
    2. Set namespace to the MathML namespace.
    3. Pop the current element off the stack of open elements.
    openfence
    1. Set local name to fencedtext.
    2. Set namespace to the SuikaWiki/0.9 namespace.
    3. Pop the current element off the stack of open elements.
    fencedtext
    1. Set local name to closefence.
    2. Set namespace to the SuikaWiki/0.9 namespace.
    3. Pop the current element off the stack of open elements.
    subscript
    1. Set local name to superscript.
    2. Set namespace to the SuikaWiki/0.9 namespace.
    3. Pop the current element off the stack of open elements.
    line
    1. Set local name to line.
    2. Set namespace to the SuikaWiki/0.9 namespace.
    3. Pop the current element off the stack of open elements.
  4. Create an element whose local name local name in the namespace.
  5. If the token's language is not null, set the lang attribute in the XML namespace of the element created to language.
  6. Append the element created to the current element.
  7. Push the element created to the stack of open elements.
A inline end tag token
  1. If the local name of the current element is one of rt, title, nsuri, mtext, line, subscript, superscript, fencedtext, openfence, closefence, sw-value, or attrvalue, pop the element off the stack of open elements.
  2. If the current element is one of structural elements, or if the local name of the current element is strong or em, run the following substeps:
    1. If both resScheme attribute and anchor attribute of the token are null, append characters ]] to the current element, push the current element to the stack of open elements, and abort these substeps.

      As a result, the bottommost and second bottommost entries becomes equal, but one of them is popped from the stack of open elements soon.

    2. If resScheme attribute of the token is not null, create an anchor-external element in the SuikaWiki/0.9 namespace.
    3. Otherwise, create a anchor-internal element in the SuikaWiki/0.9 namespace.
    4. Append the element created to the current element.
    5. Set the textContent IDL attribute of the element created to ]].
    6. Push the element created to the stack of open elements.
  3. If anchor attribute of the token is not null, set the anchor content attribute in the SuikaWiki/0.9 namespace of the current element to anchor attribute of the token.
  4. If resScheme attribute of the token is not null, set the resScheme content attribute in the SuikaWiki/0.9 namespace of the current element to resScheme attribute of the token.
  5. If resParameter attribute of the token is not null, set the resParameter content attribute in the SuikaWiki/0.9 namespace of the current element to resParameter attribute of the token.
  6. Pop the current element off the stack of open elements.
A strong token
  1. If the local name of the current element is strong, pop the element off the stack of open elements and abort these substeps.
  2. Create a strong element in the HTML namespace.
  3. Append the element created to the current element.
  4. Push the element created to the stack of open elements.
An emphasis token
  1. If the local name of the current element is em, pop the element off the stack of open elements and abort these substeps.
  2. Create an em element in the HTML namespace.
  3. Append the element created to the current element.
  4. Push the element created to the stack of open elements.
A form token whose name is form
  1. Create a form element in the SuikaWiki/0.9 namespace.
  2. If id of the form token is not null, set the id attribute of the element created to id of the form token.
  3. Set the input content attribute of the element created to the first item in parameters of the form token, if any, or the empty string otherwise.
  4. Set the template content attribute of the element created to the second item in parameters of the form token, if any, or the empty string otherwise.
  5. Set the option content attribute of the element created to the third item in parameters of the form token, if any, or the empty string otherwise.
  6. If the parameters contains four or more items, set the parameter content attribute of the element created to the concatenation of items in parameters, separated by a : character, in the same order.
  7. Append the element created to the current element.
Any other form token
  1. Create a form element in the SuikaWiki/0.9 namespace.
  2. Set the ref content attribute of the element created to name of the form token.
  3. If the id of the form token is not null, set the id attribute of the element created to id of the form token.
  4. If parameters of form token is not empty, set the parameter content attribute of the element created to the concatenation of items in parameters, separated by a : character, in the same order. The result value might be the empty string.
  5. Append the element created to the current element.
An element token
  1. Create an element whose local name is local name of the element token and namespace is namespace of the element token.
  2. If anchor attribute of the element token is not null, set the anchor content attribute in the SuikaWiki/0.9 namespace of the element created to anchor attribute of the element token.
  3. If by attribute of the element token is not null, set the by content attribute of the element created to by attribute of the element token.
  4. If resScheme attribute of the element token is not null, set the resScheme content attribute in the SuikaWiki/0.9 namespace of the element created to resScheme attribute of the element token.
  5. If resParameter attribute of the element token is not null, set the resParameter content attribute in the SuikaWiki/0.9 namespace of the element created to resParameter attribute of the element token.
  6. If content of the element token is not null, set the textContent IDL attribute of the element created to content of the element token.
  7. Append the element created to the stack of open elements.
A labeled list middle token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is dt, pop the element off the stack of open elements.
  3. Create a dd element in the HTML namespace.
  4. Append the element created to the current element.
  5. Push the element created to the stack of open elements.
A heading end token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is h1, pop the element off the stack of open elements.
  3. Switch to the "in section" insertion mode.
A table cell end token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is td or th, pop the element off the stack of open elements.
  3. Switch to the "in table row" insertion mode.
A block end tag token whose tag name is PRE
A preformatted end token
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. If the local name of the current element is pre, pop the element off the stack of open elements.
  3. Switch to the "in section" insertion mode.
Anything else
  1. If the current element is not one of structural elements, pop the element off the stack of open elements and follow this substep again.
  2. Switch to the "in section" insertion mode and reprocess the token.

5 Serializing SWML text serialization documents

...

6 Element definitions for the SWML text serialization

The following ordered map known as Character Reference Table is referenced from the parser:

KeyValue
##
&&
''
**
--
::
<<
==
>>
__
[[
]]

The following Block Element Table is referenced from the parser:

Tag name Local name Namespace Semantics (non-normative) Semantics of start tag's line contents (non-normative)
BOX box SuikaWiki/0.9 namespace A group of page content. Not allowed.
DEL delete SuikaWiki/0.9 namespace Removal. Not allowed.
EG example SuikaWiki/0.9 namespace Example. Not allowed.
FIG figure HTML namespace Figure. Figure caption.
FIGCAPTION figcaption HTML namespace Figure caption. Not allowed.
HISTORY history SuikaWiki/0.9 namespace Historical notes. Not allowed.
INS insert SuikaWiki/0.9 namespace Insertion. Not allowed.
ITEMS sw-items SuikaWiki/0.9 namespace Items. Item types.
ITEMTYPES sw-itemtypes SuikaWiki/0.9 namespace Item types. Not allowed.
LEFT sw-left SuikaWiki/0.9 namespace A left-to-right defaulted horizontal content where lines are stacked top-to-bottom. Not allowed.
LEFTBOX sw-leftbox SuikaWiki/0.9 namespace A box for left-to-right defaulted horizontal content where lines are stacked top-to-bottom. Not allowed.
LEFTBTBOX sw-leftbtbox SuikaWiki/0.9 namespace A box for left-to-right defaulted horizontal content where lines are stacked bottom-to-top. Not allowed.
NOTE note HTML3 namespace Note. Not allowed.
POSTAMBLE postamble SuikaWiki/0.9 namespace Postamble. Not allowed.
PREAMBLE preamble SuikaWiki/0.9 namespace Preamble. Not allowed.
REFS refs SuikaWiki/0.9 namespace References and quotations. Not allowed.
RIGHT sw-right SuikaWiki/0.9 namespace A right-to-left defaulted horizontal content where lines are stacked top-to-bottom. Not allowed.
RIGHTBOX sw-rightbox SuikaWiki/0.9 namespace A box for right-to-left defaulted horizontal content where lines are stacked top-to-bottom. Not allowed.
RIGHTBTBOX sw-rightbtbox SuikaWiki/0.9 namespace A box for right-to-left defaulted horizontal content where lines are stacked bottom-to-top. Not allowed.
SPEAKER speaker SuikaWiki/0.9 namespace Speaker name of a talk. Not allowed.
TALK talk SuikaWiki/0.9 namespace A single talk part in a dialogue. Speaker name of the talk.
VLR sw-vlr SuikaWiki/0.9 namespace A top-to-bottom defaulted vertical content where lines are stacked left-to-right. Not allowed.
VLRBOX sw-vlrbox SuikaWiki/0.9 namespace A box for top-to-bottom defaulted vertical content where lines are stacked left-to-right. Not allowed.
VRL sw-vrl SuikaWiki/0.9 namespace A top-to-bottom defaulted vertical content where lines are stacked right-to-left. Not allowed.
VRLBOX sw-vrlbox SuikaWiki/0.9 namespace A box for top-to-bottom defaulted vertical content where lines are stacked right-to-left. Not allowed.

A future revison to this specification might define more tag names.

The semantics of an element is formally defined in terms of corresponding DOM elements.

Block elements are elements whose local name is one of local names in the Block Element Table.

Structural elements are block elements and elements whose local name is one of body, section, blockquote, h1, ul, ol, dl, li, dt, dd, table, tbody, th, tr, td, p, comment-p, ed, box, and pre.

These definitions are referenced from the parser. The parser does not have to check elements' namespaces.

7 The SWML XML serialization

A document using SWML elements and attributes can be serialized using the XML syntax.

8 SWML MIME types

An SWML MIME type is any MIME type whose essence is text/x-suikawiki or text/x.suikawiki.image.

The SWML text serialization can be identified by an SWML MIME type. However, MIME type text/x.suikawiki.image MUST NOT be used.

An implementation that recognize any SWML MIME type MUST interpret all the SWML MIME types as equivalent.

Historically, a document whose format name is SuikaWiki was expected to be labeled as text/x-suikawiki, while a document whose format name is SuikaWikiImage was expected to be labeled as text/x.suikawiki.image.


If a string is labeled by an SWML MIME type, it MUST be an SWML text serialization document.

If a byte sequence is labeled by an SWML MIME type, it MUST be an SWML text serialization document, encoded in UTF-8.

When a string labeled by an SWML MIME type is processed, the parser MUST be used with the string as the input.

When a byte sequence labeled by an SWML MIME type is processed, the parser MUST be used with the UTF-8 decoded I/O-queue-coverted byte sequence as the input.


An SWML MIME type MAY have a charset parameter, unless it is not allowed (e.g. when it is used to label a string). If specified, its value MUST be UTF-8, ASCII case-insensitive.

An SWML MIME type MUST NOT have any other parameter.

An implementation that recognize any SWML MIME type MUST ignore any parameter (including the charset parameter).

Historically, there was the version parameter whose value is 0.9 or 0.10. It was used to encode format version in the magic line.

fragment identifier


The SWML XML serialization has no dedicated MIME type. It can be identified by any XML MIME type.

9 Semantics of elements and attributes

This specification is the specification for the SuikaWiki/0.9 namespace and the SuikaWiki/0.10 namespace. Anything belongging to those namespaces is defined in this specification.

Elements and attributes in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace, as well as attributes in no namespace for elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace, MUST NOT be used in context where they are not allowed explicitly.

Elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace defined in this specification MUST conform to their content model.

Inter-element whitespace, comment nodes, and processing instruction nodes MUST be ignored when establishing whether an element matches its content model or not.

Elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace MAY be orphan nodes (i.e. without a parent node).

In the following subsections, attributes listed in the allowed attributes entry MAY be set to an element described in that subsection.

Some elements belong to categories such as flow content and phrasing content.

9.1 Document structures

9.1.1 The document element in the SuikaWiki/0.9 namespace

Category
None.
Content model
A head element in the XHTML2 namespace, followed by a body element in the XHTML2 namespace, optionally followed by a image element in the SuikaWiki/0.9 namespace.
Allowed attributes
None.

This element MUST NOT be used.

...

9.1.2 The Name attribute in the SuikaWiki/0.9 namespace

This attribute MUST NOT be used.

...

9.1.3 The Version attribute in the SuikaWiki/0.9 namespace

This attribute MUST NOT be used.

...

9.1.4 The parameter element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Zero or more value element in the SuikaWiki/0.9 namespace.
Allowed attributes
name

This element MUST NOT be used.

... name

9.1.5 The value element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Text.
Allowed attributes
None.

This element MUST NOT be used.

...

9.1.6 The class attribute

The class attribute of an element in the SuikaWiki/0.9 namespace or in the SuikaWiki/0.10 namespace has the same semantics and requirements as the class attribute of the HTML elements.

Unless otherwise specified by other applicable specification, the class attribute of an element in the AA namespace, in the HTML3 namespace, or in the HTML3 namespace has the same semantics and requirements as the class attribute of the HTML elements.

9.1.7 The id attribute

The id attribute of an element in the SuikaWiki/0.9 namespace or in the SuikaWiki/0.10 namespace has the same semantics and requirements as the id attribute of the HTML elements.

Unless otherwise specified by other applicable specification, the id attribute of an element in the AA namespace, in the HTML3 namespace, or in the XHTML2 namespace has the same semantics and requirements as the id attribute of the HTML elements.

9.1.8 The itemprop attribute

The itemprop attribute of an element in the SuikaWiki/0.9 namespace or in the SuikaWiki/0.10 namespace has the same semantics and requirements as the itemprop attribute of the HTML elements.

Unless otherwise specified by other applicable specification, the itemprop attribute of an element in the AA namespace, in the HTML3 namespace, or in the XHTML2 namespace has the same semantics and requirements as the itemprop attribute of the HTML elements.

9.1.9 The xml:lang attribute

The lang attribute in the XML namespace (xml:lang) MAY be set to any element in the SuikaWiki/0.9 namespace or in the SuikaWiki/0.10 namespace.

Unless otherwise specified by other applicable specification, the lang attribute in the XML namespace MAY be set to any element in the AA namespace, in the HTML3 namespace, or in the XHTML2 namespace.


The space attribute in the XML namespace (xml:space) has no effect for the elements in the SuikaWiki/0.9 namespace or in the SuikaWiki/0.10 namespace and MUST NOT be used.

Unless otherwise specified by other applicable specification, the space attribute in the XML namespace has no effect for the elements in the AA namespace, in the HTML3 namespace, or in the XHTML2 namespace and MUST NOT be used.

The xml:base attribute and the xml:id attribute MUST NOT be used or supported.

9.2 Blocks

9.2.1 The dr element in the SuikaWiki/0.9 namespace

Category
None.
Content model
A dt element in the XHTML2 namespace, followed by a dd element in the XHTML2 namespace.
Allowed attributes
None.

The dr element is semantically equivalent to the HTML div element that is a child of the HTML dl element.

This element MUST NOT be used.

9.2.2 The comment-p element in the SuikaWiki/0.10 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
None.

The comment-p element represents a note.

Historically, the p suffix in the element name implied that it represented a paragraph. As its content is any flow content, it now can contain any number of paragraphs.

9.2.3 The history element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The history element represents a description of history or an out-of-date content.

9.2.4 The example element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The example element represents an example.

9.2.5 The preamble element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The preamble element represents a preamble or preface.

9.2.6 The postamble element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The postamble element represents a postamble.

9.2.7 The box element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The box element represents a physically grouped chunk of contents.

The class attribute and CSS can be used to specify presentations of the element.

This element can be used to mark up a chunk of text quoted from other materials when it is not apparently mapped into any other element's semantics, e.g. indented text, center-aligned text, right-aligned text, and vertical text.

9.2.8 The sw-items element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
An optional sw-itemtypes element followed by flow content, optionally intermixed with script-supporting elements.
Allowed attributes
class

The sw-items element represents a set of items.

Any HTML li element descendant of an sw-items element is considered as an item (i.e. an itemscope attribute is implied for the purpose of element semantics) whose item types is its nearest ancestor sw-items element's item item types.

Any HTML li element descendant of HTML li element descendant of an sw-items element is considered as having a property name child (i.e. an itemprop=child attribute is implied for the purpose of element semantics).

The item item types of an sw-items element element is the result of the following steps:

  1. Let types be an empty list.
  2. Let container be the first sw-itemtypes element child of element, if any, or null otherwise.
  3. If conainer is null, return types.
  4. Let anchors be the list of elements with the SuikaWiki/0.9 namespace and anchor.
  5. For each anchor in anchors:
    1. Append anchor's destination to types.
  6. Return types.

The item ID of an HTML li element descendant of an sw-items element element is the result of the following steps:

  1. Let end be the first anchor-end element child of element, if any, or null otherwise.
  2. If end is null, return null.
  3. Return a URL that is a unique identifier that is associated with end.

For the purpose of the convertion from an sw-items element to a Microdata subtree, the item ID of an HTML li element can be used as the itemid attribute value.

9.2.9 The sw-itemtypes element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The sw-itemtypes element represents a set of item types.

9.3 Dialogues

9.3.1 The dialogue element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Zero or more talk elements, optionally intermixed with script-supporting elements.
Allowed attributes
None.

The dialogue element represents a conversation between one or more persons.

Each piece of the conversation is represented by child talk elements.

A dialogue element SHOULD have at least one talk element child.

9.3.2 The talk element in the SuikaWiki/0.9 namespace

Category
None.
Content model
One speaker element followed by flow content, optionally intermixed with script-supporting elements.
Allowed attributes
class

The talk element represents a group of sentences by a person (or a specific group of persons) in the dialogue.

The speaker of a talk element is the first speaker child element of the element, if any, or null. If the speaker is not null, it describes the speaker(s) of the talk. Otherwise, the speaker is not explicitly described.

Interviewer's questions are often identified by lack of explicit speaker name.

The class attribute can be used to style talks in a dialogue based on the speaker of them (e.g. use different colors for different speakers).

9.3.3 The speaker element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The speaker element represents a short string used to credit the person (or a group of person) of the piece of the conversation.

It can also contain other metadata than person name, such as affiliation of the person or the timestamp of the talk, if desired.

9.4 Hyperlinks

Some of elements defined by this specification or used in SWML documents are considered as implicit link elements. Elements abbr, cite, code, and kbd in the HTML namespace are impicit link elements.

9.4.1 The anchor element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content, optionally followed by a title element, optionally intermixed with script-supporting elements.
Allowed attributes
anchor in the SuikaWiki/0.9 namespace

An anchor element represents a source anchor of hyperlink.

The hyperlink has a destination, which is identified in the application-dependent manner by the element's destination. The destination of an anchor element element is the result of the following steps:

  1. Let title be the first title element child, if any, or null otherwise.
  2. If title is null, return element's text content.
  3. Otherwise, return title's text content.

9.4.2 The anchor-internal element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
anchor in the SuikaWiki/0.9 namespace

...

9.4.3 The anchor-end element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
anchor in the SuikaWiki/0.9 namespace.

...

9.4.4 The anchor attribute in the SuikaWiki/0.9 namespace

The anchor attribute in the SuikaWiki/0.9 namespace, when set to an anchor-end element, defines an anchor number for the parent element of the anchor-end element, if any.

The attribute MUST be present and its value MUST be a valid integer. The integer MUST have different value from any other anchor attribute in the SuikaWiki/0.9 namespace of an anchor-end element in the SuikaWiki/0.9 namespace that belongs to the same tree as the first attribute.


The anchor attribute in the SuikaWiki/0.9 namespace MAY be set to q elements in the HTML namespace and in the XHTML2 namespace, as well as ins and del elements in the HTML namespace. The attribute can also be present to anchor and anchor-internal elements in the SuikaWiki/0.9 namespace.

In these cases, the attribute represents the anchor number of the element referenced. If the element on which the attribute is found is an anchor element, the element referenced might be found in the document referenced by the element. Otherwise, the element is in the tree the element belongs to.

If the element on which the attribute is found is not an anchor or anchor-internal element, the attribute has similar semantics to that of the cite attribute on the element. In such cases, the anchor attribute in the SuikaWiki/0.9 namespace MUST NOT be present when there is a cite attribute. A user agent MUST ignore the anchor attribute in the SuikaWiki/0.9 namespace if there is a cite attribute.

The attribute value MUST be a valid integer. Unless the element is anchor, the integer MUST be equal to one of the integer represented by the anchor attribute in the SuikaWiki/0.9 namespace set to an anchor-internal element in the SuikaWiki/0.9 namespace that belongs to the same tree.

9.4.5 The anchor-external element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
resParameter in the SuikaWiki/0.9 namespace
resScheme in the SuikaWiki/0.9 namespace

...

9.4.6 The resScheme attribute in the SuikaWiki/0.9 namespace

The resScheme attribute in the SuikaWiki/0.9 namespace MAY be used for q elements in the HTML namespace and in the XHTML2 namespace, as well as ins and del elements in the HTML namespace. The attribute can also be used for an anchor-external element in the SuikaWiki/0.9 namespace.

...

9.4.7 The resParameter attribute in the SuikaWiki/0.9 namespace

The resParameter attribute in the SuikaWiki/0.9 namespace MAY be used for q elements in the HTML namespace and in the XHTML2 namespace, as well as ins and del elements in the HTML namespace. The attribute can also be used for an anchor-external element in the SuikaWiki/0.9 namespace.

...

The element's resScheme of an resParameter attribute in the SuikaWiki/0.9 namespace attr is the result of the following steps:

  1. If attr's element is null, return the empty string.
  2. Return the result of getting attr's element's resScheme attribute in the SuikaWiki/0.9 namespace.

If the element's resScheme is MAIL, the attribute value MUST be a valid e-mail address.

If the element's resScheme is M, IMG, URL, or URI, the attribute value MUST be a valid URL string.

9.5 Embedded objects

Unless otherwise specified by other applicable specification, the aa element is a phrasing content and its content model is phrasing content.

9.5.1 The form element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Nothing.
Allowed attributes
id
input
option
parameter
ref
template

... ref, parameter.


... input, template, option

9.5.2 The image element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Text.
Allowed attributes
None.

This element MUST NOT be used. It was used to embed an image data associated with the document.

An image element element in the SuikaWiki/0.9 namespace represents the result of the following steps:

  1. Let encoded be elmeent's text content.
  2. Let decoded be the result of decoding encoded.
  3. If decoded is failure, return nothing.
  4. If decoded is not a valid PNG or JFIF image data, return nothing.
  5. Return an image represented by decoded.

Historically, at most one image element was allowed to be inserted after the body element in the HTML namespace. Its content has to be a Base64-encoded PNG or JPEG (JFIF) image data.

9.5.3 The replace element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Nothing.
Allowed attributes
by

This element MUST NOT be used.

... by

9.5.4 The text element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Text.
Allowed attributes
None.

This element MUST NOT be used.

...

9.6 Citations

9.6.1 The csection element in the SuikaWiki/0.10 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

...

9.6.2 The src element in the SuikaWiki/0.10 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

...

9.6.3 The refs element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The refs element represents a list of referenced documents.

9.6.4 The sw-see element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-see element represents a text that contains references to other relevant parts (typically another pages in the same Wiki or another pieces in the same document) associated with the surrounding texts.

9.7 Writing directions

A node's inline direction mode is either horizontal mode or vertical mode. Unless otherwise specified, the node's inline direction mode is same as the node's parent's inline direction mode if the node's parent is not null, or horizontal mode otherwise.

In horizontal mode, characters are intended to be stacked horizontally, left-to-right or right-to-left. Vertical glyphs are not used.

In vertical mode, characters are intended to be stacked vertically, top-to-bottom or bottom-to-top. Vertical glyphs are used.

Vertical glyphs are glyphs used in vertical writings. For Unicode characters, they are described by the Vertical_Orientation property of the code points.

When an element is defined as turned, it is rotated 180 degree. For example, a left-to-right text becomes a right-to-text text with 180-degree rotated glyphs, a top-to-bottom text becomes a bottom-to-top text with 180-degree rotated glyphs, and an underline becomes an overline.

When an element is defined as mirrored, it is reversed at the horizontal center line in horizontal mode or at the vertical center line in vertical mode. For example, a left-to-right text becomes a right-to-text text with horizontally mirrored glyphs and a top-to-bottom text becomes a bottom-to-top text with vertically mirrored glyphs.

An node has horizontal orientation and vertical orientation. Unless otherwise specified, a horizontal mode node's horizontal orientation is its natural orientation and vertical orientation is its 90-degree clockwise rotation (that is, the left of the context becomes the bottom for the node's content). Unless otherwise specified, a vertical mode node's vertical orientation is its natural orientation and the node's horizontal orientation is its 270-degree clockwise rotation (that is, the right of the context becomes the top for the node's content). These rotations are applied after any turning.

For a horizontal mode node, its children represent their horizontal orientation. For a vertical mode node, its children represent their vertical orientation.


A directional group is a sequence of zero or more flow content with a specific set of inline-direction/block-direction. It is expected to be rendered in a way that is appropriate for the directions. For example, a right-to-left/top-to-bottom directional group is styled such that texts are right-aligned and blocks are top-aligned.

A directional box is a directional group. A directional box's horizontal orientation and vertical orientation are its natural orientation.

9.7.1 The sw-l element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-l element represents a left-to-right, horizontal mode text.

9.7.2 The sw-lt element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-lt element represents a turned left-to-right, horizontal mode text.

9.7.3 The sw-r element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-r element represents a right-to-left, horizontal mode text.

9.7.4 The sw-rt element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-rt element represents a turned right-to-left, horizontal mode text.

9.7.5 The sw-v element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-v element represents a top-to-bottom, vertical mode text.

9.7.6 The sw-vt element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vt element represents a turned top-to-bottom, vertical mode text.

9.7.7 The sw-vb element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vb element represents a bottom-to-top, vertical mode text.

9.7.8 The sw-vbt element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vbt element represents a turned bottom-to-top, vertical mode text.

9.7.9 The sw-tate element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-tate element represents a vertical mode text embedded in horizontal mode.

The sw-tate element's horizontal orientation and vertical orientation are its natural orientation.

The sw-tate element MUST NOT be used in vertical mode.

9.7.10 The yoko element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The yoko element represents a horizontal mode text embedded in vertical mode.

The yoko element's horizontal orientation and vertical orientation are its natural orientation.

The yoko element MUST NOT be used in horizontal mode.

The element can be used to mark up a 縦中横 text span or a upright latin letter in a vertical text.

9.7.11 The sw-mirrored element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-mirrored element represents a mirrored text.

9.7.12 The sw-left element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-left element represents a left-to-right/top-to-bottom directional group in horizontal mode.

9.7.13 The sw-right element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-right element represents a right-to-left/top-to-bottom directional group in horizontal mode.

9.7.14 The sw-vlr element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vlr element represents a top-to-bottom/left-to-right directional group in vertical mode.

9.7.15 The sw-vrl element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vrl element represents a top-to-bottom/right-to-left directional group in vertical mode.

9.7.16 The sw-leftbox element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-leftbox element represents a left-to-right/top-to-bottom directional box in horizontal mode.

9.7.17 The sw-rightbox element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-rightbox element represents a right-to-left/top-to-bottom directional box in horizontal mode.

9.7.18 The sw-leftbtbox element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-leftbtbox element represents a left-to-right/bottom-to-top directional box in vertical mode.

Though it is desired that the element be in horizontal mode, it is defined as in vertical mode, such that the bottom-to-top block direction can be emulated by CSS 'writing-mode: vertical-lr' with 270-degree rotation.

9.7.19 The sw-rightbtbox element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-rightbtbox element represents a right-to-left/bottom-to-top directional box in vertical mode.

Though it is desired that the element be in horizontal mode, it is defined as in vertical mode, such that the bottom-to-top block direction can be emulated by CSS 'writing-mode: vertical-lr' with 270-degree rotation.

9.7.20 The sw-vlrbox element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vlrbox element represents a bottom-to-top/left-to-right directional box in vertical mode.

9.7.21 The sw-vrlbox element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-vrlbox element represents a bottom-to-top/right-to-left directional box in vertical mode.

9.8 Inline structures

9.8.1 The fenced element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
An openfence element, followed by a fencedtext element, followed by an optional closefence element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The fenced element represents a chunk of content with parentheses enclosing it.

The fenced element's open fence is the first child openfence element, if any, or null instead.

The fenced element's fenced content is the first child fencedtext element, if any, after any open fence of the element.

The fenced element's close fence is the first child closefence element, if any, after any fenced content of the element.

Any other child represents nothing.

The element's non-null open fence or close fence's child MAY be empty. It is semantically equivalent to the element's open fence or close fence, respectively, is null.

It is expected that the open fence and close fence, if any, are rendered by the same line height as the maximum of the height of the fenced content and the height of the line box usually generated by the open fence or close fence.

9.8.2 The openfence element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Nothing.
Phrasing content.
Allowed attributes
class

The openfence element represents the parenthese before the fenced content if it is an open fence. Otherwise, it represents its children.

9.8.3 The fencedtext element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The fencedtext element represents the content enclosed by parentheses if it is a fenced content. Otherwise, it represents its children.

9.8.4 The closefence element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Nothing.
Phrasing content.
Allowed attributes
class

The closefence element represents the parenthese after the fenced content if it is an close fence. Otherwise, it represents its children.

9.8.5 The lines element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Two or more line elements, optionally intermixed with script-supporting elements.
Allowed attributes
class

The lines element represents zero or more spans of parallel contents embedded within a line.

9.8.6 The line element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The line element represents a piece of content in the parent lines element, if any. Otherwise, it represents its content.

9.9 Editorial annotations

9.9.1 The insert element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The insert element is equivalent to the ins element.

9.9.2 The delete element in the SuikaWiki/0.9 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
class

The delete element is equivalent to the del element.

9.9.3 The ed element in the SuikaWiki/0.10 namespace

Category
Flow content.
Content model
Flow content.
Allowed attributes
None.

...

9.9.4 The asis element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content, optionally followed by a title element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The asis element represents a chunk of content with an annotation that the content is repeated from the source material as-is. This annotation is typically used within a quotation to describe that the content has an error that occurs in the original text (and is not produced by the author of the document).

If there is a last child element that is a title element, it is the annotation (and is not part of the annotated content). A user agent is expected to show the annotation as such. If there is no title element child, a user-agent dependent annotation SHOULD be used instead.

Typically a short text such as “(sic)” or “ママ” is used as the annotation.

9.9.5 The snip element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Nothing.
Phrasing content.
Allowed attributes
class

The snip element represents the point where some content in the source material are omitted. It is typically used within a quotation.

If the element's children is not empty, it represents the annotation placed instead of the original text.

It is expected that the element's children are rendered. If the element's children is empty, a user-agent dependent annotation SHOULD be used instead.

Typically, a short text such as “...” or “(中略)” is used as the annotation.

9.10 Inline annotations

9.10.1 The emph element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The emph element is equivalent to the em element.

The class attribute and CSS can be used to specify presentations of emphasized texts, such as italic and various kinds of annotation marks.

9.10.2 The rubyb element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content, followed by an rt element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The rubyb element is equivalent to the ruby element except that it only contains the base text (phrasing content) and the secondary ruby text (an rt element).

It is expected that the first rt element child, if any, is rendered after the other children before it.

9.10.3 The okuri element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
An rt element, optionally followed by an rt element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The okuri element represents okuri-ganas (送り仮名) annotated after the base character in Kambun (漢文).

Its first rt element child, if any, represents the primary ruby text. Its second rt element child, if any, represents the secondary ruby text. The base character is the character just before the okuri element (ignoring out-of-flow content such as another ruby texts).

It is expected that the primary ruby text, if any, is rendered at the right bottom of the base character and the secondary ruby text, if any, is rendered at the left bottom of the base character, in a vertical text. Note that the base character might have its own ruby texts (i.e. it is part of a ruby element).

9.10.4 The weak element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

...

9.10.5 The title element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

The title element represents a text that would be specified to the title attribute of the parent element, if applicable. Otherwise, it represents nothing.

A title element MAY be the last non-script-supporting child element of an HTML abbr, dfn, or span element parent when there is no title attribute in parent.

The title element MUST NOT be used when it represents nothing.

9.11 Mathematical representations

9.11.1 The dotabove element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content
Allowed attributes
class

The dotabove element represents a content with a dot above it. This is used in mathematical expressions, such as recurring decimals.

It is expected that a dot symbol is rendered above the element's children.

9.11.2 The sw-macron element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content
Allowed attributes
class

The sw-macron element represents a content with a dot above it. This is used in mathematical expressions, such as sample mean variables.

It is expected that an overline is rendered above the element's children.

9.11.3 The vector element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content
Allowed attributes
class

The vector element represents a content with a right-pointing arrow above it. This is used in mathematical expressions to denote a vector.

It is expected that a right-pointing arrow is rendered above the element's children.

This element SHOULD NOT be used for other possible presentations of vectors, such as bold face without arrow.

9.11.4 The subsup element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
A subscript element, followed by a superscript element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The subsup element represents a pair of subscript and superscript chunks of contents.

The subscript content of a subsup element is the first subscript element child, if any.

The superscript content of a subsup element is the first child superscript element, if any, after the subscript content if there is.

Any other child represents nothing.

It is expected that the subscript content is rendered as subscript and the superscript content is rendered as superscript sharing the same horizontal spaces.

9.11.5 The subscript element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The subscript element is semantically equivalent to the sub element.

9.11.6 The superscript element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The superscript element is semantically equivalent to the sup element.

9.12 Values

Some elements are defined as elements with value. For an element element with value, the element value is the value returned by the following steps:

  1. If element has a child attrvalue element:
    1. Let value element be the first attrvalue element child of element.
    2. Return the text content of element.
  2. Otherwise, return the text content of element.

9.12.1 The f element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Content model
Phrasing content.
Allowed attributes
class

The f element represents a field name or key of some structure, such as a field name of a C data structure, a key of a Perl hash, or a property name of an XML information item.

9.12.2 The key element in the SuikaWiki/0.10 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Content model
Phrasing content.
Allowed attributes
class

...

9.12.3 The n element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content, optionally followed by an attrvalue element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The n element represents the number given as the element value.

9.12.4 The lat element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content, optionally followed by an attrvalue element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The lat element represents a latitude given as the element value.

9.12.5 The lon element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content, optionally followed by an attrvalue element, optionally intermixed with script-supporting elements.
Allowed attributes
class

The lon element represents a longitude given as the element value.

9.12.6 The tz element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content, optionally followed by an attrvalue element, optionally intermixed with script-supporting elements.
Allowed attributes
class

A tz element represents a time-zone offset given as the element value.

9.12.7 The cc element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content, optionally followed by an title element, optionally intermixed with script-supporting elements.
Allowed attributes
class

A cc element represents a character code (or a sequence of character codes) given as the element value.

A character code is a string that identifies a sequence of one or more characters.

The element value of a cc element MUST be a characters string.

A characters string is either one of the followings:

  • A Unicode code point string.
  • A U+003C LESS-THAN SIGN (<) followed by two or more Unicode code point strings, separated by a U+002C COMMA character (,), followed by a U+003E GREATER-THAN SIGN (>). There MAY be zero or more space characters before and after any COMMA character.

A characters string represents the concatenation of the code point represented by the Unicode code point strings within the characters string, in order.

A Unicode code point string is a string that represents a code point as specified by the Infra Standard. It represents the code point.

If the element value of a cc element is not a characters string, it represents an invalid character code (or sequence of character codes).

The cc element is not a character escaping mechanism. It represents a character code(s), not a character.

9.12.8 The cn element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content, optionally followed by an title element, optionally intermixed with script-supporting elements.
Allowed attributes
class

A cn element represents a character name given as the element value.

A character name is a string that identifies a sequence of one or more characters. It MUST be a name specified by The Unicode Standard or named in a convention similar to The Unicode Standard and ISO/IEC 6429.

If the element value of a cn element is not a characters name, it represents an invalid character name.

The cn element is not a character escaping mechanism. It represents a character name, not a character.

9.12.9 The ch element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Elements with value.
Content model
Phrasing content.
Allowed attributes
class

A ch element represents a single character string given as the element value.

A single character string is a sequence of one or more Unicode code points. It MUST be a string of one or more code points.

If the element value of a cn element is the empty string, it represents nothing.

Though usually a single character string is a single code point, it might contain two or more code points, e.g. combining characters and ZWJ.

9.12.10 The sw-value element in the SuikaWiki/0.9 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

The sw-value element represents a string that would be specified to the value attribute of the parent data element, if applicable. Otherwise, it represents nothing.

A sw-value element MAY be the last non-script-supporting child element of an HTML data element parent when there is no value attribute in parent.

The sw-value element MUST NOT be used when it represents nothing.

9.12.11 The attrvalue element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

The attrvalue element represents the element value of the parent element, if applicable, or the text that would be set to the datetime attribute of the parent element, if it is a time element without a datetime attribute. Otherwise, it represents nothing.

An attrvalue MAY be the last non-script-supporting child element of a time element.

An attrvalue MUST NOT be used when it represents nothing.

9.13 Conformance keywords

Elements in this section represents keywords defined in RFC 2119.

9.13.1 The MUST element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Content model
Phrasing content.
Allowed attributes
class

The MUST element represents an RFC 2119 keyword "MUST". It can also be used for keywords "REQUIRED", "SHALL", "MUST NOT", and "SHALL NOT".

9.13.2 The SHOULD element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Content model
Phrasing content.
Allowed attributes
class

The SHOULD element represents an RFC 2119 keyword "SHOULD". It can also be used for keywords "RECOMMENDED", "SHOULD NOT", and "NOT RECOMMENDED".

9.13.3 The MAY element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Implicit link elements.
Content model
Phrasing content.
Allowed attributes
class

The MAY element represents an RFC 2119 keyword "MAY". It can also be used for keyword "OPTIONAL".

9.14 Physical representations

9.14.1 The sw-cursive element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The sw-cursive element represents a text in a cursive font.

9.14.2 The smallcaps element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content.
Allowed attributes
class

The smallcaps element represents a text in small-capitals.

The characters in the smallcaps element, including but not limited to uppercase and lowercase latin letters, are expected to be rendered using small-capital glyphs, whenever possible.

9.14.3 The sw-br element in the SuikaWiki/0.9 namespace

Category
Phrasing content.
Flow content.
Content model
Nothing.
Phrasing content.
Allowed attributes
class

The sw-br element represents an explicit line break.

The element's children, if not empty, represents the hyphenation or other symbols indicating the end of the line, placed just before the line break.

The sw-br element is expected to be rendered as if there were a U+000A LINE FEED character at the end of the element.

White space characters in the sw-br element and the U+000A character at the end of the element are expected to be processed as preserved white spaces.

9.15 Qualified names

9.15.1 The qn element in the SuikaWiki/0.10 namespace

Category
Phrasing content.
Flow content.
Content model
Phrasing content, optionally followed by a nsuri element, optionally intermixed with script-supporting elements.
Allowed attributes
class

...

9.15.2 The qname element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

This element MUST NOT be used.

...

9.15.3 The nsuri element in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
None.

...

9.16 Fallback elements

9.16.1 Uppercase elements in the SuikaWiki/0.10 namespace

Category
None.
Content model
Phrasing content.
Allowed attributes
class

Uppercase elements are elements in the SuikaWiki/0.10 namespace whose local name consists of one or more uppercase letters.

These elements MUST NOT be used.

These elements might be inserted into a node tree by a parser when an inline start tag with unknown tag name is found.

10 Security

The node tree returned by the parsing algorithm does not contain any script element. The following nodes might contain URLs, which might be potentially dangerous (e.g. javascript: URL):

The image element in the SuikaWiki/0.9 namespace might contain a Base64-encoded binary data that is not in fact an image (e.g an executable binary).

References

Normative references

MANAKAI
manakai's DOM extensions.
XHTML2
...

Tests and implementation

There are test data.

There is a Perl implementation.

Author

This document is written by Wakaba <wakaba@suikawiki.org>.

This document is developed as part of the SuikaWiki project.

Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.