Summary of Modifications to Textile 2.0 beta
Colin Brown, June 2004 (Last Updated: 24th June 2004)
Introduction
This document describes the enhancements I have made to the php version of Textile to implement some of the features documented in Brad Choate’s perl version.
The changes are described in detail below, but in summary I have:
- Implemented extended block syntax (BC’s Documentation)
- Improved handling of preformatted text
- Implemented a new block modifier, bc (BC’s Documentation) and block escaping (BC’s Documentation)
The source code files in various stages can be downloaded here. classTextile.php includes all the changes; the other versions are referred to in each section below.
Known issues, bugs and fixes implemented since this document was written are described here: Issues, Bugs and Fixes
1. Extended Block Syntax
Source code for this revision: classTextile.cbv3.php
These changes were implemented in a clean copy of classTextile.php as found in textpattern_g118a.zip
My revision numbering starts at cbv3 because cbv1 and cbv2 were early attempts that I discarded.
The first set of changes I made was to implement extended block modifier syntax. For example:
bq.. Block Quote
Another block quote
p. A paragraph
Would be translated into
<blockquote>
<p>Block Quote</p>
<p>Another block quote</p>
</blockquote>
<p>A paragraph</p>
The intial block modifier applies to all subsequent blocks until another block modifier or the end of the file is found.
Summary of Changes Made
- added code to block() to set new class variables (properties) 'ext' and 'xend'.
- ext: boolean indicating whether we are within an extended block
- xend: the closing tag for the current extended block
- modified the regex in the block() method that calls the fBlock method to match and capture the optional extended block indicator '.'
- modified fBlock to allow for the additional captured string and assign it to the variable $extb
- modified fBlock to test the value of $extb - if a '.' character is found, we set the new class variables $this->ext = true, $this->xend = closing tag, and $end = ''.
- modified fBlock - adapted the special code that deals with blockquotes so that if we are dealing with an extended block, we only add </p> to the current line, and set $this->xend = </blockquote>
- modified the loop through the array of lines in the block() method to assign the current array key to $key, which we then use to (a) append the closing block tag to the previous line when a new block modifier is found and (b) set the key in the out array to ensure the keys of $text and $out are syncronised (they should be anyway but it seemed a good idea to make sure)
- modified fBlock return value - changed to work the same way as Brad Choate's version - the original code assigned attributes for a blockquote to the contained <p> tag; the modified code now assigns the attributes to the blockquote tag.
2. Improved Handling of <pre>
Source code for this revision: classTextile.cbv4.php.
The original version of Textile 2.0 beta created paragraphs inside blocks of text marked up with <pre> tags. This seems wrong, since the text is (pre)formatted. Lines not starting with, but containing <pre> tags, would also be wrapped in <p></p> tags, which seems right, since <pre> is defined as inline by default. The modified version always wraps a <pre> block inside paragraph tags, but no longer inserts <p> tags within the <pre> block.
Details of changes made to the block() method
- added $pril - boolean flag to indicate whether the current line contains both opening and closing <pre> tags - (pr)e (i)n (l)ine.
- set $pril to true if the current line (a) contains <pre> and (b) contains an even number of opening and closing tags
- added $popen - a boolean flag to indicate when an opening paragraph tag has been added but not yet closed
- changed the line that does the preg_replace which wraps lines with <p></p> so that the replacement only happens if
- we are not within a pre block (!$pre) - the line is wrapped with <p></p> as before. In this instance, the regex has been simplified - we don't check for lines starting with <pre> or </pre> since they cannot appear when $pre is false. Note that the final line of a pre block which contains the closing pre tag will not get wrapped in <p></p> as we are still considered inside the pre block until the end of the loop
- we are not within a pre block, but a paragraph tag for a preceding pre block is still open, we append </p> to the previous line and set $popen to false
- the current line contains opening and closing pre tags, the replacement is executed as normal
- we are within a pre block, the current line does not contain opening and closing tags, and the para has not yet been opened, we only insert the opening <p> tag. The regex has been modified so that lines beginning in <pre> are also matched. We also set $popen to true
- modified the regex that sets $pre to false at the end of the loop - we now check that the closing </pre> is not followed by a subsequent opening <pre>
- added some code after the loop through all the lines to add a closing </p> if $popen is still true (for the instance where the last line contains a closing </pre>)
3. Implementing bc. and bc.. “block code” modifiers, and block escaping
Source code for this revision: classTextile.cbv5.php
Summary of changes to implement bc
This set of changes implemented “bc.” and “bc..”. Fairly substantial modifiactions were required, because code blocks have to be excluded from white space cleaning and paragraph creation. Support for extended block modifiers which terminate in a new line rather than a space was also required, as this is commonly used when including code using “bc..” (and is implemented in Brad Choate’s perl version).
Details of changes to implement bc
- Added bc to the list of possible blockmodifiers in block()
- Modified fBlock() to wrap <pre><code></code></pre> around blocks marked with the bc modifier
- Added the $xbc boolean variable to indicate we are in an extended block of code. False initially; we set to true if bc.. is found at the start of a line. Is reset to false in the conditional statement where we set $this->ext = false when a new block modifier is found. The code that checks for "bc.." and sets $xbc is below the check to see if a current exteneded block has ended, as otherwise $xbc gets set to false if a bc.. is preceeded by say a bq.. block.
- Added a check to ensure $xbc is false before doing the paragraph regex replacement in block()
- Moved definition of the block modifier codes to a class variable, $abm - (a)ll (b)lock (m)odifiers.
- Modified block() to get the block modifier codes from $this->abm
- Added two new methods, shelveAllPreformatted() and shelveBlock() which prevent white space in preformatted blocks from being stripped out
- shelveAllPreformatted() is called by textileThis prior to cleanWhiteSpace (but after fixEntities). It contains a couple of regular expression replacements that pass blocks marked up with bc.. or <pre> to the shelveBlock method, which in turn replaces them with a referenced marker. Uses $this->abm to get block modifier codes for use in the regex that pulls out the bc.. blocks, which can be delimited by susbsequent block modifiers. Shelved blocks are retrieved by the call in textileThis() to retrieve() - we wouldn't want to retrieve them earlier than this, because this neatly avoids any processing on inline text within the preformatted blocks, which we wouldn't want.
- shelveBlock is a callback function for the preg_replace_callback used in shelveAllPreformatted - very similar to the shelve() method, but accepts the 'match' array passed from preg_replace_callback and doesn't insert a space before the referenced marker returned.
- Moved the clean up of line endings out of cleanWhiteSpace() and into a new method, fixLineEndings() - this is called before shelveAllPreformatted() because we want to ensure consistent line endings even in preformatted blocks
- Moved the call in TextileThis() to encodeEntities() so that it happens after shelveAllPreformatted - this is required to prevent angle brackets in preformatted blocks getting converted twice. Angle brackets appearing elswhere get converted back by fixEntities - not so for preformatted blocks, since fixEntities is called after shelveAllPreformatted(). I did try allowing fixEntites to run on the content of the preformatted blocks, but this introduced more problems with escaped blocks containing already encoded entities (they were 'fixed' by fixEntities). I think we now "do the right thing" by intially skipping all entity conversion on escaped / preformatted blocks, then converting entities in the content of bc.. blocks in the block() and fBlock() methods. Note that the glyphs() method (which converts entities found outside of preformatted blocks) is not suitable for converting the content of code blocks, because html tags are deliberately omitted to allow html tags to be included in textile source text. Within code blocks we want to convert any tags found to use entities.
- Added a check into the conditional statement in the block() method that checks for an opening <pre> tag and sets $pre so that it is not executed if we are inside an extended bc.. block (i.e. $xbc is true) - this allows <pre> tags to be included within bc..'s. Without this check, a lone <pre> tag within a block of code can mess up subsequent formatting of paragraphs.
- Modified fBlock() to convert the content of a bc. block using htmlspecialchars so that <, > etc are converted to html entities
- Modidified block() so that if we are _within_ an extended code block (i.e. not on the first line) the whole line is converted using htmlspecialchars. This is required because fBlock only converts content on the first line following a bc.. modifier.
- Added an additional preg_replace_callback statement to block() to allow for extended blocks modifiers that are followed by a newline instead of the usual space. This matches Brad Choate's implementation and is useful for blocks of code (bc..). Currently my version also allows bq.. blocks to be followed by a newline too, resulting in an empty paragraph tag at the beginning of the blockquote (Brad C's version fails to insert the starting blockquote tag in this case, so I don't know whether bq.. blocks should allow this or not!). This regex is only executed if the ".." string is found, to ensure performance is not affected too badly (the check for ".." uses the fast strpos() function) if we are not processing an extended block.
- Modified the preg_match statement in block() that checks for the start of a new block when we are within an extended block to match extended block modifiers that are followed by a newline, so that one extended block can be followed by another.
- Added "Comment Author: CRB" to the end of all of my comments to allow them to be easily stripped out if required to reduce the size of the code.
Summary of changes to implement escaped blocks
Escaped blocks allow you to switch off textile formatting for a whole block of text. Documented here.
Details of changes to implement escaped blocks
- In the "TextileThis() method, I have moved the line "$text = $this->noTextile($text);" higher up in the order of execution so that this is one of the first things we do (moved it from immediately before the line "$text = $this->links($text);" to before "$text = $this->fixLineEndings($text);"). This allows the shelveAllPreformatted method to shelve notextile blocks too (required to prevent entity encoding).
- Have added $notb into the block() method - a boolean for tracking whether we are within a <notextile>...</notextile> block. Initially set to false; set to true at the beginning of the loop through lines if <notextile> is found at the start of the line and followed by one or more line breaks. This means that the escaped block _must_ be preceeded by a blank line, e.g.
paragraph 1
==
escaped stuff
==
paragaph 2
will work, but
paragraph 1
==
escaped stuff
==
paragaph 2
will not.
This seems reasonable since it is not easy to figure out where the end of paragraph 1 is in this scenario, and blocks are generally separated by two newlines elsewhere in textile.
- Have modified shelveAllPreformatted() to shelve <notextile>...</notextile> blocks.
- $notb is set to false at the end of the loop through lines when it is found on a line on its own - this should always be the case since line break codes are replaced with newlines when we are within a <notextile> block.
- Added check so that the code that calls fBlock is skipped if $notb is true
- Added check so that paragraph wrapping is skipped if $notb is true
- Added check so that line break codes are replaced with newlines $notb is true
- Replced <br /> with ~slbr~ as the line break code used to initially replace single newlines in the block() method - this allows <br />'s to appear within escaped blocks without being replaced with newlines. Added code to replace ~slbr~ with <br />'s once all processing is done, just before returning the text. Note: origainlly I used <slbr> as the placeholder; however, since we convert content in bc.. blocks using htmlspecialchars, the <> characters were getting replaced and preventing the <slbr> from being replaced with real <br /> tags at the end of the method.