webhelpers2.html.builder

HTML/XHTML/HTML 5 tag builder.

HTML Builder provides:

  • an HTML object that creates (X)HTML tags in a Pythonic way.
  • a literal class used to mark strings containing intentional HTML markup.
  • a smart escaping mechanism that preserves literals but escapes other strings that may accidentally contain markup characters ("<", ">", "&", '"', "'") or malicious Javascript tags. Escaped strings are returned as literals to prevent them from being double-escaped later.

The builder uses markupsafe and follows Python's unofficial .__html__ protocol, which Mako, Chameleon, Pylons, and some other packages also follow. These features are explained in the next section.

Literals

class webhelpers2.html.builder.literal(s, encoding=None, errors=strict')

An HTML literal string, which will not be further escaped.

I'm a subclass of markupsafe.Markup, which itself is a subclass of unicode in Python 2 or str in Python 3. The main difference from ordinary strings is the .__html__ method, which allows smart escapers to recognize it as a "safe" HTML string that doesn't need to be escaped.

All my string methods preserve literal arguments and escape plain strings. However, in expressions you must pay attention to which value "controls" the expression. I seem to be able to control all combinations of the + operator, but with % and .join I must be on the left side. So these all work:

"A" + literal("B")
literal(", ".join(["A", literal("B")])
literal("%s %s") % (16, literal("kg"))

But these return plain strings which are vulnerable to double-escaping later:

"\n".join([literal("<span>A</span"), literal("Bar!")])
"%s %s" % ([literal("16"), literal("&lt;&gt;")])
static __new__(base=u'', encoding=None, errors=u'strict')

Constructor.

I convert my first argument to a string like str() does. However, I convert None to the empty string, which is usually what's desired in templates. (In contrast, raw Markup(None) returns "None".)

Examples:

>>> literal("A")   # => literal("A")
>>> literal(">")   # => literal(">")
>>> literal(None)  # => literal("")
>>> literal(11)    # => literal("11")
>>> literal(datetime.date.today())   # => literal("2014-08-31")

The default encoding is "ascii".

classmethod escape(s)

Escape the argument and return a literal.

This is a class method. The result depends on the argument type:

  • literal: return unchanged.
  • an object with an .__html__ method: call it and return the result. The method should take no arguments and return the object's preferred HTML representation as a string.
  • plain string: escape any HTML markup characters in it, and wrap the result in a literal to prevent double-escaping later.
  • non-string: call str(), escape the result, and wrap it in a literal.
  • None: convert to the empty string and return a literal.

If the argument has an .__html__ method, I call it and return the result. This causes literals to pass through unchanged, and other objects with an .__html__ method return their preferred HTML representation. If the argument is a plain string, I escape any HTML markup characters and wrap the result in a literal to prevent further escaping. If the argument is a non-string, I convert it to a string, escape it, and wrap it in a literal. Examples:

>>> literal.escape(">")            # => literal("&gt;")
>>> literal.escape(literal(">"))   # => literal(">")
>>> literal.escape(None)           # => literal("")

I call markupsafe.escape_silent(). It escapes double quotes as "&#34;", single quotes as "&#39;", "<" as "&lt;", ">" as "&gt;", and "&" as "&amp;".

lit_join(iterable)

Like the .join string method but don't escape elements in the iterable.

striptags()

Unescape markup into an text_type string and strip all tags. This also resolves known HTML4 and XHTML entities. Whitespace is normalized to one:

>>> Markup("Main &raquo;  <em>About</em>").striptags()
u'Main \xbb About'
unescape()

Unescape markup again into an text_type string. This also resolves known HTML4 and XHTML entities:

>>> Markup("Main &raquo; <em>About</em>").unescape()
u'Main \xbb <em>About</em>'

The HTML generator

The HTML global is an instance of the HTMLBuilder class. Normally you use the global rather than instantiating it yourself.

class webhelpers2.html.builder.HTMLBuilder

An HTML tag generator.

__call__(*args, **kw)

Escape the string args, concatenate them, and return a literal.

This is the same as literal.escape(s) but accepts multiple strings. Multiple arguments are useful when mixing child tags with text, such as:

html = HTML("The king is a >>", HTML.strong("fink"), "<<!")

Keyword args:

nl
If true, append a newline to the value. (Default False.)
lit
If true, don't escape the arguments. (Default False.)
__getattr__(attr)

Same as the tag method but using attribue access.

HTML.a(...) is equivalent to HTML.tag("a", ...).

tag(tag, *args, **kw)

Create an HTML tag.

tag is the tag name. The other positional arguments become the content for the tag, and are escaped and concatenated.

Keyword arguments are converted to HTML attributes, except for the following special arguments:

c
Specifies the content. This cannot be combined with content in positional args. The purpose of this argument is to position the content at the end of the argument list to match the native HTML syntax more closely. Its use is entirely optional. The value can be a string, a tuple, or a tag.
_closed
If present and false, do not close the tag. Otherwise the tag will be closed with a closing tag or an XHTML-style trailing slash.
_nl

If present and true, insert a newline before the first content element, between each content element, and at the end of the tag.

Note that this argument has a leading underscore while the same argument to __call__ doesn't. That's because this method has so many other complex arguments, and for backward compatibility.

_bool
Additional HTML attributes to consider boolean beyond those listed in .boolean_attrs. See "Class Attributes" below.

Other keyword arguments are converted to HTML attributes after undergoing several transformations:

  • Ignore attributes whose value is None.
  • Delete trailing underscores in attribute names. ('class_' -> 'class').
  • Replace non-trailing underscores with hyphens. ('data_foo' -> 'data-foo').
  • If the attribute is "defer", "disable", "multiple", or "readonly", render it as an HTML 5 boolean attribute. If the value is true, copy the attribute name to the value. If the value is false, don't render the attribute at all. See self.boolean_attrs and _bool to customize which attributes are considered boolean.
  • If the attribute is "class" or "class_" and the value is a list or tuple, convert the value to a space-delimited string. If the value is an empty list/tuple, don't render the attribute at all. If the value contains elements that are 2-tuples, the first subelement is the string item, and the second subelement is a boolean flag; render only subelements whose flag is true. This allows users to programatically set the parts of a composable attribute in a template without extra loops or logic code.
  • Likewise for "style", if the value is a list/tuple, convert it to a semicolon-delimited string, with a space after the semicolon. See self.compose_attrs to customize which attributes have list/tuple conversion and what their delimiter is.

Examples:

>>> HTML.tag("div", "Foo", class_="important")
literal(u'<div class="important">Foo</div>')
>>> HTML.tag("div", "Foo", class_=None)
literal(u'<div>Foo</div>')
>>> HTML.tag("div", "Foo", class_=["a", "b"])
literal(u'<div class="a b">Foo</div>')
>>> HTML.tag("div", "Foo", class_=[("a", False), ("b", True)])
literal(u'<div class="b">Foo</div>')
>>> HTML.tag("div", "Foo", style=["color:black", "border:solid"])
literal(u'<div style="color:black; border:solid">Foo</div>')
>>> HTML.tag("br")
literal(u'<br />')
>>> HTML.tag("input", disabled=True)
literal(u'<input disabled="disabled"></input>')
>>> HTML.tag("input", disabled=False)
literal(u'<input></input>')

To generate opening and closing tags in isolation:

>>> HTML.tag("div", _closed=False)
literal(u'<div>')
>>> HTML.tag("/div", _closed=False)
literal(u'</div>')
comment(*args)

Wrap the content in an HTML comment.

Escape and concatenate the string arguments.

Example:

>>> HTML.comment("foo", "bar")
literal(u'<!-- foobar -->')
cdata(*args)

Wrap the content in a "<![CDATA[ ... ]]>" section.

Plain strings will not be escaped because CDATA itself is an escaping syntax. Concatenate the arguments:

>>> HTML.cdata(u"Foo")
literal(u'<![CDATA[Foo]]>')
>>> HTML.cdata(u"<p>")
literal(u'<![CDATA[<p>]]>')
render_attrs(attrs)

Format HTML attributes into a string of ' key="value"' pairs.

You don't normally need to call this because the tag method calls it for you. However, it can be useful for lower-level formatting in string templates like this:

Click <a href="http://example.com/"{attrs1}>here</a>
or maybe <a{attrs2}>here</a>.

attrs is a list of attributes. The values will be escaped if they're not literals, but no other transformation will be performed on them.

The return value will have a leading space if any attributes are present. If no attributes are specified, the return value is the empty string literal. This allows it to render prettily in the interpolations above regardless of whether attrs contains anything.

The following class attributes are literal constants:

EMPTY

The empty string as a literal.

SPACE

A single space as a literal.

TAB2

A 2-space tab as a literal.

TAB4

A 4-space tab as a literal.

NL

A newline ("\n") as a literal.

NL2

Two newlines as a literal.

BR

A literal consisting of one "<br />" tag.

BR2

A literal consisting of two "<br />" tags.

The following class attributes affect the behavior of the ``tag`` method:

void_tags

The set of tags which can never have content. These are rendered in self-closing style; e.g., '<br />'. See About XHTML and HTML below.

boolean_attrs

The set of attributes which are rendered as booleans. E.g., disabled=True renders as 'disabled="disabled"', while disabled=False is not rendered at all.

The default set is conservative; it includes only "checked", "defer", "disabled", "multiple", "readonly", and "selected". We may add to this later as more standard boolean attributes are identified.

compose_attrs

A dict of attributes whose value may have string-delimited components. The keys are attribute names and the values are delimiter literals. The default configuration supports the "class" and "style" attributes.

literal

The literal class that will be used internally to generate literals. Changing this does not automatically affect the constant attributes (EMPTY, NL, BR, etc).

About XHTML and HTML

This builder always produces tags that are valid as both HTML and XHTML. "Void" tags -- those which can never have content like <br> and <input> -- are written like <br />, with a space and a trailing /.

Only void tags get this treatment. The library will never, for example, produce <script src="..." />, which is invalid HTML and legacy browsers misinterpret it as still being open. Instead the builder will produce <script src="..."></script>.

The W3C HTML validator validates these constructs as valid HTML Strict. It does produce warnings, but those warnings warn about the ambiguity if this same XML-style self-closing tags are used for HTML elements that are allowed to take content (<script>, <textarea>, etc). This library never produces markup like that.

Rather than add options to generate different kinds of behavior, we felt it was better to create markup that could be used in different contexts without any real problems and without the overhead of passing options around or maintaining different contexts, where you'd have to keep track of whether markup is being rendered in an HTML or XHTML context.

If you _really_ want void tags without training slashes (e.g., <br>), you can abuse _closed=False to produce them.

Functions

webhelpers2.html.builder.escape(s)

Same as literal.escape(s).

webhelpers2.html.builder.lit_sub(*args, **kw)

Literal-safe version of re.sub. If the string to be operated on is a literal, return a literal result. All arguments are passed directly to re.sub.

webhelpers2.html.builder.url_escape(s, safe='/')

Urlencode the path portion of a URL. This is the same function as urllib.quote in the Python standard library. It's exported here with a name that's easier to remember.

The markupsafe package has a function soft_unicode which converts a string to Unicode if it's not already. Unlike the Python builtin unicode(), it will not convert Markup (literal) to plain Unicode, to avoid overescaping. This is not included in webhelpers2 but you may find it useful.

Table Of Contents

Previous topic

webhelpers2.html

Next topic

webhelpers2.html.tags