[lxml-dev] naming the lxml.html parse functions

Stefan Behnel stefan_ml at behnel.de
Sun Jul 8 11:15:07 CEST 2007



Stefan Behnel wrote:
> HTML is a factory function, so what about
> calling the string parser functions "HTML()", "HTMLFragment()" and
> "HTMLFragments()"?

That would also make the semantics pretty simple:

HTML() will always return a complete HTML document, i.e. wrapped by html/body
if necessary.

HTMLFragment() will always return a fragment, i.e. a single element that can
be pasted into a body. This means: remove html/body if they are present and
add a <div> if there are multiple elements. Maybe check if there actually are
any block tags and just wrap the fragments in a <p> otherwise, but that's more
of an optimisation.

HTMLFragments() will always return a list of fragments, i.e. text and/or
elements and remove any html/body parts that come from the document or were
added by the parser.

Does that sound like a suitable API?

Stefan


More information about the lxml-dev mailing list