[lxml-dev] naming the lxml.html parse functions
Stefan Behnel
stefan_ml at behnel.de
Mon Jul 9 21:44:15 CEST 2007
Stefan Behnel wrote:
> Stefan Behnel wrote:
>> HTML is a factory function, so what about
>> calling the string parser functions "HTML()", "HTMLFragment()" and
>> "HTMLFragments()"?
>
> That would also make the semantics pretty simple:
>
> HTML() will always return a complete HTML document, i.e. wrapped by html/body
> if necessary.
>
> HTMLFragment() will always return a fragment, i.e. a single element that can
> be pasted into a body. This means: remove html/body if they are present and
> add a <div> if there are multiple elements. Maybe check if there actually are
> any block tags and just wrap the fragments in a <p> otherwise, but that's more
> of an optimisation.
>
> HTMLFragments() will always return a list of fragments, i.e. text and/or
> elements and remove any html/body parts that come from the document or were
> added by the parser.
I changed this on the branch and also renamed the current do-what-I-mean
"parse()" function to "fromstring()".
This means that "HTML()" now behaves differently from "fromstring()", although
"XML()" and "fromstring()" behave the same in etree. But I find that ok, since
they behave as you would expect. HTML() gives you an HTML page (including
html/body) and "fromstring()" more or less gives you what you passed in as a
string, be it with or without <html>.
So, that makes the API complete (for now), I think. I'll double check the
modules to see if everything looks nice and consistent and will then try to
merge the branch back into the trunk soon to get out a "2.0alpha1". The API
may still change during the alpha cycle, but this will hopefully get us some
broader feedback on the new package.
Stefan
More information about the lxml-dev
mailing list