[lxml-dev] [objectify] patch/changes proposal: xsiannotate, deannotate
Stefan Behnel
stefan_ml at behnel.de
Tue Apr 10 20:45:56 CEST 2007
Hi Holger,
thanks a lot for the patch. I took a deeper look at it this morning and it
doesn't really look like the cleanest one on earth to me. I applied it anyway
and cleaned it up to match my idea of what you were going after. The new patch
is attached, please verify that this is what you wanted.
jholg at gmx.de wrote:
> Hi all, I suggest
>
> 1. adding two functions to lxml.objectify:
>
> def xsiannotate(element_or_tree, ignore_old=True): """Recursively annotates
> the elements of an XML tree with 'xsi:type' attributes.
>
> If the 'ignore_old' keyword argument is True (the default), current
> 'xsi:type' attributes will be ignored and replaced. Otherwise, they will
> be checked and only replaced if they no longer fit the current text value.
> """ [...]
Sure. I think that's helpful as objectify supports two annotations after all.
> Note: Will simply take the first schema type in PyType.xmlSchemaTypes list.
>
Hmmm. I guess that should do, but I'd prefer having that documented.
> def deannotate(element_or_tree, pytype=True, xsi=True): """Recursively
> de-annotate the elements of an XML tree by removing 'pytype' and/or 'type'
> attributes.
>
> If the 'pytype' keyword argument is True (the default), 'pytype' attributes
> will be removed. If the 'xsi' keyword argument is True (the default),
> 'xsi:type' attributes will be removed. """ [...]
Sure, definitely helpful for cleanup purposes.
> 2. Patching annotate() so that it allows for leaving pytype="str" as is if
> ignore_old=False. Currently it will start type-guessing/xsi-type lookup as
> PyType(str,...) uses no type_check function.
I think that's the right thing to do.
> 3. Modifying the objectify.Element() factory to default nsmap to nsmap = {
> "py": PYTYPE_NAMESPACE, "xsi": XML_SCHEMA_INSTANCE_NS } if it is None. This
> keeps namespace-information in non-root nodes nice and clean with the cool
> new 1.3 lookup-if-ns-is-defined-up-in-the-tree functionality.
Cool, hum? 8o]
Ok, although not everyone will use annotations, we already add them internally
in DataElement() if we can figure out the type, so this is also helpful.
> 4. Patch DataElement so that it allows s.o. using an _xsitype argument that
> is not registered (or even plain wrong). Currently, this raises a KeyError,
> whereas using an unknown pytype defaults to StringElement.
Sure, why not. We're all adults, right?
> 5. Restructure pytype<-->XML Schema type mapping a bit, as e.g XML Schema
> type integer fits better to a Python long than a Python int regarding value
> space.
Definitely. And since Python transmogrifies ints into longs already if it has
to, assuming longs can never hurt.
> (Anything that fits in 32bit becomes a Python int, everything else a Python
> long. Maybe slightly arbitrary, but ok for 32bit-machines :-)
Sure, no one will ever need more than 32 bits to address those 670KB of
memory, right? :)
Have you checked what the XML Schema datatypes spec says here? I know that C
doesn't really define an int across platforms, but they do, right?
> This does not
> have big implications in practice, it's more or less for consistency. One
> thing remains: xsiannotate()-ing an IntElement >=2**31 will still xsi:type
> that as "int", which is not really valid regarding schema types. This could
> be addressed by using a more elaborate type_check for PyType("int",...) but
> I'm unsure about performance drawback and if it's worth the effort.
Whatever. Just wait until someone complains. :)
>From the point of view of objectify's internal use of type annotations, I
can't see a major difference here, so whatever we change in the future should
not impact current programs (famous last words...)
Note that you can always override this by hand by replacing the 'int()'
function with something that additionally checks the resulting value. Requires
a bit of shuffeling in the PyType registry, but since most people will not
care anyway...
> b) Add all (non-list) XML Schema datatypes that restrict "string" to
> PyType('str', ...) As StringElement is the default these end up in
> StringElement anyway today. Adding them can result in faster lookup as no
> type-guessing will be invoked, and just for completeness...
Sure, and it definitely doesn't hurt as it's still only a lookup in a rather
small dictionary.
Thanks for the effort,
Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: secondtry.patch
Type: text/x-patch
Size: 30083 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20070410/59f342b0/attachment-0001.bin
More information about the lxml-dev
mailing list