[lxml-dev] 2.1beta questions: objectify.XML, objectify.parse base_url arg, deprecate enableRecursiveStr, etree.tounicode()
jholg at gmx.de
jholg at gmx.de
Tue Jul 1 16:21:29 CEST 2008
Hi Stefan,
> looks like you started cleaning up. :)
>
>
>
Quite right. I started having a bad conscience for never really looking at
2.1 for quite a while.
Works smoothly for me for all I can tell.
> Holger Joukl wrote:
> > I guess the module functions XML() and parse() should also support the
> > base_url arg?
>
> Yes.
>
Implemented on trunk, revision 56201.
I stole the unittests from test_etree and noticed that I also had to
special case
'base' in objectify's __setattr__ magic.
>
>
> > Also, I suppose enableRecursiveStr() could be removed?
>
> I never really liked it, but why would you want to remove it?
>
I put it the wrong way: There's already enable_recursive_str() which
should be
used instead. I for one actually *need* it, so I do like it :)
But some other of the old CamelCase method/function names went away, so I
figured
this can also go.
>
> > Btw I realized that etree.tounicode() is bound to be deprecated in
> favor
> > of tostring(..., encoding=unicode).
>
> Yes. Having a second function for a more limited functional scope is just
> superfluous.
>
> BTW, does that affect objectify in any way or is it just curiosity (or
> users interest) on your side?
>
>
No, just curiosity. I currently use tounicode() for what I outlined
(fallback
to python encoding capabilities) but can just as well switch to the new
conventions.
> > I suppose this is owed to ElementTree API compat which doesn't have
> > tounicode() - or is this a py3k issue?
>
> Actually, the "encoding=unicode" bit has a Py3k issue. In Py3, you have
> to
> say "encoding=str" instead...
>
>
How do you specify which actual encoding, e.g 'ISO-8859-15', here?
>
> > IMHO unicode is not an encoding and from my experience it confuses
> > people starting out with unicode to think of unicode as an encoding.
>
> If you start with unicode, I think this is your smallest problem.
>
> You are right that it's not an encoding and I admit that this might look
> a
> little hackish if you think about it. However, a unicode string is a
> well-defined way of representing the data, and it replaces the byte
> encoding that you'd normally get from the tostring() function. So it fits
> into the existing API quite well.
>
lxml is just a fine design. So even smallest deviations in the realms of
hackishness
provoke protest storms ;). Just joking, and maybe I'm being anal about it
but it still
feels a little uncomfortable to hand in s.th. that it isn't an encoding to
a parameter
that is named 'encoding', if only from an educational perspective.
Not that I can't live with it, especially since I can't think of an good
alternative...
Yet another parameter to tostring() feels awkward, and renaming the
parameter
conflicts with ElementTree compatibility.
Holger
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080701/ee2b7cf1/attachment.htm
More information about the lxml-dev
mailing list