[lxml-dev] 2.1beta questions: objectify.XML, objectify.parse base_url arg, deprecate enableRecursiveStr, etree.tounicode()

jholg at gmx.de jholg at gmx.de
Tue Jul 1 16:21:29 CEST 2008


Hi Stefan,

  
> looks like you started cleaning up. :)
> 

> 
> 

Quite right. I started having a bad conscience for never really looking at 
2.1 for quite a while.

Works smoothly for me for all I can tell. 

 
> Holger Joukl wrote:
> > I guess the module functions XML() and parse() should also support the
> > base_url arg?
> 
> Yes.
> 

 Implemented on trunk, revision  56201.

I stole the unittests from test_etree and noticed that I also had to 
special case

'base' in objectify's __setattr__ magic.

 
> 
> 
> > Also, I suppose enableRecursiveStr() could be removed?
> 
> I never really liked it, but why would you want to remove it?
> 

 I put it the wrong way: There's already enable_recursive_str() which 
should be 

used instead. I for one actually *need* it, so I do like it :) 

But some other of the old CamelCase method/function names went away, so I 
figured 

this can also go. 



> 
> > Btw I realized that etree.tounicode() is bound to be deprecated in 
> favor
> > of tostring(..., encoding=unicode).
> 
> Yes. Having a second function for a more limited functional scope is just
> superfluous.
> 
> BTW, does that affect objectify in any way or is it just curiosity (or
> users interest) on your side?
> 
> 

 No, just curiosity. I currently use tounicode() for what I outlined 
(fallback

to python encoding capabilities) but can just as well switch to the new 
conventions. 

 
> > I suppose this is owed to ElementTree API compat which doesn't have
> > tounicode() - or is this a py3k issue?
> 
> Actually, the "encoding=unicode" bit has a Py3k issue. In Py3, you have 
> to
> say "encoding=str" instead...
> 

>  

How do you specify which actual encoding, e.g 'ISO-8859-15', here?

 
> 
> > IMHO unicode is not an encoding and from my experience it confuses
> > people starting out with unicode to think of unicode as an encoding.
> 
> If you start with unicode, I think this is your smallest problem.
> 
> You are right that it's not an encoding and I admit that this might look 
> a
> little hackish if you think about it. However, a unicode string is a
> well-defined way of representing the data, and it replaces the byte
> encoding that you'd normally get from the tostring() function. So it fits
> into the existing API quite well.
> 

 lxml is just a fine design. So even smallest deviations in the realms of 
hackishness

provoke protest storms ;). Just joking, and maybe I'm being anal about it 
but it still

feels a little uncomfortable to hand in s.th. that it isn't an encoding to 
a parameter

that is named 'encoding', if only from an educational perspective.

 Not that I can't live with it, especially since I can't think of an good 
alternative...

Yet another parameter to tostring() feels awkward, and renaming the 
parameter

conflicts with ElementTree compatibility.

 Holger 


-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20080701/ee2b7cf1/attachment.htm 


More information about the lxml-dev mailing list