From breuerss at uni-koeln.de Tue Feb 1 22:06:49 2011 From: breuerss at uni-koeln.de (Sebastian Breuers) Date: Tue, 01 Feb 2011 22:06:49 +0100 Subject: [lxml-dev] etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema Message-ID: <4D4875E9.1040901@uni-koeln.de> Dear lxml developers, I encounter the following issue. As a member of the MoSGrid consortium, a project that is aimed to facilitate molecular simulations in the D-Grid environment, I want to use the CML (Chemical Markup Language) to describe molecular simulation jobs. I wrote a small validator that uses the lxml.etree.XMLSchema object to read the XSD describing the CML3 (located at http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the schema with the lxml.etree.XMLSchemaParseError: local complex type: The content model is not determinist., line 5962 As I wrote to the developer of the CML he told me that his schema is read properly in JAVA and C# with the saxon library. I've got an idea why the XMLSchema object is throwing that exception but now I am not quite sure if it is an issue with the standard (CML) or with the XMLSchema. Your help would be most appreciated. Kind regards, Sebastian -- _____________________________________________________________________________ Sebastian Breuers Tel: +49-221-470-4108 EMail: breuerss at uni-koeln.de Universit?t zu K?ln University of Cologne Department f?r Chemie Department of Chemistry Organische Chemie Organic Chemistry Greinstra?e 6 Greinstra?e 6 Raum 325 Room 325 D-50939 K?ln D-50939 Cologne, Federal Rep. of Germany _____________________________________________________________________________ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: validator Url: http://codespeak.net/pipermail/lxml-dev/attachments/20110201/f17d1482/attachment.diff From stefan_ml at behnel.de Wed Feb 2 07:19:06 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 02 Feb 2011 07:19:06 +0100 Subject: [lxml-dev] etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema In-Reply-To: <4D4875E9.1040901@uni-koeln.de> References: <4D4875E9.1040901@uni-koeln.de> Message-ID: <4D48F75A.208@behnel.de> Sebastian Breuers, 01.02.2011 22:06: > I encounter the following issue. As a member of the MoSGrid consortium, a > project that is aimed to facilitate molecular simulations in the D-Grid > environment, I want to use the CML (Chemical Markup Language) to describe > molecular simulation jobs. > > I wrote a small validator that uses the lxml.etree.XMLSchema object to read > the XSD describing the CML3 (located at > http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the > schema with the lxml.etree.XMLSchemaParseError: > > local complex type: The content model is not determinist., line 5962 > > As I wrote to the developer of the CML he told me that his schema is read > properly in JAVA and C# with the saxon library. I've got an idea why the > XMLSchema object is throwing that exception but now I am not quite sure if > it is an issue with the standard (CML) or with the XMLSchema. It's usually an issue with the standard of XML-Schema. ;) The problem is that the W3C specification is extremely complicated - it's even more complex than actually writing a schema, and that's telling, in case you've never done that. So the simple fact that there is one tool that can successfully parse a W3C schema document doesn't mean that every other validation tool can work with it. Specifically, it is a known fact that libxml2 (which lxml gets its schema support from) has deficiencies with some less widely used schema constructs. I suggest this: 1) test the schema with the xmllint command line tool to reproduce the problem with plain libxml2. 2) contact the CML developer again and ask him to debug the schema against libxml2/xmllint. Maybe he can find a simple way to make it work. Don't forget to mention that libxml2 is a very widely used tool for XML processing that's absolutely worth supporting. 3) look out for a CML schema in RelaxNG or Schematron, which have much more accessible specifications and are much easier to implement correctly. These languages also make it a lot easier to write and maintain a schema, and you can generate a W3C XML Schema from RelaxNG using "trang". Stefan From breuerss at uni-koeln.de Wed Feb 2 07:31:24 2011 From: breuerss at uni-koeln.de (Sebastian Breuers) Date: Wed, 02 Feb 2011 07:31:24 +0100 Subject: [lxml-dev] etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema In-Reply-To: <4D48F75A.208@behnel.de> References: <4D4875E9.1040901@uni-koeln.de> <4D48F75A.208@behnel.de> Message-ID: <4D48FA3C.10907@uni-koeln.de> Am 02.02.2011 07:19, schrieb Stefan Behnel: > Sebastian Breuers, 01.02.2011 22:06: >> I encounter the following issue. As a member of the MoSGrid >> consortium, a >> project that is aimed to facilitate molecular simulations in the D-Grid >> environment, I want to use the CML (Chemical Markup Language) to >> describe >> molecular simulation jobs. >> >> I wrote a small validator that uses the lxml.etree.XMLSchema object >> to read >> the XSD describing the CML3 (located at >> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the >> schema with the lxml.etree.XMLSchemaParseError: >> >> local complex type: The content model is not determinist., line 5962 >> >> As I wrote to the developer of the CML he told me that his schema is >> read >> properly in JAVA and C# with the saxon library. I've got an idea why the >> XMLSchema object is throwing that exception but now I am not quite >> sure if >> it is an issue with the standard (CML) or with the XMLSchema. > > It's usually an issue with the standard of XML-Schema. ;) The problem > is that the W3C specification is extremely complicated - it's even > more complex than actually writing a schema, and that's telling, in > case you've never done that. So the simple fact that there is one tool > that can successfully parse a W3C schema document doesn't mean that > every other validation tool can work with it. Specifically, it is a > known fact that libxml2 (which lxml gets its schema support from) has > deficiencies with some less widely used schema constructs. > > I suggest this: > > 1) test the schema with the xmllint command line tool to reproduce the > problem with plain libxml2. > > 2) contact the CML developer again and ask him to debug the schema > against libxml2/xmllint. Maybe he can find a simple way to make it > work. Don't forget to mention that libxml2 is a very widely used tool > for XML processing that's absolutely worth supporting. > > 3) look out for a CML schema in RelaxNG or Schematron, which have much > more accessible specifications and are much easier to implement > correctly. These languages also make it a lot easier to write and > maintain a schema, and you can generate a W3C XML Schema from RelaxNG > using "trang". > > Stefan Thanks for your quick and valuable answer. I will follow your advice and contact the developer again as xmllint gave me even more conveniently every error the schema produces. Sebastian -- _____________________________________________________________________________ Sebastian Breuers Tel: +49-221-470-4108 EMail: breuerss at uni-koeln.de Universit?t zu K?ln University of Cologne Department f?r Chemie Department of Chemistry Organische Chemie Organic Chemistry Greinstra?e 6 Greinstra?e 6 Raum 325 Room 325 D-50939 K?ln D-50939 Cologne, Federal Rep. of Germany _____________________________________________________________________________ From jholg at gmx.de Wed Feb 2 14:58:00 2011 From: jholg at gmx.de (jholg at gmx.de) Date: Wed, 02 Feb 2011 14:58:00 +0100 Subject: [lxml-dev] etree.XMLSchema throws etree.XMLSchemaParseError on reading CML schema In-Reply-To: <4D48FA3C.10907@uni-koeln.de> References: <4D4875E9.1040901@uni-koeln.de> <4D48F75A.208@behnel.de> <4D48FA3C.10907@uni-koeln.de> Message-ID: <20110202135800.290540@gmx.net> Hi, > >> the XSD describing the CML3 (located at > >> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the > >> schema with the lxml.etree.XMLSchemaParseError: > >> > >> local complex type: The content model is not determinist., line 5962 > >> > >> As I wrote to the developer of the CML he told me that his schema is > >> read > >> properly in JAVA and C# with the saxon library. I've got an idea why > the > >> XMLSchema object is throwing that exception but now I am not quite > >> sure if > >> it is an issue with the standard (CML) or with the XMLSchema. > > > > It's usually an issue with the standard of XML-Schema. ;) The problem > > is that the W3C specification is extremely complicated - it's even > > more complex than actually writing a schema, and that's telling, in > > case you've never done that. So the simple fact that there is one tool > > that can successfully parse a W3C schema document doesn't mean that > > every other validation tool can work with it. Specifically, it is a > > known fact that libxml2 (which lxml gets its schema support from) has > > deficiencies with some less widely used schema constructs. Out of curiosity: libxml2 complains about such a construct (Xerces 2.9.1 and Saxon EE 9.2.0.6 do not choke) Interestingly, libxml2 does only complain if the Schema uses a targetNamespace but not if it doesn't. So I think it boils down to how ##other and ##local are treated. It looks like libxml2 interprets them like this targetNS="/my/target/NS": ##other --> ns is {not "/my/target/NS"} ##local --> ns is {no namespace} no targetNS: ##other --> ns is {not no namespace} ##local --> ns is {no namespace} and concludes that ##local and ##foo are indistinguishable for a namespace-less element in the first case, i.e. the "content model is not deterministic". However, in that case I'd expect a schema: would successfully validate such a document: hello! But it does not: test_targetNS.xml:3: element foo: Schemas validity error : Element 'foo': This element is not expected. Expected is ( ##other{http://www.example.org/target/NS}* ). test_targetNS.xml fails to validate So I'd say libxml2 gets it wrong in one of the cases, and the question is if ##other means {not , not namespace-less} or {not } (but possibly namespace-less). A bit confusing, that is. Holger -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de From paulhtremblay at gmail.com Fri Feb 4 07:06:39 2011 From: paulhtremblay at gmail.com (Paul Tremblay) Date: Fri, 04 Feb 2011 01:06:39 -0500 Subject: [lxml-dev] message still not working with xslt? Message-ID: <4D4B976F.1020806@gmail.com> Hi developers I believe the that when transforming xslt, lyxml still does not report messages, unless that message terminates? match root match root lyxml code xslt_doc = etree.parse(xslt_file) transform = etree.XSLT(xslt_doc) indoc = etree.parse(xml_file) outdoc = transform(indoc, **param_dict) sys.stdout.write(str(outdoc)) Thanks Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110204/f5cf36c9/attachment.htm From stefan_ml at behnel.de Fri Feb 4 07:20:50 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 04 Feb 2011 07:20:50 +0100 Subject: [lxml-dev] message still not working with xslt? In-Reply-To: <4D4B976F.1020806@gmail.com> References: <4D4B976F.1020806@gmail.com> Message-ID: <4D4B9AC2.50803@behnel.de> Paul Tremblay, 04.02.2011 07:06: > I believe the that when transforming xslt, lyxml still does not report > messages, unless that message terminates? I'm sure it does, you just have to know where to look. ;) > > > match root > > > > > match root > > > > lyxml code > > xslt_doc = etree.parse(xslt_file) > transform = etree.XSLT(xslt_doc) > indoc = etree.parse(xml_file) > outdoc = transform(indoc, **param_dict) > sys.stdout.write(str(outdoc)) Hmm, interesting, that seems to be missing from the XSLT docs completely. You should get at the messages through the error log of the XSLT object (most lxml.etree objects have one). http://codespeak.net/lxml/parsing.html#error-log Stefan From paulhtremblay at gmail.com Fri Feb 4 08:12:49 2011 From: paulhtremblay at gmail.com (Paul Tremblay) Date: Fri, 04 Feb 2011 02:12:49 -0500 Subject: [lxml-dev] message still not working with xslt? In-Reply-To: <4D4B9AC2.50803@behnel.de> References: <4D4B976F.1020806@gmail.com> <4D4B9AC2.50803@behnel.de> Message-ID: <4D4BA6F1.60900@gmail.com> On 2/4/11 1:20 AM, Stefan Behnel wrote: > Paul Tremblay, 04.02.2011 07:06: >> I believe the that when transforming xslt, lyxml still does not report >> messages, unless that message terminates? > > I'm sure it does, you just have to know where to look. ;) > > >> >> >> match root >> >> >> >> >> match root >> >> >> >> lyxml code >> >> xslt_doc = etree.parse(xslt_file) >> transform = etree.XSLT(xslt_doc) >> indoc = etree.parse(xml_file) >> outdoc = transform(indoc, **param_dict) >> sys.stdout.write(str(outdoc)) > > Hmm, interesting, that seems to be missing from the XSLT docs completely. > > You should get at the messages through the error log of the XSLT > object (most lxml.etree objects have one). > > http://codespeak.net/lxml/parsing.html#error-log > > Yes, that works. Thanks. print len(transform.error_log) error_obj = transform.error_log[0] print error_obj.message print error_obj.line print error_obj.column According to our link, the error_log has at least 3 methods as I've illustrated above. Are there any more I'm missing? Paul From stefan_ml at behnel.de Fri Feb 4 09:05:27 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 04 Feb 2011 09:05:27 +0100 Subject: [lxml-dev] message still not working with xslt? In-Reply-To: <4D4BA6F1.60900@gmail.com> References: <4D4B976F.1020806@gmail.com> <4D4B9AC2.50803@behnel.de> <4D4BA6F1.60900@gmail.com> Message-ID: <4D4BB347.2050804@behnel.de> Paul Tremblay, 04.02.2011 08:12: > On 2/4/11 1:20 AM, Stefan Behnel wrote: >> You should get at the messages through the error log of the XSLT >> object (most lxml.etree objects have one). >> >> http://codespeak.net/lxml/parsing.html#error-log > > Yes, that works. Thanks. > > print len(transform.error_log) > error_obj = transform.error_log[0] > print error_obj.message > print error_obj.line > print error_obj.column > > According to our link, the error_log has at least 3 methods as I've > illustrated above. Are there any more I'm missing? http://codespeak.net/lxml/dev/api/lxml.etree._LogEntry-class.html http://codespeak.net/lxml/dev/api/lxml.etree._ErrorLog-class.html Sorry for the lack of API documentation. Cython doesn't currently support attaching a docstring to an automatically generated property (this has been an often requested feature but no-one has taken care of it so far). But I'll add at least a doc comment to the _LogEntry class. As always, documentation fixes are always welcome. http://codespeak.net/svn/lxml/trunk/doc/ Stefan From paulhtremblay at gmail.com Sat Feb 5 04:10:51 2011 From: paulhtremblay at gmail.com (Paul Tremblay) Date: Fri, 04 Feb 2011 22:10:51 -0500 Subject: [lxml-dev] message still not working with xslt? In-Reply-To: <4D4BB347.2050804@behnel.de> References: <4D4B976F.1020806@gmail.com> <4D4B9AC2.50803@behnel.de> <4D4BA6F1.60900@gmail.com> <4D4BB347.2050804@behnel.de> Message-ID: <4D4CBFBB.8010202@gmail.com> On 2/4/11 3:05 AM, Stefan Behnel wrote: > > http://codespeak.net/lxml/dev/api/lxml.etree._LogEntry-class.html > http://codespeak.net/lxml/dev/api/lxml.etree._ErrorLog-class.html > > Sorry for the lack of API documentation. Cython doesn't currently > support attaching a docstring to an automatically generated property > (this has been an often requested feature but no-one has taken care of > it so far). But I'll add at least a doc comment to the _LogEntry class. > > As always, documentation fixes are always welcome. > > http://codespeak.net/svn/lxml/trunk/doc/ > > Here's a documentation fix. I'm not sure about some of the properties, such as level, etc. Error log --------- Parsers have an ``error_log`` property that lists the errors of the last parser run. Each ``error_log`` is a list, and each item in the list is an object that has the following properties: * ``columns``: an integer that identifies the column where the error occurred. * ``domain``: a unicode string * ``filename``: a unicode string * ``level``: an integer * ``level_name``: an integer * ``line``: a unicode string that identifies the line where the error occurred. * ``message``: a unicode string that lists the message. * ``type``: an integer * ``type_name``: a unicode string .. sourcecode:: pycon >>> parser = etree.XMLParser() >>> print(len(parser.error_log)) 0 >>> tree = etree.XML("", parser) Traceback (most recent call last): ... lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: root line 1 and b, line 1, column 11 >>> print(len(parser.error_log)) 1 >>> error = parser.error_log[0] >>> print(error.message) Opening and ending tag mismatch: root line 1 and b >>> print(error.line) 1 >>> print(error.column) 11 The following code shows how to output messages from xsl:message when processing XSl. .. sourcecode:: pycon >>> f = StringIO(''' ... ... ... ... found root ... ... ... ... ... ... ... found para ... ... ... ... ... ''') >>> xslt_doc = etree.parse(f) >>> transform = etree.XSLT(xslt_doc) >>> f = StringIO('Text') >>> doc = etree.parse(f) >>> result_tree = transform(doc) >>> for error in transform.error_log: ... print 'message from line %s, col %s:' % (error.line, error.column) ... print error.message ... print ... print 'domain_name: %s' % error.domain_name ... print 'filename: %s' % error.filename ... print 'level: %s' % error.level ... print 'level_name: %s' % error.level_name ... print 'type: %s' % error.type ... print 'type_name: %s' % error.type_name ... print '=================================================' message from line 0, col 0: found root domain_name: XSLT filename: level: 2 level_name: ERROR type: 0 type_name: ERR_OK ================================================= message from line 0, col 0: found para domain_name: XSLT filename: level: 2 level_name: ERROR type: 0 type_name: ERR_OK ================================================= From stefan_ml at behnel.de Sat Feb 5 12:12:57 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 05 Feb 2011 12:12:57 +0100 Subject: [lxml-dev] message still not working with xslt? In-Reply-To: <4D4CBFBB.8010202@gmail.com> References: <4D4B976F.1020806@gmail.com> <4D4B9AC2.50803@behnel.de> <4D4BA6F1.60900@gmail.com> <4D4BB347.2050804@behnel.de> <4D4CBFBB.8010202@gmail.com> Message-ID: <4D4D30B9.4020000@behnel.de> Paul Tremblay, 05.02.2011 04:10: > Here's a documentation fix. I'm not sure about some of the properties, > such as level, etc. Thanks. I fixed it up somewhat and moved the second part over to the XSLT documentation. http://codespeak.net/svn/lxml/trunk/doc/ Stefan From paulhtremblay at gmail.com Sat Feb 5 18:18:01 2011 From: paulhtremblay at gmail.com (Paul Tremblay) Date: Sat, 05 Feb 2011 12:18:01 -0500 Subject: [lxml-dev] paramters doc fix? Message-ID: <4D4D8649.9080100@gmail.com> I am submitting the following as a possible fix for passing parameters with hyphens. Stylesheet parameters --------------------- It is common for parameters in an XSL stylesheet to have hyphens in the parameters, such as ``title-block``. The above methods won't work in this case. For example, if you add the parameter ``the-param`` to the above stylesheet, you get this result: .. sourcecode:: pycon >>> plain_string_value_hyphen = etree.XSLT.strparam(""" It's "Monty Python--with a hyphen" """) >>> result = transform(doc_root, a=plain_string_value, the-param = plain_string_value_hyphen) File "", line 1 SyntaxError: keyword can't be an expression In order to get around this problem, use a dictionary to pass the parameter: .. sourcecode:: pycon >>> param_d = {'a': plain_string_value, 'the-param': plain_string_value_hyphen} >>> result = transform(doc_root, **param_d) >>> print str(result) It's "Monty Python" It's "Monty Python--with a hyphen" From stefan_ml at behnel.de Sat Feb 5 18:42:42 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 05 Feb 2011 18:42:42 +0100 Subject: [lxml-dev] paramters doc fix? In-Reply-To: <4D4D8649.9080100@gmail.com> References: <4D4D8649.9080100@gmail.com> Message-ID: <4D4D8C12.1000209@behnel.de> Paul Tremblay, 05.02.2011 18:18: > I am submitting the following as a possible fix for passing parameters > with hyphens. > > Stylesheet parameters > --------------------- > > > > It is common for parameters in an XSL stylesheet to have hyphens in the > parameters, such as ``title-block``. The above methods won't work in this > case. For example, if you add the parameter ``the-param`` to the above > stylesheet, you get this result: > > .. sourcecode:: pycon > > >>> plain_string_value_hyphen = etree.XSLT.strparam(""" It's "Monty > Python--with a hyphen" """) > >>> result = transform(doc_root, a=plain_string_value, the-param = > plain_string_value_hyphen) > File "", line 1 > SyntaxError: keyword can't be an expression > > In order to get around this problem, use a dictionary to pass the parameter: > > .. sourcecode:: pycon > > >>> param_d = {'a': plain_string_value, 'the-param': > plain_string_value_hyphen} > >>> result = transform(doc_root, **param_d) > >>> print str(result) > > It's "Monty Python" It's "Monty Python--with a hyphen" > Thanks. You're right that this is worth mentioning. However, I think the following is enough. Stefan --- a/doc/xpathxslt.txt Sat Feb 05 12:07:17 2011 +0100 +++ b/doc/xpathxslt.txt Sat Feb 05 18:39:01 2011 +0100 @@ -590,6 +590,25 @@ >>> str(result) '\n It\'s "Monty Python" \n' +If you need to pass parameters that are not legal Python identifiers, +pass them inside of a dictionary: + +.. sourcecode:: pycon + + >>> transform = etree.XSLT(etree.XML('''\ + ... + ... + ... + ... + ... + ... ''')) + + >>> result = transform(doc_root, **{'non-python-identifier': '5'}) + >>> str(result) + '\n5\n' + + Errors and messages ------------------- From henke at mac.se Sun Feb 6 02:16:41 2011 From: henke at mac.se (Henrik) Date: Sun, 6 Feb 2011 02:16:41 +0100 Subject: [lxml-dev] Handling CSS conditional Message-ID: <11351786-BA31-4B6F-9C4E-01D3713E6473@mac.se> Hello list, I'm trying to iterate over link tags in the header. I'm using both cssselect and iter(). The problem is that lxml can't find inside css conditionals ie. Is there anyway to solve this? Either with cssselect or iter? Cheers, Henrik From stefan_ml at behnel.de Sun Feb 6 07:13:17 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 06 Feb 2011 07:13:17 +0100 Subject: [lxml-dev] Handling CSS conditional In-Reply-To: <11351786-BA31-4B6F-9C4E-01D3713E6473@mac.se> References: <11351786-BA31-4B6F-9C4E-01D3713E6473@mac.se> Message-ID: <4D4E3BFD.4080800@behnel.de> Henrik, 06.02.2011 02:16: > I'm trying to iterate over link tags in the header. > > I'm using both cssselect and iter(). > > The problem is that lxml can't find inside css conditionals ie. > Well, this is not a "conditional", it's a comment. > Is there anyway to solve this? Either with cssselect or iter? With iter(), you can check if the '.tag' of the Element is lxml.etree.Comment. Then, use a regexp to parse the attributes from the comment's text. Stefan From mlissner at michaeljaylissner.com Sun Feb 6 07:34:53 2011 From: mlissner at michaeljaylissner.com (Michael Lissner) Date: Sat, 05 Feb 2011 22:34:53 -0800 Subject: [lxml-dev] Failing Code Message-ID: <4D4E410D.3080805@michaeljaylissner.com> Hi, I'm using lxml to parse the contents of some court pages (basically bringing court docs to the people), and a certain page in particular is failing without throwing any errors. I'm curious if this is something I'm doing, if I've discovered a corner-case in lxml's abilities, or if it's something else altogether. The code I'm running is (in part) this: >>> from lxml import etree >>> import StringIO >>> # Read the URL using urllib2 >>> quickHTML = readURL('http://www.ca1.uscourts.gov/cgi-bin/getopn.pl?OPINION=95-1346.01A', 1) >>> # Use the HTML Parser >>> parser = etree.HTMLParser() >>> # Make the HTML into a tree >>> quickTree = etree.parse(StringIO.StringIO(quickHTML), parser) >>> # Pull out any pre elements (there's only one) >>> documentPlainText = quickTree.find('//pre') >>> # Print the pre elements to the console >>> print tostring(documentPlainText)


>>> # Woah - that should have been much bigger! Print the whole HTML to the console:
>>> print tostring(quickTree)



USCA1 Opinion




>>> # ****What the heck?****

If you look at the URL from line 4, above, you'll see that the pre element has a
TON of pretty ugly content inside it, but when I run it as above, I don't get
any of it. I've run this script to download and parse hundreds of nearly
identical pages, and it is working great, but for this page, it fails.

Anybody have any theories why? I'd love to get this scraper back up and running
this weekend, so I can download as much of the court's material before the
lawyers start using the site again Monday.

Thanks for the help and the great library,

Mike

PS - if you want to see the /real/ code, it's here:
https://bitbucket.org/mlissner/legal-current-awareness/src/ffbfcb79c659/alert/back_scrape.py#cl-176
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110205/c76a6f8f/attachment.htm 

From stefan_ml at behnel.de  Sun Feb  6 21:02:52 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 06 Feb 2011 21:02:52 +0100
Subject: [lxml-dev] lxml 2.3 final released
Message-ID: <4D4EFE6C.8090102@behnel.de>

Hi everyone,

I'm happy to announce the long awaited release of lxml 2.3 final. It is the 
first officially stable release of the 2.3 release series, which officially 
supports Python 3.1.2 and 3.2 (previous support in 2.2.x should be 
considered accidental).

http://codespeak.net/lxml/

http://pypi.python.org/pypi/lxml/2.3/

Binary builds are expected to become available in the near future.

This release was built using Cython 0.14.1. It is recommended (although not 
required) to use at least libxml2 2.7.8 with lxml, which fixes a number of 
important bugs compared to the previous 2.7.x releases.

Updating from the 2.2 series is recommended. It should be relatively easy 
and will be rewarded by a higher resistance to potential crashes. I'm 
posting the complete 2.3 changelog below for your convenience. Please note 
that the 2.2 series will only receive critical bug fixes in the future that 
have not been superseded by changes in the 2.3 series.

If you are interested in commercial support or customisations for the lxml 
package, please contact me directly.

Have fun,

Stefan


2.3 (2011-02-06)
================

Features added
--------------

* When looking for children, ``lxml.objectify`` takes '{}tag' as
   meaning an empty namespace, as opposed to the parent namespace.

Bugs fixed
----------

* When finished reading from a file-like object, the parser
   immediately calls its ``.close()`` method.

* When finished parsing, ``iterparse()`` immediately closes the input
   file.

* Work-around for libxml2 bug that can leave the HTML parser in a
   non-functional state after parsing a severly broken document (fixed
   in libxml2 2.7.8).

* ``marque`` tag in HTML cleanup code is correctly named ``marquee``.

Other changes
--------------

* Some public functions in the Cython-level C-API have more explicit
   return types.


2.3beta1 (2010-09-06)
=====================

Features added
--------------

Bugs fixed
----------

* Crash in newer libxml2 versions when moving elements between
   documents that had attributes on replaced XInclude nodes.

* ``XMLID()`` function was missing the optional ``parser`` and
   ``base_url`` parameters.

* Searching for wildcard tags in ``iterparse()`` was broken in Py3.

* ``lxml.html.open_in_browser()`` didn't work in Python 3 due to the
   use of os.tempnam.  It now takes an optional 'encoding' parameter.

Other changes
--------------


2.3alpha2 (2010-07-24)
======================

Features added
--------------

Bugs fixed
----------

* Crash in XSLT when generating text-only result documents with a
   stylesheet created in a different thread.

Other changes
--------------

* ``repr()`` of Element objects shows the hex ID with leading 0x
   (following ElementTree 1.3).


2.3alpha1 (2010-06-19)
======================

Features added
--------------

* Keyword argument ``namespaces`` in ``lxml.cssselect.CSSSelector()``
   to pass a prefix-to-namespace mapping for the selector.

* New function ``lxml.etree.register_namespace(prefix, uri)`` that
   globally registers a namespace prefix for a namespace that newly
   created Elements in that namespace will use automatically.  Follows
   ElementTree 1.3.

* Support 'unicode' string name as encoding parameter in
   ``tostring()``, following ElementTree 1.3.

* Support 'c14n' serialisation method in ``ElementTree.write()`` and
   ``tostring()``, following ElementTree 1.3.

* The ElementPath expression syntax (``el.find*()``) was extended to
   match the upcoming ElementTree 1.3 that will ship in the standard
   library of Python 3.2/2.7.  This includes extended support for
   predicates as well as namespace prefixes (as known from XPath).

* During regular XPath evaluation, various ESXLT functions are
   available within their namespace when using libxslt 1.1.26 or later.

* Support passing a readily configured logger instance into
   ``PyErrorLog``, instead of a logger name.

* On serialisation, the new ``doctype`` parameter can be used to
   override the DOCTYPE (internal subset) of the document.

* New parameter ``output_parent`` to ``XSLTExtension.apply_templates()``
   to append the resulting content directly to an output element.

* ``XSLTExtension.process_children()`` to process the content of the
   XSLT extension element itself.

* ISO-Schematron support based on the de-facto Schematron reference
   'skeleton implementation'.

* XSLT objects now take XPath object as ``__call__`` stylesheet
   parameters.

* Enable path caching in ElementPath (``el.find*()``) to avoid parsing
   overhead.

* Setting the value of a namespaced attribute always uses a prefixed
   namespace instead of the default namespace even if both declare the
   same namespace URI.  This avoids serialisation problems when an
   attribute from a default namespace is set on an element from a
   different namespace.

* XSLT extension elements: support for XSLT context nodes other than
   elements: document root, comments, processing instructions.

* Support for strings (in addition to Elements) in node-sets returned
   by extension functions.

* Forms that lack an ``action`` attribute default to the base URL of
   the document on submit.

* XPath attribute result strings have an ``attrname`` property.

* Namespace URIs get validated against RFC 3986 at the API level
   (required by the XML namespace specification).

* Target parsers show their target object in the ``.target`` property
   (compatible with ElementTree).

Bugs fixed
----------

* API is hardened against invalid proxy instances to prevent crashes
   due to incorrectly instantiated Element instances.

* Prevent crash when instantiating ``CommentBase`` and friends.

* Export ElementTree compatible XML parser class as
   ``XMLTreeBuilder``, as it is called in ET 1.2.

* ObjectifiedDataElements in lxml.objectify were not hashable.  They
   now use the hash value of the underlying Python value (string,
   number, etc.) to which they compare equal.

* Parsing broken fragments in lxml.html could fail if the fragment
   contained an orphaned closing '' tag.

* Using XSLT extension elements around the root of the output document
   crashed.

* ``lxml.cssselect`` did not distinguish between ``x[attr="val"]`` and
   ``x [attr="val"]`` (with a space).  The latter now matches the
   attribute independent of the element.

* Rewriting multiple links inside of HTML text content could end up
   replacing unrelated content as replacements could impact the
   reported position of subsequent matches.  Modifications are now
   simplified by letting the ``iterlinks()`` generator in ``lxml.html``
   return links in reversed order if they appear inside the same text
   node.  Thus, replacements and link-internal modifications no longer
   change the position of links reported afterwards.

* The ``.value`` attribute of ``textarea`` elements in lxml.html did
   not represent the complete raw value (including child tags etc.). It
   now serialises the complete content on read and replaces the
   complete content by a string on write.

* Target parser didn't call ``.close()`` on the target object if
   parsing failed.  Now it is guaranteed that ``.close()`` will be
   called after parsing, regardless of the outcome.

Other changes
-------------

* Official support for Python 3.1.2 and later.

* Static MS Windows builds can now download their dependencies
   themselves.

* ``Element.attrib`` no longer uses a cyclic reference back to its
   Element object.  It therefore no longer requires the garbage
   collector to clean up.

* Static builds include libiconv, in addition to libxml2 and libxslt.

From stefan_ml at behnel.de  Sun Feb  6 21:13:48 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 06 Feb 2011 21:13:48 +0100
Subject: [lxml-dev] Failing Code
In-Reply-To: <4D4E410D.3080805@michaeljaylissner.com>
References: <4D4E410D.3080805@michaeljaylissner.com>
Message-ID: <4D4F00FC.4030000@behnel.de>

Michael Lissner, 06.02.2011 07:34:
> Hi, I'm using lxml to parse the contents of some court pages (basically bringing
> court docs to the people), and a certain page in particular is failing without
> throwing any errors. I'm curious if this is something I'm doing, if I've
> discovered a corner-case in lxml's abilities, or if it's something else altogether.
>
> The code I'm running is (in part) this:
>
> >>> from lxml import etree
> >>> import StringIO
>
> >>> # Read the URL using urllib2
> >>> quickHTML = readURL('http://www.ca1.uscourts.gov/cgi-bin/getopn.pl?OPINION=95-1346.01A', 1)
>
> >>> # Use the HTML Parser
> >>> parser = etree.HTMLParser()
>
> >>> # Make the HTML into a tree
> >>> quickTree = etree.parse(StringIO.StringIO(quickHTML), parser)
>
> >>> # Pull out any pre elements (there's only one)
> >>> documentPlainText = quickTree.find('//pre')
>
> >>> # Print the pre elements to the console
> >>> print tostring(documentPlainText)
>
> 

>
> >>> # Woah - that should have been much bigger! Print the whole HTML to the console:
> >>> print tostring(quickTree)
>
> 
>
> USCA1 Opinion
>
> 

> > > > > >



>
> >>> # ****What the heck?****
>
> If you look at the URL from line 4, above, you'll see that the pre element has a
> TON of pretty ugly content inside it, but when I run it as above, I don't get
> any of it. I've run this script to download and parse hundreds of nearly
> identical pages, and it is working great, but for this page, it fails.
>
> Anybody have any theories why? I'd love to get this scraper back up and running
> this weekend, so I can download as much of the court's material before the
> lawyers start using the site again Monday.
>
> Thanks for the help and the great library,
>
> Mike
>
> PS - if you want to see the /real/ code, it's here:
> https://bitbucket.org/mlissner/legal-current-awareness/src/ffbfcb79c659/alert/back_scrape.py#cl-176

I can reproduce this. It seems to be a character encoding problem. 
libxml2's xmllint gives me the same result. When I set the input encoding 
to "ISO8859-1" explicitly for the parser using

     parser = etree.HTMLParser(encoding='ISO8859-1')

then I get the complete tree. So I guess the parser stops short on the 
undecodeable characters in the page.

Stefan

From mlissner at michaeljaylissner.com  Mon Feb  7 03:22:57 2011
From: mlissner at michaeljaylissner.com (Michael Lissner)
Date: Sun, 06 Feb 2011 18:22:57 -0800
Subject: [lxml-dev] Failing Code
In-Reply-To: <4D4F00FC.4030000@behnel.de>
References: <4D4E410D.3080805@michaeljaylissner.com>
	<4D4F00FC.4030000@behnel.de>
Message-ID: <4D4F5781.9020903@michaeljaylissner.com>

Ah. I was afraid it was something like that, and eventually just wrapped all the
text coming from the site within unicode(text, errors='ignore). I must say, I
hate character encodings - I can't wait for the utf overloads to take over once
and for all.

In looking into my problem, I discovered the chardet library, which can detect
character encodings automatically. I'm probably going to integrate it into my
code, but I wonder if lxml has any interest in having this as part of its system.

It looks like BeautifulSoup uses it already, FWIW.

More info here:
http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file

Mike


Stefan Behnel wrote on 02/06/2011 12:13 PM:
> Michael Lissner, 06.02.2011 07:34:
>> Hi, I'm using lxml to parse the contents of some court pages (basically bringing
>> court docs to the people), and a certain page in particular is failing without
>> throwing any errors. I'm curious if this is something I'm doing, if I've
>> discovered a corner-case in lxml's abilities, or if it's something else
>> altogether.
>>
>> The code I'm running is (in part) this:
>>
>> >>> from lxml import etree
>> >>> import StringIO
>>
>> >>> # Read the URL using urllib2
>> >>> quickHTML =
>> readURL('http://www.ca1.uscourts.gov/cgi-bin/getopn.pl?OPINION=95-1346.01A', 1)
>>
>> >>> # Use the HTML Parser
>> >>> parser = etree.HTMLParser()
>>
>> >>> # Make the HTML into a tree
>> >>> quickTree = etree.parse(StringIO.StringIO(quickHTML), parser)
>>
>> >>> # Pull out any pre elements (there's only one)
>> >>> documentPlainText = quickTree.find('//pre')
>>
>> >>> # Print the pre elements to the console
>> >>> print tostring(documentPlainText)
>>
>> 

>>
>> >>> # Woah - that should have been much bigger! Print the whole HTML to the
>> console:
>> >>> print tostring(quickTree)
>>
>> > "http://www.w3.org/TR/REC-html40/loose.dtd">
>>
>> USCA1 Opinion
>>
>> 

>> >> > src="/images/buttons/pacer/help.jpg" border="0"> >> >> > src="/images/buttons/pacer/wp_format.jpg" border="0"> >> >>



>>
>> >>> # ****What the heck?****
>>
>> If you look at the URL from line 4, above, you'll see that the pre element has a
>> TON of pretty ugly content inside it, but when I run it as above, I don't get
>> any of it. I've run this script to download and parse hundreds of nearly
>> identical pages, and it is working great, but for this page, it fails.
>>
>> Anybody have any theories why? I'd love to get this scraper back up and running
>> this weekend, so I can download as much of the court's material before the
>> lawyers start using the site again Monday.
>>
>> Thanks for the help and the great library,
>>
>> Mike
>>
>> PS - if you want to see the /real/ code, it's here:
>> https://bitbucket.org/mlissner/legal-current-awareness/src/ffbfcb79c659/alert/back_scrape.py#cl-176
>>
>
> I can reproduce this. It seems to be a character encoding problem. libxml2's
> xmllint gives me the same result. When I set the input encoding to "ISO8859-1"
> explicitly for the parser using
>
>     parser = etree.HTMLParser(encoding='ISO8859-1')
>
> then I get the complete tree. So I guess the parser stops short on the
> undecodeable characters in the page.
>
> Stefan

From e98cuenc at gmail.com  Mon Feb  7 23:45:14 2011
From: e98cuenc at gmail.com (Joaquin Cuenca Abela)
Date: Mon, 7 Feb 2011 23:45:14 +0100
Subject: [lxml-dev] Failing Code
In-Reply-To: <4D4F5781.9020903@michaeljaylissner.com>
References: <4D4E410D.3080805@michaeljaylissner.com>
	<4D4F00FC.4030000@behnel.de>
	<4D4F5781.9020903@michaeljaylissner.com>
Message-ID: 

FWIW, I found chardet useless for my purposes, because it's biased against
iso-8859-1. Quite often spanish pages in iso-8859-1 are classified as
iso-8859-2. In my case it was much more accurate to assume iso-8859-1 (or
windows-1252) as I'm dealing with pages from western europe.
El 07/02/2011 03:23, "Michael Lissner" 
escribi?:
> Ah. I was afraid it was something like that, and eventually just wrapped
all the
> text coming from the site within unicode(text, errors='ignore). I must
say, I
> hate character encodings - I can't wait for the utf overloads to take over
once
> and for all.
>
> In looking into my problem, I discovered the chardet library, which can
detect
> character encodings automatically. I'm probably going to integrate it into
my
> code, but I wonder if lxml has any interest in having this as part of its
system.
>
> It looks like BeautifulSoup uses it already, FWIW.
>
> More info here:
>
http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file
>
> Mike
>
>
> Stefan Behnel wrote on 02/06/2011 12:13 PM:
>> Michael Lissner, 06.02.2011 07:34:
>>> Hi, I'm using lxml to parse the contents of some court pages (basically
bringing
>>> court docs to the people), and a certain page in particular is failing
without
>>> throwing any errors. I'm curious if this is something I'm doing, if I've
>>> discovered a corner-case in lxml's abilities, or if it's something else
>>> altogether.
>>>
>>> The code I'm running is (in part) this:
>>>
>>> >>> from lxml import etree
>>> >>> import StringIO
>>>
>>> >>> # Read the URL using urllib2
>>> >>> quickHTML =
>>> readURL('
http://www.ca1.uscourts.gov/cgi-bin/getopn.pl?OPINION=95-1346.01A', 1)
>>>
>>> >>> # Use the HTML Parser
>>> >>> parser = etree.HTMLParser()
>>>
>>> >>> # Make the HTML into a tree
>>> >>> quickTree = etree.parse(StringIO.StringIO(quickHTML), parser)
>>>
>>> >>> # Pull out any pre elements (there's only one)
>>> >>> documentPlainText = quickTree.find('//pre')
>>>
>>> >>> # Print the pre elements to the console
>>> >>> print tostring(documentPlainText)
>>>
>>> 

>>>
>>> >>> # Woah - that should have been much bigger! Print the whole HTML to
the
>>> console:
>>> >>> print tostring(quickTree)
>>>
>>> >> "http://www.w3.org/TR/REC-html40/loose.dtd">
>>>
>>> USCA1 Opinion
>>>
>>> 

>>> >>> >> src="/images/buttons/pacer/help.jpg" border="0"> >>> >>> >> src="/images/buttons/pacer/wp_format.jpg" border="0"> >>> >>>



>>>
>>> >>> # ****What the heck?****
>>>
>>> If you look at the URL from line 4, above, you'll see that the pre
element has a
>>> TON of pretty ugly content inside it, but when I run it as above, I
don't get
>>> any of it. I've run this script to download and parse hundreds of nearly
>>> identical pages, and it is working great, but for this page, it fails.
>>>
>>> Anybody have any theories why? I'd love to get this scraper back up and
running
>>> this weekend, so I can download as much of the court's material before
the
>>> lawyers start using the site again Monday.
>>>
>>> Thanks for the help and the great library,
>>>
>>> Mike
>>>
>>> PS - if you want to see the /real/ code, it's here:
>>>
https://bitbucket.org/mlissner/legal-current-awareness/src/ffbfcb79c659/alert/back_scrape.py#cl-176
>>>
>>
>> I can reproduce this. It seems to be a character encoding problem.
libxml2's
>> xmllint gives me the same result. When I set the input encoding to
"ISO8859-1"
>> explicitly for the parser using
>>
>> parser = etree.HTMLParser(encoding='ISO8859-1')
>>
>> then I get the complete tree. So I guess the parser stops short on the
>> undecodeable characters in the page.
>>
>> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110207/25bffb08/attachment.htm 

From ovnicraft at gmail.com  Tue Feb  8 00:17:30 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Mon, 7 Feb 2011 18:17:30 -0500
Subject: [lxml-dev] XSLT transformation problem
Message-ID: 

Hello i am working in a simple 'xslt transformation', in my country
a government site[1] give us the the xsd (for schema) and XML files and they
says us
Make a XSLT tranformation i open the file and found is not and xsl sheet (i
attach it).

I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
included)

How i can use the XML file attached if with XSLT transformation ?

Regards,

[1]
https://declaraciones.sri.gov.ec/rec-declaraciones-internet/general/especificacionesTec.jsp
-- 
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110207/fdd6dc8d/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CAL0402.xml
Type: text/xml
Size: 25850 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110207/fdd6dc8d/attachment-0001.bin 

From stefan_ml at behnel.de  Tue Feb  8 07:01:22 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 08 Feb 2011 07:01:22 +0100
Subject: [lxml-dev] XSLT transformation problem
In-Reply-To: 
References: 
Message-ID: <4D50DC32.3000503@behnel.de>

Ovnicraft, 08.02.2011 00:17:
> Hello i am working in a simple 'xslt transformation', in my country
> a government site[1] give us the the xsd (for schema) and XML files and they
> says us
> Make a XSLT tranformation i open the file and found is not and xsl sheet (i
> attach it).
>
> I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
> included)
>
> How i can use the XML file attached if with XSLT transformation ?

I'm not sure what you're asking (I'm having difficulties parsing your 
English). Did they tell you to use the XML file as a stylesheet, or did 
they ask you to write a stylesheet for them?

Note that the file you are trying to parse as XSLT file has a different 
name than the file you attached, and that the code that your link shows is 
not runnable (it uses "XSL" instead of "XSLT", for example).

Stefan

From ovnicraft at gmail.com  Tue Feb  8 17:45:17 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Tue, 8 Feb 2011 11:45:17 -0500
Subject: [lxml-dev] XSLT transformation problem
In-Reply-To: <4D50DC32.3000503@behnel.de>
References: 
	<4D50DC32.3000503@behnel.de>
Message-ID: 

On Tue, Feb 8, 2011 at 1:01 AM, Stefan Behnel  wrote:

> Ovnicraft, 08.02.2011 00:17:
> > Hello i am working in a simple 'xslt transformation', in my country
> > a government site[1] give us the the xsd (for schema) and XML files and
> they
> > says us
> > Make a XSLT tranformation i open the file and found is not and xsl sheet
> (i
> > attach it).
> >
> > I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
> > included)
> >
> > How i can use the XML file attached if with XSLT transformation ?
>
> I'm not sure what you're asking (I'm having difficulties parsing your
> English). Did they tell you to use the XML file as a stylesheet, or did
> they ask you to write a stylesheet for them?
>
> Note that the file you are trying to parse as XSLT file has a different
> name than the file you attached, and that the code that your link shows is
> not runnable (it uses "XSL" instead of "XSLT", for example).
>

Yes it was fixed, i use now XSLT, but the file attached they give me telling
me use it for XSLT transformation and it does not a xsl template.

In another hand the attached file has special structure build for them and
has xpath expression: http://paste.pocoo.org/show/334424/
the file generated by me is this http://paste.pocoo.org/show/334427/ for
this file i need make the transformation with the attached.

I need clues for use my attached file, maybe use it to create my xsl
template or parse it and use it with xpath method.

What do you think?

Regards,

PS: hope be more clear now. :)


>
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>



-- 
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110208/c47778bf/attachment.htm 

From stefan_ml at behnel.de  Wed Feb  9 09:08:46 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 09 Feb 2011 09:08:46 +0100
Subject: [lxml-dev] XSLT transformation problem
In-Reply-To: 
References: 
	<4D50DC32.3000503@behnel.de>
	
Message-ID: <4D524B8E.7020304@behnel.de>

Ovnicraft, 08.02.2011 17:45:
> On Tue, Feb 8, 2011 at 1:01 AM, Stefan Behnel wrote:
>
>> Ovnicraft, 08.02.2011 00:17:
>>> Hello i am working in a simple 'xslt transformation', in my country
>>> a government site[1] give us the the xsd (for schema) and XML files and
>> they
>>> says us
>>> Make a XSLT tranformation i open the file and found is not and xsl sheet
>> (i
>>> attach it).
>>>
>>> I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
>>> included)
>>>
>>> How i can use the XML file attached if with XSLT transformation ?
>>
>> I'm not sure what you're asking (I'm having difficulties parsing your
>> English). Did they tell you to use the XML file as a stylesheet, or did
>> they ask you to write a stylesheet for them?
>>
>> Note that the file you are trying to parse as XSLT file has a different
>> name than the file you attached, and that the code that your link shows is
>> not runnable (it uses "XSL" instead of "XSLT", for example).
>>
>
> Yes it was fixed, i use now XSLT, but the file attached they give me telling
> me use it for XSLT transformation and it does not a xsl template.
>
> In another hand the attached file has special structure build for them and
> has xpath expression: http://paste.pocoo.org/show/334424/
> the file generated by me is this http://paste.pocoo.org/show/334427/ for
> this file i need make the transformation with the attached.
>
> I need clues for use my attached file, maybe use it to create my xsl
> template or parse it and use it with xpath method.

AFAICT, the web site only says

"""
These formulas must be applied to the XML file by an XSLT transformation
"""

If that translation is correct, it doesn't say where the stylesheet for the 
XSLT is supposed to come from, only that you should apply some kind of XSLT 
to the XML documents. Maybe the stylesheet to do that is a separate 
download somewhere? Why don't you ask the site owners where to get the 
stylesheet from?

That being said, it shouldn't be too hard to write an XSLT script (or 
Python code) that extracts the XPath expressions from the attributes in the 
XML document you posted, and runs them against a suitable document.

Stefan

From luciano at ramalho.org  Thu Feb 10 11:12:40 2011
From: luciano at ramalho.org (Luciano Ramalho)
Date: Thu, 10 Feb 2011 08:12:40 -0200
Subject: [lxml-dev] Codespeak shutting down: migration plans?
Message-ID: 

Dear colleagues,

As a long-time user of lxml I was shocked to read a message today from
Holger Krekel about the end of Codespeak (see full text at the end of
this).

First we need to thank Holger for all these years of free service.

Then we need to quickly plan and execute a migration to some other repository.

I am willing to help, but I am no core developer of lxml, and in fact
this is my first message ever to this mailing list, so I think we need
someone better known to this community to lead the migration effort.

lxml as a key piece of the Python ecosystem and for the benefit of its
present and future users and the wider Python community we need to
make sure it continues to be available, and to make sure as many
references (pypi!) as possible point to the new canonical repository.

Finally, a big thank you to Martijn, Philikon, Stefan and all the
others who have built this great piece of software.

Cheers,

-- 
Luciano Ramalho
programador repentista || stand-up programmer
Twitter: @luciano

#########################################
hi codespeak.net users, (sorry if you get mail twice, i wanted to make sure ...)

after 8 years of operation codespeak.net services are bound to
terminate, starting

   END OF FEBRUARY 2011

Background: one of the original codespeak purposes was to offer subversion (then
in version 0.17) for the PyPy and other projects but today this is not too
interesting given the pletora of VCS hosting solutions.  Also, there aren't too
many admins besides me, the hosting is costing money, PyPy's repository has
moved to Bitbucket and i am re-shuffling my priorities preparing for my soon to
emerge father-hood.  After February 2011 i probably won't be able to help
much with any transition issues or questions. The host will keep on running for
a while but i give no guaruantees.

Some remarks regarding termination wrt to the FEB 2011 deadline:

* the subversion repo will turn read-only (and will eventually be switched off).

* Shell accounts will be restricted to those people who need it *and* mail
 me about it.  Some time later they will be gone as well.

* Mailing lists will be terminated as well unless i get a mail asking
 me to postpone termination for a specific time. You can go to your respective
 mailman admin page and extract a list of members.  If you mail me i can also
 provide a list of members.

* Any remaining web docs/pages will probably continue to exist for a while
 but i also prefer them to be moved away by end Feb 2011.

Note that the codespeak svn repository contains a lot of projects.
For migration
you have two options: do a flat import just of your project checkout
directory into
a new version system.  This is super-simple, obviously.  If you want to preserve
history for your project please mail me and i either provide you a full dump or
a filtered dump only containing your project.

So long and I hope you all had a good time and enjoyed the services and also
have a good transition now.

see you in other places,
holger krekel
_______________________________________________
codespeak-ann at codespeak.net
http://codespeak.net/mailman/listinfo/codespeak-ann

From stefan_ml at behnel.de  Thu Feb 10 13:09:25 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 10 Feb 2011 13:09:25 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: 
References: 
Message-ID: <4D53D575.4040200@behnel.de>

Luciano Ramalho, 10.02.2011 11:12:
> As a long-time user of lxml I was shocked to read a message today from
> Holger Krekel about the end of Codespeak (see full text at the end of
> this).

So was I. And I'm not aware of any April-1st-like day anywhere on this 
planet right now.

Actually, even if it's going to be some work to switch mailing list and web 
site, I think the most painful thing will be getting the current links to 
"http://codespeak.net/lxml" diverted to the new place (many of the links 
are in blog entries that will never get updated) and convincing the web 
search engines that this new place is just as good as the old one. Then 
again, searching for "python xml" gives me good-ol'-dead-and-gone PyXML as 
top hit in Google, so that needs updating anyway.


> First we need to thank Holger for all these years of free service.

Yes, I'm also grateful about the support in the past years. Thanks, Holger.


> Then we need to quickly plan and execute a migration to some other repository.
>
> I am willing to help, but I am no core developer of lxml, and in fact
> this is my first message ever to this mailing list, so I think we need
> someone better known to this community to lead the migration effort.

I've just created a repo at github.

https://github.com/lxml/lxml

There's nothing there yet, but given that I already switched to hg-svn a 
while ago, I have the complete trunk history available that I could just push.

I'm not sure what to do with the maintenance branches, though. It would be 
nice to preserve them as well, at least the 2.2 line. I could convert them 
and just hang them in as separate repositories. Opinions?

I've also been looking for a better issue tracker than launchpad for ages. 
github could provide that as well (I mean, seriously, anything is better 
than launchpad), but that's not urgent and the main problem is getting at 
the current set of issues to move them over...

One project I could use help with is the web site. I'd like to migrate it 
to Sphinx. Shouldn't be hard, but it's not a quick action either. This 
includes the PDF version, for example. That's not urgent, but if we switch 
web sites anyway, it would be a good time to get it in shape. Any help with 
that would be appreciated.


> lxml as a key piece of the Python ecosystem and for the benefit of its
> present and future users and the wider Python community we need to
> make sure it continues to be available, and to make sure as many
> references (pypi!) as possible point to the new canonical repository.

Again, any help is welcome.


> Finally, a big thank you to Martijn, Philikon, Stefan and all the
> others who have built this great piece of software.

You're welcome.

Stefan

From sgt04b at yahoo.gr  Thu Feb 10 13:05:33 2011
From: sgt04b at yahoo.gr (Vas Zor)
Date: Thu, 10 Feb 2011 12:05:33 +0000 (GMT)
Subject: [lxml-dev] Building Python2.6 Windows eggs
Message-ID: <715844.87334.qm@web27701.mail.ukl.yahoo.com>

I reached this thread after googling the phrase "undefined reference to _ftol2 _chkstk mingw32". I found a quick and dirty solution that worked for me and want to share it. The solution is to include the following lines into a c file (e.g. ftol2.c) and include this file in the compilation.

--------------  ftol2.c -----------------

int _chkstk(int size) { return _alloca(size); }

long _ftol2(double f) { return (long) f; }

-----------------------------------------


Vangelis





From jholg at gmx.de  Thu Feb 10 13:57:56 2011
From: jholg at gmx.de (jholg at gmx.de)
Date: Thu, 10 Feb 2011 13:57:56 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <4D53D575.4040200@behnel.de>
References: 
	<4D53D575.4040200@behnel.de>
Message-ID: <20110210125756.230230@gmx.net>

Hi,
 
> > First we need to thank Holger for all these years of free service.
> 
> Yes, I'm also grateful about the support in the past years. Thanks,
> Holger.
+1

> > Then we need to quickly plan and execute a migration to some other
> repository.
> >
> > I am willing to help, but I am no core developer of lxml, and in fact
> > this is my first message ever to this mailing list, so I think we need
> > someone better known to this community to lead the migration effort.
> 
> I've just created a repo at github.
> 
> https://github.com/lxml/lxml

And just as I was about to ask if the lxml repo should switch to mercurial... ;-)

> There's nothing there yet, but given that I already switched to hg-svn a 
> while ago, I have the complete trunk history available that I could just
> push.
> 
> I'm not sure what to do with the maintenance branches, though. It would be
> nice to preserve them as well, at least the 2.2 line. I could convert them
> and just hang them in as separate repositories. Opinions?

Are there any "established" best practices for dealing with maintenance branches in distributed vcs? From skimming mercurial docs I got the impression that named branches living in the main repo might fit the bill.

What about the release tags? I suppose they are already preserved in the repo history, as they are just some symolic name for a revision/changeset (?)

My daily routine is still svn-centric and I haven't used hg but for the simplest "track some changes" use case - never used git so far.

Holger
-- 
Schon geh?rt? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar

From stefan_ml at behnel.de  Thu Feb 10 14:36:15 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 10 Feb 2011 14:36:15 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <20110210125756.230230@gmx.net>
References: 
	<4D53D575.4040200@behnel.de> <20110210125756.230230@gmx.net>
Message-ID: <4D53E9CF.308@behnel.de>

jholg at gmx.de, 10.02.2011 13:57:
>>> Then we need to quickly plan and execute a migration to some other
>>> repository.
>>>
>>> I am willing to help, but I am no core developer of lxml, and in
>>> fact this is my first message ever to this mailing list, so I think
>>> we need someone better known to this community to lead the migration
>>> effort.
>>
>> I've just created a repo at github.
>>
>> https://github.com/lxml/lxml
>
> And just as I was about to ask if the lxml repo should switch to
> mercurial... ;-)

Well, I used it for a while now. Maybe I should have given a note about it
on the ML.


>> There's nothing there yet, but given that I already switched to hg-svn
>> a while ago, I have the complete trunk history available that I could
>> just push.
>>
>> I'm not sure what to do with the maintenance branches, though. It
>> would be nice to preserve them as well, at least the 2.2 line. I could
>> convert them and just hang them in as separate repositories.
>> Opinions?
>
> Are there any "established" best practices for dealing with maintenance
> branches in distributed vcs? From skimming mercurial docs I got the
> impression that named branches living in the main repo might fit the
> bill.

Well, there are basically two ways: branches (in-repo) and separate repos. 
I think separate repos generally make more sense for maintenance branches 
where you explicitly want things to diverge (and a trunk user really won't 
care about a 1.3 branch). In-repo branches are better for short-lived 
experiments, collaboration, etc. In both cases, it's easy enough to 
cherry-pick patches from one branch to the other.


> What about the release tags? I suppose they are already preserved in the
> repo history, as they are just some symolic name for a
> revision/changeset (?)

Tags are a problem, yes. SVN doesn't readily provide tag information, it 
sees tags and branches as simple directories. And most tags in lxml 
originate from the maintenance branches, so that's even trickier to 
re-engineer. I don't currently have that information in my hg repo at all.


> My daily routine is still svn-centric and I haven't used hg but for the
> simplest "track some changes" use case

Here's a good read then:

http://hginit.com/


> - never used git so far.

Well, I keep being disgusted by git, it just feels so wrong each time I 
can't help using it. From a German POV, the name is really well chosen 
(basically means 'yuk!').

Stefan

From jholg at gmx.de  Thu Feb 10 16:07:37 2011
From: jholg at gmx.de (jholg at gmx.de)
Date: Thu, 10 Feb 2011 16:07:37 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <4D53E9CF.308@behnel.de>
References: 
	<4D53D575.4040200@behnel.de> <20110210125756.230230@gmx.net>
	<4D53E9CF.308@behnel.de>
Message-ID: <20110210150737.63020@gmx.net>



> > Are there any "established" best practices for dealing with maintenance
> > branches in distributed vcs? From skimming mercurial docs I got the
> > impression that named branches living in the main repo might fit the
> > bill.
> 
> Well, there are basically two ways: branches (in-repo) and separate repos.
> I think separate repos generally make more sense for maintenance branches 
> where you explicitly want things to diverge (and a trunk user really won't
> care about a 1.3 branch). In-repo branches are better for short-lived 
> experiments, collaboration, etc. In both cases, it's easy enough to 
> cherry-pick patches from one branch to the other.

Not sure if this is out of date feature-wise but PEP 374 talks about "git cherry-pick doesn't work across repositories; you need to have the branches in the same repository." (http://python.org/dev/peps/pep-0374/)

> > What about the release tags? I suppose they are already preserved in the
> > repo history, as they are just some symolic name for a
> > revision/changeset (?)
> 
> Tags are a problem, yes. SVN doesn't readily provide tag information, it 
> sees tags and branches as simple directories. And most tags in lxml 
> originate from the maintenance branches, so that's even trickier to 
> re-engineer. I don't currently have that information in my hg repo at all.
> 
> Here's a good read then:
> 
> http://hginit.com/

Thanks.

This might be of help for the strategy for handling of branches/tags: http://www.python.org/dev/peps/pep-0385/#transition-plan

Though I don't see anything on keeping correct tags, on first glance.

> Well, I keep being disgusted by git, it just feels so wrong each time I 
> can't help using it. From a German POV, the name is really well chosen 
> (basically means 'yuk!').

Read "it just feels so wrong each time that I can't help but using it" or
"it just feels so wrong each time when I can't help using it"? 8-)

Holger

-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

From paulhtremblay at gmail.com  Fri Feb 11 06:04:48 2011
From: paulhtremblay at gmail.com (Paul Tremblay)
Date: Fri, 11 Feb 2011 00:04:48 -0500
Subject: [lxml-dev] possible to use import with a string?
Message-ID: <4D54C370.5070503@gmail.com>

First, thanks Holger for making such a nice library for libxslt.

Can someone help me out with resolving URIs?

The following code works, except for the 

mport sys, os, StringIO
from StringIO import StringIO
from lxml import etree
xslt_root = etree.XML('''\





''')
transform = etree.XSLT(xslt_root)
f = StringIO('Text')
doc = etree.parse(f)
result = transform(doc)

I've read http://codespeak.net/lxml/resolvers.html, but still don't 
quite understand
how to solve my problem.

Thanks

Paul


From stefan_ml at behnel.de  Fri Feb 11 07:26:48 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 11 Feb 2011 07:26:48 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <4D53D575.4040200@behnel.de>
References: 
	<4D53D575.4040200@behnel.de>
Message-ID: <4D54D6A8.4010206@behnel.de>

Stefan Behnel, 10.02.2011 13:09:
> I've just created a repo at github.
>
> https://github.com/lxml/lxml

... and now recreated it as an organisation. I think that makes more sense.

Stefan

From stefan_ml at behnel.de  Fri Feb 11 14:31:33 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 11 Feb 2011 14:31:33 +0100
Subject: [lxml-dev] Source repository has moved to github
Message-ID: <4D553A35.20301@behnel.de>

Hi all,

as previously noted on the list, codespeak.net is closing down after 
several years as friendly, free and very well working home for lxml.

I have therefore started with the migration process for lxml's 
infrastructure. First, the SVN will no longer be used and write access has 
been disabled. The new home for the source repository is

https://github.com/lxml

The new main branch is at

https://github.com/lxml/lxml.git

or

git+ssh://git at github.com/lxml/lxml.git

for developers.

You can use either hg or git to access it. I personally use hg together 
with the git bridge "hg-git", which I can recommend.

http://hg-git.github.com/

Here's a good introduction to hg, in case you have never used it:

http://hginit.com/


Have fun!

Stefan

From svetlyak.40wt at gmail.com  Fri Feb 11 17:17:01 2011
From: svetlyak.40wt at gmail.com (Alexander Artemenko)
Date: Fri, 11 Feb 2011 19:17:01 +0300
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To: <4D553A35.20301@behnel.de>
References: <4D553A35.20301@behnel.de>
Message-ID: 

Hi

On Fri, Feb 11, 2011 at 4:31 PM, Stefan Behnel  wrote:
> Hi all,
>
> as previously noted on the list, codespeak.net is closing down after
> several years as friendly, free and very well working home for lxml.
>
> I have therefore started with the migration process for lxml's
> infrastructure. First, the SVN will no longer be used and write access has
> been disabled. The new home for the source repository is
>
> https://github.com/lxml

There is the way to keep tags and branches moving from SVN to GIT.

Read about git-svn's --branches and --tags options. Now I'm trying to
clone repository to git with all tags and branches. I run it at 12:00.
It is 19:15 now
but it is still running. If you wish, I could send you ready git
repository archived,
when process will be completed.

-- 
Alexander Artemenko (a.k.a. Svetlyak 40wt)
Blog: http://dev.svetlyak.ru
Photos: http://svetlyak.ru
Jabber: svetlyak.40wt at gmail.com

From arfrever.fta at gmail.com  Fri Feb 11 21:35:27 2011
From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis)
Date: Fri, 11 Feb 2011 21:35:27 +0100
Subject: [lxml-dev] 'import lxml.html.soupparser' fails with Python 3
Message-ID: <201102112136.02364.Arfrever.FTA@gmail.com>

$ python3.1 -c 'import lxml.html.soupparser'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python3.1/site-packages/lxml/html/soupparser.py", line 108, in 
    from htmlentitydefs import name2codepoint
ImportError: No module named htmlentitydefs

I'm attaching the patch.

-- 
Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lxml.html.soupparser.patch
Type: text/x-patch
Size: 399 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110211/82ad59e1/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110211/82ad59e1/attachment.pgp 

From Tim.Arnold at sas.com  Fri Feb 11 21:19:42 2011
From: Tim.Arnold at sas.com (Tim Arnold)
Date: Fri, 11 Feb 2011 20:19:42 +0000
Subject: [lxml-dev] modifying a tree in place
Message-ID: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814DB74@MERCMBX03R.na.SAS.com>

hi, I'm not sure if I can modify elements in a tree as I iterate over the tree.
The situation is parsing XHTML and attempting to remove empty italic or bold tags. Otherwise, they're written out as  or , which the browser (well, Chrome anyway), treats as opening the tag, so all text after it becomes italicized.

So I'm iterating over the tree and adding each offending element to a list. When I'm done with that, I iterate over the list and replace the elements.  My question is whether I need to do the second step or can I replace the elements as I iterate over the tree. Here's my code:

from lxml import etree
parser = etree.HTMLParser()
fname = 'mytest0.htm'
tree = etree.parse(fname, parser)

droptags = list()
for elem in  tree.xpath('//i|//b'):
    if not elem.text and not len(elem):
            droptags.append(elem)

for elem in droptags:
    parent = elem.getparent()
    newelem = etree.Element('span')
    newelem.text = elem.tail
    parent.replace(elem,newelem)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110211/b26b5bd4/attachment.htm 

From jholg at gmx.de  Mon Feb 14 08:51:13 2011
From: jholg at gmx.de (jholg at gmx.de)
Date: Mon, 14 Feb 2011 08:51:13 +0100
Subject: [lxml-dev] modifying a tree in place
In-Reply-To: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814DB74@MERCMBX03R.na.SAS.com>
References: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814DB74@MERCMBX03R.na.SAS.com>
Message-ID: <20110214075113.259470@gmx.net>

Hi,

> So I'm iterating over the tree and adding each offending element to a
> list. When I'm done with that, I iterate over the list and replace the
> elements.  My question is whether I need to do the second step or can I replace the
> elements as I iterate over the tree. Here's my code:

http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk

(Of course, you should profile your alternatives to see what works (performs) better for your use case)

Holger
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

From stefan_ml at behnel.de  Mon Feb 14 18:11:52 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 18:11:52 +0100
Subject: [lxml-dev] Where to move the mailing list?
Message-ID: <4D596258.9090107@behnel.de>

Hi,

now that the source repo is on github and the web site is about to get 
moved as well - where should this mailing list move? Any proposals? Any 
volunteers for hosting this list?

Stefan

From sergio at sergiomb.no-ip.org  Mon Feb 14 18:33:57 2011
From: sergio at sergiomb.no-ip.org (Sergio Monteiro Basto)
Date: Mon, 14 Feb 2011 17:33:57 +0000
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: <1297704837.21317.4.camel@segulix>

On Mon, 2011-02-14 at 18:11 +0100, Stefan Behnel wrote: 
> Hi,
> 
> now that the source repo is on github and the web site is about to get 
> moved as well - where should this mailing list move? Any proposals? Any 
> volunteers for hosting this list?

have you consider sourceforge.net ?


-- 
S?rgio M. B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3309 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110214/ff9d67f3/attachment-0001.bin 

From ovnicraft at gmail.com  Mon Feb 14 19:15:39 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Mon, 14 Feb 2011 13:15:39 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <1297704837.21317.4.camel@segulix>
References: <4D596258.9090107@behnel.de>
	<1297704837.21317.4.camel@segulix>
Message-ID: 

I really prefer google groups.

Cristian Salamea
On Feb 14, 2011 12:40 PM, "Sergio Monteiro Basto" 
wrote:
> On Mon, 2011-02-14 at 18:11 +0100, Stefan Behnel wrote:
>> Hi,
>>
>> now that the source repo is on github and the web site is about to get
>> moved as well - where should this mailing list move? Any proposals? Any
>> volunteers for hosting this list?
>
> have you consider sourceforge.net ?
>
>
> --
> S?rgio M. B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110214/5870d61d/attachment.htm 

From fdrake at acm.org  Mon Feb 14 19:11:56 2011
From: fdrake at acm.org (Fred Drake)
Date: Mon, 14 Feb 2011 13:11:56 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: 

On Mon, Feb 14, 2011 at 12:11 PM, Stefan Behnel  wrote:
> now that the source repo is on github and the web site is about to get
> moved as well - where should this mailing list move? Any proposals? Any
> volunteers for hosting this list?

http://librelist.com/ seems to be getting some traction in the larger
Python community, and generally appeals to people who just want a list
and don't want to feed content directly to Google.


? -Fred

--
Fred L. Drake, Jr.? ? 
"A storm broke loose in my mind."? --Albert Einstein

From stefan_ml at behnel.de  Mon Feb 14 19:51:23 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 19:51:23 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: 
References: <4D596258.9090107@behnel.de>
	
Message-ID: <4D5979AB.7000506@behnel.de>

Fred Drake, 14.02.2011 19:11:
> On Mon, Feb 14, 2011 at 12:11 PM, Stefan Behnel wrote:
>> now that the source repo is on github and the web site is about to get
>> moved as well - where should this mailing list move? Any proposals? Any
>> volunteers for hosting this list?
>
> http://librelist.com/ seems to be getting some traction in the larger
> Python community, and generally appeals to people who just want a list
> and don't want to feed content directly to Google.

Been there. It may be ok as long as it works, but the hoster (Zed Shaw) 
actively gave me the impression of not wanting to let anyone actually use 
this service (or maybe use, but certainly not do anything with it that may 
require him to put down his telly's remote control).

http://librelist.com/browser/meta/2011/2/11/hyphens-in-list-names/

Stefan

From stefan_ml at behnel.de  Mon Feb 14 19:57:59 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 19:57:59 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: 
References: <4D596258.9090107@behnel.de>	<1297704837.21317.4.camel@segulix>
	
Message-ID: <4D597B37.1060006@behnel.de>

Ovnicraft, 14.02.2011 19:15:
> I really prefer google groups.

Google Groups has been proposed by a couple of top-posters already. This 
seem to be a general problem with this provider, but certainly not the only 
one.

Sorry, but Google's straight out.

Stefan

From fdrake at acm.org  Mon Feb 14 19:57:14 2011
From: fdrake at acm.org (Fred Drake)
Date: Mon, 14 Feb 2011 13:57:14 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D5979AB.7000506@behnel.de>
References: <4D596258.9090107@behnel.de>
	
	<4D5979AB.7000506@behnel.de>
Message-ID: 

On Mon, Feb 14, 2011 at 1:51 PM, Stefan Behnel  wrote:
> Been there. It may be ok as long as it works, but the hoster (Zed Shaw)
> actively gave me the impression of not wanting to let anyone actually use
> this service (or maybe use, but certainly not do anything with it that may
> require him to put down his telly's remote control).

Ouch!

I've no particular interest in seeing this go either way, but...
non-support for hyphens is weird.  Zed's response tells me other
projects' going with Google Groups is a good thing, for reasons that
have nothing to do with Google.


? -Fred

--
Fred L. Drake, Jr.? ? 
"A storm broke loose in my mind."? --Albert Einstein

From Tim.Arnold at sas.com  Mon Feb 14 20:00:48 2011
From: Tim.Arnold at sas.com (Tim Arnold)
Date: Mon, 14 Feb 2011 19:00:48 +0000
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: 
References: <4D596258.9090107@behnel.de>
	
	<4D5979AB.7000506@behnel.de>
	
Message-ID: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814EC1D@MERCMBX03R.na.SAS.com>

> -----Original Message-----
> From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-
> bounces at codespeak.net] On Behalf Of Fred Drake
> Sent: Monday, February 14, 2011 1:57 PM
> To: Stefan Behnel
> Cc: ML-Lxml-dev
> Subject: Re: [lxml-dev] Where to move the mailing list?
> 
> On Mon, Feb 14, 2011 at 1:51 PM, Stefan Behnel 
> wrote:
> > Been there. It may be ok as long as it works, but the hoster (Zed
> > Shaw) actively gave me the impression of not wanting to let anyone
> > actually use this service (or maybe use, but certainly not do anything
> > with it that may require him to put down his telly's remote control).
> 
> Ouch!
> 
> I've no particular interest in seeing this go either way, but...
> non-support for hyphens is weird.  Zed's response tells me other projects'
> going with Google Groups is a good thing, for reasons that have nothing to
> do with Google.
> 
> 
> ? -Fred
> 
> --
> Fred L. Drake, Jr.? ?  

+1. After reading that thread I think I'll leave librelist alone.
--Tim Arnold



From p.oberndoerfer at urheberrecht.org  Mon Feb 14 20:55:34 2011
From: p.oberndoerfer at urheberrecht.org (=?iso-8859-1?Q?=22Pascal_Obernd=F6rfer=22?=)
Date: Mon, 14 Feb 2011 20:55:34 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: <06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>


> Hi,
>
> now that the source repo is on github and the web site is about to get
> moved as well - where should this mailing list move? Any proposals? Any
> volunteers for hosting this list?
>
> Stefan

Might applying for a python.org-list be worth a try?



Pascal


From st.jonathan at gmail.com  Mon Feb 14 21:01:50 2011
From: st.jonathan at gmail.com (Jonathan Stoppani)
Date: Mon, 14 Feb 2011 21:01:50 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: <22B2762A-5562-4080-8BAC-691B918FDE02@gmail.com>


On Feb 14, 2011, at 6:11 PM, Stefan Behnel wrote:

> Hi,
> 
> now that the source repo is on github and the web site is about to get 
> moved as well - where should this mailing list move? Any proposals? Any 
> volunteers for hosting this list?
> 
> Stefan

I can offer a mailman based mailing list if needed.


Jonathan


From stefan_ml at behnel.de  Mon Feb 14 21:11:11 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 21:11:11 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
References: <4D596258.9090107@behnel.de>
	<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
Message-ID: <4D598C5F.7070908@behnel.de>

"Pascal Obernd?rfer", 14.02.2011 20:55:
>> now that the source repo is on github and the web site is about to get
>> moved as well - where should this mailing list move? Any proposals? Any
>> volunteers for hosting this list?
>>
>> Stefan
>
> Might applying for a python.org-list be worth a try?
>
> 

I'm actually considering that. Cython's mailing list also moved there (for 
the same reason as lxml's). The lists at python.org aren't really fast, 
likely because of the relatively high traffic they carry. But they 
certainly are a valid address for a major Python based project.

Stefan

From ovnicraft at gmail.com  Mon Feb 14 22:24:12 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Mon, 14 Feb 2011 16:24:12 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D598C5F.7070908@behnel.de>
References: <4D596258.9090107@behnel.de>
	<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
	<4D598C5F.7070908@behnel.de>
Message-ID: 

On Mon, Feb 14, 2011 at 3:11 PM, Stefan Behnel  wrote:

> "Pascal Obernd?rfer", 14.02.2011 20:55:
> >> now that the source repo is on github and the web site is about to get
> >> moved as well - where should this mailing list move? Any proposals? Any
> >> volunteers for hosting this list?
> >>
> >> Stefan
> >
> > Might applying for a python.org-list be worth a try?
> >
> > 
>
> I'm actually considering that. Cython's mailing list also moved there (for
> the same reason as lxml's). The lists at python.org aren't really fast,
> likely because of the relatively high traffic they carry. But they
> certainly are a valid address for a major Python based project.
>

But remember something its all about python itself, anyway at this point we
get two choices:

Google services
Python services

Can all subscribers gives your opinion here ?

Regards,



> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>



-- 
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110214/31ec0efa/attachment-0001.htm 

From stefan_ml at behnel.de  Mon Feb 14 22:29:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 22:29:35 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: 
References: <4D596258.9090107@behnel.de>
	<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
	<4D598C5F.7070908@behnel.de>
	
Message-ID: <4D599EBF.4060908@behnel.de>

Ovnicraft, 14.02.2011 22:24:
> On Mon, Feb 14, 2011 at 3:11 PM, Stefan Behnel wrote:
>
>> "Pascal Obernd?rfer", 14.02.2011 20:55:
>>>> now that the source repo is on github and the web site is about to get
>>>> moved as well - where should this mailing list move? Any proposals? Any
>>>> volunteers for hosting this list?
>>>>
>>>> Stefan
>>>
>>> Might applying for a python.org-list be worth a try?
>>>
>>> 
>>
>> I'm actually considering that. Cython's mailing list also moved there (for
>> the same reason as lxml's). The lists at python.org aren't really fast,
>> likely because of the relatively high traffic they carry. But they
>> certainly are a valid address for a major Python based project.
>
> But remember something its all about python itself, anyway at this point we
> get two choices:
>
> Google services
> Python services
>
> Can all subscribers gives your opinion here ?

Argh, please, no. We have a lot of subscribers, if everyone writes an 
e-mail to express their opinion, we'd be flooded.

Stefan

From stefan_ml at behnel.de  Tue Feb 15 05:31:53 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 05:31:53 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: 
References: <4D596258.9090107@behnel.de>
	<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
	<4D598C5F.7070908@behnel.de>
	
	<4D599EBF.4060908@behnel.de>
	
Message-ID: <4D5A01B9.20804@behnel.de>

Michael Lissner, 14.02.2011 23:21:
>> Ovnicraft, 14.02.2011 22:24:
>>> On Mon, Feb 14, 2011 at 3:11 PM, Stefan Behnel wrote:
>>>
>>>> "Pascal Obernd?rfer", 14.02.2011 20:55:
>>>>>> now that the source repo is on github and the web site is about to get
>>>>>> moved as well - where should this mailing list move? Any proposals? Any
>>>>>> volunteers for hosting this list?
>>>>>>
>>>>>> Stefan
>>>>>
>>>>> Might applying for a python.org-list be worth a try?
>>>>>
>>>>> 
>>>>
>>>> I'm actually considering that. Cython's mailing list also moved there (for
>>>> the same reason as lxml's). The lists at python.org aren't really fast,
>>>> likely because of the relatively high traffic they carry. But they
>>>> certainly are a valid address for a major Python based project.
>>>
>>> But remember something its all about python itself, anyway at this point we
>>> get two choices:
>>>
>>> Google services
>>> Python services
>
> OK, I know you don't want all subscribers responding, but I'll give my
> own thoughts. I'd say go with a Google Group or self-host.

Hmm, now that you mention it - I may not be able to self-host, but I can 
give us a good mailing list address.


>  Yes, Google
> is a big company, but they do have a good group system (the best).

I do not argue against Google being a big company. I also do not argue 
against them offering a good service in some areas. But I do argue against 
them offering a good service for mailing lists.


> You've got threads, search, easy subscription, a UI people are
> familiar with, etc. I don't think I saw the argument against Google
> Groups?

They have problems with spam, they push users into top-posting, they keep 
having their own ideas about who they want to let subscribe. I don't like 
that at all. If not using their web interface, what's the point in using 
their service?


> I don't know much about mailman, but from my own usage, it seems like
> it's about a decade old, and lacks search, which is pretty lame. As a
> user of lxml, I'd really appreciate a list where I could search for
> common problems.

The lxml mailing list has been archived in various places. Most of those 
are easily searchable, including with Google.

Stefan

From stefan_ml at behnel.de  Tue Feb 15 09:45:49 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 09:45:49 +0100
Subject: [lxml-dev] http://codespeak.net/lxml has moved to lxml.de
Message-ID: <4D5A3D3D.7040003@behnel.de>

Hi everyone,

I moved the web site to a new home. It's now at

http://lxml.de/

which is clearly a lot shorter than the old address. ;)

There is a 301 redirect set up from the old site, so that you should get 
the new pages through the old addresses. I took care to fix up the inner 
links (without breaking XML namespaces etc.). If you find any problems, 
please report them to me, I'll see that I can fix them ASAP.

Have fun,

Stefan

From p.oberndoerfer at urheberrecht.org  Tue Feb 15 10:48:09 2011
From: p.oberndoerfer at urheberrecht.org (Pascal)
Date: Tue, 15 Feb 2011 09:48:09 +0000 (UTC)
Subject: [lxml-dev] http://codespeak.net/lxml has moved to lxml.de
References: <4D5A3D3D.7040003@behnel.de>
Message-ID: 

Stefan Behnel  behnel.de> writes:

> There is a 301 redirect set up from the old site, so that you should get 
> the new pages through the old addresses.

This (i.e. specifically the redirect) is really great news! Congrats!



From stefan_ml at behnel.de  Tue Feb 15 12:56:38 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 12:56:38 +0100
Subject: [lxml-dev] new mailing list
Message-ID: <4D5A69F6.9040600@behnel.de>

Hi,

I got the mailing list set up. It's hosted by a volunteer (thanks, 
Jonathan!), but directed through a local mail address "lxml" at "lxml.de".

You should have received a subscription message for the new list, as I mass 
subscribed all current subscribers to the new list. Please update your mail 
filters accordingly and arrange for digest delivery if you prefer that. 
Sorry for any inconvenience.

Stefan

From Marc.Graff at VerizonWireless.com  Tue Feb 15 16:17:00 2011
From: Marc.Graff at VerizonWireless.com (Graff, Marc)
Date: Tue, 15 Feb 2011 10:17:00 -0500
Subject: [lxml-dev] new mailing list
In-Reply-To: 
References: 
Message-ID: <20110215152717.4846D282BEA@codespeak.net>

Thanks for the solid project and support.

-----Original Message-----
From: lxml-dev-bounces at codespeak.net
[mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Stefan Behnel
Sent: Tuesday, February 15, 2011 6:57 AM
To: lxml mailing list; ML-Lxml-dev
Subject: [lxml-dev] new mailing list

Hi,

I got the mailing list set up. It's hosted by a volunteer (thanks, 
Jonathan!), but directed through a local mail address "lxml" at
"lxml.de".

You should have received a subscription message for the new list, as I
mass 
subscribed all current subscribers to the new list. Please update your
mail 
filters accordingly and arrange for digest delivery if you prefer that. 
Sorry for any inconvenience.

Stefan
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev

From ovnicraft at gmail.com  Tue Feb 15 17:03:16 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Tue, 15 Feb 2011 11:03:16 -0500
Subject: [lxml-dev] new mailing list
In-Reply-To: <4D5A69F6.9040600@behnel.de>
References: <4D5A69F6.9040600@behnel.de>
Message-ID: 

On Tue, Feb 15, 2011 at 6:56 AM, Stefan Behnel  wrote:

> Hi,
>
> I got the mailing list set up. It's hosted by a volunteer (thanks,
> Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
>
> You should have received a subscription message for the new list, as I mass
> subscribed all current subscribers to the new list. Please update your mail
> filters accordingly and arrange for digest delivery if you prefer that.
> Sorry for any inconvenience.
>
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>


Thanks for this great project.

-- 
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110215/b857a6d0/attachment.htm 

From lists at cheimes.de  Tue Feb 15 17:49:46 2011
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 15 Feb 2011 17:49:46 +0100
Subject: [lxml-dev] new mailing list
In-Reply-To: <4D5A69F6.9040600@behnel.de>
References: <4D5A69F6.9040600@behnel.de>
Message-ID: 

Am 15.02.2011 12:56, schrieb Stefan Behnel:
> I got the mailing list set up. It's hosted by a volunteer (thanks, 
> Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
> 
> You should have received a subscription message for the new list, as I mass 
> subscribed all current subscribers to the new list. Please update your mail 
> filters accordingly and arrange for digest delivery if you prefer that. 
> Sorry for any inconvenience.

Thanks Stefan!

The links at http://lxml.de/index.html#mailing-list still points to the
wrong URL.

Have you notified gmane about the new addresses for the Cython and LXML
mailing lists?

Christian


From stefan_ml at behnel.de  Tue Feb 15 17:58:55 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 17:58:55 +0100
Subject: [lxml-dev] new mailing list
In-Reply-To: 
References: <4D5A69F6.9040600@behnel.de> 
Message-ID: <4D5AB0CF.8040501@behnel.de>

Christian Heimes, 15.02.2011 17:49:
> Am 15.02.2011 12:56, schrieb Stefan Behnel:
>> I got the mailing list set up. It's hosted by a volunteer (thanks,
>> Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
>>
>> You should have received a subscription message for the new list, as I mass
>> subscribed all current subscribers to the new list. Please update your mail
>> filters accordingly and arrange for digest delivery if you prefer that.
>> Sorry for any inconvenience.
>
> Thanks Stefan!
>
> The links at http://lxml.de/index.html#mailing-list still points to the
> wrong URL.

Right, I've fixed that in the sources but not redeployed the web site yet.

You can actually go through http://lxml.de/mailinglist/ now to get to the 
subscription page.


> Have you notified gmane about the new addresses for the Cython and LXML
> mailing lists?

Yes, two times about the Cython list, once about the lxml list - no 
response so far. I'll keep trying.

Stefan

From noah at mahalo.com  Tue Feb 15 20:19:20 2011
From: noah at mahalo.com (Noah Silas)
Date: Tue, 15 Feb 2011 11:19:20 -0800
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To: 
References: <4D553A35.20301@behnel.de>
	
Message-ID: 

Now that there is a proper presence for the project on github, I'll be
closing my existing svn mirror. Anybody that has been using the
https://github.com/noah256/lxml/ github repo should migrate to the official
https:///github.com/lxml/lxml/ repo. If you have any problems changing over,
feel free to email me directly for assistance.
~Noah


On Fri, Feb 11, 2011 at 8:17 AM, Alexander Artemenko <
svetlyak.40wt at gmail.com> wrote:

> Hi
>
> On Fri, Feb 11, 2011 at 4:31 PM, Stefan Behnel 
> wrote:
> > Hi all,
> >
> > as previously noted on the list, codespeak.net is closing down after
> > several years as friendly, free and very well working home for lxml.
> >
> > I have therefore started with the migration process for lxml's
> > infrastructure. First, the SVN will no longer be used and write access
> has
> > been disabled. The new home for the source repository is
> >
> > https://github.com/lxml
>
> There is the way to keep tags and branches moving from SVN to GIT.
>
> Read about git-svn's --branches and --tags options. Now I'm trying to
> clone repository to git with all tags and branches. I run it at 12:00.
> It is 19:15 now
> but it is still running. If you wish, I could send you ready git
> repository archived,
> when process will be completed.
>
> --
> Alexander Artemenko (a.k.a. Svetlyak 40wt)
> Blog: http://dev.svetlyak.ru
> Photos: http://svetlyak.ru
> Jabber: svetlyak.40wt at gmail.com
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110215/258880fb/attachment-0001.htm 

From stefan_ml at behnel.de  Tue Feb 15 20:27:51 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 20:27:51 +0100
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To: 
References: <4D553A35.20301@behnel.de>
	
Message-ID: <4D5AD3B7.6090202@behnel.de>

Alexander Artemenko, 11.02.2011 17:17:
> On Fri, Feb 11, 2011 at 4:31 PM, Stefan Behnel wrote:
>> as previously noted on the list, codespeak.net is closing down after
>> several years as friendly, free and very well working home for lxml.
>>
>> I have therefore started with the migration process for lxml's
>> infrastructure. First, the SVN will no longer be used and write access has
>> been disabled. The new home for the source repository is
>>
>> https://github.com/lxml
>
> There is the way to keep tags and branches moving from SVN to GIT.
>
> Read about git-svn's --branches and --tags options. Now I'm trying to
> clone repository to git with all tags and branches. I run it at 12:00.
> It is 19:15 now
> but it is still running. If you wish, I could send you ready git
> repository archived,
> when process will be completed.

Ah, sorry for coming back to this so late. Did the conversion run 
successful? I actually ran mine on a stripped dump of the SVN repo. That's 
much faster.

So far, I'm quite happy with the separation of the maintenance branches 
from the master branch. But I wouldn't mind replacing the maintenance 
branch repo with your complete conversion. Although, all that's currently 
missing is the tags. I wouldn't even mind recreating those manually...

Stefan

From svetlyak.40wt at gmail.com  Wed Feb 16 07:10:10 2011
From: svetlyak.40wt at gmail.com (Alexander Artemenko)
Date: Wed, 16 Feb 2011 09:10:10 +0300
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To: <4D5AD3B7.6090202@behnel.de>
References: <4D553A35.20301@behnel.de>
	
	<4D5AD3B7.6090202@behnel.de>
Message-ID: 

Hi Stefan,

On Tue, Feb 15, 2011 at 10:27 PM, Stefan Behnel  wrote:

> Ah, sorry for coming back to this so late. Did the conversion run
> successful? I actually ran mine on a stripped dump of the SVN repo. That's
> much faster.

Yes, conversion completed successfuly, you can download it here:
http://pypi.svetlyak.ru/lxml-git.tar.bz2

git-svn copied all svn branches and tags as git's remote branches, you
can transform them into the real tags and branches. I already created
a tag for 2.3 version. You even could write a script which will run
'git branch -r' to see which tags are available, and then will do:

git checkout tags/lxml-2.2.4
git tag 2.2.4
git checkout master

Or, if you need a real branch, then you could do:

git branch --no-track threading threading

-- 
Alexander Artemenko (a.k.a. Svetlyak 40wt)
Blog: http://dev.svetlyak.ru
Photos: http://svetlyak.ru
Jabber: svetlyak.40wt at gmail.com

From john at nmt.edu  Wed Feb 16 22:05:06 2011
From: john at nmt.edu (John W. Shipman)
Date: Wed, 16 Feb 2011 14:05:06 -0700 (MST)
Subject: [lxml-dev] [lxml] Small sample
In-Reply-To: <699662.39531.qm@web84205.mail.re3.yahoo.com>
References: <699662.39531.qm@web84205.mail.re3.yahoo.com>
Message-ID: 

+--
| Would someone point me to a small lxml parsing sample for
| opening an traversing an XML file?
+--

If I might recommend my own modest effort at documenting lxml:

     http://www.nmt.edu/tcc/help/pubs/pylxml/

Also see this page for a number of literate programs that use
lxml:

     http://www.nmt.edu/~shipman/soft/litprog/

In particular, from this page, you might look at these projects:

     Bb8import:  Reads XML, generates XML.
     docbookindex:  Generates XSL-FO directly from Python.
     hwscan3:  Reads XML, generates XHTML.
     Bird taxonomy system: Reads and writes XML.
     birdnotes.py:  Reads XML.
     catweb:  Reads XML, generates XHTML.

Best regards,
John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center,
Speare 119, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber

From paulhtremblay at gmail.com  Thu Feb 17 04:49:19 2011
From: paulhtremblay at gmail.com (Paul Tremblay)
Date: Wed, 16 Feb 2011 22:49:19 -0500
Subject: [lxml-dev] Thanks
Message-ID: <4D5C9ABF.4080200@gmail.com>

I would also like to thank to all the developers and others who have 
supported lxml.

Paul

From stefan_ml at behnel.de  Thu Feb 17 07:16:33 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 17 Feb 2011 07:16:33 +0100
Subject: [lxml-dev] 'import lxml.html.soupparser' fails with Python 3
In-Reply-To: <201102112136.02364.Arfrever.FTA@gmail.com>
References: <201102112136.02364.Arfrever.FTA@gmail.com>
Message-ID: <4D5CBD41.7020709@behnel.de>

Arfrever Frehtes Taifersar Arahesis, 11.02.2011 21:35:
> $ python3.1 -c 'import lxml.html.soupparser'
> Traceback (most recent call last):
>    File "", line 1, in
>    File "/usr/lib64/python3.1/site-packages/lxml/html/soupparser.py", line 108, in
>      from htmlentitydefs import name2codepoint
> ImportError: No module named htmlentitydefs
>
> I'm attaching the patch.

Thanks!

https://github.com/lxml/lxml/commit/3022257b05a3ba86d72666c0b3f929be50e8e331

Stefan

From breuerss at uni-koeln.de  Mon Feb 28 11:11:29 2011
From: breuerss at uni-koeln.de (Sebastian Breuers)
Date: Mon, 28 Feb 2011 11:11:29 +0100
Subject: [lxml-dev] etree.XMLSchema throws etree.XMLSchemaParseError on
 reading CML schema
In-Reply-To: <4D48F75A.208@behnel.de>
References: <4D4875E9.1040901@uni-koeln.de> <4D48F75A.208@behnel.de>
Message-ID: <4D6B74D1.6060803@uni-koeln.de>

Hey,

just to finally add a comment to that issue.

I discussed the stuff with the CML developers and after some testing 
also according to your suggestions we came to the conclusion that it is 
a bug in the libxml2. This bug is also already mentioned

https://bugzilla.gnome.org/show_bug.cgi?id=573483

That means that lxml is unfortunately till the fix of that issue not 
usable for us.

Kind regards and thanks for your efforts.

Sebastian

Am 02.02.2011 07:19, schrieb Stefan Behnel:
> Sebastian Breuers, 01.02.2011 22:06:
>> I encounter the following issue. As a member of the MoSGrid consortium, a
>> project that is aimed to facilitate molecular simulations in the D-Grid
>> environment, I want to use the CML (Chemical Markup Language) to describe
>> molecular simulation jobs.
>>
>> I wrote a small validator that uses the lxml.etree.XMLSchema object to
>> read
>> the XSD describing the CML3 (located at
>> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the
>> schema with the lxml.etree.XMLSchemaParseError:
>>
>> local complex type: The content model is not determinist., line 5962
>>
>> As I wrote to the developer of the CML he told me that his schema is read
>> properly in JAVA and C# with the saxon library. I've got an idea why the
>> XMLSchema object is throwing that exception but now I am not quite
>> sure if
>> it is an issue with the standard (CML) or with the XMLSchema.
>
> It's usually an issue with the standard of XML-Schema. ;) The problem is
> that the W3C specification is extremely complicated - it's even more
> complex than actually writing a schema, and that's telling, in case
> you've never done that. So the simple fact that there is one tool that
> can successfully parse a W3C schema document doesn't mean that every
> other validation tool can work with it. Specifically, it is a known fact
> that libxml2 (which lxml gets its schema support from) has deficiencies
> with some less widely used schema constructs.
>
> I suggest this:
>
> 1) test the schema with the xmllint command line tool to reproduce the
> problem with plain libxml2.
>
> 2) contact the CML developer again and ask him to debug the schema
> against libxml2/xmllint. Maybe he can find a simple way to make it work.
> Don't forget to mention that libxml2 is a very widely used tool for XML
> processing that's absolutely worth supporting.
>
> 3) look out for a CML schema in RelaxNG or Schematron, which have much
> more accessible specifications and are much easier to implement
> correctly. These languages also make it a lot easier to write and
> maintain a schema, and you can generate a W3C XML Schema from RelaxNG
> using "trang".
>
> Stefan

-- 
_____________________________________________________________________________

Sebastian Breuers               Tel: +49-221-470-4108
EMail: breuerss at uni-koeln.de

Universit?t zu K?ln             University of Cologne
Department f?r Chemie           Department of Chemistry
Organische Chemie               Organic Chemistry

Greinstra?e 6                   Greinstra?e 6
Raum 325			Room 325
D-50939 K?ln                    D-50939 Cologne, Federal Rep. of Germany
_____________________________________________________________________________