escribi?:
> Ah. I was afraid it was something like that, and eventually just wrapped
all the
> text coming from the site within unicode(text, errors='ignore). I must
say, I
> hate character encodings - I can't wait for the utf overloads to take over
once
> and for all.
>
> In looking into my problem, I discovered the chardet library, which can
detect
> character encodings automatically. I'm probably going to integrate it into
my
> code, but I wonder if lxml has any interest in having this as part of its
system.
>
> It looks like BeautifulSoup uses it already, FWIW.
>
> More info here:
>
http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file
>
> Mike
>
>
> Stefan Behnel wrote on 02/06/2011 12:13 PM:
>> Michael Lissner, 06.02.2011 07:34:
>>> Hi, I'm using lxml to parse the contents of some court pages (basically
bringing
>>> court docs to the people), and a certain page in particular is failing
without
>>> throwing any errors. I'm curious if this is something I'm doing, if I've
>>> discovered a corner-case in lxml's abilities, or if it's something else
>>> altogether.
>>>
>>> The code I'm running is (in part) this:
>>>
>>> >>> from lxml import etree
>>> >>> import StringIO
>>>
>>> >>> # Read the URL using urllib2
>>> >>> quickHTML =
>>> readURL('
http://www.ca1.uscourts.gov/cgi-bin/getopn.pl?OPINION=95-1346.01A', 1)
>>>
>>> >>> # Use the HTML Parser
>>> >>> parser = etree.HTMLParser()
>>>
>>> >>> # Make the HTML into a tree
>>> >>> quickTree = etree.parse(StringIO.StringIO(quickHTML), parser)
>>>
>>> >>> # Pull out any pre elements (there's only one)
>>> >>> documentPlainText = quickTree.find('//pre')
>>>
>>> >>> # Print the pre elements to the console
>>> >>> print tostring(documentPlainText)
>>>
>>>
>>>
>>> >>> # Woah - that should have been much bigger! Print the whole HTML to
the
>>> console:
>>> >>> print tostring(quickTree)
>>>
>>> >> "http://www.w3.org/TR/REC-html40/loose.dtd">
>>>
>>> USCA1 Opinion
>>>
>>>
>>>
>>>
>> src="/images/buttons/pacer/help.jpg" border="0">
>>>
>>>
>> src="/images/buttons/pacer/wp_format.jpg" border="0">
>>>
>>>
>>>
>>> >>> # ****What the heck?****
>>>
>>> If you look at the URL from line 4, above, you'll see that the pre
element has a
>>> TON of pretty ugly content inside it, but when I run it as above, I
don't get
>>> any of it. I've run this script to download and parse hundreds of nearly
>>> identical pages, and it is working great, but for this page, it fails.
>>>
>>> Anybody have any theories why? I'd love to get this scraper back up and
running
>>> this weekend, so I can download as much of the court's material before
the
>>> lawyers start using the site again Monday.
>>>
>>> Thanks for the help and the great library,
>>>
>>> Mike
>>>
>>> PS - if you want to see the /real/ code, it's here:
>>>
https://bitbucket.org/mlissner/legal-current-awareness/src/ffbfcb79c659/alert/back_scrape.py#cl-176
>>>
>>
>> I can reproduce this. It seems to be a character encoding problem.
libxml2's
>> xmllint gives me the same result. When I set the input encoding to
"ISO8859-1"
>> explicitly for the parser using
>>
>> parser = etree.HTMLParser(encoding='ISO8859-1')
>>
>> then I get the complete tree. So I guess the parser stops short on the
>> undecodeable characters in the page.
>>
>> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110207/25bffb08/attachment.htm
From ovnicraft at gmail.com Tue Feb 8 00:17:30 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Mon, 7 Feb 2011 18:17:30 -0500
Subject: [lxml-dev] XSLT transformation problem
Message-ID:
Hello i am working in a simple 'xslt transformation', in my country
a government site[1] give us the the xsd (for schema) and XML files and they
says us
Make a XSLT tranformation i open the file and found is not and xsl sheet (i
attach it).
I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
included)
How i can use the XML file attached if with XSLT transformation ?
Regards,
[1]
https://declaraciones.sri.gov.ec/rec-declaraciones-internet/general/especificacionesTec.jsp
--
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110207/fdd6dc8d/attachment-0001.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CAL0402.xml
Type: text/xml
Size: 25850 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110207/fdd6dc8d/attachment-0001.bin
From stefan_ml at behnel.de Tue Feb 8 07:01:22 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 08 Feb 2011 07:01:22 +0100
Subject: [lxml-dev] XSLT transformation problem
In-Reply-To:
References:
Message-ID: <4D50DC32.3000503@behnel.de>
Ovnicraft, 08.02.2011 00:17:
> Hello i am working in a simple 'xslt transformation', in my country
> a government site[1] give us the the xsd (for schema) and XML files and they
> says us
> Make a XSLT tranformation i open the file and found is not and xsl sheet (i
> attach it).
>
> I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
> included)
>
> How i can use the XML file attached if with XSLT transformation ?
I'm not sure what you're asking (I'm having difficulties parsing your
English). Did they tell you to use the XML file as a stylesheet, or did
they ask you to write a stylesheet for them?
Note that the file you are trying to parse as XSLT file has a different
name than the file you attached, and that the code that your link shows is
not runnable (it uses "XSL" instead of "XSLT", for example).
Stefan
From ovnicraft at gmail.com Tue Feb 8 17:45:17 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Tue, 8 Feb 2011 11:45:17 -0500
Subject: [lxml-dev] XSLT transformation problem
In-Reply-To: <4D50DC32.3000503@behnel.de>
References:
<4D50DC32.3000503@behnel.de>
Message-ID:
On Tue, Feb 8, 2011 at 1:01 AM, Stefan Behnel wrote:
> Ovnicraft, 08.02.2011 00:17:
> > Hello i am working in a simple 'xslt transformation', in my country
> > a government site[1] give us the the xsd (for schema) and XML files and
> they
> > says us
> > Make a XSLT tranformation i open the file and found is not and xsl sheet
> (i
> > attach it).
> >
> > I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
> > included)
> >
> > How i can use the XML file attached if with XSLT transformation ?
>
> I'm not sure what you're asking (I'm having difficulties parsing your
> English). Did they tell you to use the XML file as a stylesheet, or did
> they ask you to write a stylesheet for them?
>
> Note that the file you are trying to parse as XSLT file has a different
> name than the file you attached, and that the code that your link shows is
> not runnable (it uses "XSL" instead of "XSLT", for example).
>
Yes it was fixed, i use now XSLT, but the file attached they give me telling
me use it for XSLT transformation and it does not a xsl template.
In another hand the attached file has special structure build for them and
has xpath expression: http://paste.pocoo.org/show/334424/
the file generated by me is this http://paste.pocoo.org/show/334427/ for
this file i need make the transformation with the attached.
I need clues for use my attached file, maybe use it to create my xsl
template or parse it and use it with xpath method.
What do you think?
Regards,
PS: hope be more clear now. :)
>
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
--
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110208/c47778bf/attachment.htm
From stefan_ml at behnel.de Wed Feb 9 09:08:46 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 09 Feb 2011 09:08:46 +0100
Subject: [lxml-dev] XSLT transformation problem
In-Reply-To:
References:
<4D50DC32.3000503@behnel.de>
Message-ID: <4D524B8E.7020304@behnel.de>
Ovnicraft, 08.02.2011 17:45:
> On Tue, Feb 8, 2011 at 1:01 AM, Stefan Behnel wrote:
>
>> Ovnicraft, 08.02.2011 00:17:
>>> Hello i am working in a simple 'xslt transformation', in my country
>>> a government site[1] give us the the xsd (for schema) and XML files and
>> they
>>> says us
>>> Make a XSLT tranformation i open the file and found is not and xsl sheet
>> (i
>>> attach it).
>>>
>>> I understand i cant do this: http://paste.pocoo.org/show/334060/ (error
>>> included)
>>>
>>> How i can use the XML file attached if with XSLT transformation ?
>>
>> I'm not sure what you're asking (I'm having difficulties parsing your
>> English). Did they tell you to use the XML file as a stylesheet, or did
>> they ask you to write a stylesheet for them?
>>
>> Note that the file you are trying to parse as XSLT file has a different
>> name than the file you attached, and that the code that your link shows is
>> not runnable (it uses "XSL" instead of "XSLT", for example).
>>
>
> Yes it was fixed, i use now XSLT, but the file attached they give me telling
> me use it for XSLT transformation and it does not a xsl template.
>
> In another hand the attached file has special structure build for them and
> has xpath expression: http://paste.pocoo.org/show/334424/
> the file generated by me is this http://paste.pocoo.org/show/334427/ for
> this file i need make the transformation with the attached.
>
> I need clues for use my attached file, maybe use it to create my xsl
> template or parse it and use it with xpath method.
AFAICT, the web site only says
"""
These formulas must be applied to the XML file by an XSLT transformation
"""
If that translation is correct, it doesn't say where the stylesheet for the
XSLT is supposed to come from, only that you should apply some kind of XSLT
to the XML documents. Maybe the stylesheet to do that is a separate
download somewhere? Why don't you ask the site owners where to get the
stylesheet from?
That being said, it shouldn't be too hard to write an XSLT script (or
Python code) that extracts the XPath expressions from the attributes in the
XML document you posted, and runs them against a suitable document.
Stefan
From luciano at ramalho.org Thu Feb 10 11:12:40 2011
From: luciano at ramalho.org (Luciano Ramalho)
Date: Thu, 10 Feb 2011 08:12:40 -0200
Subject: [lxml-dev] Codespeak shutting down: migration plans?
Message-ID:
Dear colleagues,
As a long-time user of lxml I was shocked to read a message today from
Holger Krekel about the end of Codespeak (see full text at the end of
this).
First we need to thank Holger for all these years of free service.
Then we need to quickly plan and execute a migration to some other repository.
I am willing to help, but I am no core developer of lxml, and in fact
this is my first message ever to this mailing list, so I think we need
someone better known to this community to lead the migration effort.
lxml as a key piece of the Python ecosystem and for the benefit of its
present and future users and the wider Python community we need to
make sure it continues to be available, and to make sure as many
references (pypi!) as possible point to the new canonical repository.
Finally, a big thank you to Martijn, Philikon, Stefan and all the
others who have built this great piece of software.
Cheers,
--
Luciano Ramalho
programador repentista || stand-up programmer
Twitter: @luciano
#########################################
hi codespeak.net users, (sorry if you get mail twice, i wanted to make sure ...)
after 8 years of operation codespeak.net services are bound to
terminate, starting
END OF FEBRUARY 2011
Background: one of the original codespeak purposes was to offer subversion (then
in version 0.17) for the PyPy and other projects but today this is not too
interesting given the pletora of VCS hosting solutions. Also, there aren't too
many admins besides me, the hosting is costing money, PyPy's repository has
moved to Bitbucket and i am re-shuffling my priorities preparing for my soon to
emerge father-hood. After February 2011 i probably won't be able to help
much with any transition issues or questions. The host will keep on running for
a while but i give no guaruantees.
Some remarks regarding termination wrt to the FEB 2011 deadline:
* the subversion repo will turn read-only (and will eventually be switched off).
* Shell accounts will be restricted to those people who need it *and* mail
me about it. Some time later they will be gone as well.
* Mailing lists will be terminated as well unless i get a mail asking
me to postpone termination for a specific time. You can go to your respective
mailman admin page and extract a list of members. If you mail me i can also
provide a list of members.
* Any remaining web docs/pages will probably continue to exist for a while
but i also prefer them to be moved away by end Feb 2011.
Note that the codespeak svn repository contains a lot of projects.
For migration
you have two options: do a flat import just of your project checkout
directory into
a new version system. This is super-simple, obviously. If you want to preserve
history for your project please mail me and i either provide you a full dump or
a filtered dump only containing your project.
So long and I hope you all had a good time and enjoyed the services and also
have a good transition now.
see you in other places,
holger krekel
_______________________________________________
codespeak-ann at codespeak.net
http://codespeak.net/mailman/listinfo/codespeak-ann
From stefan_ml at behnel.de Thu Feb 10 13:09:25 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 10 Feb 2011 13:09:25 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To:
References:
Message-ID: <4D53D575.4040200@behnel.de>
Luciano Ramalho, 10.02.2011 11:12:
> As a long-time user of lxml I was shocked to read a message today from
> Holger Krekel about the end of Codespeak (see full text at the end of
> this).
So was I. And I'm not aware of any April-1st-like day anywhere on this
planet right now.
Actually, even if it's going to be some work to switch mailing list and web
site, I think the most painful thing will be getting the current links to
"http://codespeak.net/lxml" diverted to the new place (many of the links
are in blog entries that will never get updated) and convincing the web
search engines that this new place is just as good as the old one. Then
again, searching for "python xml" gives me good-ol'-dead-and-gone PyXML as
top hit in Google, so that needs updating anyway.
> First we need to thank Holger for all these years of free service.
Yes, I'm also grateful about the support in the past years. Thanks, Holger.
> Then we need to quickly plan and execute a migration to some other repository.
>
> I am willing to help, but I am no core developer of lxml, and in fact
> this is my first message ever to this mailing list, so I think we need
> someone better known to this community to lead the migration effort.
I've just created a repo at github.
https://github.com/lxml/lxml
There's nothing there yet, but given that I already switched to hg-svn a
while ago, I have the complete trunk history available that I could just push.
I'm not sure what to do with the maintenance branches, though. It would be
nice to preserve them as well, at least the 2.2 line. I could convert them
and just hang them in as separate repositories. Opinions?
I've also been looking for a better issue tracker than launchpad for ages.
github could provide that as well (I mean, seriously, anything is better
than launchpad), but that's not urgent and the main problem is getting at
the current set of issues to move them over...
One project I could use help with is the web site. I'd like to migrate it
to Sphinx. Shouldn't be hard, but it's not a quick action either. This
includes the PDF version, for example. That's not urgent, but if we switch
web sites anyway, it would be a good time to get it in shape. Any help with
that would be appreciated.
> lxml as a key piece of the Python ecosystem and for the benefit of its
> present and future users and the wider Python community we need to
> make sure it continues to be available, and to make sure as many
> references (pypi!) as possible point to the new canonical repository.
Again, any help is welcome.
> Finally, a big thank you to Martijn, Philikon, Stefan and all the
> others who have built this great piece of software.
You're welcome.
Stefan
From sgt04b at yahoo.gr Thu Feb 10 13:05:33 2011
From: sgt04b at yahoo.gr (Vas Zor)
Date: Thu, 10 Feb 2011 12:05:33 +0000 (GMT)
Subject: [lxml-dev] Building Python2.6 Windows eggs
Message-ID: <715844.87334.qm@web27701.mail.ukl.yahoo.com>
I reached this thread after googling the phrase "undefined reference to _ftol2 _chkstk mingw32". I found a quick and dirty solution that worked for me and want to share it. The solution is to include the following lines into a c file (e.g. ftol2.c) and include this file in the compilation.
-------------- ftol2.c -----------------
int _chkstk(int size) { return _alloca(size); }
long _ftol2(double f) { return (long) f; }
-----------------------------------------
Vangelis
From jholg at gmx.de Thu Feb 10 13:57:56 2011
From: jholg at gmx.de (jholg at gmx.de)
Date: Thu, 10 Feb 2011 13:57:56 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <4D53D575.4040200@behnel.de>
References:
<4D53D575.4040200@behnel.de>
Message-ID: <20110210125756.230230@gmx.net>
Hi,
> > First we need to thank Holger for all these years of free service.
>
> Yes, I'm also grateful about the support in the past years. Thanks,
> Holger.
+1
> > Then we need to quickly plan and execute a migration to some other
> repository.
> >
> > I am willing to help, but I am no core developer of lxml, and in fact
> > this is my first message ever to this mailing list, so I think we need
> > someone better known to this community to lead the migration effort.
>
> I've just created a repo at github.
>
> https://github.com/lxml/lxml
And just as I was about to ask if the lxml repo should switch to mercurial... ;-)
> There's nothing there yet, but given that I already switched to hg-svn a
> while ago, I have the complete trunk history available that I could just
> push.
>
> I'm not sure what to do with the maintenance branches, though. It would be
> nice to preserve them as well, at least the 2.2 line. I could convert them
> and just hang them in as separate repositories. Opinions?
Are there any "established" best practices for dealing with maintenance branches in distributed vcs? From skimming mercurial docs I got the impression that named branches living in the main repo might fit the bill.
What about the release tags? I suppose they are already preserved in the repo history, as they are just some symolic name for a revision/changeset (?)
My daily routine is still svn-centric and I haven't used hg but for the simplest "track some changes" use case - never used git so far.
Holger
--
Schon geh?rt? GMX hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://www.gmx.net/de/go/toolbar
From stefan_ml at behnel.de Thu Feb 10 14:36:15 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 10 Feb 2011 14:36:15 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <20110210125756.230230@gmx.net>
References:
<4D53D575.4040200@behnel.de> <20110210125756.230230@gmx.net>
Message-ID: <4D53E9CF.308@behnel.de>
jholg at gmx.de, 10.02.2011 13:57:
>>> Then we need to quickly plan and execute a migration to some other
>>> repository.
>>>
>>> I am willing to help, but I am no core developer of lxml, and in
>>> fact this is my first message ever to this mailing list, so I think
>>> we need someone better known to this community to lead the migration
>>> effort.
>>
>> I've just created a repo at github.
>>
>> https://github.com/lxml/lxml
>
> And just as I was about to ask if the lxml repo should switch to
> mercurial... ;-)
Well, I used it for a while now. Maybe I should have given a note about it
on the ML.
>> There's nothing there yet, but given that I already switched to hg-svn
>> a while ago, I have the complete trunk history available that I could
>> just push.
>>
>> I'm not sure what to do with the maintenance branches, though. It
>> would be nice to preserve them as well, at least the 2.2 line. I could
>> convert them and just hang them in as separate repositories.
>> Opinions?
>
> Are there any "established" best practices for dealing with maintenance
> branches in distributed vcs? From skimming mercurial docs I got the
> impression that named branches living in the main repo might fit the
> bill.
Well, there are basically two ways: branches (in-repo) and separate repos.
I think separate repos generally make more sense for maintenance branches
where you explicitly want things to diverge (and a trunk user really won't
care about a 1.3 branch). In-repo branches are better for short-lived
experiments, collaboration, etc. In both cases, it's easy enough to
cherry-pick patches from one branch to the other.
> What about the release tags? I suppose they are already preserved in the
> repo history, as they are just some symolic name for a
> revision/changeset (?)
Tags are a problem, yes. SVN doesn't readily provide tag information, it
sees tags and branches as simple directories. And most tags in lxml
originate from the maintenance branches, so that's even trickier to
re-engineer. I don't currently have that information in my hg repo at all.
> My daily routine is still svn-centric and I haven't used hg but for the
> simplest "track some changes" use case
Here's a good read then:
http://hginit.com/
> - never used git so far.
Well, I keep being disgusted by git, it just feels so wrong each time I
can't help using it. From a German POV, the name is really well chosen
(basically means 'yuk!').
Stefan
From jholg at gmx.de Thu Feb 10 16:07:37 2011
From: jholg at gmx.de (jholg at gmx.de)
Date: Thu, 10 Feb 2011 16:07:37 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <4D53E9CF.308@behnel.de>
References:
<4D53D575.4040200@behnel.de> <20110210125756.230230@gmx.net>
<4D53E9CF.308@behnel.de>
Message-ID: <20110210150737.63020@gmx.net>
> > Are there any "established" best practices for dealing with maintenance
> > branches in distributed vcs? From skimming mercurial docs I got the
> > impression that named branches living in the main repo might fit the
> > bill.
>
> Well, there are basically two ways: branches (in-repo) and separate repos.
> I think separate repos generally make more sense for maintenance branches
> where you explicitly want things to diverge (and a trunk user really won't
> care about a 1.3 branch). In-repo branches are better for short-lived
> experiments, collaboration, etc. In both cases, it's easy enough to
> cherry-pick patches from one branch to the other.
Not sure if this is out of date feature-wise but PEP 374 talks about "git cherry-pick doesn't work across repositories; you need to have the branches in the same repository." (http://python.org/dev/peps/pep-0374/)
> > What about the release tags? I suppose they are already preserved in the
> > repo history, as they are just some symolic name for a
> > revision/changeset (?)
>
> Tags are a problem, yes. SVN doesn't readily provide tag information, it
> sees tags and branches as simple directories. And most tags in lxml
> originate from the maintenance branches, so that's even trickier to
> re-engineer. I don't currently have that information in my hg repo at all.
>
> Here's a good read then:
>
> http://hginit.com/
Thanks.
This might be of help for the strategy for handling of branches/tags: http://www.python.org/dev/peps/pep-0385/#transition-plan
Though I don't see anything on keeping correct tags, on first glance.
> Well, I keep being disgusted by git, it just feels so wrong each time I
> can't help using it. From a German POV, the name is really well chosen
> (basically means 'yuk!').
Read "it just feels so wrong each time that I can't help but using it" or
"it just feels so wrong each time when I can't help using it"? 8-)
Holger
--
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl
From paulhtremblay at gmail.com Fri Feb 11 06:04:48 2011
From: paulhtremblay at gmail.com (Paul Tremblay)
Date: Fri, 11 Feb 2011 00:04:48 -0500
Subject: [lxml-dev] possible to use import with a string?
Message-ID: <4D54C370.5070503@gmail.com>
First, thanks Holger for making such a nice library for libxslt.
Can someone help me out with resolving URIs?
The following code works, except for the
mport sys, os, StringIO
from StringIO import StringIO
from lxml import etree
xslt_root = etree.XML('''\
''')
transform = etree.XSLT(xslt_root)
f = StringIO('Text')
doc = etree.parse(f)
result = transform(doc)
I've read http://codespeak.net/lxml/resolvers.html, but still don't
quite understand
how to solve my problem.
Thanks
Paul
From stefan_ml at behnel.de Fri Feb 11 07:26:48 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 11 Feb 2011 07:26:48 +0100
Subject: [lxml-dev] Codespeak shutting down: migration plans?
In-Reply-To: <4D53D575.4040200@behnel.de>
References:
<4D53D575.4040200@behnel.de>
Message-ID: <4D54D6A8.4010206@behnel.de>
Stefan Behnel, 10.02.2011 13:09:
> I've just created a repo at github.
>
> https://github.com/lxml/lxml
... and now recreated it as an organisation. I think that makes more sense.
Stefan
From stefan_ml at behnel.de Fri Feb 11 14:31:33 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 11 Feb 2011 14:31:33 +0100
Subject: [lxml-dev] Source repository has moved to github
Message-ID: <4D553A35.20301@behnel.de>
Hi all,
as previously noted on the list, codespeak.net is closing down after
several years as friendly, free and very well working home for lxml.
I have therefore started with the migration process for lxml's
infrastructure. First, the SVN will no longer be used and write access has
been disabled. The new home for the source repository is
https://github.com/lxml
The new main branch is at
https://github.com/lxml/lxml.git
or
git+ssh://git at github.com/lxml/lxml.git
for developers.
You can use either hg or git to access it. I personally use hg together
with the git bridge "hg-git", which I can recommend.
http://hg-git.github.com/
Here's a good introduction to hg, in case you have never used it:
http://hginit.com/
Have fun!
Stefan
From svetlyak.40wt at gmail.com Fri Feb 11 17:17:01 2011
From: svetlyak.40wt at gmail.com (Alexander Artemenko)
Date: Fri, 11 Feb 2011 19:17:01 +0300
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To: <4D553A35.20301@behnel.de>
References: <4D553A35.20301@behnel.de>
Message-ID:
Hi
On Fri, Feb 11, 2011 at 4:31 PM, Stefan Behnel wrote:
> Hi all,
>
> as previously noted on the list, codespeak.net is closing down after
> several years as friendly, free and very well working home for lxml.
>
> I have therefore started with the migration process for lxml's
> infrastructure. First, the SVN will no longer be used and write access has
> been disabled. The new home for the source repository is
>
> https://github.com/lxml
There is the way to keep tags and branches moving from SVN to GIT.
Read about git-svn's --branches and --tags options. Now I'm trying to
clone repository to git with all tags and branches. I run it at 12:00.
It is 19:15 now
but it is still running. If you wish, I could send you ready git
repository archived,
when process will be completed.
--
Alexander Artemenko (a.k.a. Svetlyak 40wt)
Blog: http://dev.svetlyak.ru
Photos: http://svetlyak.ru
Jabber: svetlyak.40wt at gmail.com
From arfrever.fta at gmail.com Fri Feb 11 21:35:27 2011
From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis)
Date: Fri, 11 Feb 2011 21:35:27 +0100
Subject: [lxml-dev] 'import lxml.html.soupparser' fails with Python 3
Message-ID: <201102112136.02364.Arfrever.FTA@gmail.com>
$ python3.1 -c 'import lxml.html.soupparser'
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib64/python3.1/site-packages/lxml/html/soupparser.py", line 108, in
from htmlentitydefs import name2codepoint
ImportError: No module named htmlentitydefs
I'm attaching the patch.
--
Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lxml.html.soupparser.patch
Type: text/x-patch
Size: 399 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110211/82ad59e1/attachment.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110211/82ad59e1/attachment.pgp
From Tim.Arnold at sas.com Fri Feb 11 21:19:42 2011
From: Tim.Arnold at sas.com (Tim Arnold)
Date: Fri, 11 Feb 2011 20:19:42 +0000
Subject: [lxml-dev] modifying a tree in place
Message-ID: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814DB74@MERCMBX03R.na.SAS.com>
hi, I'm not sure if I can modify elements in a tree as I iterate over the tree.
The situation is parsing XHTML and attempting to remove empty italic or bold tags. Otherwise, they're written out as or , which the browser (well, Chrome anyway), treats as opening the tag, so all text after it becomes italicized.
So I'm iterating over the tree and adding each offending element to a list. When I'm done with that, I iterate over the list and replace the elements. My question is whether I need to do the second step or can I replace the elements as I iterate over the tree. Here's my code:
from lxml import etree
parser = etree.HTMLParser()
fname = 'mytest0.htm'
tree = etree.parse(fname, parser)
droptags = list()
for elem in tree.xpath('//i|//b'):
if not elem.text and not len(elem):
droptags.append(elem)
for elem in droptags:
parent = elem.getparent()
newelem = etree.Element('span')
newelem.text = elem.tail
parent.replace(elem,newelem)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110211/b26b5bd4/attachment.htm
From jholg at gmx.de Mon Feb 14 08:51:13 2011
From: jholg at gmx.de (jholg at gmx.de)
Date: Mon, 14 Feb 2011 08:51:13 +0100
Subject: [lxml-dev] modifying a tree in place
In-Reply-To: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814DB74@MERCMBX03R.na.SAS.com>
References: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814DB74@MERCMBX03R.na.SAS.com>
Message-ID: <20110214075113.259470@gmx.net>
Hi,
> So I'm iterating over the tree and adding each offending element to a
> list. When I'm done with that, I iterate over the list and replace the
> elements. My question is whether I need to do the second step or can I replace the
> elements as I iterate over the tree. Here's my code:
http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
(Of course, you should profile your alternatives to see what works (performs) better for your use case)
Holger
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
From stefan_ml at behnel.de Mon Feb 14 18:11:52 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 18:11:52 +0100
Subject: [lxml-dev] Where to move the mailing list?
Message-ID: <4D596258.9090107@behnel.de>
Hi,
now that the source repo is on github and the web site is about to get
moved as well - where should this mailing list move? Any proposals? Any
volunteers for hosting this list?
Stefan
From sergio at sergiomb.no-ip.org Mon Feb 14 18:33:57 2011
From: sergio at sergiomb.no-ip.org (Sergio Monteiro Basto)
Date: Mon, 14 Feb 2011 17:33:57 +0000
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: <1297704837.21317.4.camel@segulix>
On Mon, 2011-02-14 at 18:11 +0100, Stefan Behnel wrote:
> Hi,
>
> now that the source repo is on github and the web site is about to get
> moved as well - where should this mailing list move? Any proposals? Any
> volunteers for hosting this list?
have you consider sourceforge.net ?
--
S?rgio M. B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3309 bytes
Desc: not available
Url : http://codespeak.net/pipermail/lxml-dev/attachments/20110214/ff9d67f3/attachment-0001.bin
From ovnicraft at gmail.com Mon Feb 14 19:15:39 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Mon, 14 Feb 2011 13:15:39 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <1297704837.21317.4.camel@segulix>
References: <4D596258.9090107@behnel.de>
<1297704837.21317.4.camel@segulix>
Message-ID:
I really prefer google groups.
Cristian Salamea
On Feb 14, 2011 12:40 PM, "Sergio Monteiro Basto"
wrote:
> On Mon, 2011-02-14 at 18:11 +0100, Stefan Behnel wrote:
>> Hi,
>>
>> now that the source repo is on github and the web site is about to get
>> moved as well - where should this mailing list move? Any proposals? Any
>> volunteers for hosting this list?
>
> have you consider sourceforge.net ?
>
>
> --
> S?rgio M. B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110214/5870d61d/attachment.htm
From fdrake at acm.org Mon Feb 14 19:11:56 2011
From: fdrake at acm.org (Fred Drake)
Date: Mon, 14 Feb 2011 13:11:56 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID:
On Mon, Feb 14, 2011 at 12:11 PM, Stefan Behnel wrote:
> now that the source repo is on github and the web site is about to get
> moved as well - where should this mailing list move? Any proposals? Any
> volunteers for hosting this list?
http://librelist.com/ seems to be getting some traction in the larger
Python community, and generally appeals to people who just want a list
and don't want to feed content directly to Google.
? -Fred
--
Fred L. Drake, Jr.? ?
"A storm broke loose in my mind."? --Albert Einstein
From stefan_ml at behnel.de Mon Feb 14 19:51:23 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 19:51:23 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To:
References: <4D596258.9090107@behnel.de>
Message-ID: <4D5979AB.7000506@behnel.de>
Fred Drake, 14.02.2011 19:11:
> On Mon, Feb 14, 2011 at 12:11 PM, Stefan Behnel wrote:
>> now that the source repo is on github and the web site is about to get
>> moved as well - where should this mailing list move? Any proposals? Any
>> volunteers for hosting this list?
>
> http://librelist.com/ seems to be getting some traction in the larger
> Python community, and generally appeals to people who just want a list
> and don't want to feed content directly to Google.
Been there. It may be ok as long as it works, but the hoster (Zed Shaw)
actively gave me the impression of not wanting to let anyone actually use
this service (or maybe use, but certainly not do anything with it that may
require him to put down his telly's remote control).
http://librelist.com/browser/meta/2011/2/11/hyphens-in-list-names/
Stefan
From stefan_ml at behnel.de Mon Feb 14 19:57:59 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 19:57:59 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To:
References: <4D596258.9090107@behnel.de> <1297704837.21317.4.camel@segulix>
Message-ID: <4D597B37.1060006@behnel.de>
Ovnicraft, 14.02.2011 19:15:
> I really prefer google groups.
Google Groups has been proposed by a couple of top-posters already. This
seem to be a general problem with this provider, but certainly not the only
one.
Sorry, but Google's straight out.
Stefan
From fdrake at acm.org Mon Feb 14 19:57:14 2011
From: fdrake at acm.org (Fred Drake)
Date: Mon, 14 Feb 2011 13:57:14 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D5979AB.7000506@behnel.de>
References: <4D596258.9090107@behnel.de>
<4D5979AB.7000506@behnel.de>
Message-ID:
On Mon, Feb 14, 2011 at 1:51 PM, Stefan Behnel wrote:
> Been there. It may be ok as long as it works, but the hoster (Zed Shaw)
> actively gave me the impression of not wanting to let anyone actually use
> this service (or maybe use, but certainly not do anything with it that may
> require him to put down his telly's remote control).
Ouch!
I've no particular interest in seeing this go either way, but...
non-support for hyphens is weird. Zed's response tells me other
projects' going with Google Groups is a good thing, for reasons that
have nothing to do with Google.
? -Fred
--
Fred L. Drake, Jr.? ?
"A storm broke loose in my mind."? --Albert Einstein
From Tim.Arnold at sas.com Mon Feb 14 20:00:48 2011
From: Tim.Arnold at sas.com (Tim Arnold)
Date: Mon, 14 Feb 2011 19:00:48 +0000
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To:
References: <4D596258.9090107@behnel.de>
<4D5979AB.7000506@behnel.de>
Message-ID: <3AA0EA4F99BA8C4F89E32C90DF945E0E1814EC1D@MERCMBX03R.na.SAS.com>
> -----Original Message-----
> From: lxml-dev-bounces at codespeak.net [mailto:lxml-dev-
> bounces at codespeak.net] On Behalf Of Fred Drake
> Sent: Monday, February 14, 2011 1:57 PM
> To: Stefan Behnel
> Cc: ML-Lxml-dev
> Subject: Re: [lxml-dev] Where to move the mailing list?
>
> On Mon, Feb 14, 2011 at 1:51 PM, Stefan Behnel
> wrote:
> > Been there. It may be ok as long as it works, but the hoster (Zed
> > Shaw) actively gave me the impression of not wanting to let anyone
> > actually use this service (or maybe use, but certainly not do anything
> > with it that may require him to put down his telly's remote control).
>
> Ouch!
>
> I've no particular interest in seeing this go either way, but...
> non-support for hyphens is weird. Zed's response tells me other projects'
> going with Google Groups is a good thing, for reasons that have nothing to
> do with Google.
>
>
> ? -Fred
>
> --
> Fred L. Drake, Jr.? ?
+1. After reading that thread I think I'll leave librelist alone.
--Tim Arnold
From p.oberndoerfer at urheberrecht.org Mon Feb 14 20:55:34 2011
From: p.oberndoerfer at urheberrecht.org (=?iso-8859-1?Q?=22Pascal_Obernd=F6rfer=22?=)
Date: Mon, 14 Feb 2011 20:55:34 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: <06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
> Hi,
>
> now that the source repo is on github and the web site is about to get
> moved as well - where should this mailing list move? Any proposals? Any
> volunteers for hosting this list?
>
> Stefan
Might applying for a python.org-list be worth a try?
Pascal
From st.jonathan at gmail.com Mon Feb 14 21:01:50 2011
From: st.jonathan at gmail.com (Jonathan Stoppani)
Date: Mon, 14 Feb 2011 21:01:50 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D596258.9090107@behnel.de>
References: <4D596258.9090107@behnel.de>
Message-ID: <22B2762A-5562-4080-8BAC-691B918FDE02@gmail.com>
On Feb 14, 2011, at 6:11 PM, Stefan Behnel wrote:
> Hi,
>
> now that the source repo is on github and the web site is about to get
> moved as well - where should this mailing list move? Any proposals? Any
> volunteers for hosting this list?
>
> Stefan
I can offer a mailman based mailing list if needed.
Jonathan
From stefan_ml at behnel.de Mon Feb 14 21:11:11 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 21:11:11 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
References: <4D596258.9090107@behnel.de>
<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
Message-ID: <4D598C5F.7070908@behnel.de>
"Pascal Obernd?rfer", 14.02.2011 20:55:
>> now that the source repo is on github and the web site is about to get
>> moved as well - where should this mailing list move? Any proposals? Any
>> volunteers for hosting this list?
>>
>> Stefan
>
> Might applying for a python.org-list be worth a try?
>
>
I'm actually considering that. Cython's mailing list also moved there (for
the same reason as lxml's). The lists at python.org aren't really fast,
likely because of the relatively high traffic they carry. But they
certainly are a valid address for a major Python based project.
Stefan
From ovnicraft at gmail.com Mon Feb 14 22:24:12 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Mon, 14 Feb 2011 16:24:12 -0500
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To: <4D598C5F.7070908@behnel.de>
References: <4D596258.9090107@behnel.de>
<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
<4D598C5F.7070908@behnel.de>
Message-ID:
On Mon, Feb 14, 2011 at 3:11 PM, Stefan Behnel wrote:
> "Pascal Obernd?rfer", 14.02.2011 20:55:
> >> now that the source repo is on github and the web site is about to get
> >> moved as well - where should this mailing list move? Any proposals? Any
> >> volunteers for hosting this list?
> >>
> >> Stefan
> >
> > Might applying for a python.org-list be worth a try?
> >
> >
>
> I'm actually considering that. Cython's mailing list also moved there (for
> the same reason as lxml's). The lists at python.org aren't really fast,
> likely because of the relatively high traffic they carry. But they
> certainly are a valid address for a major Python based project.
>
But remember something its all about python itself, anyway at this point we
get two choices:
Google services
Python services
Can all subscribers gives your opinion here ?
Regards,
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
--
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110214/31ec0efa/attachment-0001.htm
From stefan_ml at behnel.de Mon Feb 14 22:29:35 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Mon, 14 Feb 2011 22:29:35 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To:
References: <4D596258.9090107@behnel.de>
<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
<4D598C5F.7070908@behnel.de>
Message-ID: <4D599EBF.4060908@behnel.de>
Ovnicraft, 14.02.2011 22:24:
> On Mon, Feb 14, 2011 at 3:11 PM, Stefan Behnel wrote:
>
>> "Pascal Obernd?rfer", 14.02.2011 20:55:
>>>> now that the source repo is on github and the web site is about to get
>>>> moved as well - where should this mailing list move? Any proposals? Any
>>>> volunteers for hosting this list?
>>>>
>>>> Stefan
>>>
>>> Might applying for a python.org-list be worth a try?
>>>
>>>
>>
>> I'm actually considering that. Cython's mailing list also moved there (for
>> the same reason as lxml's). The lists at python.org aren't really fast,
>> likely because of the relatively high traffic they carry. But they
>> certainly are a valid address for a major Python based project.
>
> But remember something its all about python itself, anyway at this point we
> get two choices:
>
> Google services
> Python services
>
> Can all subscribers gives your opinion here ?
Argh, please, no. We have a lot of subscribers, if everyone writes an
e-mail to express their opinion, we'd be flooded.
Stefan
From stefan_ml at behnel.de Tue Feb 15 05:31:53 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 05:31:53 +0100
Subject: [lxml-dev] Where to move the mailing list?
In-Reply-To:
References: <4D596258.9090107@behnel.de>
<06c0167896d4e520577497f90d2fc476.squirrel@mail.urheberrecht.org>
<4D598C5F.7070908@behnel.de>
<4D599EBF.4060908@behnel.de>
Message-ID: <4D5A01B9.20804@behnel.de>
Michael Lissner, 14.02.2011 23:21:
>> Ovnicraft, 14.02.2011 22:24:
>>> On Mon, Feb 14, 2011 at 3:11 PM, Stefan Behnel wrote:
>>>
>>>> "Pascal Obernd?rfer", 14.02.2011 20:55:
>>>>>> now that the source repo is on github and the web site is about to get
>>>>>> moved as well - where should this mailing list move? Any proposals? Any
>>>>>> volunteers for hosting this list?
>>>>>>
>>>>>> Stefan
>>>>>
>>>>> Might applying for a python.org-list be worth a try?
>>>>>
>>>>>
>>>>
>>>> I'm actually considering that. Cython's mailing list also moved there (for
>>>> the same reason as lxml's). The lists at python.org aren't really fast,
>>>> likely because of the relatively high traffic they carry. But they
>>>> certainly are a valid address for a major Python based project.
>>>
>>> But remember something its all about python itself, anyway at this point we
>>> get two choices:
>>>
>>> Google services
>>> Python services
>
> OK, I know you don't want all subscribers responding, but I'll give my
> own thoughts. I'd say go with a Google Group or self-host.
Hmm, now that you mention it - I may not be able to self-host, but I can
give us a good mailing list address.
> Yes, Google
> is a big company, but they do have a good group system (the best).
I do not argue against Google being a big company. I also do not argue
against them offering a good service in some areas. But I do argue against
them offering a good service for mailing lists.
> You've got threads, search, easy subscription, a UI people are
> familiar with, etc. I don't think I saw the argument against Google
> Groups?
They have problems with spam, they push users into top-posting, they keep
having their own ideas about who they want to let subscribe. I don't like
that at all. If not using their web interface, what's the point in using
their service?
> I don't know much about mailman, but from my own usage, it seems like
> it's about a decade old, and lacks search, which is pretty lame. As a
> user of lxml, I'd really appreciate a list where I could search for
> common problems.
The lxml mailing list has been archived in various places. Most of those
are easily searchable, including with Google.
Stefan
From stefan_ml at behnel.de Tue Feb 15 09:45:49 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 09:45:49 +0100
Subject: [lxml-dev] http://codespeak.net/lxml has moved to lxml.de
Message-ID: <4D5A3D3D.7040003@behnel.de>
Hi everyone,
I moved the web site to a new home. It's now at
http://lxml.de/
which is clearly a lot shorter than the old address. ;)
There is a 301 redirect set up from the old site, so that you should get
the new pages through the old addresses. I took care to fix up the inner
links (without breaking XML namespaces etc.). If you find any problems,
please report them to me, I'll see that I can fix them ASAP.
Have fun,
Stefan
From p.oberndoerfer at urheberrecht.org Tue Feb 15 10:48:09 2011
From: p.oberndoerfer at urheberrecht.org (Pascal)
Date: Tue, 15 Feb 2011 09:48:09 +0000 (UTC)
Subject: [lxml-dev] http://codespeak.net/lxml has moved to lxml.de
References: <4D5A3D3D.7040003@behnel.de>
Message-ID:
Stefan Behnel behnel.de> writes:
> There is a 301 redirect set up from the old site, so that you should get
> the new pages through the old addresses.
This (i.e. specifically the redirect) is really great news! Congrats!
From stefan_ml at behnel.de Tue Feb 15 12:56:38 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 12:56:38 +0100
Subject: [lxml-dev] new mailing list
Message-ID: <4D5A69F6.9040600@behnel.de>
Hi,
I got the mailing list set up. It's hosted by a volunteer (thanks,
Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
You should have received a subscription message for the new list, as I mass
subscribed all current subscribers to the new list. Please update your mail
filters accordingly and arrange for digest delivery if you prefer that.
Sorry for any inconvenience.
Stefan
From Marc.Graff at VerizonWireless.com Tue Feb 15 16:17:00 2011
From: Marc.Graff at VerizonWireless.com (Graff, Marc)
Date: Tue, 15 Feb 2011 10:17:00 -0500
Subject: [lxml-dev] new mailing list
In-Reply-To:
References:
Message-ID: <20110215152717.4846D282BEA@codespeak.net>
Thanks for the solid project and support.
-----Original Message-----
From: lxml-dev-bounces at codespeak.net
[mailto:lxml-dev-bounces at codespeak.net] On Behalf Of Stefan Behnel
Sent: Tuesday, February 15, 2011 6:57 AM
To: lxml mailing list; ML-Lxml-dev
Subject: [lxml-dev] new mailing list
Hi,
I got the mailing list set up. It's hosted by a volunteer (thanks,
Jonathan!), but directed through a local mail address "lxml" at
"lxml.de".
You should have received a subscription message for the new list, as I
mass
subscribed all current subscribers to the new list. Please update your
mail
filters accordingly and arrange for digest delivery if you prefer that.
Sorry for any inconvenience.
Stefan
_______________________________________________
lxml-dev mailing list
lxml-dev at codespeak.net
http://codespeak.net/mailman/listinfo/lxml-dev
From ovnicraft at gmail.com Tue Feb 15 17:03:16 2011
From: ovnicraft at gmail.com (Ovnicraft)
Date: Tue, 15 Feb 2011 11:03:16 -0500
Subject: [lxml-dev] new mailing list
In-Reply-To: <4D5A69F6.9040600@behnel.de>
References: <4D5A69F6.9040600@behnel.de>
Message-ID:
On Tue, Feb 15, 2011 at 6:56 AM, Stefan Behnel wrote:
> Hi,
>
> I got the mailing list set up. It's hosted by a volunteer (thanks,
> Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
>
> You should have received a subscription message for the new list, as I mass
> subscribed all current subscribers to the new list. Please update your mail
> filters accordingly and arrange for digest delivery if you prefer that.
> Sorry for any inconvenience.
>
> Stefan
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
Thanks for this great project.
--
Cristian Salamea
@ovnicraft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110215/b857a6d0/attachment.htm
From lists at cheimes.de Tue Feb 15 17:49:46 2011
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 15 Feb 2011 17:49:46 +0100
Subject: [lxml-dev] new mailing list
In-Reply-To: <4D5A69F6.9040600@behnel.de>
References: <4D5A69F6.9040600@behnel.de>
Message-ID:
Am 15.02.2011 12:56, schrieb Stefan Behnel:
> I got the mailing list set up. It's hosted by a volunteer (thanks,
> Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
>
> You should have received a subscription message for the new list, as I mass
> subscribed all current subscribers to the new list. Please update your mail
> filters accordingly and arrange for digest delivery if you prefer that.
> Sorry for any inconvenience.
Thanks Stefan!
The links at http://lxml.de/index.html#mailing-list still points to the
wrong URL.
Have you notified gmane about the new addresses for the Cython and LXML
mailing lists?
Christian
From stefan_ml at behnel.de Tue Feb 15 17:58:55 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 17:58:55 +0100
Subject: [lxml-dev] new mailing list
In-Reply-To:
References: <4D5A69F6.9040600@behnel.de>
Message-ID: <4D5AB0CF.8040501@behnel.de>
Christian Heimes, 15.02.2011 17:49:
> Am 15.02.2011 12:56, schrieb Stefan Behnel:
>> I got the mailing list set up. It's hosted by a volunteer (thanks,
>> Jonathan!), but directed through a local mail address "lxml" at "lxml.de".
>>
>> You should have received a subscription message for the new list, as I mass
>> subscribed all current subscribers to the new list. Please update your mail
>> filters accordingly and arrange for digest delivery if you prefer that.
>> Sorry for any inconvenience.
>
> Thanks Stefan!
>
> The links at http://lxml.de/index.html#mailing-list still points to the
> wrong URL.
Right, I've fixed that in the sources but not redeployed the web site yet.
You can actually go through http://lxml.de/mailinglist/ now to get to the
subscription page.
> Have you notified gmane about the new addresses for the Cython and LXML
> mailing lists?
Yes, two times about the Cython list, once about the lxml list - no
response so far. I'll keep trying.
Stefan
From noah at mahalo.com Tue Feb 15 20:19:20 2011
From: noah at mahalo.com (Noah Silas)
Date: Tue, 15 Feb 2011 11:19:20 -0800
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To:
References: <4D553A35.20301@behnel.de>
Message-ID:
Now that there is a proper presence for the project on github, I'll be
closing my existing svn mirror. Anybody that has been using the
https://github.com/noah256/lxml/ github repo should migrate to the official
https:///github.com/lxml/lxml/ repo. If you have any problems changing over,
feel free to email me directly for assistance.
~Noah
On Fri, Feb 11, 2011 at 8:17 AM, Alexander Artemenko <
svetlyak.40wt at gmail.com> wrote:
> Hi
>
> On Fri, Feb 11, 2011 at 4:31 PM, Stefan Behnel
> wrote:
> > Hi all,
> >
> > as previously noted on the list, codespeak.net is closing down after
> > several years as friendly, free and very well working home for lxml.
> >
> > I have therefore started with the migration process for lxml's
> > infrastructure. First, the SVN will no longer be used and write access
> has
> > been disabled. The new home for the source repository is
> >
> > https://github.com/lxml
>
> There is the way to keep tags and branches moving from SVN to GIT.
>
> Read about git-svn's --branches and --tags options. Now I'm trying to
> clone repository to git with all tags and branches. I run it at 12:00.
> It is 19:15 now
> but it is still running. If you wish, I could send you ready git
> repository archived,
> when process will be completed.
>
> --
> Alexander Artemenko (a.k.a. Svetlyak 40wt)
> Blog: http://dev.svetlyak.ru
> Photos: http://svetlyak.ru
> Jabber: svetlyak.40wt at gmail.com
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://codespeak.net/pipermail/lxml-dev/attachments/20110215/258880fb/attachment-0001.htm
From stefan_ml at behnel.de Tue Feb 15 20:27:51 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 15 Feb 2011 20:27:51 +0100
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To:
References: <4D553A35.20301@behnel.de>
Message-ID: <4D5AD3B7.6090202@behnel.de>
Alexander Artemenko, 11.02.2011 17:17:
> On Fri, Feb 11, 2011 at 4:31 PM, Stefan Behnel wrote:
>> as previously noted on the list, codespeak.net is closing down after
>> several years as friendly, free and very well working home for lxml.
>>
>> I have therefore started with the migration process for lxml's
>> infrastructure. First, the SVN will no longer be used and write access has
>> been disabled. The new home for the source repository is
>>
>> https://github.com/lxml
>
> There is the way to keep tags and branches moving from SVN to GIT.
>
> Read about git-svn's --branches and --tags options. Now I'm trying to
> clone repository to git with all tags and branches. I run it at 12:00.
> It is 19:15 now
> but it is still running. If you wish, I could send you ready git
> repository archived,
> when process will be completed.
Ah, sorry for coming back to this so late. Did the conversion run
successful? I actually ran mine on a stripped dump of the SVN repo. That's
much faster.
So far, I'm quite happy with the separation of the maintenance branches
from the master branch. But I wouldn't mind replacing the maintenance
branch repo with your complete conversion. Although, all that's currently
missing is the tags. I wouldn't even mind recreating those manually...
Stefan
From svetlyak.40wt at gmail.com Wed Feb 16 07:10:10 2011
From: svetlyak.40wt at gmail.com (Alexander Artemenko)
Date: Wed, 16 Feb 2011 09:10:10 +0300
Subject: [lxml-dev] Source repository has moved to github
In-Reply-To: <4D5AD3B7.6090202@behnel.de>
References: <4D553A35.20301@behnel.de>
<4D5AD3B7.6090202@behnel.de>
Message-ID:
Hi Stefan,
On Tue, Feb 15, 2011 at 10:27 PM, Stefan Behnel wrote:
> Ah, sorry for coming back to this so late. Did the conversion run
> successful? I actually ran mine on a stripped dump of the SVN repo. That's
> much faster.
Yes, conversion completed successfuly, you can download it here:
http://pypi.svetlyak.ru/lxml-git.tar.bz2
git-svn copied all svn branches and tags as git's remote branches, you
can transform them into the real tags and branches. I already created
a tag for 2.3 version. You even could write a script which will run
'git branch -r' to see which tags are available, and then will do:
git checkout tags/lxml-2.2.4
git tag 2.2.4
git checkout master
Or, if you need a real branch, then you could do:
git branch --no-track threading threading
--
Alexander Artemenko (a.k.a. Svetlyak 40wt)
Blog: http://dev.svetlyak.ru
Photos: http://svetlyak.ru
Jabber: svetlyak.40wt at gmail.com
From john at nmt.edu Wed Feb 16 22:05:06 2011
From: john at nmt.edu (John W. Shipman)
Date: Wed, 16 Feb 2011 14:05:06 -0700 (MST)
Subject: [lxml-dev] [lxml] Small sample
In-Reply-To: <699662.39531.qm@web84205.mail.re3.yahoo.com>
References: <699662.39531.qm@web84205.mail.re3.yahoo.com>
Message-ID:
+--
| Would someone point me to a small lxml parsing sample for
| opening an traversing an XML file?
+--
If I might recommend my own modest effort at documenting lxml:
http://www.nmt.edu/tcc/help/pubs/pylxml/
Also see this page for a number of literate programs that use
lxml:
http://www.nmt.edu/~shipman/soft/litprog/
In particular, from this page, you might look at these projects:
Bb8import: Reads XML, generates XML.
docbookindex: Generates XSL-FO directly from Python.
hwscan3: Reads XML, generates XHTML.
Bird taxonomy system: Reads and writes XML.
birdnotes.py: Reads XML.
catweb: Reads XML, generates XHTML.
Best regards,
John Shipman (john at nmt.edu), Applications Specialist, NM Tech Computer Center,
Speare 119, Socorro, NM 87801, (575) 835-5735, http://www.nmt.edu/~john
``Let's go outside and commiserate with nature.'' --Dave Farber
From paulhtremblay at gmail.com Thu Feb 17 04:49:19 2011
From: paulhtremblay at gmail.com (Paul Tremblay)
Date: Wed, 16 Feb 2011 22:49:19 -0500
Subject: [lxml-dev] Thanks
Message-ID: <4D5C9ABF.4080200@gmail.com>
I would also like to thank to all the developers and others who have
supported lxml.
Paul
From stefan_ml at behnel.de Thu Feb 17 07:16:33 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 17 Feb 2011 07:16:33 +0100
Subject: [lxml-dev] 'import lxml.html.soupparser' fails with Python 3
In-Reply-To: <201102112136.02364.Arfrever.FTA@gmail.com>
References: <201102112136.02364.Arfrever.FTA@gmail.com>
Message-ID: <4D5CBD41.7020709@behnel.de>
Arfrever Frehtes Taifersar Arahesis, 11.02.2011 21:35:
> $ python3.1 -c 'import lxml.html.soupparser'
> Traceback (most recent call last):
> File "", line 1, in
> File "/usr/lib64/python3.1/site-packages/lxml/html/soupparser.py", line 108, in
> from htmlentitydefs import name2codepoint
> ImportError: No module named htmlentitydefs
>
> I'm attaching the patch.
Thanks!
https://github.com/lxml/lxml/commit/3022257b05a3ba86d72666c0b3f929be50e8e331
Stefan
From breuerss at uni-koeln.de Mon Feb 28 11:11:29 2011
From: breuerss at uni-koeln.de (Sebastian Breuers)
Date: Mon, 28 Feb 2011 11:11:29 +0100
Subject: [lxml-dev] etree.XMLSchema throws etree.XMLSchemaParseError on
reading CML schema
In-Reply-To: <4D48F75A.208@behnel.de>
References: <4D4875E9.1040901@uni-koeln.de> <4D48F75A.208@behnel.de>
Message-ID: <4D6B74D1.6060803@uni-koeln.de>
Hey,
just to finally add a comment to that issue.
I discussed the stuff with the CML developers and after some testing
also according to your suggestions we came to the conclusion that it is
a bug in the libxml2. This bug is also already mentioned
https://bugzilla.gnome.org/show_bug.cgi?id=573483
That means that lxml is unfortunately till the fix of that issue not
usable for us.
Kind regards and thanks for your efforts.
Sebastian
Am 02.02.2011 07:19, schrieb Stefan Behnel:
> Sebastian Breuers, 01.02.2011 22:06:
>> I encounter the following issue. As a member of the MoSGrid consortium, a
>> project that is aimed to facilitate molecular simulations in the D-Grid
>> environment, I want to use the CML (Chemical Markup Language) to describe
>> molecular simulation jobs.
>>
>> I wrote a small validator that uses the lxml.etree.XMLSchema object to
>> read
>> the XSD describing the CML3 (located at
>> http://www.xml-cml.org/schema/schema3/schema.xsd). It stops reading the
>> schema with the lxml.etree.XMLSchemaParseError:
>>
>> local complex type: The content model is not determinist., line 5962
>>
>> As I wrote to the developer of the CML he told me that his schema is read
>> properly in JAVA and C# with the saxon library. I've got an idea why the
>> XMLSchema object is throwing that exception but now I am not quite
>> sure if
>> it is an issue with the standard (CML) or with the XMLSchema.
>
> It's usually an issue with the standard of XML-Schema. ;) The problem is
> that the W3C specification is extremely complicated - it's even more
> complex than actually writing a schema, and that's telling, in case
> you've never done that. So the simple fact that there is one tool that
> can successfully parse a W3C schema document doesn't mean that every
> other validation tool can work with it. Specifically, it is a known fact
> that libxml2 (which lxml gets its schema support from) has deficiencies
> with some less widely used schema constructs.
>
> I suggest this:
>
> 1) test the schema with the xmllint command line tool to reproduce the
> problem with plain libxml2.
>
> 2) contact the CML developer again and ask him to debug the schema
> against libxml2/xmllint. Maybe he can find a simple way to make it work.
> Don't forget to mention that libxml2 is a very widely used tool for XML
> processing that's absolutely worth supporting.
>
> 3) look out for a CML schema in RelaxNG or Schematron, which have much
> more accessible specifications and are much easier to implement
> correctly. These languages also make it a lot easier to write and
> maintain a schema, and you can generate a W3C XML Schema from RelaxNG
> using "trang".
>
> Stefan
--
_____________________________________________________________________________
Sebastian Breuers Tel: +49-221-470-4108
EMail: breuerss at uni-koeln.de
Universit?t zu K?ln University of Cologne
Department f?r Chemie Department of Chemistry
Organische Chemie Organic Chemistry
Greinstra?e 6 Greinstra?e 6
Raum 325 Room 325
D-50939 K?ln D-50939 Cologne, Federal Rep. of Germany
_____________________________________________________________________________