[lxml-dev] About the position of html parsing by HTML Target parser
Nicholas Dudfield
ndudfield at gmail.com
Mon Jul 20 10:53:57 CEST 2009
> [bringing this back to the list]
Sorry about that :)
> You may also consider using something like ahocorasick
ahocorasick doesnt seem to work with native unicode, which makes it
about as useful (for my particular purpose anyway) as the parser
context uft8 stream positions :(
Is there any fundamental reason why an xml parser couldn't work with
native unicode? ie an abstract character stream? I'm completely
clueless when it comes to parsers.
More information about the lxml-dev
mailing list