[python] Pomoc s pythonním RE
Bystroushaak
bystrousak na kitakitsune.org
Sobota Leden 12 22:17:35 CET 2013
Zdravím.
Potřebuji pomoc s pythonním re modulem. Hraji si s tím už několik hodin
a už jsem z toho v koncích.
Mám script:
-------------------------------------------------------------------------------
import re
data = """<tr><td class="newscap"><b style="font-size:13px">Downtime for
Christmas</b>
<br><small>by <script language="javascript">document.write('<a
class=\"cap\"
href=\"mailto:'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'\">'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'</a>')</script><noscript>rattle</noscript>
on 12/30/12 10:48</small></td></tr>
<tr><td class="aware" colspan="2">
So, it appears the site was down for christmas. I could try to find
out why, but I don't care enough. Went to <a
href="https://events.ccc.de/congress/2012/wiki/Main_Page">29c3</a>,
didn't get much done, ate a lot of fast food. I'm old, fat, and boring
now. However, I found out about <a
href="http://www.hyperelliptic.org/tanja/newelliptic/newelliptic.html">Edwards
curves</a>, that shit is rad.
</td></tr>"""
print re.sub(r'.*(<script.*>)(.*)(</script>).*',
r"\n\n---\1\n---\2\n---\3", data)
-------------------------------------------------------------------------------
Který po spuštění vypíše:
-------------------------------------------------------------------------------
<tr><td class="newscap"><b style="font-size:13px">Downtime for Christmas</b>
---<script language="javascript">document.write('<a class="cap"
href="mailto:'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'">'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'</a>
---')
---</script>
<tr><td class="aware" colspan="2">
So, it appears the site was down for christmas. I could try to find
out why, but I don't care enough. Went to <a
href="https://events.ccc.de/congress/2012/wiki/Main_Page">29c3</a>,
didn't get much done, ate a lot of fast food. I'm old, fat, and boring
now. However, I found out about <a
href="http://www.hyperelliptic.org/tanja/newelliptic/newelliptic.html">Edwards
curves</a>, that shit is rad.
</td></tr>
-------------------------------------------------------------------------------
Mým cílem je mít ve skupině \1 tag <script>, tedy <script
language="javascript">, v \2 pak tělo tagu. V současné podobě se mi
oboje spojuje do \1.
"Živá" ukázka: http://ideone.com/TfbmB1
Prosím o nakopnutí správným směrem.
Další informace o konferenci Python