TA的每日心情 | 无聊 2019-1-1 20:10 |
---|
签到天数: 31 天 [LV.5]常住居民I
|
发表于 2018-12-26 10:02:26
|
显示全部楼层
这问题用 XPath 可以不用这么烧脑,用正则表达式则是把简单事情复杂化了,附 python 程序实现,依赖 lxml 库。
& ]3 m- |8 l( \, d# O$ [2 ~4 i
7 D D) R, z1 s+ t% v- #!/usr/bin/env python28 g+ W3 p c" Y3 j, E7 U' N* G
- # -*- coding: utf-8 -*-
* Y `; r; h) W+ X2 _3 I1 W8 ^" J - """
+ n6 e1 Q7 Q; }$ v5 E4 d - File: replace_tilde_with_title.py
9 \3 ^0 d4 K$ L. |* x - Author: zzhirong
% z% O: E I, y8 p8 k) e$ \ - Email: [email protected]
m: D& q! H' m - Description: 替换 span 下的 ~ 为 d:entry 的 d:title 属性
, o$ a, ^( K9 u# N. @ - """7 I; i; K5 ]* f8 m2 J7 s9 a
" r; K( ]% ~. h! f3 }2 Z8 x2 b$ G- from lxml import etree/ M+ Z+ A% E, V. I( @ M% {2 `
% n A/ t) y0 V' W w- ]- s = """<?xml version="1.0" encoding="UTF-8"?>
) M; ]; v1 e$ F2 T0 P8 R: z6 T7 A/ u - <d:dictionary xmlns="http://www.w3.org/1999/xml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">6 Y+ G' n% ~0 b
- <d:entry id="_38ja" d:title="xxx">7 ]8 |# M, r+ { ^) y( L/ {" v6 ~
- <d:index d:value="steal" d:title="steal"/><span class="hw">steal</span><br/> r; y/ B) `8 o* Q
- <span class="ex">~ a visit <span class="tag1">(an interview)</span> </span><span class="ex_c">测试<span class="tag1">(测试)</span></span>
4 S( h8 w' v6 L& l - <span class="ex">~ a kiss </span><span class="ex_c">测试</span>
( Q3 N# ^6 F1 M" p \, D7 F. t - <span class="ex">~ rides on the train </span><span class="ex_c">测试</span>. J1 F' Q; I% A: Z* x, n' m
- </d:entry>
) i9 C( q+ s8 g& t, ]8 K' B - </d:dictionary>
6 V, A! Z2 ?( b( ~# Z; t9 j1 s - """7 {8 d7 Z2 ]2 ?0 L8 O9 g8 d
- . S9 {! {% ]" [, c7 K$ |
- xml = etree.XML(s)
6 L% i! e P$ t" q% }) D2 U - D_NS = xml.nsmap["d"]
: E9 V! \' D8 l/ j/ X. S - XML_NS = xml.nsmap[None]
4 o6 n" N4 g3 ]( E' ` - ) ?* n- F7 V+ X# k* \6 U, J% j. A
- for entry in xml.xpath("//d:entry", namespaces={"d": xml.nsmap["d"]}):
" i( ?# x' p; b$ x, V - title = entry.get("{%s}title" % D_NS, "")0 b4 ?6 b3 B6 s; i4 k( I( |
- for span in entry.iterfind("./{%s}span" % XML_NS):/ X. ^) z2 m0 b( b
- span.text = span.text.replace("~", title)1 R5 j1 _/ z. f
- print(etree.tostring(xml))
' J9 n! W% }0 ?7 w# U. z1 W
复制代码
* G; F9 J4 Y0 p* i$ ]! m |
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有账号?免费注册
x
|