TA的每日心情 | 开心 2019-8-21 08:44 |
---|
签到天数: 163 天 [LV.7]常住居民III
|
发表于 2018-11-16 21:27:49
|
显示全部楼层
- '''( f- }7 D# U3 G" U% b! ? N
- Based on xmllarge.py
% D3 V+ @. b5 l! O8 K' M - '''+ F7 m! Z- [( s P
- # from pyquery import PyQuery as pq
, m. c5 Y5 R7 r. h# `* E - from pathlib import Path
+ \, {* n. c/ ?4 f6 X* k
: K7 j: Z6 y; p) W' A2 j& {! x2 g* `! ~# y
7 s: e- p# m9 O7 F( j- def xml_iter(file, tag):8 a8 g- e* [- L1 E, o% M. U5 p
- '''1 b; J$ O+ y% F" D3 p( ~
- Process huge xml files
3 V. `! k2 Z% z4 p - <tag> </tag> need to be in separate lines
- K) `) j. G7 y& E/ n - # TODO: in the middle of lines& O. S6 h& n3 s- Y
z b! v4 m1 O, |" D- :file: file path+ k& C6 ` G6 A; H; }: l6 Q- O
- :tag: element to retrieve2 N6 c# V, M$ I6 `& o3 U4 M/ `
- ''': y+ ]: x, x( l( j1 @) U: T% W
- tagb1 = '<' + tag + '>'5 q( o* _) j3 G6 b0 T' X
- tagb1 = tagb1.encode()
* X& O8 d, U; h; { - % Y! G* r) k5 y
, @9 v8 o7 {+ h) ~5 E5 q- tagb2 = '<' + tag + ' '
# ^# k4 s2 j3 [1 F, D3 a - tagb2 = tagb2.encode()
2 s5 [2 d! h4 H" B$ u
' ?7 Z: A" h& L- tagb3 = '</' + tag + '>', g/ J) ]# w. `7 D: `
- tagb3 = tagb3.encode()
, [ I8 e( c" S- d
7 c0 X" w& Y ^9 j+ k8 m1 A- with open(file, 'rb') as inputfile:0 Y4 c! r: d2 u, B+ i9 H
- append = False
0 f6 T. z, l' H; t: p; _ - for line in inputfile:
4 h1 X$ ^# x3 T3 E - #~ if b'<tu>' in line or b'<tu ' in line:
' X5 V9 B9 \) e - if tagb1 in line:
' w' K+ Y/ O; x) | k% [ - inputbuffer = line[line.index(tagb1):]. |" A1 X+ Y' T: U
- append = True
2 j1 o- U' Q5 ^1 W) Z: X - elif tagb2 in line:
. g3 ~$ j$ ^% r - inputbuffer = line[line.index(tagb2):]
6 U" [2 d+ g9 z3 O) z6 O% n. U/ W - append = True
' W% @! H$ ~1 P8 X5 X - #~ elif b'</tu>' in line:/ b2 U5 F$ w( A( _" F
- elif tagb3 in line:
0 U6 l' N* r, A3 M8 y - inputbuffer += line[:line.index(tagb3) + len(tagb3)]
/ t3 \8 |: ?' L6 {. G( p/ m! w - append = False4 w" r$ i9 u" g) Z1 Z
- yield inputbuffer' J6 |3 }1 Y/ M, F
- #~ docitem = process_buffer(inputbuffer, id_num)! v Q0 H8 z1 d6 x/ h& w3 g |
- #~ print(id_num)
9 ~+ G/ E8 w: R0 }$ H6 B& t - #~ id_num += 1- Q2 }# _4 G& N' s5 ^5 X5 L6 _
- inputbuffer = b''
; V# G/ U' L6 G& j* o8 Q - elif append:- m- S4 w0 H, o2 v* j; e
- inputbuffer += line
复制代码
" |/ a6 p ?1 R- ]8 `! {) U* z7 _" K; ?" p/ s, N
这么多人找这东西?我过一阵打包发个小工具。
0 m& Y7 i2 c/ [* W$ ?) h& _; ]
' b9 l* b( I) r" q上面的python3函数用法5 G/ o6 r" e1 @. i, |, ~
resu = ''
, `- S, o a' v; h+ x6 ~for elm in xml_iter(filename, 'tu'):
8 l2 t% U/ L( A; g6 T4 H. b resu += elm
4 e4 i: i; T* \ D3 i, ?# F& l( @+ v, O0 ^$ Y7 X, J, c
内存足迹极小……不管文件多大。 |
|