TA的每日心情 | 擦汗 2021-11-17 09:18 |
---|
签到天数: 79 天 [LV.6]常住居民II
|
发表于 2021-10-6 09:52:48
|
显示全部楼层
4 V# {3 k: m# l& y5 O
(3) 在哪里加入- current_path = os.path.dirname(__file__);current_path+"/OALD4_azure.txt
复制代码
% u- }3 {* l% q9 b8 y, a9 @3 M+ g, I
我的版本是 3.9.7 ,目前没有遇到 No such file or directory 报错。 然后是genMDX_ox4.py 文件有部分中文乱码* y3 {2 l2 G n+ q1 U
( f; o) T7 ]4 U+ d7 Y- # -*- coding: utf-8 -*-: q; m$ s* M0 U8 C/ n1 T: [7 Z
- # encoding=utf8
5 }& ^9 t9 J& e2 u
1 r: k) R% B( Q2 C! d, y ?4 G- from __future__ import unicode_literals,print_function, absolute_import, division6 B1 R; h1 _9 c0 g0 w
- 0 h; C* c; X- S7 D6 ]" G) h
6 y: Y1 c' b$ q1 o" B9 C- import re; |3 @4 r" k) Q- K; B
- import copy1 ?( }9 X1 F) b' ?! i1 ^
- import chardet) i& F% @9 t- u$ b) U! m
- ! d3 ]. t( V) |& u1 i# ]% N
- import os. |$ d7 I% p) S. U* N) A
- import io0 T) g$ [) a6 o5 N- A* E
- import sys
( j. ^' a1 p% P& y9 u( _5 a( v - # reload(sys)$ `! O7 d5 h4 \! ]: w- Z
- # sys.setdefaultencoding('utf-8')
- n' o [3 |$ V' n6 G0 {1 _
1 w( i: @8 O2 ~1 Q/ p- import collections
" F% {2 t1 `5 u- p6 C - from collections import defaultdict5 S: t. e }. A- M: p
6 l) c3 W; ~) t- D; |0 Q! L
! [, L9 \0 z, J* L3 W- from writemdict import MDictWriter, encrypt_key
6 Q0 ?3 c. ]- e( n% m - from ripemd128 import ripemd128: h4 j% v$ P* G( h: j) C
4 U; n7 k- _3 w- D( k) u% t% T! |
- head = 0& c) t$ H6 Q: c: A6 [$ F
- new_mean =[]
3 `* z& d& A& F" r - f=io.open('OALD4_azure.txt', 'r',encoding='utf-8')7 w/ a+ |+ h9 o/ O$ _
- #f=io.open('oxford2_original.txt', 'r',encoding='utf-8'); |! w! k6 B4 g( t: G
- d = defaultdict(list) #����һ�����ֵ䣬Ҳ��ʹ��{}������
! s* o; s0 f& u( X - for line in f: #ÿ�δ�f�ж���һ��
6 `% g v1 b a, m* G) \ - line=line.rstrip('\n')#ȥ����β�Ļ��з�
0 _& f" ~( z e/ M, p& u - if line == '</>':
" D9 ^0 G9 Z9 v+ h - if head == 2:$ ^' N; l, z9 v7 q. S
- new_mean[0:] = ["".join(new_mean[0:])]
# Y2 E2 L) e( l# E: I( [7 {7 g/ w - d[word].append(new_mean[0])
- {2 M! g2 V `& D - head = 1;* i# [! S$ D% j# v1 ~' N3 |7 _5 a
- new_mean =[]
& V) p3 f/ }4 k+ c4 X. o9 ` - elif head == 1:, p/ J! \, `" h$ x
- word = line9 p% r, \5 c! V. y0 y/ s4 V
- head = 2
" J/ G5 l! d$ M/ \+ p9 s+ X4 ~: {' V - elif head == 2:
, e0 M; A, G* b; x( B& c% n( j - new_mean.append(line)
* S, Y2 c1 v9 V" t - head = 2
6 g* K7 o: N0 a; i/ Y6 z! H: ` - f.close()
: ]: E* { i, Q& U9 s
' C6 d; a' Q$ v5 R. N- j4 D
- }+ b! c) ]$ I) }, W- ff=io.open('about_OX4.txt', 'r',encoding='utf-8')#�ʵ�about��Ϣ��txt�ļ��뱣��Ϊutf-8& S5 A7 x- b& |' w$ ^* H
- about=[]
" B* j& C8 {: w7 n% Y i - for line in ff: #ÿ�δ�f�ж���һ��
9 u7 g. A( ?/ j6 |1 B - about.append(line)" _/ ]5 R$ l* R! J, _
- about[0:] = ["".join(about[0:])]
' W. u- [8 H) e( G) l) Q. i8 ~
( e: @3 m& ]" G: Z1 ^# o- 4 P9 C' n5 c, T _3 h h
- #outfile = open("example_output/��ţ��Beta_V2.2.1.mdx", "wb")
6 w+ r7 O1 {6 s6 L - #writer = MDictWriter(d, "��ţ��Beta_V2.2.1", about[0])/ ^) N, T" x. F
- outfile = open("output_ox4/OALD4_Ex.mdx", "wb")
# x1 N6 C2 l/ H' Z$ m2 W - writer = MDictWriter(d, "ţ��߽�˫��(���İ�)", about[0])
6 ~' v- Q& r4 J$ j1 |7 [ P - writer.write(outfile)
5 t# |* E# l! f y! D - outfile.close()1 S7 |# `) d5 D% L- c; J) c! n
- - p- l s! C: T; ^& A4 A" d
复制代码 " y; P3 x' A6 N/ H: B9 `
" w# R6 f7 P+ C3 r
& q; y. ]- O* \0 h6 d* ]0 }
是否可以看看你的文档呢 乱码的部分中文写的是什么? |
|