|
本帖最后由 5dhtml 于 2018-11-13 11:01 编辑
' \% d9 B! @# G
# u E0 y) v( J' E% O( E; [最近在分析整理了几本英英词典的的数据,产生了一个疑问:即使是初级词典,词条的选择也并不是只选择比较初级的词汇,比如某词典收词量仅2万,但包含大量词频在2万以后的词条(根据ANC/BNC/COCA综合数据),那么除了OED这种巨无霸,普通词典编纂时候是如何选择收录哪些词呢?同样一直有疑问的是,像CET4-6这类考试,词汇大纲的范围又是根据什么依据选择的?
( w6 ~7 o& ^/ z" l0 p2 f# f7 k
1 ]- v9 L$ y N$ g: w# c# N2 \7 v
/ Y# i. I8 c# x4 ]顺便请教一下,谁知道有没有现成的英语词汇的屈折变化列表(如动词四态)和名词复数、衍生等词形变化的列表数据呢?
' |0 W. H5 l6 ?+ h比如work works worked working。。。。
4 L1 r) ]1 t1 t6 }
$ S5 S& K" y1 c4 ~& D 找到了,根据BNC词频排列的84497个英语词干屈折变化列表,可直接另存为txt/ K) Q8 b! g& ~. ]% ]
3 y* f( t, e" v! n
https://raw.githubusercontent.com/skywind3000/ECDICT/master/lemma.en.txt
# O5 W& t) ^7 W0 b2 J- F& n
& A" ~7 B) R. M& g% W0 ]1 _, d9 z
; En Lemma Database (version 1.0.2)5 R, [% K$ Q$ R3 ?1 v; m/ I
; Compiled by Lin Wei (https://github.com/skywind3000), Mar 28, 2017, I* h: C, o: H9 S
; by referencing the 100M+ words in the British National Corpus (BNC),
& n+ L/ O! }8 w0 U1 S9 s, `. I) T X; NodeBox Linguistics and Yasumasa Someya's lemma list.
' o5 H; Z6 L' Y5 C& {; p1 C; This lemma list is provided "as is" and is free to use for any research
r3 K, T/ ?% ] k+ X; and/or educational purposes.
' g& V5 K, t) E+ H8 E$ d; The list currently contains 186,523 words (tokens) in 84,487 lemma groups.
5 [4 p& k0 O& Z; r/ M6 ~; A; If you have any questions or comments about this lemma list, feel free : F( J' |% `8 w5 [1 K
; to contact me ([email protected]), at any time..8 |7 [2 S* t( J9 U3 l
;" v/ d2 ^6 ?, r2 i
be/4109826 -> is,was,are,were,'s,been,being,'re,'m,am,m7 | X! z5 i4 I2 O
have/1315648 -> had,has,'ve,having,'s,'d,of,d,ve2 B- v. D# D9 L: E* u
it/1213224 -> its,they$ z2 a9 b- D) B/ y6 R. ~" d/ z
he/1196022 -> his,him,they
1 a; ?5 w5 i2 b* i+ Hi/1133697 -> my,me,we,is
Z# e$ s) Z [" O( P* A/ bthey/841960 -> their,them,'em! |: g! Z; Y& }& G/ ^, Q
you/804279 -> your,ya,ye
' b" |7 R0 T* R3 j( mnot/767330 -> n't
, f7 ^- h7 w+ L9 z/ k4 Vshe/653505 -> her+ A0 A1 x, j# a# N/ a
do/535646 -> did,does,done,doing,du,d'
+ u6 ~5 Y/ S. `4 H( Rwe/503360 -> our,us5 u4 M R8 d" I/ k+ U$ @1 }+ e. g
will/334612 -> 'll,wo,ll' ?. {+ x$ n1 V8 |% j
say/317317 -> said,says,saying& q' G2 a$ T/ J# t% w. q' N
would/278414 -> 'd' |* r" c+ b2 b1 @
can/263138 -> ca,cans,can,could
! d# z* S# L' t& q1 E. pgo/227247 -> going,went,gone,goes,goin'
1 m0 g. k) I+ o) h* W3 [! jget/212569 -> got,getting,gets,gotten. j) M& t, }; `) K2 P* ^6 v, k
make/209818 -> made,making,makes
( c) w# N; E8 K& P+ x. U, @5 lup/206976 -> ups,upping,upped u7 H5 X4 ?/ B+ a# n4 h
see/184969 -> seen,saw,seeing,sees
/ t( J- l( n7 r' M/ d2 U b" [other/181277 -> others) ?( }0 S: Z; @: P7 N
time/181080 -> times,timed,timing% ?4 [1 D, H' \$ w9 s" t4 ]/ [
know/177717 -> knew,known,knows,knowing, T# ` o$ {2 r9 I: N, \
take/172773 -> took,taken,taking,takes
! \* ~$ @; C+ ~year/161649 -> years6 e1 U' k+ Q. Q# h0 y. y b& w
well/156075 -> better,wells,welling,welled' w- u8 \( w3 {3 a6 \
like/154975 -> liked,likes,liking l# S2 B! k3 }; D2 Y7 P
then/154443 -> thens, l* X. i+ B5 b6 f6 u m
think/145268 -> thought,thinking,thinks
( U: o6 e4 i* U. l4 Pcome/144107 -> came,coming,comes* K, {, ?1 f1 b' K! c
now/138986 -> nows
2 z4 S' U& I, n8 Y: V: q. ~use/137498 -> used,using,uses2 ]! x5 G$ z6 d. F( N) w
over/130163 -> overs
% N9 g9 k( Y: G* W1 x2 d! Ugood/128437 -> best,better,goods0 Z; B6 @% Q: N' c3 S
work/126290 -> working,worked,works,wrought& M' q. G% L8 P8 H
give/125727 -> given,gave,giving,gives
' `! R8 U+ C! J! anew/124872 -> newer,newest$ c9 Z, d" T8 V6 c" o" s1 ~! l \5 y" S
people/123156 -> peoples,peopling,peopled
/ n C& k4 S) J& j1 f6 r Olook/119946 -> looked,looking,looks [# ^/ Q) _7 b# F0 Y. f/ G
one/116568 -> ones
* {: |7 t8 G. E# e7 Lway/110362 -> ways
( ?+ Y m8 t4 T: ^9 g; x4 V! E
6 B1 _6 N5 t/ \4 z" [4 u# e |
|