|
楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。' n* i, q# y+ s; L
我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
* a, b* G" C: }8 K5 a有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。; V2 { T$ v5 p8 V/ N
目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。2 D& ]. Y# o2 N
8 z7 \% U$ F+ ~6 y1 r生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。
6 T6 K: T' e: C给个文件头自己看看吧:
$ T; B8 e1 b& x. a6 Q% m+ j$ t- Option Explicit
3 n, E3 n R8 d7 D# z& G: N5 ^7 U - $ a' l( ]$ i! ]; A
- '金山词霸DIC词库解析
; H b4 m, c0 B% l& w) R; z - 'Kingsoft PowerWord Dic file format:
9 a/ B( b+ }9 a8 {0 p - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 154 v# S# E- ]3 z. p7 Q* V& c
- '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?
- n5 c6 |; O% w! r1 Z6 F - '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?, t2 t$ {3 S! L# A1 p3 s# ]- h
- '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @- y& R% S0 h+ s6 N% h* \
- '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00
4 ~% w3 }! x8 o- a - '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x4 @5 u8 Z$ u3 ~+ F) A7 |/ Q6 u
- '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH
4 C1 |# d" C# S/ @% p* L - '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
& V, ]3 _& J! D1 _7 ~* g: q - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c" h# m- D6 c. T# _2 a r, j
- '每个zlib块解压后都是16384: u% R$ n, g9 r# H% w
- Type TCibaDIC
- @' {% Q0 g" V, f - lSign As Long '0x4944534b ie KSDI
) m9 e+ C. e0 G+ H. y; [+ V - lFileSize As Long 'file size
4 F8 J h7 K# R3 ^ - lFileSize1 As Long3 Y4 Q [6 V2 ], L
- lFileSize2 As Long
, L0 B8 k7 |9 z7 {# F - lFileCRC32 As Long 'crc32?) h. D5 R5 x( k/ H4 W
- lNum1 As Long '8
7 g0 h. h9 O4 F - lNum2 As Long '1- P9 \; g8 K$ E( j5 x
- lFileSizeOrig As Long 'Original file size of decription. L F+ T: K7 s* R
- lBlockSize As Long '0x00004000
% \6 ]$ Q/ D) |9 S$ J6 ?/ w$ [- I0 [ - lNum4 As Long '0x00020001
9 V& J' K% U! ?! u. ~& J - lSource_lcid As Long '0x00000804
/ V6 a6 B5 W9 @ - lTarget_lcid As Long '0x00000409
- K: V; d4 V( b, L# g - lNum5 As Long '0x00000804. A7 [4 }4 _5 E# n: E/ C6 L
- lNumWords As Long '0x1e1d: g8 X7 V/ B' @/ W- {
- lNum6 As Long '0x20* c8 J" _: V- } p" c5 v9 f( M
- lNum7 As Long '0x11) @- E8 A0 Q% [" H3 X; m( A5 Q
- lNum8 As Long '0x01f4
1 }7 ]' s/ Z1 s" f - lNum9 As Long '0x00
9 c" Y" o, \1 l, H0 G - lOffStart As Long '0x78
2 ]2 O- t# o4 P/ E5 R6 l - lOffXML As Long '0x78 E* _% i3 H5 e3 |+ d! j8 d7 L
- lLenXML As Long '0x07f8. ~9 \4 l; G) a1 o3 {" w! E o. t/ L
- lOffIdxTable As Long '0x786 T+ m( C* E2 L+ ~: U
- lLenIdxTable As Long '0x78! U1 d, a3 t% C- E4 p3 l
- lOffIdxTable1 As Long '0x78
7 p: {( N! g# [+ M - lLenIdxTable1 As Long '0x78/ a) u! x. y: |* Y' ?0 l
- lOffIndexTable As Long '0x78
. J! e& V) D( q. A! b' H% A4 s - lLenIndexTable As Long '0x78
2 I& F% P8 ^* P9 c3 g% V( N - lOffWordsTable As Long '0x78
% ]3 n8 b, C0 e - lLenWordsTable As Long '0x789 e5 f7 T0 i" c. \) @& z
- End Type4 Z( q+ c" s- S# ^( f7 F
复制代码 |
|