|

楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。
& |$ o; b/ q R7 O& s- k我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
2 h$ J- n3 G# U4 R' |有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。' c3 c9 U9 e! Q
目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。
0 S) x) U# w# s, \* x* U1 |+ h. C- G& `% B; f" E
生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。
1 I: D& u) k0 H$ {8 g/ L O0 r给个文件头自己看看吧:
+ l- J6 G6 c( n* R- c4 |9 n- Option Explicit- ?" ]& ]7 f6 o
- 0 A5 Z, Q& \8 [
- '金山词霸DIC词库解析( v7 J; R& d: A$ t0 l. M. u
- 'Kingsoft PowerWord Dic file format:
$ j( Z" i% x( P" k - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 @$ ~; p/ W- i' N7 D - '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?
3 q/ P5 N/ Y' ^6 F" X/ m7 S, |8 H8 C - '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?
! y) ^6 W0 r6 D8 e- b; ? - '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @4 S5 }) w' |0 B4 r% U' y
- '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00
' |+ J8 b W. _- b, [ - '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x
" @( j; R- ?6 d9 |, ~; @ - '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH
" g& Q$ v$ ]; G$ f2 z9 g - '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
* `) U: Z p4 S7 N - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c6 h8 r! E! l$ K; H6 \& T4 P+ ]8 Y
- '每个zlib块解压后都是16384! t7 C* D2 ~: d8 h
- Type TCibaDIC
2 g Q8 z3 x# L2 F1 G8 h) U- x1 _ - lSign As Long '0x4944534b ie KSDI+ ~* B$ _" Y0 I5 C" s
- lFileSize As Long 'file size. e" N, \: }* o; Y
- lFileSize1 As Long
! u5 i: ?3 c' i& L# f; { - lFileSize2 As Long' t1 Y+ P, ?7 y1 i( z! ?
- lFileCRC32 As Long 'crc32?% Y" v- ?! |4 a
- lNum1 As Long '8
( U# w* v2 y) h1 ^! m- M, P K - lNum2 As Long '1
5 \+ `* y% c0 P! l' ] X9 m - lFileSizeOrig As Long 'Original file size of decription
9 o! a1 F8 w6 o7 E0 X! p' Z0 g - lBlockSize As Long '0x00004000
: A9 n8 L1 _: h% ~$ k# [, z - lNum4 As Long '0x00020001 Q8 F2 `5 D" o4 z7 c
- lSource_lcid As Long '0x00000804
1 u! i" I2 f! d6 I [ - lTarget_lcid As Long '0x00000409( n8 f3 T6 S+ a" X5 D( e3 b/ P
- lNum5 As Long '0x00000804
6 }' L( u; c$ v. K% b - lNumWords As Long '0x1e1d
7 ^, V5 U- b8 c - lNum6 As Long '0x202 w7 o* q0 M3 i8 s. g. y8 x: n
- lNum7 As Long '0x11
1 l9 s; }7 j7 h: W& T# O - lNum8 As Long '0x01f4% L0 _+ N: J. s) n% ]) N4 k8 N
- lNum9 As Long '0x00. z0 G7 ^% i! A
- lOffStart As Long '0x78" T' y: u5 a! A7 z* B
- lOffXML As Long '0x782 [4 V: C& l) H# m! t/ z2 }# W" y- A+ b& s
- lLenXML As Long '0x07f8! @) [ j8 e; }8 `* h v! T" ]) e
- lOffIdxTable As Long '0x78
: V1 u# {( ?3 Q- e7 K1 x# r - lLenIdxTable As Long '0x786 o$ M0 q s: ]) S
- lOffIdxTable1 As Long '0x78
3 V. {! n B, c8 s - lLenIdxTable1 As Long '0x78
/ q; j ^: B5 p - lOffIndexTable As Long '0x78
$ Y1 M ~+ F$ H+ h' j; ? - lLenIndexTable As Long '0x780 X4 x/ X% A7 e6 S" d8 L
- lOffWordsTable As Long '0x78
7 ]$ o2 n6 {% {. \2 w* _! B6 a7 P' F - lLenWordsTable As Long '0x78, A9 g ~4 O. i+ m( O' i5 Q
- End Type# ?1 H+ f: d9 p8 D. G* R- P
复制代码 |
|