|
![](static/image/common/ico_lz.png)
楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。
% T5 B0 @( T; H6 j+ u4 X# J/ @我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
# Q: i* V2 s& g7 ~* m q3 ]有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。6 G( E7 n& L: D) P: X
目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。
9 _& b# ?3 |! S1 T2 b! Q; Y$ Z) q; T* N" t
生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。! x+ { `/ M1 ?0 F! Z3 R
给个文件头自己看看吧:+ _. x5 j* |7 {- O% n
- Option Explicit
+ ~4 U; n' H* [9 S$ ^9 U - - w3 R6 P0 S# T1 Z% B7 j
- '金山词霸DIC词库解析
' a" S4 ^ ^5 Y - 'Kingsoft PowerWord Dic file format:
# z6 H* x2 j$ @6 Q - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. L! w( V# k4 ^8 u8 t/ N; M) `, c& ^
- '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?7 v' C4 q- X& f u( n4 B2 k
- '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?' _+ @. I9 H# _1 x$ N
- '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @! @3 r0 f: W; t J7 c3 p
- '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00
: z: p! G% i6 O. t8 V, [. t - '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x0 J' ]" ^% W3 ^' ?
- '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH6 c! O: M0 Z6 ~1 L+ v6 n0 @
- '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
+ C* ?1 s; d- D! S5 {- Q" v - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c
' \$ a! j: m; D. j4 o! Y) E% l% B - '每个zlib块解压后都是16384
+ \ D7 U6 g6 k( ]/ G7 P6 ~; U; B - Type TCibaDIC
& b% k0 K. C1 T' e9 c( X* K - lSign As Long '0x4944534b ie KSDI
5 M1 N# P1 Q) k( U; \' M - lFileSize As Long 'file size0 L$ `: H4 I# V* m- ^
- lFileSize1 As Long
) ]# V- m; L# u6 I4 v - lFileSize2 As Long
7 J' l. n9 z% q' u% J. n - lFileCRC32 As Long 'crc32?* b) h+ T3 N! F1 R
- lNum1 As Long '8
4 D" K# S. H( o5 [# z. } - lNum2 As Long '1
. e+ u5 r) X) L" _ - lFileSizeOrig As Long 'Original file size of decription! p0 F: t8 J/ F* r/ A3 ^
- lBlockSize As Long '0x00004000, h S1 t- C8 n7 A/ X) S8 P1 e! H) t
- lNum4 As Long '0x000200019 x" k! q# W! P( p+ {
- lSource_lcid As Long '0x00000804
0 H2 f2 q: y1 Z9 E - lTarget_lcid As Long '0x00000409) h P8 d# t8 {) z; T% g) c* q, k. L
- lNum5 As Long '0x00000804
9 ~% ~' D7 F% c - lNumWords As Long '0x1e1d* m( _& C1 R* k9 e9 D% k
- lNum6 As Long '0x20* l! \7 Y' x a# o+ f
- lNum7 As Long '0x11
" t: p2 a! g1 B4 Y$ [ - lNum8 As Long '0x01f4
' X& H3 ~& L, J( M - lNum9 As Long '0x000 Q- Q4 R' f* `, `2 G
- lOffStart As Long '0x78 q. `7 L7 k$ I( V" R
- lOffXML As Long '0x78
9 E7 b% {2 Y1 c0 Y" `5 Z) E - lLenXML As Long '0x07f8
9 ^- }* Y3 _4 o0 s - lOffIdxTable As Long '0x78
) S: c, |/ w& @! e8 v X2 x# X) u - lLenIdxTable As Long '0x787 x/ R; A- h; Y) D) F) P0 `
- lOffIdxTable1 As Long '0x78( E7 ?% V( z7 i$ Q+ q" X3 [% W2 a6 e
- lLenIdxTable1 As Long '0x78! E+ P5 g( M" [" f* e- \0 Z' x
- lOffIndexTable As Long '0x78
6 W& r! b. b( I( B0 J - lLenIndexTable As Long '0x78
6 c+ g* r$ c" i$ T - lOffWordsTable As Long '0x78
) U0 A/ U, Z# v* Z6 L4 H - lLenWordsTable As Long '0x78
! \9 T$ I: \4 d5 t1 k - End Type$ `1 ?# E! D9 n) I" N- v4 D6 Z+ ?2 Z( Y
复制代码 |
|