|

楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。( O( \) V2 P8 E% j: V' T# J6 O! [
我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。1 n: i: |3 m% ~$ L: b# K
有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。
4 }$ v& c# a* H. ~5 i" U目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。
4 V" W6 y& W0 h& q* h/ y4 T' D$ X' P
生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。. p! L/ e0 Z( R/ I, o$ j: I
给个文件头自己看看吧:1 C L1 X9 s) Q& ]
- Option Explicit9 Z5 e) c; e( J) F% \
7 R4 }/ S% Z; R" \! V- '金山词霸DIC词库解析& m7 b3 p; y% J* v: i: U0 W
- 'Kingsoft PowerWord Dic file format:
- b! e: }: {& R# l) p - 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15+ X" F1 F0 {& {% H
- '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?6 p+ ~+ f0 N& O" ]5 G
- '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?
- y2 Y5 R- r6 E( h8 M8 G7 E1 f - '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @
& } ?' l! D0 \- B! |4 ?+ h2 S - '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00& ^7 M: J' [8 l6 U; q# j6 D
- '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x7 v. ]7 t$ |6 g: A7 E+ N
- '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH
) d) ^8 @% B) p. n/ F) k - '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
( ~' a3 l$ Y1 p$ }3 V5 X0 Z* B7 e. ?! u - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c3 \( z7 F9 }! j' O( o* O
- '每个zlib块解压后都是16384. _5 a P4 ]$ I
- Type TCibaDIC) k& z: i6 \- R* Y" C7 ^
- lSign As Long '0x4944534b ie KSDI
5 F$ l' o% F" f. ]" n3 p/ e - lFileSize As Long 'file size$ R; H3 u# a6 X3 p3 ~- c7 n6 B
- lFileSize1 As Long
2 U, w' b6 r! P4 p- a3 v# N - lFileSize2 As Long
4 c. {% Z0 i; e& G - lFileCRC32 As Long 'crc32?. p$ z' L: Z* o! E
- lNum1 As Long '8
( j) X ~/ ~0 O e - lNum2 As Long '1( t B" e. @- Z8 w
- lFileSizeOrig As Long 'Original file size of decription
5 Z' {5 z' x3 |$ ]4 X - lBlockSize As Long '0x00004000
2 X5 e5 Z6 h$ E* c$ P7 A - lNum4 As Long '0x00020001' f' c) U& Y: @6 _# i; o
- lSource_lcid As Long '0x00000804
; v( [* X2 i7 H3 i8 M - lTarget_lcid As Long '0x000004091 E: P2 f. K& l' I' W3 ^6 }
- lNum5 As Long '0x00000804
( }. O. A, e2 U+ K+ W( A - lNumWords As Long '0x1e1d
$ J W* u" Z8 F% W5 R - lNum6 As Long '0x20# M- J S5 b9 p0 X& L
- lNum7 As Long '0x11
+ Q5 l" [- h6 s# T! j - lNum8 As Long '0x01f4/ B" J& T( {, j2 k8 [8 T
- lNum9 As Long '0x00
1 I2 Z1 [$ l _ - lOffStart As Long '0x78$ k( U. l* l9 t; L( R8 f+ W
- lOffXML As Long '0x787 \2 o' A+ B3 _
- lLenXML As Long '0x07f8* h1 n. x+ h; ?
- lOffIdxTable As Long '0x78! i2 A1 `7 e, }! i
- lLenIdxTable As Long '0x78' l- S7 v7 ^$ F+ \ B( T
- lOffIdxTable1 As Long '0x78
! Q+ Z) Y( \4 J# [8 _ - lLenIdxTable1 As Long '0x78
# j4 n" H: L! L: U+ Q+ L5 Z& g - lOffIndexTable As Long '0x78
5 [% [; k$ ?9 f - lLenIndexTable As Long '0x78
3 G: j, [! O5 A1 M/ w - lOffWordsTable As Long '0x784 H) R; |- D; Q: [7 f
- lLenWordsTable As Long '0x788 V* t6 `4 X/ c t! X
- End Type$ v2 h( M2 M2 E0 R. G0 ~
复制代码 |
|