|
楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。
8 `- l6 K, h. w% ^我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
* H2 O4 ^0 m) z U/ I! L3 a有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。
Q v5 z! O7 A# o* `2 r" D目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。" W& y; w! h1 L* c2 s* B6 }
6 Z8 Y( S% y8 O' q( x7 f4 j
生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。8 C: m( L5 x4 h; E4 X3 j; p
给个文件头自己看看吧:8 s. U% s2 r: v0 z
- Option Explicit
3 i7 P8 {9 R5 d" w. K z5 A4 U
9 v5 [- W: D3 b' t5 Y- '金山词霸DIC词库解析# p3 _- I* g `* c
- 'Kingsoft PowerWord Dic file format:- C/ V) r- j% g) I* {, f$ E4 a' L
- 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 A5 |% ]$ v; s: F6 g& ]4 S - '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?
% s/ I) n1 E+ v, x6 v9 x' t - '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?
1 u. ~3 b& h' x* q; b6 Y - '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @
. t/ @* q5 H' ]4 ^' x+ a+ |6 V - '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00
$ G6 E5 w/ ~ m' c6 n/ c- S - '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x
( s9 @% a/ j% ] U- g- x6 O0 q: @ - '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH( n2 A* {1 Y6 Y. s& h+ y
- '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (
) N- ?% t: T5 |# ~0 [' m6 h - '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c7 @& g' @: W9 a
- '每个zlib块解压后都是16384
. _/ @5 m8 b9 d m% x7 p1 c - Type TCibaDIC
' {2 I/ z/ ~ s3 [8 |0 i - lSign As Long '0x4944534b ie KSDI
" r2 I6 ]9 [1 q2 x' @- n% j3 I - lFileSize As Long 'file size r2 a# \2 W" c! o) s- G
- lFileSize1 As Long
# ^& f. z6 M. S/ w) a - lFileSize2 As Long
- l% A. W5 \1 C$ @) L! h* ?4 L - lFileCRC32 As Long 'crc32?
. L! a* C$ [: Q- m& z z8 E - lNum1 As Long '84 E: P( v2 l2 I5 u
- lNum2 As Long '1
+ @( C( Q9 \0 u$ M7 Y7 u - lFileSizeOrig As Long 'Original file size of decription- ]! h: z+ S+ f2 s8 B5 I
- lBlockSize As Long '0x000040002 Q1 L6 @: G0 ~ d
- lNum4 As Long '0x00020001
. }3 b, ^5 m& y9 p - lSource_lcid As Long '0x00000804
0 m2 S8 [+ W6 J( V! q4 j2 ] - lTarget_lcid As Long '0x00000409
3 F$ `0 a$ X' F. J6 L - lNum5 As Long '0x00000804/ ?) C6 S- A& H9 y; @) q
- lNumWords As Long '0x1e1d
* |" I6 I1 \8 i" Y - lNum6 As Long '0x20
' p' A$ R) @' p6 F% @7 u c' f+ K - lNum7 As Long '0x11
3 S- z9 m( X S0 _1 v5 m# A0 t4 P - lNum8 As Long '0x01f4
* G. b8 p' \9 `5 o$ I# j - lNum9 As Long '0x00
* n- H4 A4 \3 [2 l# U - lOffStart As Long '0x781 l1 L4 f* J/ l9 X4 p* V" U
- lOffXML As Long '0x78
. t! q8 _' _' ^ - lLenXML As Long '0x07f89 t y$ H- p+ y9 O% q
- lOffIdxTable As Long '0x78
8 F; y9 s* W0 ?% c - lLenIdxTable As Long '0x78+ c- n# n1 a* l- J
- lOffIdxTable1 As Long '0x788 u6 ~! z1 K. p1 G0 v
- lLenIdxTable1 As Long '0x78
3 a# B& T* r5 v! n - lOffIndexTable As Long '0x78
; @# D( O( Z7 V2 E6 Z- A - lLenIndexTable As Long '0x78
* m1 E6 O* w0 W, n* L! g! ~ - lOffWordsTable As Long '0x78; F/ S W$ Z' v# {6 o. R
- lLenWordsTable As Long '0x781 w* ]( Y" o7 R& `/ r$ E* r
- End Type5 R8 X* c6 N* k; Z* N
复制代码 |
|