|
楼主 |
发表于 2021-7-10 13:04:29
|
显示全部楼层
前面说的很清楚,KSDRIP不开源,而且其生成的DA3自动从utf-16LE转成了GB2312,丢失了索引和特殊字符,同时也无法解析语音库。0 Z ]0 Q+ P* [: v- B
我这个是底层解析,只是说明技术可行性,只是为了好玩,不喜欢可以忽略。
5 o9 e) P1 P7 x有了这个源代码,完全可以在任何平台支持金山词霸DIC和ADIC。1 E N4 G6 K/ O+ a3 Q& B
目前已经解决了国内大部分词典的词库格式解析,包括有道、海笛、欧路、灵格斯、金山词霸、MDICT等等,只剩海笛的语音图片离线库没有解析完成,资料太少,加密比较复杂,等有空好好再研究一下。
# M( n4 N! `* m& y8 N! Y
, v0 [4 u$ F1 b0 Q% \生成DIC跟解析是两个工程,目前看,120字节的文件头有几个不知道什么意思,我个人没有这个需求,所以抽不出时间。! B" f6 j+ u$ @
给个文件头自己看看吧:
0 B2 @3 {6 y5 S0 H- Option Explicit$ O! m/ ^( ^, p3 }
- , F2 ^0 s8 P. C, e9 {
- '金山词霸DIC词库解析
3 j2 R e! _; c - 'Kingsoft PowerWord Dic file format:& V: g1 u c. P: z* ?1 e# d) Y
- 'Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
* m/ ], S9 w# o; t; n5 [/ i - '00000000 4B 53 44 49 70 57 05 00 95 8B 00 00 52 A3 00 00 KSDIpW 晪 R?
8 \& ~2 ?) E8 w& k% \ - '00000016 68 58 22 49 08 00 00 00 01 00 00 00 78 A8 25 00 hX"I x?
6 B6 U, N' K/ G" F - '00000032 00 40 00 00 01 00 02 00 04 08 00 00 09 04 00 00 @$ H) n, ~) i4 _! ?
- '00000048 04 08 00 00 1D 1E 00 00 20 00 00 00 11 00 00 00# V3 Z# ]6 v/ K- }! |
- '00000064 F4 01 00 00 00 00 00 00 78 00 00 00 78 00 00 00 ? x x% ?3 K7 z) @8 Q6 D% ^9 P# j
- '00000080 F8 07 00 00 70 08 00 00 F8 3F 00 00 68 48 00 00 ? p ? hH$ s8 Q2 o5 V# Z: h
- '00000096 E8 F0 00 00 50 39 01 00 D8 CB 01 00 28 05 03 00 桊 P9 厮 (& p+ m; T7 x+ c: H
- '00000112 50 A3 22 00 00 00 00 00 3C 00 64 00 69 00 63 00 P? < d i c
! ~, F' U& k) ?' u$ O; V1 X5 j. ~ - '每个zlib块解压后都是163840 c. o& P9 g7 e+ n6 Q' Y1 y" F5 j
- Type TCibaDIC
9 V" x5 M( M9 E7 `5 Z& ? - lSign As Long '0x4944534b ie KSDI
# t: {+ E; X+ ]% z, o - lFileSize As Long 'file size0 R* _4 W/ a* V' X9 s
- lFileSize1 As Long5 {. E/ R7 e: y; W4 u7 u; z
- lFileSize2 As Long
- y# E( O& A( n, D1 V0 c) n' {, { - lFileCRC32 As Long 'crc32?
$ g' o) z5 B) o! `, A5 v - lNum1 As Long '8, J. k i0 c# D% D
- lNum2 As Long '1* f, V# I6 X/ n, ~# k6 `+ z1 Y' k
- lFileSizeOrig As Long 'Original file size of decription
8 U. O1 j6 p$ X/ w& i# G& @; u5 ?0 ? - lBlockSize As Long '0x00004000+ ? }4 l, a p/ Q$ g0 D
- lNum4 As Long '0x00020001
. H2 ]3 X6 ~7 M) l# [: I - lSource_lcid As Long '0x00000804
) K' H" t- m0 n3 Q9 _. f - lTarget_lcid As Long '0x00000409
9 x/ K: R) ~1 b) J# ?2 {* `0 I - lNum5 As Long '0x00000804
8 F- a3 u. y* T: y z9 C8 {2 Y' _- o - lNumWords As Long '0x1e1d
8 i: n% J# }, `, } - lNum6 As Long '0x20
1 E' x. C: W' n& |# r, G3 }! T - lNum7 As Long '0x11$ s0 K: \1 [7 K4 @
- lNum8 As Long '0x01f4# s" W. k0 J* d3 a7 U }
- lNum9 As Long '0x00
# a K$ r" z* c3 W8 j2 L. ~ - lOffStart As Long '0x78
( ]$ X0 x3 N$ Q" ^# Y# L& W - lOffXML As Long '0x78# c' N7 ^' S Y" {# q6 f' X
- lLenXML As Long '0x07f8
, _3 o5 k$ T( A0 ? - lOffIdxTable As Long '0x78
6 c. u8 o# s8 o2 J9 c2 a2 O - lLenIdxTable As Long '0x788 P3 G4 S0 U: w) a" ?% ]( w% D2 V
- lOffIdxTable1 As Long '0x78
( E7 @6 |5 K4 r% ]4 Z5 \& k - lLenIdxTable1 As Long '0x78- o4 b: W e" B
- lOffIndexTable As Long '0x78
0 ]; f/ y# ~# F7 a5 Z9 ^* P% v - lLenIndexTable As Long '0x78
& w6 H4 e% p! w# e1 D - lOffWordsTable As Long '0x78( J' V7 N" ^' K. q5 n6 ^
- lLenWordsTable As Long '0x78& r6 C) y3 @+ D, }
- End Type, U% J0 M6 R; ?$ y8 G
复制代码 |
|