Neil Developer

zip 压缩文件名的unicode(utf-8)支持

2013-11-01
Neil

参见:http://www.pkware.com/documents/casestudies/APPNOTE.TXT

        Bit 11: Language encoding flag (EFS).  If this bit is set,
                the filename and comment fields for this file
                MUST be encoded using UTF-8. (see APPENDIX D)
D.1 The ZIP format has historically supported only the original IBM PC character 
encoding set, commonly referred to as IBM Code Page 437.  This limits storing 
file name characters to only those within the original MS-DOS range of values 
and does not properly support file names in other character encodings, or 
languages. To address this limitation, this specification will support the 
following change.

zip 文件头标志位的第11个位表示是否支持unicode文件名.

如果为0, 表示默认的使用 IBM 437 编码,

如果为1, 表示使用unicode文件名, 并且  文件名一定使用的是utf-8编码.

概括起来,  zip格式本来只支持英文 IBM437编码.  后来扩展了这个标记. 表示文件名是用utf-8编码表示的.

并且zip只支持utf-8.

 


Comments

Content