====== File name encoding problem in File Compression Method ======
* Problem: specially when cross-platform, file name encoding is difference in different compression tool, like Zip, Gz, Bz2.
* Solution 1: using tar to compress
* Solution 2 in linux: unzip -O CP936 non_english_name.zip (means using GBK, GB18030 Chinese code)
* Solution 3 use Java: jar xvf non_english_name.zip
* ref: http://www.111cn.net/sys/linux/72590.htm
* Problem: when zip or winrar uncompress a non-english encoding archive file, sometimes require set system locale language and reboot to get name uncompressed right.
* Solution 1 (windows method): using already-to-use-build zip and unzip tool from DotNetZip library, which support encoding and decode option Unzip.exe -cp 936 chinese_name_content.zip
* download and it's under its tool folder: https://dotnetzip.codeplex.com/
* ref: http://www.chengxuyuans.com/Ruby/41584.html
* windows code page: https://en.wikipedia.org/wiki/Windows_code_page
* 936 and 1386 for GBK
* 932 and 943 for shift JIS
* Solution 2 (cross platform) using Python (sometime works, sometimes error encoding): python xZip.py non_english.zip decode_language
(such as gbk for Chinese, decode_language code refer to this https://docs.python.org/2/library/codecs.html )
* here is the python code for xZip.py
# full list of codec: https://docs.python.org/2/library/codecs.html
# note:
# - input from command line is using commandline system default locale encoding
# - it read the zip file path in unicode format with the given decode method
# - if you use python print method to print those unicode path in window command windows,
# it may error when system default locale codec can't print those unicode characters
import zipfile
import os.path
import os
import sys
class ZFile(object):
def __init__(self, filename, mode='r', basedir=''):
self.filename = filename
self.mode = mode
if self.mode in ('w', 'a'):
self.zfile = zipfile.ZipFile(filename, self.mode, compression=zipfile.ZIP_DEFLATED)
else:
self.zfile = zipfile.ZipFile(filename, self.mode)
self.basedir = basedir
if not self.basedir:
self.basedir = os.path.dirname(filename)
def addfile(self, path, arcname=None):
path = path.replace('//', '/')
if not arcname:
if path.startswith(self.basedir):
arcname = path[len(self.basedir):]
else:
arcname = ''
self.zfile.write(path, arcname)
def addfiles(self, paths):
for path in paths:
if isinstance(path, tuple):
self.addfile(*path)
else:
self.addfile(path)
def close(self):
self.zfile.close()
def extract_to(self, path, decode):
for p in self.zfile.namelist():
self.extract(p, path, decode)
def extract(self, filename, path, decode):
if not filename.endswith('/'):
f = os.path.join(path, filename.decode(decode)) #gbk,gb18030, GB2312, utf-8
dir = os.path.dirname(f)
if not os.path.exists(dir):
os.makedirs(dir)
file(f, 'wb').write(self.zfile.read(filename))
def create(zfile, files):
z = ZFile(zfile, 'w')
z.addfiles(files)
z.close()
def extract(zfile, path, decode):
z = ZFile(zfile)
z.extract_to(path, decode)
z.close()
if __name__=="__main__":
extract(unicode(sys.argv[1]), u'.', sys.argv[2])
* Alternative solution: extract normally with wrong-encoding names, then fixing those name using python decode and encode
* Site Notes:
* in windows commands, chcp is used to change display page code (file name encoding) [[https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396|page code list]]
* additional reading:
* http://www.docin.com/p-739332424.html
* http://www.cnblogs.com/qq78292959/archive/2013/03/27/2985310.html
* https://allencch.wordpress.com/2010/12/06/how-to-extract-zip-file-which-contains-filenames-with-shift_jis-encoding-in-ubuntu/
* https://www.mkssoftware.com/docs/man1/unzip.1.asp
====== Common Problem on compressed File and Solution ======
* Problem: Winrar has update the version recently, only winrar can't open some new winrar file.
* Solution: get latest 7z to uncompress it, will be fine. http://www.7-zip.org/
====== Winrar ======
* extract with cmd unrar x Pack.rar