GB2UFT8
看到好多人还在用4、5年前的GB - UTF8转换程序,真的很难过。就算不用ICOVN函数,也可以进步一点吧,
上传的文件包含gb2312全集,每行:
1到2字节,GB2312码
3字节,空格
4字节,对应utf8码的位数;
后面,utf8码+回车(rn);
附件: gb2utf8.txt
http://www.phpx.com/happy/showthread.php?s=&threadid=90509&perpage=15&pagenumber=1
- //对照表的使用(一)
- $filename = "gb2utf8.txt.new";
- $fp = fopen($filename,"r");
- while(! feof($fp)) {
- list($gb,$utf8) = fgetcsv($fp,10);
- $charset[$gb] = $utf8;
- }
- fclose($fp);
- //以上读取对照表到数组备用
- function gb2utf8($text) {
- global $charset;
- //提取文本中的成分,汉字为一个元素,连续的非汉字为一个元素
- preg_match_all("/(?:[x80-xff].)|[x01-x7f]+/",$text,$tmp);
- $tmp = $tmp[0];
- //分离出汉字
- $ar = array_intersect($tmp, array_keys($charset));
- //替换汉字编码
- foreach($ar as $k=>$v)
- $tmp[$k] = $charset[$v];
- //返回换码后的串
- return join('',$tmp);
- }
- echo gb2utf8("haha,这是对照表的测试");
- ?>
- //对照表的使用(二)
- //建立表gb2utf8,二个字段:gb、utf8。把对照表导入到表中(代码略)
- function gbk2utf8($text) {
- /* 提取文本中的汉字 */
- preg_match_all("/[x80-xff]./",$text,$ar);
- $expr = join("','", array_unique($ar[0]));
- /* 检索文本中所使用的汉字 */
- mysql_connect();
- mysql_select_db('test');
- $rs = mysql_query("select * from gb2utf8 where gb in ('$expr')");
- while(list($key, $value) = mysql_fetch_row($rs))
- $gb[$key] = $value;
- /* 分解文本的各个成分 */
- preg_match_all("/(?:[x80-xff].)|[x01-x7f]+/",$text,$ar);
- $ar = $ar[0];
- /* 替换汉字编码 */
- foreach($ar as $k=>$v)
- if(array_key_exists($v,$gb))
- $ar[$k] = $gb[$v];
- return join(' ',$ar);
- }
- ?>
有人问及逆转换:utf82gb
假定读照表已读到数组charset,则
原贴
http://www.phpx.com/happy/showthread.php?s=&threadid=90509&perpage=15&pagenumber=2
- function utf82gb($text, &$charset) {
- $p = "/[xf0-xf7][x80-xbf]{3}|[xe0-xef][x80-xbf]{2}|[xc2-xdf][x80-xbf]|[x01-x7f]+/";
- preg_match_all($p,$text,$r);
- $utf8 = array_flip($charset);
- foreach($r[0] as $k=>$v)
- if(isset($utf8[$v]))
- $r[0][$k] = $utf8[$v];
- return join('',$r[0]);
- }
- $s = gb2utf8('这是对照表的测试');
- echo utf82gb($s,$charset);