Skip to content


GB2UFT8

看到好多人还在用4、5年前的GB – UTF8转换程序,真的很难过。就算不用ICOVN函数,也可以进步一点吧,

上传的文件包含gb2312全集,每行:
1到2字节,GB2312码
3字节,空格
4字节,对应utf8码的位数;
后面,utf8码+回车(rn);

附件: gb2utf8.txt

http://www.phpx.com/happy/showthread.php?s=&threadid=90509&perpage=15&pagenumber=1

//对照表的使用(一)

$filename = “gb2utf8.txt.new”;

$fp = fopen($filename,”r”);

while(! feof($fp)) {

list($gb,$utf8) = fgetcsv($fp,10);

$charset[$gb] = $utf8;

}

fclose($fp);

//以上读取对照表到数组备用

function gb2utf8($text) {

global $charset;

//提取文本中的成分,汉字为一个元素,连续的非汉字为一个元素

preg_match_all(“/(?:[x80-xff].)|[x01-x7f]+/”,$text,$tmp);

$tmp = $tmp[0];

//分离出汉字

$ar = array_intersect($tmp, array_keys($charset));

//替换汉字编码

foreach($ar as $k=>$v)

$tmp[$k] = $charset[$v];

//返回换码后的串

return join(”,$tmp);

}

echo gb2utf8(“haha,这是对照表的测试”);

?>

//对照表的使用(二)

//建立表gb2utf8,二个字段:gb、utf8。把对照表导入到表中(代码略)

function gbk2utf8($text) {

/* 提取文本中的汉字 */

preg_match_all(“/[x80-xff]./”,$text,$ar);

$expr = join(“‘,'”, array_unique($ar[0]));

/* 检索文本中所使用的汉字 */

mysql_connect();

mysql_select_db(‘test’);

$rs = mysql_query(“select * from gb2utf8 where gb in (‘$expr’)”);

while(list($key, $value) = mysql_fetch_row($rs))

$gb[$key] = $value;

/* 分解文本的各个成分 */

preg_match_all(“/(?:[x80-xff].)|[x01-x7f]+/”,$text,$ar);

$ar = $ar[0];

/* 替换汉字编码 */

foreach($ar as $k=>$v)

if(array_key_exists($v,$gb))

$ar[$k] = $gb[$v];

return join(‘ ‘,$ar);

}

?>

有人问及逆转换:utf82gb
假定读照表已读到数组charset,则

原贴

http://www.phpx.com/happy/showthread.php?s=&threadid=90509&perpage=15&pagenumber=2

function utf82gb($text, &$charset) {

$p = “/[xf0-xf7][x80-xbf]{3}|[xe0-xef][x80-xbf]{2}|[xc2-xdf][x80-xbf]|[x01-x7f]+/”;

preg_match_all($p,$text,$r);

$utf8 = array_flip($charset);

foreach($r[0] as $k=>$v)

if(isset($utf8[$v]))

$r[0][$k] = $utf8[$v];

return join(”,$r[0]);

}

$s = gb2utf8(‘这是对照表的测试’);

echo utf82gb($s,$charset);

Posted in PHP, 技术.


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.