您當(dāng)前位置：首頁 > php開源 > php教程 > php截斷帶html字符串文章內(nèi)容的方法

php截斷帶html字符串文章內(nèi)容的方法

來源：程序員人生發(fā)布時間：2013-10-26 05:12:14 閱讀次數(shù)：4277次

文章截斷使用主要是在列表頁面時我沒有寫描述這樣只能在文章中截取字符串了，但使用php 自帶函數(shù)會導(dǎo)致div未結(jié)束，從而頁面混亂了，那么要如何解決此問題呢？

博主寫好一篇文章，博客后臺一般會在搜索頁面或者列表頁面給出文章標(biāo)題和截斷了的的文章部分作為進(jìn)一步閱讀的入口。

Function: mb_substr( $str, $start, $length, $encoding )

$str，需要截斷的字符串

$start，截斷開始處

$length，長度（注意，這個跟mb_strimwidth不同，1就代表一個中文字符）

$encoding，編碼，我設(shè)為 utf-8

例,截斷文章標(biāo)題,控制在15個文字,代碼如下:

<?php echo mb_substr('www.phpfensi.com原創(chuàng)', 0, 15,"utf-8"); ?>

這樣對于純文本沒問題,但是我的是中間有html標(biāo)簽的于是問題來了,怎樣截斷一篇文章,注意,這篇文章不僅僅是普通的字符串文本,而是包含了各種格式化標(biāo)簽和樣式內(nèi)容的文本,如果處理不當(dāng),這些閉合標(biāo)簽無法正常關(guān)閉,從而破壞整個文檔流。

如果單純是純文本,下面這個函數(shù)差不多是夠用的,代碼如下:

<?php
/**
* 字符串截取，支持中文和其他編碼
*
* @param string $str 需要轉(zhuǎn)換的字符串
* @param string $start 開始位置
* @param string $length 截取長度
* @param string $charset 編碼格式
* @param string $suffix 截斷字符串后綴
* @return string
*/
function substr_ext($str, $start=0, $length, $charset="utf-8", $suffix="")
{
if(function_exists("mb_substr")){
return mb_substr($str, $start, $length, $charset).$suffix;
}
elseif(function_exists('iconv_substr')){
return iconv_substr($str,$start,$length,$charset).$suffix;
}
$re['utf-8'] = "/[x01-x7f]|[xc2-xdf][x80-xbf]|[xe0-xef][x80-xbf]{2}|[xf0-xff][x80-xbf]{3}/";
$re['gb2312'] = "/[x01-x7f]|[xb0-xf7][xa0-xfe]/";
$re['gbk'] = "/[x01-x7f]|[x81-xfe][x40-xfe]/";
$re['big5'] = "/[x01-x7f]|[x81-xfe]([x40-x7e]|xa1-xfe])/";
preg_match_all($re[$charset], $str, $match);
$slice = join("",array_slice($match[0], $start, $length));
return $slice.$suffix;
}

但是,如果需要截斷是網(wǎng)頁中的某部分格式化文本,上面的函數(shù)就不夠用了,它不具備處理格式化標(biāo)簽的能力。

這時,需要一個新函數(shù),它應(yīng)該是以上函數(shù)的升級加強(qiáng)版,它必須有能力正確的處理標(biāo)簽,下面找到一個

strip_tags() 函數(shù)剝?nèi)?HTML、XML 以及 PHP 的標(biāo)簽。

例子1,代碼如下:

<?php
echo strip_tags("Hello <b>world!</b>");
?>

輸出:Hello world!

這樣就好做了我們只要在上面基礎(chǔ)上如下操作,代碼如下:

<?php
$a = strip_tags("Hello <b>world!</b>");
substr_ext( $a,10) ;
//但是發(fā)現(xiàn)html不見了這個也不是什么好的解決辦法了。
?>

接著google 發(fā)現(xiàn)cns寫了一個支持html截取字符串的函數(shù),代碼如下:

/**
* 獲取字符在字符串中第N次出現(xiàn)的位置
* @param string $text 字符串
* @param string $key 字符
* @param int $int N
* @return int
*/
function strpos_int($text, $key, $int)
{
$keylen = strlen($key);
global $textlen;
if (!$textlen)
$textlen = strlen($text);
static $textpos = 0;
$pos = strpos($text, $key);
$int--;
if ($pos)
{
if ($int == 0)
$textpos+=$pos;
else
$textpos+=$pos + $keylen;
}
else
{
$int = 0;
$textpos = $textlen;
}
if ($int > 0)
{
strpos_int(substr($text, $pos + $keylen), $key, $int);
}
return $textpos;
}
/**
* 截取HTML
* @param string $string HTML 字符串
* @param int $length 截取的長度
* @param string $dot
* @param string $append
* @return string
*/
function cuthtml($string, $length, $dot = ' ...', $append = "")
{
$str = strip_tags($string);//先過濾標(biāo)簽
$new_str = iconv_substr($str, 0, $length, 'utf-8');
$last = iconv_substr($new_str, -1, 1, 'utf-8');
$sc = substr_count($new_str, $last);
$position = strpos_int($string, $last, $sc); //獲取截取真實(shí)的長度
if (function_exists('tidy_parse_string'))//服務(wù)器開啟tidy的話直接用函數(shù)不全html代碼即可
{
$options = array("show-body-only" => true);
return tidy_parse_string(mb_substr($string, 0, $position) . $dot . $append, $options, 'UTF8');
} else //沒有開啟tidy
{
if (strlen($string) <= $position)
{
return $string;
}
$pre = chr(1);
$end = chr(1);
$string = str_replace(array('&', '"', '<', '>'), array($pre . '&' . $end, $pre . '"' . $end, $pre . '<' . $end, $pre . '>' . $end), $string);
$strcut = '';
$n = $tn = $noc = 0;
while ($n < strlen($string))
{
$t = ord($string[$n]);
if ($t == 9 || $t == 10 || (32 <= $t && $t <= 126))
{
$tn = 1;
$n++;
$noc++;
} elseif (194 <= $t && $t <= 223)
{
$tn = 2;
$n += 2;
$noc += 2;
} elseif (224 <= $t && $t <= 239)
{
$tn = 3;
$n += 3;
$noc += 2;
} elseif (240 <= $t && $t <= 247)
{
$tn = 4;
$n += 4;
$noc += 2;
} elseif (248 <= $t && $t <= 251)
{
$tn = 5;
$n += 5;
$noc += 2;
} elseif ($t == 252 || $t == 253)
{
$tn = 6;
$n += 6;
$noc += 2;
} else
{
$n++;
}
if ($noc >= $position)
{
break;
}
}
if ($noc > $position)
{
$n -= $tn;
}
$strcut = substr($string, 0, $n);
$strcut = str_replace(array($pre . '&' . $end, $pre . '"' . $end, $pre . '<' . $end, $pre . '>' . $end), array('&', '"', '<', '>'), $strcut);
$pos = strrpos($strcut, chr(1));
if ($pos !== false)
{
$strcut = substr($strcut, 0, $pos);
}
return $strcut . $dot . $append;
}
}

生活不易，碼農(nóng)辛苦
如果您覺得本網(wǎng)站對您的學(xué)習(xí)有所幫助,可以手機(jī)掃描二維碼進(jìn)行捐贈
程序員人生

------分隔線----------------------------

上一篇 seo很迷惑時如何讓自己保持清醒的方法

下一篇 PHP校驗(yàn)ISBN碼的函數(shù)

分享到:

------分隔線----------------------------

為碼而活

積分：4237

15粉絲

7關(guān)注

欄目熱點(diǎn)

多多色-多人伦交性欧美在线观看-多人伦精品一区二区三区视频-多色视频-免费黄色视屏网站-免费黄色在线

php截斷帶html字符串文章內(nèi)容的方法