我正在做一些HTML DOM操作:
function parse_html($html) { $dom->loadHTML($html); libxml_clear_errors(); // Parse DOM return $dom->saveHTML(); }问题是我的HTML包含一些PHP代码,其中一些是在HTML实体中转换的。 例如,如果$html包含:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <?php // lang=es $pwd = $parameter['pwd']; $url = $parameter['url']; ?> <p> You are now registered. Go to -> <a href="<?php echo $url ?>">control panel</a> to change the settings. </p>它改变了:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head><meta http-equiv="content-type" content="text/html; charset=UTF-8"></head> <body> <?php // lang=es $pwd = $parameter['pwd']; $url = $parameter['url']; ?><p> You are now registered. Go to -> <a href="<?php%20echo%20%24url%20?>">control panel</a> to change the settings. </p> </body> </html><?php echo $url ?>在实体中转换,但我不能使用像* html_entity_decode *这样的函数,因为它还会解码一些必须保留实体的实体。
如何解析包含PHP代码的DOM?
I'm doing some HTML DOM manipulations:
function parse_html($html) { $dom->loadHTML($html); libxml_clear_errors(); // Parse DOM return $dom->saveHTML(); }The problem is my HTML contains some PHP code and some of them is transformed in HTML entities. For example if $html contains this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <?php // lang=es $pwd = $parameter['pwd']; $url = $parameter['url']; ?> <p> You are now registered. Go to -> <a href="<?php echo $url ?>">control panel</a> to change the settings. </p>It's transformed in this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head><meta http-equiv="content-type" content="text/html; charset=UTF-8"></head> <body> <?php // lang=es $pwd = $parameter['pwd']; $url = $parameter['url']; ?><p> You are now registered. Go to -> <a href="<?php%20echo%20%24url%20?>">control panel</a> to change the settings. </p> </body> </html>The <?php echo $url ?> is converted in entities, but I cannot use a function like *html_entity_decode* because it will decode also some entities that must remain entities.
How can I parse a DOM that contains PHP code?
最满意答案
何时以及如何构建$html变量? 它是在那个地方和时间你想要解析内部的PHP。 如果你试图将它吐出来之后它会像只是一个字符串一样吐出而不会被解析。
为了更清楚,使用当时包含的php构建$html变量。 或许您正在构建模板。 在这种情况下,您将采用不同的方式。
如果您在$html变量播放后尝试填写php内容,则可以使用str_replace()或其他类似函数来实现某些效果。
The solution I've found is to create a couple of functions to encode/decode the PHP strings.
function encode_php($html) { return preg_replace_callback('#<\?php.*\?>#imsU', '_encode_php', $html); } function _encode_php($matches) { return 'PHP_ENCRYPTED_CODE_BEGIN'.base64_encode($matches[0]).'PHP_ENCRYPTED_CODE_END'; } function decode_php($html) { return preg_replace_callback('#PHP_ENCRYPTED_CODE_BEGIN(.*)PHP_ENCRYPTED_CODE_END#imsU', '_decode_php', $html); } function _decode_php($matches) { return base64_decode($matches[1]); }It's important to choose a prefix and a suffix that you are sure don't appear in your files. This solution has been tested with 2500 HTML files and it works.
更多推荐
发布评论