问题描述
限时送ChatGPT账号..我有一个基于教程的非常基本的 XML 解析器 此处,用于在 Python 中阅读 RSS 提要.
I have a very basic XML parser based on the tutorial provided here, for the purpose of reading RSS feeds in Python.
def GetRSS(RSSurl):
url_info = urllib.urlopen(RSSurl)
if (url_info):
xmldoc = minidom.parse(url_info)
if (xmldoc):
for item_node in xmldoc.documentElement.childNodes:
if (item_node.nodeName == "item"):
PrintNodeItems(item_node, ["title","link"])
else:
print "error"
def PrintNodeItems(XmlNode, items):
for item_node in XmlNode.childNodes:
if item_node.nodeName in items:
PrintNodesText(item_node)
def PrintNodesText(XmlNode):
text = ""
for text_node in XmlNode.childNodes:
if(text_node.nodeType == Node.TEXT_NODE):
text = text_node.nodeValue
if (len(text)>0):
print text
print ""
我已经在教程中提供的地址(http://rss.slashdot/Slashdot/slashdot),它工作得很好,为我提供了正确的反馈.然而,我在学习如何编写这个模块时的意图是使用它来阅读 RedLetterMedia (http://redlettermedia/feed/).当我尝试在该地址的 Python Shell 中使用 GetRSS 函数时,我得到一个空行作为反馈而不是正确的结果.我还在 CNN 的世界"RSS 提要 上对其进行了测试,但没有收到任何结果,因为好.我在所有地址上都使用了 urllib.urlopen,并且它们的节点和子节点似乎都使用相同的格式(
).
I have tested the GetRSS function on the address provided in the tutorial (http://rss.slashdot/Slashdot/slashdot), and it works just fine, providing me with the correct feedback. However, my intention when learning how to write this module was to use it for reading the RSS feed at RedLetterMedia (http://redlettermedia/feed/). When I attempt to use the GetRSS function in the Python Shell on that address, I get a blank line as feedback instead of the correct results. I also tested it on CNN's "World" RSS feed, and received no results for that as well. I have used urllib.urlopen on all addresses and they all appear to use the same format for their nodes and child nodes (<item><title><description><link></item>
).
我想,就像我之前的问题一样,我可能遗漏了一些非常明显的东西.有人知道那是什么吗?
I figure, as was the case for my previous question, there is probably something very obvious I am missing. Does anybody know what that is?
为了记录,我的错误消息根本没有出现,但这可能是因为我错误地将其集成到代码中;我不会把它超出我的范围.
and for the record, my error message has not come up at all, but maybe that's because I integrated it into the code incorrectly; I would not put it beyond me.
更新:使用 stackoverflow 上的多个回答问题从头开始重写代码.奇迹般有效!
update: Rewrote code from scratch using multiple answered questions on stackoverflow. Works like a charm!
def GetRSS(RSSurl):
url_info = urllib.urlopen(RSSurl)
if (url_info):
xmldoc = minidom.parse(url_info)
if (xmldoc):
channel = xmldoc.getElementsByTagName('channel')
for node in channel:
item = xmldoc.getElementsByTagName('item')
for node in item:
alist = xmldoc.getElementsByTagName('link')
for a in alist:
linktext = a.firstChild.data
print linktext
def main():
GetRSS('http://redlettermedia/feed/')
推荐答案
错误在这里:
for item_node in xmldoc.documentElement.childNodes:
if (item_node.nodeName == "item"):
没有根 item
元素,只有一个 channel
.我通过在循环中打印 nodeName
的所有值发现了这一点.
There is no root item
element, just a channel
. I found this out by just printing all the values of nodeName
in the loop.
这篇关于Python XML 解析不适用于某些站点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论