当我输入
=IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2")在我的Google表格中,我得到:#N/A Imported content is empty.
in my google sheet, I get: #N/A Imported content is empty.
但是,当我输入时:
=IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*")我得到了一些内容,所以我可以假定对该页面的访问没有被阻止.
I get some content, so I can presume that access to the page is not blocked.
毫无疑问,该页面包含几个h2标签.
And the page contains several h2 tags without any doubt.
那是什么问题?
推荐答案
- 您想知道以下情况的原因.
- =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2")返回#N/A Imported content is empty.
- =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*")返回内容.
- You want to know the reason of the following situation.
- =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty.
- =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content.
- 使用此脚本时,请首先运行myFunction()函数并检索端点.作为测试用例,请将端点放入单元格"A1".并将=IMPORTXML(A1,"//h2")放入单元格"A2".这样,就可以检索值.
- When you use this script, at first, please run the function of myFunction() and retrieve the endpoint. And as a test case, please put the endpoint to the cell "A1". And put =IMPORTXML(A1,"//h2") to the cell "A2". By this, the values can be retrieved.
- 运行脚本时,标记h2的值将直接放置到活动电子表格中.
- When you run the script, the values of the tag h2 are directly put to the active Spreadsheet.
- UrlFetchApp类
- XmlService类
- Class UrlFetchApp
- Class XmlService
如果我的理解正确,那么这个答案如何?
If my understanding is correct, how about this answer?
当我看到www.ilgiornale.it/autore/franco-battaglia.html的HTML数据时,我注意到它的错误之处.如下.
When I saw the HTML data of www.ilgiornale.it/autore/franco-battaglia.html, I noticed that the wrong point of it. It is as follows.
window.jQuery || document.write("<script src='/sites/all/modules/jquery_update/replace/jquery/jquery.min.js'>\x3C/script>")在这种情况下,脚本标签不会像\x3C/script>那样关闭.似乎IMPORTXML检索此行时,脚本选项卡未关闭.我可以确认将\x3C转换为<时,=IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2")正确返回了h2标记的值.
In this case, the script tag is not closed like \x3C/script>. It seems that when IMPORTXML retrieves this line, the script tab is not closed. I could confirm that when \x3C is converted to <, =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2") correctly returns the values of h2 tag.
通过这种方式,似乎出现了=IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2")返回#N/A Imported content is empty的问题.
By this, it seems that the issue that =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty occurs.
关于=IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*")返回内容的原因,当我输入此公式时,找不到脚本选项卡的值.从这种情况来看,我认为脚本标签可能有问题.因此,我可以找到上述错误点.我可以确认,当\x3C转换为<时,=IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*")返回的值包括脚本标记的值.
About the reason that =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content, when I put this formula, I couldn't find the values of the script tab. From this situation, I thought that the script tag might have an issue. So I could find the above wrong point. I could confirm that when \x3C is converted to <, =IMPORTXML("www.ilgiornale.it/autore/franco-battaglia.html","*") returns the values including the values of the script tag.
为了避免出现上述问题,需要将\x3C修改为<.那么以下解决方法呢?在这些变通办法中,我使用了Google Apps脚本.请考虑这些变通办法只是几种变通办法中的两个.
In order to avoid above issue, it is required to be modified \x3C to <. So how about the following workarounds? In these workarounds, I used Google Apps Script. Please think of these workarounds as just two of several workarounds.
首先,在这种模式下,从URL下载HTML数据,然后修改错误的点.然后,将修改后的HTML数据创建为文件,并共享该文件.并检索文件的URL.使用该URL检索值.
In this pattern, at first, download the HTML data from the URL, and modify the wrong point. Then, the modified HTML data is created as a file, and the file is shared. And retrieve the URL of the file. Using this URL, the values are retrieved.
function myFunction() { var url = "www.ilgiornale.it/autore/franco-battaglia.html"; var data = UrlFetchApp.fetch(url).getContentText().replace(/\\x3C/g, "<"); var file = DriveApp.createFile("htmlData.html", data, MimeType.HTML); file.setSharing(DriveApp.Access.ANYONE_WITH_LINK, DriveApp.Permission.VIEW); var endpoint = "drive.google/uc?id=" + file.getId() + "&export=download"; Logger.log(endpoint) }在这种模式下,通过解析HTML数据直接将标记h2的值检索出来,并将其放入活动的电子表格中.
In this pattern, the values of the tag h2 are directly retrieved by parsing HTML data and put them to the active Spreadsheet.
function myFunction() { var url = "www.ilgiornale.it/autore/franco-battaglia.html"; var data = UrlFetchApp.fetch(url).getContentText().match(/<h2[\s\S]+?<\/h2>/g); var xml = XmlService.parse("<temp>" + data.join("") + "</temp>"); var h2Values = xml.getRootElement().getChildren("h2").map(function(e) {return [e.getValue()]}); var sheet = SpreadsheetApp.getActiveSheet(); sheet.getRange(sheet.getLastRow() + 1, 1, h2Values.length, 1).setValues(h2Values); Logger.log(h2Values) }如果我误解了您的问题,而这不是您想要的方向,我深表歉意.
If I misunderstood your question and this was not the direction you want, I apologize.
更多推荐
另一个IMPORTXML返回空内容
发布评论