BeautifulSoup如何获得跨度内容？(how BeautifulSoup get the content inside a span?)

我试图解析来自我设法分析匹配列的网站的夹具内容，但在解析日期和时间列时遇到困难。

我的程序

import re import pytz import requests import datetime from bs4 import BeautifulSoup from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError from espncricinfo.match import Match bigbash_article_link = "http://www.espncricinfo.com/ci/content/series/1128817.html?template=fixtures" r = requests.get(bigbash_article_link) bigbash_article_html = r.text soup = BeautifulSoup(bigbash_article_html, "html.parser") bigbash1_items = soup.find_all("span",{"class": "fixture_date"}) bigbash_items = soup.find_all("span",{"class": "play_team"}) bigbash_article_dict = {} date_dict = {} for div in bigbash_items: a = div.find('a')['href'] bigbash_article_dict[div.find('a').string] = a print(bigbash_article_dict) for div in bigbash1_items: a = div.find('span').string date_dict[div.find('span').string] = a print(date_dict)

当我执行这个时，我得到print（bigbash_article_dict）输出，但print（date_dict）给了我错误，我该如何解析日期和时间内容？

I'm trying to parse fixture contents from a website I managed to parse Match column but facing difficulty in parsing date and time column.

My program

import re import pytz import requests import datetime from bs4 import BeautifulSoup from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError from espncricinfo.match import Match bigbash_article_link = "http://www.espncricinfo.com/ci/content/series/1128817.html?template=fixtures" r = requests.get(bigbash_article_link) bigbash_article_html = r.text soup = BeautifulSoup(bigbash_article_html, "html.parser") bigbash1_items = soup.find_all("span",{"class": "fixture_date"}) bigbash_items = soup.find_all("span",{"class": "play_team"}) bigbash_article_dict = {} date_dict = {} for div in bigbash_items: a = div.find('a')['href'] bigbash_article_dict[div.find('a').string] = a print(bigbash_article_dict) for div in bigbash1_items: a = div.find('span').string date_dict[div.find('span').string] = a print(date_dict)

When I execute this I get print(bigbash_article_dict) output, but print(date_dict) gives me error, how can I parse date and time content?

最满意答案

按照您的代码，您想要获取标签范围内的内容。所以你应该使用“div.contents”来获取span的内容。

你的问题应该是BeautifulSoup如何获得跨度内容。

eg. div= <span class="fixture_date"> Thu Feb 22 </span> div.contents[0].strip()= Thu Feb 22 ------------ for div in bigbash1_items: print("div=",div) print("div.contents[0].strip()=",div.contents[0].strip(),"\r\n------------\r\n")

Follow your code, you want to get the content inside the tag span. So you should using "div.contents" to get the contents of span.

And your question should be how BeautifulSoup get the content inside a span.

eg. div= <span class="fixture_date"> Thu Feb 22 </span> div.contents[0].strip()= Thu Feb 22 ------------ for div in bigbash1_items: print("div=",div) print("div.contents[0].strip()=",div.contents[0].strip(),"\r\n------------\r\n")

更多推荐

BeautifulSoup如何获得跨度内容？(how BeautifulSoup get the content inside a span?)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表