实例是这样的:
将titile和Cost提取,出来
代码如下:
- from urllib.request import urlopen
- from bs4 import BeautifulSoup
-
-
- if __name__ == '__main__':
-
- html = urlopen("http://www.pythonscraping.com/pages/page3.html")
- bsObj = BeautifulSoup(html.read(), 'html.parser')
-
- list = bsObj.find_all('tr', class_ = 'gift')
-
- for line in list:
- lineObject = BeautifulSoup(str(line), 'xml')
- all = lineObject.find_all('td')
- print(all[0].get_text().rstrip().lstrip() + '\t' + all[2].get_text().rstrip().lstrip())
- pass
-
- pass
程序运行截图如下:
解释下,python中没有java和Qt的trim(),对于字符串去首尾空格,使用:
rstrip()和lstrip()
r为right,l为left。
find_all他返回的是ResultSet,可以使用for循环去遍历,如下: