北上资金一直被誉为“聪明钱”,擅长左侧交易。现在很多机构和大户都会盯着北上资金调整自己的交易。这似乎已经是公开的秘密了。香港证券交易python教程所每天收盘都会公布当天北上资金的持股量,所以我们也可以爬取这份数据抄一抄北上资金的作业。
我们分别爬取沪港通和深港通的数据,然后再将两个 dataframe 合并起来,并保存为 csv 文件。
好了,不多说了上代码吧。
Code
- import urllib.request, urllib.parse, urllib.error
- from bs4 import BeautifulSoup
- import ssl
- import pandas as pd
-
- # Ignore SSL certificate errors
- ctx = ssl.create_default_context()
- ctx.check_hostname = False
- ctx.verify_mode = ssl.CERT_NONE
-
- urls = ['https://sc.hkexnews.hk/TuniS/www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh&t=sh',
- 'https://sc.hkexnews.hk/TuniS/www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh&t=sz']
-
- dates = []
- df_list = []
-
- for url in urls:
- html = urllib.request.urlopen(url, context=ctx).read()
- soup = BeautifulSoup(html, 'lxml')
-
- date = soup.find('input', class_='input-searchDate')['value']
- dates.append(date)
- codes = [code.find('div', class_='mobile-list-body').string for code in soup.find_all('td',class_='col-stock-code')]
- names = [name.find('div', class_='mobile-list-body').string for name in soup.find_all('td',class_='col-stock-name')]
- shareholding = [int(shareholding.find('div', class_='mobile-list-body').string.replace(',', '')) for shareholding in soup.find_all('td',class_='col-shareholding')]
- percent = [float(percent.find('div', class_='mobile-list-body').string.strip('%')) for percent in soup.find_all('td',class_='col-shareholding-percent')]
-
- df = pd.DataFrame(list(zip(codes, names, shareholding, percent)), columns=['code', 'stock', 'shareholding', 'shareholding%'])
- df_list.append(df)
-
- output = pd.DataFrame()
- if dates[0] == dates[1]:
- # combine dataframe sz and dataframe sh
- output = pd.concat(df_list)
- output.to_csv(fname, encoding='utf-8', index=False)
- else:
- print('failed to get northbound data from web')
作者:Yuki