I'm trying to get from binance the historical futures prices of the futures contract that expires on the 31st of December 2021.
I have figured this out for perps but am struggling with a futures contract with a delivery date. The code for the perps is below
df = pd.DataFrame(client.futures_historical_klines(
symbol='BTCUSDT',
interval='1d',
start_str='2021-06-01',
end_str='2021-06-30'
))
I assumed that replacing the symbol with BTCUSD_211231 or BTCUSDT_211231 would have done the trick, but unfortunately I get the below error message:
BinanceAPIException: APIError(code=-1121): Invalid symbol.
Any help is much appreciated!
Thanks
According to the binance documentation, you can set contractType for the desired contract.
The following options for contractType are available:
PERPETUAL
CURRENT_MONTH
NEXT_MONTH
CURRENT_QUARTER
NEXT_QUARTER
The following code works for me:
import binance
import pandas as pd
client = binance.Client()
r = client.futures_continous_klines(
pair='BTCUSDT',
contractType='CURRENT_QUARTER',
interval='1d',
)
df = pd.DataFrame(r)
print(df)
Output:
0 1 2 3 4 5 6 7 8 9 10 11
0 1612310400000 36054.1 39000.0 31111.0 38664.1 688.738 1612396799999 21632849.1822 33908 336.944 10440572.5057 0
1 1612396800000 38664.1 43820.2 37445.4 38328.3 757.761 1612483199999 29584739.5781 16925 387.058 15156395.7362 0
2 1612483200000 38304.6 39955.4 37858.3 39848.1 383.410 1612569599999 14995639.3696 5563 183.214 7170636.7752 0
3 1612569600000 39876.5 42727.3 39437.6 41245.1 453.336 1612655999999 18775322.9609 8898 225.566 9347858.6798 0
4 1612656000000 41240.3 41642.5 38639.2 40592.0 428.693 1612742399999 17269756.5859 7553 202.850 8175165.3435 0
.. ... ... ... ... ... ... ... ... ... ... ... ..
242 1633219200000 48654.6 50479.1 48100.0 49325.1 2222.058 1633305599999 109305621.1491 27633 1106.026 54396950.3347 0
243 1633305600000 49325.1 50738.3 47961.5 50461.2 1808.321 1633391999999 89174286.5122 28367 925.258 45643475.0561 0
244 1633392000000 50491.5 53151.3 50245.3 52764.5 1860.870 1633478399999 95741544.2087 29105 921.897 47452449.1528 0
245 1633478400000 52769.4 57442.1 51606.6 56710.1 2431.580 1633564799999 132849081.0296 38013 1225.920 67014360.6873 0
246 1633564800000 56723.3 56739.9 54769.0 55645.2 1188.181 1633651199999 66322580.2665 21176 570.967 31878146.7653 0
[247 rows x 12 columns]
The meaning of each column in the above dataframe is documented in the Binance documentation (see right side, under "Response").
Related
I have a trouble of pulling prices on the Bid and Ask columns of this website: [https://banggia.vps.com.vn/chung-khoan/derivative-VN30][1]. Now I can only pull the name of the class, which is "price-table-content". How can I improve these codes so that I can pull prices on the Bid and Ask columns? Any helps to pull these prices are greatly appreciated :)
from selenium import webdriver
options = webdriver.ChromeOptions()
options.headless = True
path = 'C:/Users/quank/PycharmProjects/pythonProject2/chromedriver.exe'
driver = webdriver.Chrome(executable_path=path, options=options)
url = 'https://banggia.vps.com.vn/chung-khoan/derivative-VN30'
driver.get(url=url)
element = driver.find_elements_by_css_selector('#root > div > div.content.undefined >
div.derivative > table.price-table > tbody')
for i in element:
print(i.get_attribute('outerHTML'))
Here is the result of running these codes
C:\Users\quank\PycharmProjects\Botthudulieu\venv\Scripts\python.exe
C:/Users/quank/PycharmProjects/pythonProject2/Botthudulieu.py
<tbody class="price-table-content"></tbody>
When you check the network activity you'll see that the data is retrieved from an api. So query the api directly rather than trying to scrape the site.
import requests
data = requests.get('https://bgapidatafeed.vps.com.vn/getpsalldatalsnapshot/VN30F2109,VN30F2110,VN30F2112,VN30F2203').json()
Or with pandas:
import pandas as pd
df = pd.read_json('https://bgapidatafeed.vps.com.vn/getpsalldatalsnapshot/VN30F2109,VN30F2110,VN30F2112,VN30F2203')
Resulting dataframe:
id
sym
mc
c
f
r
lastPrice
lastVolume
lot
avePrice
highPrice
lowPrice
fBVol
fBValue
fSVolume
fSValue
g1
g2
g3
g4
g5
g6
g7
mkStatus
listing_status
matureDate
closePrice
ptVol
oi
oichange
lv
0
2100
VN30F2109
4
1505.1
1308.3
1406.7
1420
6018
351832
1406.49
1422.9
1390
2011
0
2225.0
0
1420.00
37
i
1419.90
232
i
1419.80
3
i
1420.10
289
i
1420.20
2
i
3
356133
e
A
0
37433
3
1
2100
VN30F2110
4
1504.3
1307.5
1405.9
1418
14
462
1406.94
1422
1390
0
0
1.0
0
1418.00
1
i
1417.80
1
i
1417.50
1
i
1420.00
4
i
1421.00
1
i
1
523
e
A
0
193
2
2
2100
VN30F2112
4
1505.5
1308.7
1407.1
1420
1
54
1424.31
1420
1390.8
0
0
0
1412.30
1
i
1411.60
1
i
1411.20
1
i
1420.80
1
i
1421.20
1
i
1
88
e
A
0
596
1
3
2100
VN30F2203
4
1503.9
1307.3
1405.6
1420
1
50
1402.19
1420
1390
0
0
0
1412.10
1
i
1412.00
1
i
1410.10
1
i
1420.50
1
i
1421.00
2
i
2
85
e
A
0
138
1
Im trying to get following output. Stuck to get Total.
Here is my code;
def generate_invoice_summary_info():
file_path = 'output.xlsx'
df = pd.read_excel(file_path, sheet_name='Invoice Details', usecols="E:F,I,L:M")
df['Price'] = df['Price'].astype(float)
# df['Total'] = df.groupby(["Invoice Cost Centre", "Invoice Category"]).agg({'Price': 'sum'}).reset_index()
df = pd.pivot_table(df, index=["Invoice Cost Centre", "Invoice Category"],columns=['Price','Reporting Frequency','Data Feed'],
aggfunc=len ,fill_value=0,margins=True)
print(df.head())
df.to_excel('a.xlsx',sheet_name='Invoice Summary')
Above code produces following output (90% right)
I got stuck to find the Total column.
Total column is calcultaed for each row, based on count* price
Total = count*price column
How can I do that in pivot table?
I used margins attribute , but it gives row sum only.
Edit
print(df):
Price 10.4 ... 85.0 All
Reporting Frequency M ... M
Data Feed BWH EMAIL ... StarBOS
Invoice Cost Centre Invoice Category ...
D3TM Reseller Non Equity 21 10 ... 0 125
EQUITYEMP Baileys 0 7 ... 0 10
Energy NSW 16 0 ... 0 32
Far North Queensland 3 0 ... 0 6
South East 6 0 ... 0 16
Cooper & Dysart 0 0 ... 0 3
Petro Fuel & Lubricants 8 0 ... 0 20
South East QLD Fuels 0 0 ... 0 19
R1M Retail QLD 60 0 ... 0 867
I have two dataframes with NHL hockey stats. One contains every game played by every team for the last ten years, and the other is where I want to fill it up with calculated values. Simply put, I want to take a metric from a team's first five games, sum it, and put that into the other df. I've trimmed my dfs below to exclude other stats and will only look at one stat.
df_all contains all of the games:
>>> df_all
season gameId playerTeam opposingTeam gameDate xGoalsFor xGoalsAgainst
1 2008 2008020001 NYR T.B 20081004 2.287 2.689
6 2008 2008020003 NYR T.B 20081005 1.793 0.916
11 2008 2008020010 NYR CHI 20081010 1.938 2.762
16 2008 2008020019 NYR PHI 20081011 3.030 3.020
21 2008 2008020034 NYR N.J 20081013 1.562 3.454
... ... ... ... ... ... ... ...
142576 2015 2015030185 L.A S.J 20160422 2.927 2.042
142581 2017 2017030171 L.A VGK 20180411 1.275 2.279
142586 2017 2017030172 L.A VGK 20180413 1.907 4.642
142591 2017 2017030173 L.A VGK 20180415 2.452 3.159
142596 2017 2017030174 L.A VGK 20180417 2.427 1.818
df_sum_all will contain the calculated stats, for now it has a bunch of empty columns:
>>> df_sum_all
season team xg5 xg10 xg15 xg20
0 2008 NYR 0 0 0 0
1 2009 NYR 0 0 0 0
2 2010 NYR 0 0 0 0
3 2011 NYR 0 0 0 0
4 2012 NYR 0 0 0 0
.. ... ... ... ... ... ...
327 2014 L.A 0 0 0 0
328 2015 L.A 0 0 0 0
329 2016 L.A 0 0 0 0
330 2017 L.A 0 0 0 0
331 2018 L.A 0 0 0 0
Here's my function for calculating the ratio of xGoalsFor and xGoalsAgainst.
def calcRatio(statfor, statagainst, games, season, team, statsdf):
tempFor = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statfor).sum())
tempAgainst = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statagainst).sum())
tempRatio = tempFor / tempAgainst
return tempRatio
I believe it's logical enough. I input the stat I want to make a ratio from, how many games to sum, the season and team to match on, and then where to get the stats from. I've tested these functions separately and know that I can filter just fine, and sum the stats, and so forth. Here's an example of a standalone implementation of the tempFor calculation:
>>> statsdf = df_all
>>> team = 'TOR'
>>> season = 2015
>>> games = 3
>>> tempFor = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statfor).sum())
>>> print(tempFor)
8.618
See? It returns a value. However I can't do the same across the whole dataframe. What am I missing? I thought the way this works is essentially for every row, it sets the 'xg5' column to the output of the calcRatio function, which uses that row's 'season' and 'team' to filter on df_all.
>>> df_sum_all['xg5'] = calcRatio('xGoalsFor','xGoalsAgainst',5,df_sum_all['season'], df_sum_all['team'], df_all)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in calcRatio
File "/home/sebastian/.local/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 1142, in wrapper
raise ValueError("Can only compare identically-labeled " "Series objects")
ValueError: Can only compare identically-labeled Series objects
Cheers, thanks for any help!
Update: I used iterrows() and it worked fine, so I must just not understand vectorization very well. It's the same function, though - why does it work in one fashion, but not another?
>>> emptyseries = []
>>> for index, row in df_sum_all.iterrows():
... emptyseries.append(calcRatio('xGoalsFor','xGoalsAgainst',5,row['season'],row['team'], df_all))
...
>>> df_sum_all['xg5'] = emptyseries
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
>>> df_sum_all
season team xg5 xg10 xg15 xg20
0 2008 NYR 0.826260 0 0 0
1 2009 NYR 1.288390 0 0 0
2 2010 NYR 0.915942 0 0 0
3 2011 NYR 0.730498 0 0 0
4 2012 NYR 0.980744 0 0 0
.. ... ... ... ... ... ...
327 2014 L.A 0.823998 0 0 0
328 2015 L.A 1.147412 0 0 0
329 2016 L.A 1.054947 0 0 0
330 2017 L.A 1.369005 0 0 0
331 2018 L.A 0.721411 0 0 0
[332 rows x 6 columns]
"ValueError: Can only compare identically-labeled Series objects"
tempFor = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statfor).sum())
tempAgainst = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statagainst).sum())
The input for variables:
team: df_sum_all['team']
season: df_sum_all['season']
statsdf: df_all
So in the code, (statsdf.playerTeam == team), it will compare between series from df_sum_all and from df_all.
If these two are not identically labeled, you will see the above error.
I tried the code for exctacting product name, year, values fro table but some where i getting issues .
my code:
import requests
import pandas as pd
import pymysql
try:
df = []
dates1 = []
try:
url = 'http://cpmaindia.com/fiscal_custom_duty.php'
html = requests.get(url).content
tab_list = pd.read_html(html)
tab = tab_list[0]
tab.apply(lambda x: x.tolist(), axis=1)
tab = tab.values.tolist()
print(tab)
except Exception as e:
raise e
except Exception as e:
raise e
This I tried But not getting desire output .
want to parse table only.
thanks
tab_list[0] produces the following:
print (tab)
0
0 <!-- function MM_swapImgRestore() { //v3.0 va...
1 Custom Duty Import Duty on Petrochemicals (%)...
2 <!-- body { \tmargin-left: 0px; \tmargin-top: ...
Did you mean to grab tab_list[8]?
Also, if you're using pandas to read in the table from html, there is no need to use requests:
import pandas as pd
url = 'http://cpmaindia.com/fiscal_custom_duty.php'
tab_list = pd.read_html(url)
table = tab_list[8]
table.columns = table.iloc[0,:]
table = table.iloc[1:,2:-1]
Output:
print (table)
0 Import Duty on Petrochemicals (%) ... Import Duty on Petrochemicals (%)
1 Product / Year - ... 16/17
2 Naphtha ... 5
3 Ethylene ... 2.5
4 Propylene ... 2.5
5 Butadiene ... 2.5
6 Benzene ... 2.5
7 Toluene ... 2.5
8 Mixed Xylene ... 2.5
9 Para Xylene ... 0
10 Ortho Xylene ... 0
11 LDPE ... 7.5
12 LLDPE ... 7.5
13 HDPE ... 7.5
14 PP ... 7.5
15 PVC ... 7.5
16 PS ... 7.5
17 EDC ... 2
18 VCM ... 2
19 Styrene ... 2
20 SBR ... 10
21 PBR ... 10
22 MEG ... 5
23 DMT ... 5
24 PTA ... 5
25 ACN ... 5
[25 rows x 7 columns]
So I am wondering how to scrape multiple websites/urls and save them, (the data), to a csv file. I can only save the first page right now. I have tried many different ways but it doesn´t seem to work. How can I save 5 pages in a csv file and not only one?
import requests
import csv
from bs4 import BeautifulSoup
import pandas as pd
import re
from datetime import timedelta
import datetime
import time
urls = ['https://store.steampowered.com/search/?specials=1&page=1', 'https://store.steampowered.com/search/?specials=1&page=2', 'https://store.steampowered.com/search/?specials=1&page=3', 'https://store.steampowered.com/search/?specials=1&page=4','https://store.steampowered.com/search/?specials=1&page=5']
for url in urls:
my_url = requests.get(url)
html = my_url.content
soup = BeautifulSoup(html,'html.parser')
data = []
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
for container in soup.find_all('div', attrs={'class':'responsive_search_name_combined'}):
title = container.find('span',attrs={'class':'title'}).text
if container.find('span',attrs={'class':'win'}):
win = '1'
else:
win = '0'
if container.find('span',attrs={'class':'mac'}):
mac = '1'
else:
mac = '0'
if container.find('span',attrs={'class':'linux'}):
linux = '1'
else:
linux = '0'
data.append({
'Title':title.encode('utf-8'),
'Time':st,
'Win':win,
'Mac':mac,
'Linux':linux})
with open('data.csv', 'w',encoding='UTF-8', newline='') as f:
fields = ['Title','Win','Mac','Linux','Time']
writer = csv.DictWriter(f, fieldnames=fields)
writer.writeheader()
writer.writerows(data)
testing = pd.read_csv('data.csv')
heading = testing.head(100)
discription = testing.describe()
print(heading)
the issue is you are re-initializing your data after each url. And then writing it after the very last iteration, meaning you'll always just have whatever the last data you got from the last url. You'll need to have that data appending and not be overwritten after each iteration:
import requests
import csv
from bs4 import BeautifulSoup
import pandas as pd
import re
from datetime import timedelta
import datetime
import time
urls = ['https://store.steampowered.com/search/?specials=1&page=1', 'https://store.steampowered.com/search/?specials=1&page=2', 'https://store.steampowered.com/search/?specials=1&page=3', 'https://store.steampowered.com/search/?specials=1&page=4','https://store.steampowered.com/search/?specials=1&page=5']
results_df = pd.DataFrame() #<-- initialize a results dataframe to dump/store the data you collect after each iteration
for url in urls:
my_url = requests.get(url)
html = my_url.content
soup = BeautifulSoup(html,'html.parser')
data = [] #<-- your data list is "reset" after each iteration of your urls
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
for container in soup.find_all('div', attrs={'class':'responsive_search_name_combined'}):
title = container.find('span',attrs={'class':'title'}).text
if container.find('span',attrs={'class':'win'}):
win = '1'
else:
win = '0'
if container.find('span',attrs={'class':'mac'}):
mac = '1'
else:
mac = '0'
if container.find('span',attrs={'class':'linux'}):
linux = '1'
else:
linux = '0'
data.append({
'Title':title,
'Time':st,
'Win':win,
'Mac':mac,
'Linux':linux})
temp_df = pd.DataFrame(data) #<-- temporary storing the data in a dataframe
results_df = results_df.append(temp_df).reset_index(drop=True) #<-- dumping that data into a results dataframe
results_df.to_csv('data.csv', index=False) #<-- writing the results dataframe to csv
testing = pd.read_csv('data.csv')
heading = testing.head(100)
discription = testing.describe()
print(heading)
Output:
print (results_df)
Linux Mac ... Title Win
0 0 0 ... Tom Clancy's Rainbow Six® Siege 1
1 0 0 ... Tom Clancy's Rainbow Six® Siege 1
2 1 1 ... Total War: WARHAMMER II 1
3 0 0 ... Tom Clancy's Rainbow Six® Siege 1
4 1 1 ... Total War: WARHAMMER II 1
5 0 1 ... Frostpunk 1
6 0 0 ... Tom Clancy's Rainbow Six® Siege 1
7 1 1 ... Total War: WARHAMMER II 1
8 0 1 ... Frostpunk 1
9 1 1 ... Two Point Hospital 1
10 0 0 ... Tom Clancy's Rainbow Six® Siege 1
11 1 1 ... Total War: WARHAMMER II 1
12 0 1 ... Frostpunk 1
13 1 1 ... Two Point Hospital 1
14 0 0 ... Black Desert Online 1
15 0 0 ... Tom Clancy's Rainbow Six® Siege 1
16 1 1 ... Total War: WARHAMMER II 1
17 0 1 ... Frostpunk 1
18 1 1 ... Two Point Hospital 1
19 0 0 ... Black Desert Online 1
20 1 1 ... Kerbal Space Program 1
21 0 0 ... Tom Clancy's Rainbow Six® Siege 1
22 1 1 ... Total War: WARHAMMER II 1
23 0 1 ... Frostpunk 1
24 1 1 ... Two Point Hospital 1
25 0 0 ... Black Desert Online 1
26 1 1 ... Kerbal Space Program 1
27 1 1 ... BioShock Infinite 1
28 0 0 ... Tom Clancy's Rainbow Six® Siege 1
29 1 1 ... Total War: WARHAMMER II 1
... .. ... ... ..
1595 0 0 ... VEGAS Pro 14 Edit Steam Edition 1
1596 0 0 ... ABZU 1
1597 0 0 ... Sacred 2 Gold 1
1598 0 0 ... Sakura Bundle 1
1599 1 1 ... Distance 1
1600 0 0 ... LEGO® Batman™: The Videogame 1
1601 0 0 ... Sonic Forces 1
1602 0 0 ... The Stronghold Collection 1
1603 0 0 ... Miscreated 1
1604 0 0 ... Batman™: Arkham VR 1
1605 1 1 ... Shadowrun Returns 1
1606 0 0 ... Upgrade to VEGAS Pro 16 Edit 1
1607 0 0 ... Girl Hunter VS Zombie Bundle 1
1608 0 1 ... Football Manager 2019 Touch 1
1609 0 1 ... Total War: NAPOLEON - Definitive Edition 1
1610 1 1 ... SteamWorld Dig 2 1
1611 0 0 ... Condemned: Criminal Origins 1
1612 0 0 ... Company of Heroes 1
1613 0 0 ... LEGO® Batman™ 2: DC Super Heroes 1
1614 1 1 ... Euro Truck Simulator 2 Map Booster 1
1615 0 0 ... Sonic Adventure DX 1
1616 0 0 ... Worms Armageddon 1
1617 1 1 ... Unforeseen Incidents 1
1618 0 0 ... Warhammer 40,000: Space Marine Collection 1
1619 0 0 ... VEGAS Pro 14 Edit Steam Edition 1
1620 0 0 ... ABZU 1
1621 0 0 ... Sacred 2 Gold 1
1622 0 0 ... Sakura Bundle 1
1623 1 1 ... Distance 1
1624 0 0 ... Worms Revolution 1
[1625 rows x 5 columns]
So I was apparently very blind to my code, that can happen when you stare at it all day. All I actually had to do was to move the "data = []" above the for loop so it wouldn´t reset every time.