I am new to Python, Can i please seek some help from experts here?
I wish to construct a dataframe from https://api.cryptowat.ch/markets/summaries JSON response.
based on following filter criteria
Kraken listed currency pairs (Please take note, there are kraken-futures i dont want those)
Currency paired with USD only, i.e aaveusd, adausd....
Ideal Dataframe i am looking for is (somehow excel loads this json perfectly screenshot below)
Dataframe_Excel_Screenshot
resp = requests.get(https://api.cryptowat.ch/markets/summaries) kraken_assets = resp.json() df = pd.json_normalize(kraken_assets) print(df)
Output:
result.binance-us:aaveusd.price.last result.binance-us:aaveusd.price.high ...
0 264.48 267.32 ...
[1 rows x 62688 columns]
When i just paste the link in browser JSON response is with double quotes ("), but when i get it via python code. All double quotes (") are changed to single quotes (') any idea why?. Though I tried to solve it with json_normalize but then response is changed to [1 rows x 62688 columns]. i am not sure how do i even go about working with 1 row with 62k columns. i dont know how to extract exact info in the dataframe format i need (please see excel screenshot).
Any help is much appreciated. thank you!
the result JSON is a dict
load this into a dataframe
decode columns into products & measures
filter to required data
import requests
import pandas as pd
import numpy as np
# load results into a data frame
df = pd.json_normalize(requests.get("https://api.cryptowat.ch/markets/summaries").json()["result"])
# columns are encoded as product and measure. decode columns and transpose into rows that include product and measure
cols = np.array([c.split(".", 1) for c in df.columns]).T
df.columns = pd.MultiIndex.from_arrays(cols, names=["product","measure"])
df = df.T
# finally filter down to required data and structure measures as columns
df.loc[df.index.get_level_values("product").str[:7]=="kraken:"].unstack("measure").droplevel(0,1)
sample output
product
price.last
price.high
price.low
price.change.percentage
price.change.absolute
volume
volumeQuote
kraken:aaveaud
347.41
347.41
338.14
0.0274147
9.27
1.77707
613.281
kraken:aavebtc
0.008154
0.008289
0.007874
0.0219326
0.000175
403.506
3.2797
kraken:aaveeth
0.1327
0.1346
0.1327
-0.00673653
-0.0009
287.113
38.3549
kraken:aaveeur
219.87
226.46
209.07
0.0331751
7.06
1202.65
259205
kraken:aavegbp
191.55
191.55
179.43
0.030559
5.68
6.74476
1238.35
kraken:aaveusd
259.53
267.48
246.64
0.0339841
8.53
3623.66
929624
kraken:adaaud
1.61792
1.64602
1.563
0.0211692
0.03354
5183.61
8366.21
kraken:adabtc
3.757e-05
3.776e-05
3.673e-05
0.0110334
4.1e-07
252403
9.41614
kraken:adaeth
0.0006108
0.00063
0.0006069
-0.0175326
-1.09e-05
590839
367.706
kraken:adaeur
1.01188
1.03087
0.977345
0.0209986
0.020811
1.99104e+06
1.98693e+06
Hello Try the below code. I have understood the structure of the Dataset and modified to get the desired output.
`
resp = requests.get("https://api.cryptowat.ch/markets/summaries")
a=resp.json()
a['result']
#creating Dataframe froom key=result
da=pd.DataFrame(a['result'])
#using Transpose to get required Columns and Index
da=da.transpose()
#price columns contains a dict which need to be seperate Columns on the data frame
db=da['price'].to_dict()
da.drop('price', axis=1, inplace=True)
#intialising seperate Data frame for price
z=pd.DataFrame({})
for i in db.keys():
i=pd.DataFrame(db[i], index=[i])
z=pd.concat([z,i], axis=0 )
da=pd.concat([z, da], axis=1)
da.to_excel('nex.xlsx')`
Related
I have a pandas dataframe with a text/tuple column as show in the attached screenshot.
Below is also an example of the data in the column:
Colum title - POLYGON_WKT_TEXT
POLYGON ( (-105.01884585094353 39.62333777125623,
-105.01851820478282 39.62333686626711,
-105.0185192106112 39.62315273546345,
-105.01888004910847 39.6231533822067,
-105.01888071966073 39.62322879067289,
-105.01884585094353 39.62322827417681,
-105.01884585094353 39.62333777125623) )
POLYGON ((-106.83036867299995 39.19331872400005,
-106.83027684299998 39.19329631000005,
-106.83034537399999 39.19313263400005,
-106.83060769199994 39.19318738000004,
-106.83056232299998 39.19329573700003,
-106.83052058199996 39.19328554900005,
-106.83048588899999 39.19336841100005,
-106.83036066599999 39.19333784600008,
-106.83036867299995 39.19331872400005))
...
...
I would like to have this field in the format below:
column name - POLYGON_WKT_TXT
[(-105.01884585094353 39.62333777125623), (-105.01851820478282 39.62333686626711), ...(-106.83036867299995 39.19331872400005)]
I have so far tried to split on the comma (",") into multiple columns, but the issue is that the length of the values in the column varies ends up not making my solution efficient.
Thanks in advance for your elegant way to solve this task.
if I understand your question, this is very simple
create dataframe from WKT text provided in question
from this create a list of tuples. This is as simple as using https://shapely.readthedocs.io/en/stable/manual.html#shapely.wkt.loads and exterior.coords
have provided output as image as it's not formatting well as markdown
import shapely.wkt
import pandas as pd
df = pd.DataFrame({"polygon_wkt_txt":["""POLYGON ( (-105.01884585094353 39.62333777125623,
-105.01851820478282 39.62333686626711,
-105.0185192106112 39.62315273546345,
-105.01888004910847 39.6231533822067,
-105.01888071966073 39.62322879067289,
-105.01884585094353 39.62322827417681,
-105.01884585094353 39.62333777125623) )""",
"""POLYGON ((-106.83036867299995 39.19331872400005,
-106.83027684299998 39.19329631000005,
-106.83034537399999 39.19313263400005,
-106.83060769199994 39.19318738000004,
-106.83056232299998 39.19329573700003,
-106.83052058199996 39.19328554900005,
-106.83048588899999 39.19336841100005,
-106.83036066599999 39.19333784600008,
-106.83036867299995 39.19331872400005))"""]})
df["tuple_list"] = df["polygon_wkt_txt"].apply(lambda txt: list(shapely.wkt.loads(txt).exterior.coords))
df
I am working in Colab with the following line of code:
json line:
'company_size': [51, 200]
python code:
from pandas.io.json import json_normalize
data = response.json()
result = pd.json_normalize(data, 'company_size')
result
output:
0
0
51
1
200
What I want is the info inside the box brackets on the json line to be displayed into two different columns named "Size Min" and "Size Max" respectively.
Desired Output:
Size Min
Size Max
0
51
200
I am very new to coding, and I couldn't find a proper solution for this. Can anybody help me?
You can transpose your column 0 and then rename the columns accordingly
from pandas.io.json import json_normalize
data = response.json()
result = pd.json_normalize(data, 'company_size').T
result.columns = ['Size Min','Size Max']
result
I'm trying to get descriptive statistics for a column of data (the tb column which is a list of numbers) for every individual (i.e., each ID). Normally, I'd use a for i in range(len(list)) statement but since the ID is not a number I'm unsure of how to do that. Any tips would be helpful! The code included below gets me descriptive statistics for the entire tb column, instead of for tb data for each individual in the ID list.
df = pd.DataFrame(pd.read_csv("SurgeryTpref.csv")) #importing data
df.columns = ['date', 'time', 'tb', 'ID','before_after'] #column headers
df.to_numpy()
import pandas as pd
# read the data in with
df = pd.read_clipboard(sep=',')
# data
,date,time,tb,ID,before_after
0,6/29/20,4:15:33 PM,37.1,SR10,after
1,6/29/20,4:17:33 PM,38.1,SR10,after
2,6/29/20,4:19:33 PM,37.8,SR10,after
3,6/29/20,4:21:33 PM,37.5,SR10,after
4,6/29/20,4:23:33 PM,38.1,SR10,after
5,6/29/20,4:25:33 PM,38.5,SR10,after
6,6/29/20,4:27:33 PM,38.6,SR10,after
7,6/29/20,4:29:33 PM,37.6,SR10,after
8,6/29/20,4:31:33 PM,35.5,SR10,after
9,6/29/20,4:33:33 PM,34.7,SR10,after
summary=[]
for individual in (ID):
vals= df['tb'].describe()
summary.append(vals)
print(summary)
The url has data which is in json format. I would like to arrange the data for C in json. It is a long list data and I want to sort by the highest C value to the smallest. However, I just unable to read the json data in Pandas dataframe.
url ='http://www.bng.com/Jso/JsonstData?qryl'
pd.read_csv(url, index_col=[0,1])
>>>Empty DataFrameColumns: [items:[{ID:0, N:'2ndChance W200123', SIP:'', NC:'CDWW', R:'', I:'', M:'', LT:0.009, C:0.001, VL:0.100, BV:2000.000, B:'0.008', S:'0.009', SV:6186.400, O:0.009, H:0.009, L:0.009, V:0.900, SC:'5', PV:0.008, P:12.500, BL:'100', P_:'X', V_:''}, {ID:1, N:'3Cnergy', SIP:''.1, NC:'502', R:''.1, I:''.1, M:'t', LT:0, C:0, VL:0.000, BV:31.000, B:'0.021', S:'0.032', SV:22.000, O:0, H:0, L:0, V:0.000, SC:'2', PV:0.021, P:0, BL:'100'.1, P_:'X'.1, V_:''}.1, {ID:2, N:'3Cnergy W200528', SIP:''.2, NC:'1E0W', R:''.2, I:''.2, M:'t'.1, LT:0.1, C:0.1, VL:0.000.1, BV:0, B:'', S:'0.004', SV:50.000, O:0.1, H:0.1, L:0.1, V:0.000.1, SC:'5'.1, PV:0.002, P:0.1, BL:'100'.2, P_:'X'.2, V_:''}.2, {ID:3, N:'800 Super', SIP:''.3, NC:'5TG', R:''.3, I:''.3, M:'t'.2, LT:1.100, C:0.000, VL:35.200, BV:8.100, B:'1.100', S:'1.110', SV:8.700, O:1.110, H:1.110, L:1.100, V:38902.000, SC:'A', PV:1.100, P:0.000, BL:'100'.3, P_:'X'.3, V_:''}.3, {ID:4, N:'8Telecom^', SIP:''.4, NC:'AZG', ...]
As you can see, it is not a correct Pandas dataframe and I just can't do anything to read the data inside.
Please advice
I am receiving an object array after applying re.findall for link and hashtags on Tweets data. My data looks like
b=['https://t.co/1u0dkzq2dV', 'https://t.co/3XIZ0SN05Q']
['https://t.co/CJZWjaBfJU']
['https://t.co/4GMhoXhBQO', 'https://t.co/0V']
['https://t.co/Erutsftlnq']
['https://t.co/86VvLJEzvG', 'https://t.co/zCYv5WcFDS']
Now I want to split it in columns, I am using following
df = pd.DataFrame(b.str.split(',',1).tolist(),columns = ['flips','row'])
But it is not working because of weird datatype I guess, I tried few other solutions as well. Nothing worked.And this is what I am expecting, two separate columns
https://t.co/1u0dkzq2dV https://t.co/3XIZ0SN05Q
https://t.co/CJZWjaBfJU
https://t.co/4GMhoXhBQO https://t.co/0V
https://t.co/Erutsftlnq
https://t.co/86VvLJEzvG
It's not clear from your question what exactly is part of your data. (Does it include the square brackets and single quotes?). In any case, the pandas read_csv function is very versitile and can handle ragged data:
import StringIO
import pandas as pd
raw_data = """
['https://t.co/1u0dkzq2dV', 'https://t.co/3XIZ0SN05Q']
['https://t.co/CJZWjaBfJU']
['https://t.co/4GMhoXhBQO', 'https://t.co/0V']
['https://t.co/Erutsftlnq']
['https://t.co/86VvLJEzvG', 'https://t.co/zCYv5WcFDS']
"""
# You'll probably replace the StringIO part with the filename of your data.
df = pd.read_csv(StringIO.StringIO(raw_data), header=None, names=('flips','row'))
# Get rid of the square brackets and single quotes
for col in ('flips', 'row'):
df[col] = df[col].str.strip("[]'")
df
Output:
flips row
0 https://t.co/1u0dkzq2dV https://t.co/3XIZ0SN05Q
1 https://t.co/CJZWjaBfJU NaN
2 https://t.co/4GMhoXhBQO https://t.co/0V
3 https://t.co/Erutsftlnq NaN
4 https://t.co/86VvLJEzvG https://t.co/zCYv5WcFDS