How to convert array to dataframe - python

My output from previous code.
from pandas import DataFrame
cursor.execute("SELECT * FROM DOCTOR")
row = cursor.fetchall()
row
how to convert array into data frame. expected output is shown in the below picture.
how to convert array to dataframe.

import numpy as np
df = DataFrame(data=np.array(row),
index=np.arange(len(row)),
columns=['Doctor_id','Doctor_Name','Doctor_Spl'])

You appear to be using a plain cursor from a SQL client
Pandas has its own read_sql function that wraps a SQL client library and returns a Dataframe
You can also do most SQL queries directly from Dataframe objects, if you were to start with read_sql_table - https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#compare-with-sql

converting the fetched row to numpy array and using that to convert to data frame
import numpy as np
import pandas as pd
cursor.execute("SELECT * FROM DOCTOR")
row = cursor.fetchall()
numpy_array_row = np.array(row)
data_frame = pd.DataFrame(numpy_array_row, columns=['Doctor_id', 'Doctor_Name', 'Doctor_Spl'])
# print your data_frame and voila :)

This can be done using the pandas.DataFrame() method.
In your case it would look like this:
DataFrame(row, columns=('Doctor_id', 'Doctor_Name', 'Doctor_Spl'))

One possible solution:
rows = [
("D001", "TIM1", "orthopedic"),
("D002", "Ram", "General medicine"),
("D003", "Sarala", "gynaecology"),
("D004", "TIM", "Ayurvedic"),
("D005", "viny", "Hemeopathy"),
]
df = pd.DataFrame(rows, columns=["Doctor_id", "Doctor_Name", "Doctor_Spl"])
df.index += 1 # <-- if you need index starting from `1`
print(df)
Prints:
Doctor_id Doctor_Name Doctor_Spl
1 D001 TIM1 orthopedic
2 D002 Ram General medicine
3 D003 Sarala gynaecology
4 D004 TIM Ayurvedic
5 D005 viny Hemeopathy

Related

Python loop to split a dataframe by a particular row/index for INSERT into SQL server

I have a dataframe with a couple thousand rows, I want to create a loop to split the entire dataframe by 90 rows each sub-dataframe and INSERT each subset into SQL server.
my dummy way to split it by a fixed number 90 rows which is not efficient
df1 = df.loc[0:89,:]
df1.to_sql("tableName", schema='dbo', con=engine, method='multi')
df2 = df.loc[90:179,:]
df2.to_sql("tableName", schema='dbo', con=engine, method='multi')
......
sample data
df = pd.DataFrame(np.random.randint(0,100,size=(2000, 4)), columns = ['Name', 'Age','food','tree']) #size control how many rows
because of my sql server has the limitation, I can only insert 90 rows for each Bulk Insert.
np.array_split(arr, indices)
Split an array into multiple sub-arrays using the given indices.
for chunk in np.array_split(df, range(90, len(df), 90)):
INSERT_sql()
Here's a pretty verbose approach. In this case, taking your sample dataframe, it is sliced in increments of 90 rows. The first block will be 0-89, then 90-179, 180-269, etc.
import pandas as pd
import numpy as np
import math
df = pd.DataFrame(np.random.randint(0,100,size=(2000, 4)), columns = ['Name', 'Age','food','tree']) #size control how many rows
def slice_df(dataframe, row_count):
num_rows = len(dataframe)
num_blocks = math.ceil(num_rows / row_count)
for i in range(num_blocks):
df = dataframe[(i * row_count) : ((i * row_count)+row_count-1)]
# Do your insert command here
slice_df(df, 90)
I believe this should work:
for i in range(0,len(df),90):
df.iloc[i:i+90].to_sql("tableName", schema='dbo', con=engine, method='multi')

Python Pandas Dataframe from API JSON Response >>

I am new to Python, Can i please seek some help from experts here?
I wish to construct a dataframe from https://api.cryptowat.ch/markets/summaries JSON response.
based on following filter criteria
Kraken listed currency pairs (Please take note, there are kraken-futures i dont want those)
Currency paired with USD only, i.e aaveusd, adausd....
Ideal Dataframe i am looking for is (somehow excel loads this json perfectly screenshot below)
Dataframe_Excel_Screenshot
resp = requests.get(https://api.cryptowat.ch/markets/summaries) kraken_assets = resp.json() df = pd.json_normalize(kraken_assets) print(df)
Output:
result.binance-us:aaveusd.price.last result.binance-us:aaveusd.price.high ...
0 264.48 267.32 ...
[1 rows x 62688 columns]
When i just paste the link in browser JSON response is with double quotes ("), but when i get it via python code. All double quotes (") are changed to single quotes (') any idea why?. Though I tried to solve it with json_normalize but then response is changed to [1 rows x 62688 columns]. i am not sure how do i even go about working with 1 row with 62k columns. i dont know how to extract exact info in the dataframe format i need (please see excel screenshot).
Any help is much appreciated. thank you!
the result JSON is a dict
load this into a dataframe
decode columns into products & measures
filter to required data
import requests
import pandas as pd
import numpy as np
# load results into a data frame
df = pd.json_normalize(requests.get("https://api.cryptowat.ch/markets/summaries").json()["result"])
# columns are encoded as product and measure. decode columns and transpose into rows that include product and measure
cols = np.array([c.split(".", 1) for c in df.columns]).T
df.columns = pd.MultiIndex.from_arrays(cols, names=["product","measure"])
df = df.T
# finally filter down to required data and structure measures as columns
df.loc[df.index.get_level_values("product").str[:7]=="kraken:"].unstack("measure").droplevel(0,1)
sample output
product
price.last
price.high
price.low
price.change.percentage
price.change.absolute
volume
volumeQuote
kraken:aaveaud
347.41
347.41
338.14
0.0274147
9.27
1.77707
613.281
kraken:aavebtc
0.008154
0.008289
0.007874
0.0219326
0.000175
403.506
3.2797
kraken:aaveeth
0.1327
0.1346
0.1327
-0.00673653
-0.0009
287.113
38.3549
kraken:aaveeur
219.87
226.46
209.07
0.0331751
7.06
1202.65
259205
kraken:aavegbp
191.55
191.55
179.43
0.030559
5.68
6.74476
1238.35
kraken:aaveusd
259.53
267.48
246.64
0.0339841
8.53
3623.66
929624
kraken:adaaud
1.61792
1.64602
1.563
0.0211692
0.03354
5183.61
8366.21
kraken:adabtc
3.757e-05
3.776e-05
3.673e-05
0.0110334
4.1e-07
252403
9.41614
kraken:adaeth
0.0006108
0.00063
0.0006069
-0.0175326
-1.09e-05
590839
367.706
kraken:adaeur
1.01188
1.03087
0.977345
0.0209986
0.020811
1.99104e+06
1.98693e+06
Hello Try the below code. I have understood the structure of the Dataset and modified to get the desired output.
`
resp = requests.get("https://api.cryptowat.ch/markets/summaries")
a=resp.json()
a['result']
#creating Dataframe froom key=result
da=pd.DataFrame(a['result'])
#using Transpose to get required Columns and Index
da=da.transpose()
#price columns contains a dict which need to be seperate Columns on the data frame
db=da['price'].to_dict()
da.drop('price', axis=1, inplace=True)
#intialising seperate Data frame for price
z=pd.DataFrame({})
for i in db.keys():
i=pd.DataFrame(db[i], index=[i])
z=pd.concat([z,i], axis=0 )
da=pd.concat([z, da], axis=1)
da.to_excel('nex.xlsx')`

How to form a matrix of distances between sites in Python?

I have all the data (sites and distances already).
Now I have to form a string matrix to use as an input for another python script.
I have sites and distances as (returned from a query, delimited as here):
A|B|5
A|C|3
A|D|9
B|C|7
B|D|2
C|D|6
How to create this kind of matrix?
A|B|C|D
A|0|5|3|9
B|5|0|7|2
C|3|7|0|6
D|9|2|6|0
This has to be returned as a string from python and I'll have more than 1000 sites, so it should be optimized for such size.
Thanks
I have no doubt it could be done in a cleaner way (because Python).
I will do some more research later on but I do want you to have something to start with, so here it is.
import pandas as pd
data = [
('A','B',5)
,('A','C',3)
,('A','D',9)
,('B','C',7)
,('B','D',2)
,('C','D',6)
]
data.extend([(y,x,val) for x,y,val in data])
df = pd.DataFrame(data, columns=['x','y','val'])
df = df.pivot_table(values='val', index='x', columns='y')
df = df.fillna(0)
Here is a demo for 1000x1000 (take about 2 seconds)
import pandas as pd, itertools as it
data = [(x,y,val) for val,(x,y) in enumerate(it.combinations(range(1000),2))]
data.extend([(y,x,val) for x,y,val in data])
df = pd.DataFrame(data, columns=['x','y','val'])
df = df.pivot_table(values='val', index='x', columns='y')
df = df.fillna(0)

How to pass an array in python pandas to plot two axes?

I am trying to create an XY chart using Python and the Pygal library. The source data is contained in a CSV file with three columns; ID, Portfolio and Value. Unfortunately I can only plot one axis and I suspect it's an issue with the array. Can anyone point me in the right direction? Do I need to use numpy? Thank you!
import pygal
import pandas as pd
data = pd.read_csv("profit.csv")
data.columns = ["ID", "Portfolio", "Value"]
xy_chart = pygal.XY()
xy_chart.add('Portfolio', data['Portfolio','Value'] << I suspect this is wrong
)
xy_chart.render_in_browser()
With
import pygal
import pandas as pd
data = pd.read_csv("profit.csv")
data.columns = ["ID", "Portfolio", "Value"]
xy_chart = pygal.XY()
xy_chart.add('Portfolio', data['Portfolio']
)
xy_chart.render_in_browser()
I get:
A graph with a series of horizontal data points/values; i.e. it has the X values but no Y values.
With:
import pygal
import pandas as pd
data = pd.read_csv("profit.csv")
data.columns = ["ID", "Portfolio", "Value"]
xy_chart = pygal.XY()
xy_chart.add('Portfolio', data['Portfolio','Value']
)
xy_chart.render_in_browser()
I get:
KeyError: ('Portfolio', 'Value')
Sample data:
ID Portfolio Value
1 1 -2560.042036
2 2 1208.106958
3 3 5702.386949
4 4 -8827.63913
5 5 -3881.665733
6 6 5951.602484
Maybe a little late here, but I just did something similar. Your second example requires multiple columns to be handed in as a array and then the DataFrame you get back needs to be converted into a list of tuples.
import pygal
import pandas as pd
data = pd.read_csv("profit.csv")
data.columns = ["ID", "Portfolio", "Value"]
points = data[['Portfolio','Value']].to_records(index=False).tolist()
xy_chart = pygal.XY()
xy_chart.add('Portfolio', points)
xy_chart.render_in_browser()
There may be a more elegant use of the pandas or pygal API to get the columns into a list of tuples.

how to convert link list to matrix in python

my input data looks like(input.txt):
AGAP2 TCGA-BL-A0C8-01A-11R-A10U-07 66.7328
AGAP2 TCGA-BL-A13I-01A-11R-A13Y-07 186.8366
AGAP3 TCGA-BL-A13J-01A-11R-A10U-07 183.3767
AGAP3 TCGA-BL-A3JM-01A-12R-A21D-07 33.2927
AGAP3 TCGA-BT-A0S7-01A-11R-A10U-07 57.9040
AGAP3 TCGA-BT-A0YX-01A-11R-A10U-07 99.8540
AGAP4 TCGA-BT-A20J-01A-11R-A14Y-07 88.8278
AGAP4 TCGA-BT-A20N-01A-11R-A14Y-07 129.7021
i want the output.txt looks like :
TCGA-BL-A0C8-01A-11R-A10U-07 TCGA-BL-A13I-01A-11R-A13Y-07 ...
AGAP2 66.7328 186.8366
AGAP3 0 0
Using pandas: read csv, create pivot and write csv.
import pandas as pd
df = pd.read_table("input.txt", names="xy", sep=r'\s+')
# reset index first - we need named column
new = df.reset_index().pivot(index="index", columns='x', values='y')
new.fillna(0, inplace=True)
new.to_csv("output.csv", sep='\t') # tab separated
Reshaping and Pivot Tables
EDIT: filling empty values

Categories