I am trying to import a dataframe from an xlsx file to Python and then convert this dataframe to a dictionary. This is how my Excel file looks like:
A B
1 a b
2 c d
where A and B are names of columns and 1 and 2 are names of rows.
I want to convert the data frame to a dictionary in python, using pandas. My code is pretty simple:
import pandas as pd
my_dict = pd.read_excel(‘.\inflation.xlsx’, sheet_name = ‘Sheet2’, index_col=0).to_dict()
print(my_dict)
What I want to get is:
{‘a’:’b’, ‘c’:’d’}
But what I get is:
{‘b’:{‘c’:’d’}}
What might be the issue?
This does what is requested:
import pandas as pd
d = pd.read_excel(‘.\inflation.xlsx’, sheet_name = ‘Sheet2’,index_col=0,header=None).transpose().to_dict('records')[0]
print(d)
Output:
{'a': 'b', 'c': 'd'}
The to_dict() function takes an orient parameter which specifies how the data will be manipulated. There are other options if you have more rows.
This should work
import pandas as pd
my_dict = pd.read_excel(‘.\inflation.xlsx’, sheet_name = ‘Sheet2’,header = 0 index_col=None).to_dict('records')
print(my_dict)
Related
im using pandas with excel and i would like to get the letter of the header in excel searching for column name.
here´s an example
i would like to do something LIKE this: df.columns.get_loc("SR Status") and i would like to return: "D"
i have already done this:
import pandas
df = pd.read_excel("file.xls")
df.columns.get_loc("SR Status")
and let´s assume data will NOT always be in the same place.
sometimes it might be at header "A" but other time could be on other place
thanks in advance
You can use get_column_letter:
import pandas as pd
from openpyxl.utils import get_column_letter
df = pd.read_excel('data.xlsx', usecols='D:F')
offset = 4 # D
col = get_column_letter(df.columns.get_loc('SR Status') + offset)
print(col) # Output: D
I am working with the bioservices package in python and I want to take the output of this function and put it into a dataframe using pandas
from bioservices import UniProt
u = UniProt(verbose=False)
d = u.search("yourlist:M20211203F248CABF64506F29A91F8037F07B67D133A278O", frmt="tab", limit=5,
columns="id, entry name")
print(d)
this is the result I am getting, almost like a neat little table
The problem however is I cannot work with the data in this form and I want to put it into a dataframe using pandas
trying this code below does not work and it only returns the error "ValueError: DataFrame constructor not properly called"
import pandas as pd
df = pd.DataFrame(columns= ['Entry','Entry name'],
data=d)
print(df)
Use pd.read_csv, after encapsulating your output in a StringIO (to present a file-like interface):
import io
import pandas as pd
data = 'Entry\tEntry name\na\t1\nb\t2'
io_data = io.StringIO(data)
df = pd.read_csv(io_data, sep='\t')
print(df)
The output is a dataframe:
Entry Entry name
0 a 1
1 b 2
Sample data:
from bioservices import UniProt
import io
u = UniProt(verbose=False)
d = u.search("yourlist:M20211203F248CABF64506F29A91F8037F07B67D133A278O", frmt="tab", limit=5,
columns="id, entry name")
#print(d)
df = pd.read_csv(io.StringIO(d), sep='\t')
print(df)
Entry Entry name
0 Q8TAS1 UHMK1_HUMAN
1 P35916 VGFR3_HUMAN
2 Q96SB4 SRPK1_HUMAN
3 Q6P3W7 SCYL2_HUMAN
4 Q9UKI8 TLK1_HUMAN
I am trying to read an excel file this way :
newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)
which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?
Close: first you call ExcelFile, but then you call the .parse method and pass it the sheet name.
>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
Tid dummy1 dummy2 dummy3 dummy4 dummy5 \
0 2006-09-01 00:00:00 0 5.894611 0.605211 3.842871 8.265307
1 2006-09-01 01:00:00 0 5.712107 0.605211 3.416617 8.301360
2 2006-09-01 02:00:00 0 5.105300 0.605211 3.090865 8.335395
3 2006-09-01 03:00:00 0 4.098209 0.605211 3.198452 8.170187
4 2006-09-01 04:00:00 0 3.338196 0.605211 2.970015 7.765058
dummy6 dummy7 dummy8 dummy9
0 0.623354 0 2.579108 2.681728
1 0.554211 0 7.210000 3.028614
2 0.567841 0 6.940000 3.644147
3 0.581470 0 6.630000 4.016155
4 0.595100 0 6.350000 3.974442
What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:
>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)
This is much simple and easy way.
import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)
Check out documentation full details.
FutureWarning: The sheetname keyword is deprecated for newer Pandas versions, use sheet_name instead.
Thought i should add here, that if you want to access rows or columns to loop through them, you do this:
import pandas as pd
# open the file
xlsx = pd.ExcelFile("PATH\FileName.xlsx")
# get the first sheet as an object
sheet1 = xlsx.parse(0)
# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want
column = sheet1.icol(0).real
# get the first row as a list you can loop through
row = sheet1.irow(0).real
Edit:
The methods icol(i) and irow(i) are deprecated now. You can use sheet1.iloc[:,i] to get the i-th col and sheet1.iloc[i,:] to get the i-th row.
I think this should satisfy your need:
import pandas as pd
# Read the excel sheet to pandas dataframe
df = pd.read_excel("PATH\FileName.xlsx", sheet_name=0) #corrected argument name
Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.
import pandas as pd
sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
sheet2 = pd.read_excel(reader, sheet_name='Sheet2')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
You just need to feed the path to your file to pd.read_excel
import pandas as pd
file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)
Checkout the documentation to explore parameters like skiprows to ignore rows when loading the excel
import pandas as pd
data = pd.read_excel (r'**YourPath**.xlsx')
print (data)
Loading an excel file without explicitly naming a sheet but instead giving the number of the sheet order (often one will simply load the first sheet) goes like:
import pandas as pd
myexcel = pd.ExcelFile("C:/filename.xlsx")
myexcel = myexcel.parse(myexcel.sheet_names[0])
Since .sheet_names returns a list of sheet names, it is easy to load one or more sheets by simply calling the list element(s).
All of these works for me
In [1]: import pandas as pd
In [2]: df = pd.read_excel('FileName.xlsx') # If there is only one sheet in the excel file
In [3]: df = pd.read_excel('FileName.xlsx', sheet_name=0)
In [4]: In [20]: df = pd.read_excel('FileName.xlsx', sheet_name='Sheet 1')
#load pandas library
import pandas as pd
#set path where the file is
path = "./myfile.xlsx"
#load the file into dataframe df
df = pd.read_excel(path)
#check the first 5 rows
df.head(5)
I am trying to convert a pandas DataFrame to JSON file. Following image shows my data:
Screenshot of the dataset from Ms. excel
I am using the following code:
import pandas as pd
os.chdir("G:\\My Drive\\LEC dashboard\\EnergyPlus simulation files\\DEC\\Ahmedabad\\Adaptive set point\\CSV")
df = pd.read_csv('Adap_40-_0_0.1_1.5_0.6.csv')
df2 = df.filter(like = '[C](Hourly)',axis =1)
df3 = df.filter(like = '[C](Hourly:ON)',axis =1)
df4 = df.filter(like = '[%](Hourly)',axis =1)
df5 = df.filter(like = '[%](Hourly:ON)',axis =1)
df6 = pd.concat([df2,df3,df4,df5],axis=1)
df6.to_json("123.json",orient='columns')
I the output, I am getting a dictionary in of values. However, I need a list as value.
The output I am getting: The JSON output I am getting by using above code
The out put that is desired: The output that is desired.
I have tried different orientations of json but nothing works.
There might be other ways of doing this but one way is this.
import json
test = pd.DataFrame({'a':[1,2,3,4,5,6]})
with open('test.json', 'w') as f:
json.dump(test.to_dict(orient='list'), f)
Result file will look like this '{"a": [1, 2, 3, 4, 5, 6]}'
There is a built-in function of pandas called to_json:
df.to_json(r'Path_to_file\file_name.json')
Take a look at the documentation if you need more specifics: https://pandas.pydata.org/pandas-docs/version/0.24/reference/api/pandas.DataFrame.to_json.html
I have the foll. csv with 1st row as header:
A B
test 23
try 34
I want to read in this as a dictionary, so doing this:
dt = pandas.read_csv('file.csv').to_dict()
However, this reads in the header row as key. I want the values in column 'A' to be the keys. How do I do that i.e. get answer like this:
{'test':'23', 'try':'34'}
dt = pandas.read_csv('file.csv', index_col=1, skiprows=1).T.to_dict()
Duplicating data:
import pandas as pd
from io import StringIO
data="""
A B
test 23
try 34
"""
df = pd.read_csv(StringIO(data), delimiter='\s+')
Converting to dictioanry:
print(dict(df.values))
Will give:
{'try': 34, 'test': 23}