AttributeError: 'ExcelFile' object has no attribute 'dropna' - python

I was trying to remove the empty column in an excel file using pandas using dropna() method. But I ended up with the above error message. Please find my code below :
import pandas as pd
df = pd.ExcelFile("1.xlsx")
print(df.sheet_names)
#df.dropna(how='all', axis=1)
newdf = df.dropna()

Please provide more code and context, but this might help:
import pandas as pd
excel_file_name = 'insert excel file path'
excel_sheet_name = 'insert sheet name'
# create dataframe from desired excel file
df = pd.read_excel(
excel_file_name,
engine='openpyxl',
sheet_name=excel_sheet_name
)
# drop columns with NaN values and write that into df
# # without the inplace option it would have to be
# < df = df.dropna(axis=1) >
df.dropna(axis=1, inplace=True)
# write that dataframe to excel file
with pd.ExcelWriter(
excel_file_name, # file to write to
engine='openpyxl', # which engine to use
mode='a', # use mode append (has to be used for if_sheet_exists to work)
if_sheet_exists='replace' # if that sheet exists, replace it
) as writer:
df.to_excel(writer, sheet_name=excel_sheet_name)

Related

How to create a new list and write pandas dataframe to it?

I have already prepared Excel spreadsheet. What would be the best way to create a new list using python code? How can I access it after to write the pandas dataframe?
Calling df.to_excel(filename) with a target filename will overwrite an existing file. You must use an ExcelWriter and pass this as the first argument to df.to_excel().
Try this:
import pandas as pd
with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
df = pd.read_csv('test.csv') # open/create dataFrame to add
df.to_excel(writer, sheet_name='new_sheet3')
You can try with that
import pandas as pd
import numpy as np
x = np.random.randn(100, 2)
df = pd.DataFrame(x) #creating a random dataframe or using yours
writer = pd.ExcelWriter('flat.xlsx', engine = 'openpyxl', mode= 'a')
df.to_excel(writer, sheet_name = 'y')
writer.save()
writer.close()

pandas drop rows based on cell content and no headers

I'm reading a csv file with pandas that has no headers.
df = pd.read_csv('file.csv', header=0)
csv file containing 1 row with several users:
admin
user
system
sysadmin
adm
administrator
I need to read the file to a df or a list except for example: sysadmin
and save the result to the csv file
admin
user
system
adm
administrator
Select first columns, filter by boolean indexing and write to file:
df = pd.read_csv('file.csv', header=0)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False)
#if there is csv header converted to column name
#df[df['colname'].ne('sysadmin')].to_csv(file, index=False)
If no header in csv need parameters like:
df = pd.read_csv('file.csv', header=None)
df[df.iloc[:, 0].ne('sysadmin')].to_csv(file, index=False, header=False)
you can give it a try:---
df = pd.read_csv('file.csv')
df=df.transpose()
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()
I think this will work it out.
if not then try it:
df = pd.read_csv('file.csv')
df.columns=['admin','user','system','sysadmin','adm','administrator']
df.head()

Python pandas xlsx/ csv

I want to convert xlsx to csv and it works, but after conversion python add ".0" to string...
Sample xlsx :
Name, Age
Mark, 20
CSV after conversion :
Name, Age
Mark, 20.0 <- add ".0"
What could the problem be?
#importing pandas as pd
import pandas as pd
# Read and store content
# of an excel file
read_file = pd.read_excel ("EXPORT.xlsx")
# Write the dataframe object
# into csv file
read_file.to_csv ("data.csv",
index = True,
header=True,
encoding='utf-8-sig')
# read csv file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_csv("data.csv"))
# show the dataframe
df
I've tried to reproduce this behavior, but in my case pd.read_excel() automatically assigned the int64 format on the Age column using the presented Excel sheet.
However this case can be easily solved with the df.astype() function, that can transforms data types, e.g. for your case from floating to integer format.
#importing pandas as pd
import pandas as pd
# Read and store content
# of an excel file
read_file = pd.read_excel ("EXPORT.xlsx")
# transform data type of column "Age" to int64
read_file = read_file.astype({'Age': 'int64'})
# Write the dataframe object
# into csv file
read_file.to_csv ("data.csv",
index = True,
header=True,
encoding='utf-8-sig')
# read csv file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_csv("data.csv"))
# show the dataframe
print(df)
I added float_format option and it seems that works
read_file.to_csv ("basf.csv",
index = None,
header=True,
encoding='utf-8-sig',
decimal=',',
float_format='%d'
)

How to open, delete columns and save a xls file in python

I need to know how to open a xls file that is already made, I want to delete some columns and then save the file. This is what I have but I get an error when I want to delete the columns. How do I use the DataFrame function to delete columns and then save.
Read in excel file
Workbook = xlrd.open_workbook("C:/Python/Python37/Files/firstCopy.xls", on_demand=True)
worksheet = Workbook.sheet_by_name("Sheet1")
Delete a column
df.DataFrame.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace=True)
Workbook.save('output.xls')
Without seeing your dataset and error it is hard to tell what is going on. See How to Ask and how to create a Minimal, Complete, and Verifiable example.
Here's what I would suggest:
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
import pandas as pd
df = pd.read_excel('firstCopy.xls')
df.drop(['StartDate', 'EndDate', 'EmployeeID'], axis=1, inplace = True)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

pandas Combine Excel Spreadsheets

I have an Excel workbook with many tabs.
Each tab has the same set of headers as all others.
I want to combine all of the data from each tab into one data frame (without repeating the headers for each tab).
So far, I've tried:
import pandas as pd
xl = pd.ExcelFile('file.xlsx')
df = xl.parse()
Can use something for the parse argument that will mean "all spreadsheets"?
Or is this the wrong approach?
Thanks in advance!
Update: I tried:
a=xl.sheet_names
b = pd.DataFrame()
for i in a:
b.append(xl.parse(i))
b
But it's not "working".
This is one way to do it -- load all sheets into a dictionary of dataframes and then concatenate all the values in the dictionary into one dataframe.
import pandas as pd
Set sheetname to None in order to load all sheets into a dict of dataframes
and ignore index to avoid overlapping values later (see comment by #bunji)
df = pd.read_excel('tmp.xlsx', sheet_name=None, index_col=None)
Then concatenate all dataframes
cdf = pd.concat(df.values())
print(cdf)
import pandas as pd
f = 'file.xlsx'
df = pd.read_excel(f, sheet_name=None, ignore_index=True)
df2 = pd.concat(df, sort=True)
df2.to_excel('merged.xlsx',
engine='xlsxwriter',
sheet_name=Merged,
header = True,
index=False)

Categories