Python - Pandas - Write Dataframe to CSV - python

I'm trying to write a 4 table, 3 column, and 50 row dataframe file to a csv using pandas. I'm getting the following error AttributeError: 'dict' object has no attribute 'to_csv'. I believe I'm writing the syntax correctly, but could anyone point out where my syntax is incorrect in trying to write a dataframe to a csv?
'dict' object has no attribute 'to_csv'
import pandas as pd
import numpy as np
df = pd.read_excel("filelocation.xlsx",
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'],
skiprows=8, parse_cols="B:D", keep_default_na='FALSE', na_values=['NULL'])
df.to_csv('filelocation.csv', line_terminator=',', index=False, header=False) #error occurs on this line

Your intuition is right; there's nothing wrong with the syntax in your code.
You are receiving the AttributeError because you are reading data from multiple sheets within your workbook, generating a dictionary of DataFrames (instead of one DataFrame), from which you attempt to_csv (a method only available to a DataFrame).
As your code is written, the keys of the dictionary you generate correspond to the names of the worksheets, and the values are the respective DataFrames. It's all explained in the docs for the read_excel() method.
To write a csv file containing the aggregate data from all the worksheets, you could loop through the worksheets and append each DataFrame to your file (this works if your sheets have the same structure and dimensions):
import pandas as pd
import numpy as np
sheets = ['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']
for sheet in sheets:
df = pd.read_excel("filelocation.xlsx",
sheetname=sheet,
skiprows=8,
parse_cols="B:D",
keep_default_na='FALSE',
na_values=['NULL'])
with open('filelocation.csv', 'a') as f:
df.to_csv(f, line_terminator=',', index=False, header=False)

Related

python writing a df to a specific cell of excel

I have come across a lot of answers and just wanted to check if this is the best answer
Write pandas dataframe values to excel to specific cell in a specific sheet.
The question is - assuming I have a dataframe "df".
I want to write to an existing excel file called "Name1.xlsx", in
worksheet called "exampleNames", and starting at cell d25.
What's the easiest/ most efficient way to do that.
###############Updated!#############
I tried this
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import openpyxl
path = "C:\\Users\\ABC\\PycharmProjects\\ABC\\Name1.xlsx"
df = pd.DataFrame(np.random.randint(1,10,(3,2)),columns=['a','b'])
df.to_excel(path,sheet_name="exampleNames",startcol=5,startrow=5,header=None,index=False)
df.to_excel(path,sheet_name="NN",startcol=5,startrow=25,header=None,index=False)
gave me error
ModuleNotFoundError: No module named 'openpyxl'
This is the approach suggested in the pandas docs
df.to_excel(writer, sheet_name='Sheet1', startcol=col,startrow=row, header=None)
where writer could be path-like, file-like, or ExcelWriter object
eg
df.to_excel('sample.xlsx',sheet_name="exampleNames",startcol=5,startrow=5,header=None)
To save multiple dataframes in excel, you will have to use the writer object
with pd.ExcelWriter('output.xlsx', engine="openpyxl", mode='a', if_sheet_exists='overlay') as writer:
df1.to_excel(writer, sheet_name='exampleNames' ,startcol=5,startrow=5,header=None,index=False)
df2.to_excel(writer, sheet_name='NN', startcol=5,startrow=25,header=None,index=False)

Inserting Data into an Excel file using Pandas - Python

I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')

Split multiple times?

So I'm currently transferring a txt file into a csv. It's mostly cleaned up, but even after splitting there are still empty columns between some of my data.
Below is my messy CSV file
And here is my current code:
Sat_File = '/Users'
output = '/Users2'
import csv
import matplotlib as plt
import pandas as pd
with open(Sat_File,'r') as sat:
with open(output,'w') as outfile:
if "2004" in line:
line=line.split(' ')
writer=csv.writer(outfile)
writer.writerow(line)
Basically, I'm just trying to eliminate those gaps between columns in the CSV picture I've provided. Thank you!
You can use python Pandas library to clear out the empty columns:
import pandas as pd
df = pd.read_csv('path_to_csv_file').dropna(axis=1, how='all')
df.to_csv('path_to_clean_csv_file')
Basically we:
Import the pandas library.
Read the csv file into a variable called df (stands for data frame).
Than we use the dropna function that allows to discard empty columns/rows. axis=1 means drop columns (0 means rows) and how='all' means drop columns all of the values in them are empty.
We save the clean data frame df to a new, clean csv file.
$$$ Pr0f!t $$$

saving a dataframe to csv file (python)

I am trying to restructure the way my precipitations' data is being organized in an excel file. To do this, I've written the following code:
import pandas as pd
df = pd.read_excel('El Jem_Souassi.xlsx', sheetname=None, header=None)
data=df["El Jem"]
T=[]
for column in range(1,56):
liste=data[column].tolist()
for row in range(1,len(liste)):
liste[row]=str(liste[row])
if liste[row]!='nan':
T.append(liste[row])
result=pd.DataFrame(T)
result
This code works fine and through Jupyter I can see that the result is good
screenshot
However, I am facing a problem when attempting to save this dataframe to a csv file.
result.to_csv("output.csv")
The resulting file contains the vertical index column and it seems I am unable to call for a specific cell.
(Hopefully, someone can help me with this problem)
Many thanks !!
It's all in the docs.
You are interested in skipping the index column, so do:
result.to_csv("output.csv", index=False)
If you also want to skip the header add:
result.to_csv("output.csv", index=False, header=False)
I don't know how your input data looks like (it is a good idea to make it available in your question). But note that currently you can obtain the same results just by doing:
import pandas as pd
df = pd.DataFrame([0]*16)
df.to_csv('results.csv', index=False, header=False)

Save list of DataFrames to multisheet Excel spreadsheet

How can I export a list of DataFrames into one Excel spreadsheet?
The docs for to_excel state:
Notes
If passing an existing ExcelWriter object, then the sheet will be added
to the existing workbook. This can be used to save different
DataFrames to one workbook
writer = ExcelWriter('output.xlsx')
df1.to_excel(writer, 'sheet1')
df2.to_excel(writer, 'sheet2')
writer.save()
Following this, I thought I could write a function which saves a list of DataFrames to one spreadsheet as follows:
from openpyxl.writer.excel import ExcelWriter
def save_xls(list_dfs, xls_path):
writer = ExcelWriter(xls_path)
for n, df in enumerate(list_dfs):
df.to_excel(writer,'sheet%s' % n)
writer.save()
However (with a list of two small DataFrames, each of which can save to_excel individually), an exception is raised (Edit: traceback removed):
AttributeError: 'str' object has no attribute 'worksheets'
Presumably I am not calling ExcelWriter correctly, how should I be in order to do this?
You should be using pandas own ExcelWriter class:
from pandas import ExcelWriter
# from pandas.io.parsers import ExcelWriter
Then the save_xls function works as expected:
def save_xls(list_dfs, xls_path):
with ExcelWriter(xls_path) as writer:
for n, df in enumerate(list_dfs):
df.to_excel(writer,'sheet%s' % n)
In case anyone needs an example using a dictionary of dataframes:
from pandas import ExcelWriter
def save_xls(dict_df, path):
"""
Save a dictionary of dataframes to an excel file,
with each dataframe as a separate page
"""
writer = ExcelWriter(path)
for key in dict_df.keys():
dict_df[key].to_excel(writer, sheet_name=key)
writer.save()
example:
save_xls(dict_df = my_dict, path = '~/my_path.xls')
Sometimes there can be issues(Writing an excel file containing unicode), if there are some non supporting character type in the data frame. To overcome it we can use 'xlsxwriter' package as in below case:
for below code:
from pandas import ExcelWriter
import xlsxwriter
writer = ExcelWriter('notes.xlsx')
for key in dict_df:
data[key].to_excel(writer, key,index=False)
writer.save()
I got the error as "IllegalCharacterError"
The code that worked:
%pip install xlsxwriter
from pandas import ExcelWriter
import xlsxwriter
writer = ExcelWriter('notes.xlsx')
for key in dict_df:
data[key].to_excel(writer, key,index=False,engine='xlsxwriter')
writer.save()

Categories