Inserting data into xlsx - python

I'm running a python script and the result of it is a group of values. Let's say the result is a unique date and time.
date = now.strftime("%d/%m/%Y")
time = now.strftime("%H:%M:%S")
I would like to write down the data into xlsx file, but the problem is that the data is rewrite every single time the scrip is done.
How should I write the date to add it into the xlsx and not to rewrite the first line of it?
That's the code I am using and I'm not sure how to change it.
worksheet.write(1, 0, date)
worksheet.write(1, 1, time)
The result I would like to get at the end should be something like following:
Date Time
20/03/2022 00:24:36
20/03/2022 00:55:36
21/03/2022 15:24:36
22/03/2022 11:24:36
23/03/2022 22:24:36

You can open the excel in append mode and then keep inserting data. Refer below snippet:
with pd.ExcelWriter("existing_file_name.xlsx", engine="openpyxl", mode="a") as writer:
df.to_excel(writer, sheet_name="name")

The simplest way is to convert your data into a Pandas dataframe, then you can easily convert your dataframe into xlxs file by using a simple command like:
df.to_excel("filename.xlxs")

Related

Python Save SQL query to excel file

I'm using psycopg2 to query a database and return a table of data. I need to then write that data to a .xlsx file.
I had it writing to a .csv really nicely using:
with open("file_name.csv", "w") as file:
csv_writer = writer(file)
csv_writer.writerow(headers)
csv_writer.writerows(data)
This works fine, the only issue is that I now need to open the .csv and save as a new .xlsx so its a step I want to cut out.
I'm trying to use pandas:
df = pandas.DataFrame(data, columns=headers)
df.to_excel("file_name.xlsx")
But all numbers are being stored as text so I now need to go back in and refresh the cells for excel to realise its an integer or float?
Also tried with openpyxl, this works better but still stored the date column as text so I still need to go in and refresh the cells for excel to recognise it as a date.
I thought it might have been an issue with how psycopg2 pulls the data but its not an issue for .csv so why is it a problem for .xlsx? This is probably just my lack of understanding the difference between the two files types. Does anyone have a solution for saving as a .xlsx but retaining all the correct formatting?
When creating the DataFrame you can specify "converters" for excel:
converters = {
'name': str,
'ages': int,
'score': float
}
df = pandas.DataFrame(data, columns=headers, converters=converters)
df.to_excel("file_name.xlsx")
Where the keys of the converters dict are the column names in the DataFrame.

How to handle timezone when loading a CSV file with pandas dataframe

I am currently facing a problem when loading data via the read_csv() function of pandas.
Here is an extract of a record from the CSV file :
2021-11-28T03:13:01+00:00,49.59,49.93,49.56,49.88
When I use pandas to_csv() my index column which is in timestamp systematically loses one hour on all records
The previous example look like this after using pandas :
2021-11-28T02:13:01+00:00,49.59,49.93,49.56,49.88
Here is python snippet code. :
df_mre_unpacked = pd.read_csv('mre_unpacked.csv', sep=',',encoding='utf-8',index_col='timestamp',decimal=".")
df_mre_unpacked = df_mre_unpacked[['ASG1.CPU10_XV_ACHSE1_ZR','ASG1.CPU11_XV_ACHSE2_ZR','ASG2.CPU10_XV_ACHSE1_ZR','ASG2.CPU11_XV_ACHSE2_ZR']]
Here is the original CSV
Result of pandas dataframe head() function :
As you can see the first records start from 03:13:01 from the CSV file but from the panda dataframe it begains with 02:13:01 since I do not have this timestamp in my csv file
Has anyone had this problem before ?
UPDATE :
I have a better view of the problem now. Here is what I discovered.
When I do the data extraction via a portal the time that is displayed is as follows:
Here is the same record read from the CSV file with notepad++
The problem comes from the extraction process that automatically removes one hour.
How do I add 1 hour to my timestamp column which is my index?
We just have to simply add this snippet code :
df_de18_EventData.index = pd.to_datetime(df_de18_EventData.index) + pd.Timedelta("1 hour")
This will add 1 hour to the index timestamp column
It's not the best solution but it works for now. An optimal solution would be to take into account automatically the time difference like excel does

Pandas - Change the default date format when creating a dataframe from CSVs

I have a script that loops through a folder of CSVs, reads them, removes any empty rows (they all have 'empty' rows that Pandas reads as NaN) and appends them to a master dataframe. It then writes the dataframe to a new CSV. This is all working as expected:
if pl.Path(file).suffix == '.csv':
fullPath = os.path.join(sourceLoc, file)
print(file)
initDF = pd.read_csv(fullPath)
cleanDF = initDF.dropna(subset=['Name'])
masterDF = masterDF.append(cleanDF)
masterDF.to_csv(destLoc, index=False)
My only issue is the input dates are displayed like this 25/05/21 but the output dates end up formatted like this 05/25/21. As I'm in the UK and using a UK version of Excel to analyse the output, it's confusing all my functions.
The only solutions I've found so far are to reformat the date columns individually or style them, which to my understanding only affects how they look in Jupyter and not in the actual data. As there are multiple date columns in the source data files I'd rather not have to reformat them all individually.
Is there any way of defining the date format when first creating the dataframe, or reformatting every date column once the dataframe is filled?
In the end this issue was caused by two different problems.
The first was Excel intermittently exporting my dates in US format despite the original format (and my Windows Region settings) being UK format. I've now added a short VBA loop in my export code to ensure those columns are formatted correctly every time the data is exported.
The second was the CSV date being imported with incorrect dtypes. I suspect this was again the fault of Excel (2010 is problematic) but I'm unsure. I'm now correcting this with an astype() method.
The end result is my dates are now imported into Pandas in the correct format and outputted to a new CSV in the correct format too.

How to get rid of timestamps infront of Date, as Pandas adds time stamps to date columns after saving to excel

I am a student and i am learning pandas.
I have created excel file named Student_Record.xlsx(using microsoft excel)
I wanted to create new file using pandas
import pandas as pd
df = pd.read_excel(r"C:\Users\sudarshan\Desktop\Student_Record.xlsx")
df.head()
df.to_excel(r"C:\Users\sudarshan\Desktop\Output.xlsx",index=False)
I opened the file in pandas and saving the file back to excel with different name(file name = Output)
I saved the file back to Excel, but when i open the file(Output) on MS.Excel the columns(DOB and YOP)have time stamp attached to dates.
Please let me know how to print only date?(I want Output file and its contents to look exactly like the original file)
Hope to get some help/support.
Thank you
Probably your DOB and Year of passing columns are of datetime format before they are saved to Excel. As a result, they got converted back to the datetime representation when saved to Excel.
If you want to retain its contents to look exactly like the original file in dd-mm-YYYY format, you can try converting these 2 columns to string format before saving to Excel. You can do it by:
df['DOB'] = df['DOB'].dt.strftime('%d-%m-%Y')
df['Year of passing'] = df['Year of passing'].dt.strftime('%d-%m-%Y')

Python script to parse a big workbook

I have an extra sized excel file and I need to automate a task I do everyday: Add rows to the bottom with the day's date, save a new workbook, crop the old ones and save as a new file with the day's date.
An example is today only having rows with date 04-10-2016 and the filename would be [sheetname]04102016H12 or [sheetname]04102016H16if it has passed 12 pm.
I've tried xldr, doing this in VBA and so on but I can't get along with VBA and it is slow. So I'd rather use Python here - lightweight, does the job and so on.
Anyway, so far, I have done the follwing:
import xlsxwriter, datetime, xlrd
import pandas as pd
# Parsing main excel sheet to save the correct
with xlrd.open_workbook(r'D:/path/to/file/file.xlsx', on_demand=True) as xls:
for sheet in xls.parse(xls.sheet_names([0])):
dfs = pd.read_excel(xls, sheet, header = 1)
now = datetime.date.today()
df[df['Data'] != now]
if datetime.time()<datetime.time(11,0,0,0):
df.to_excel(r'W:\path\I\need'+str(sheet)+now+'H12.xlsx', index=False)
else:
df.to_excel(r'W:\path\I\need'+str(sheet)+now+'H16.xlsx', index=False)
Unfortunately, this does not separate the main file into as many files as worksheets the workbook contains. It outputs TypeError: 'list' object is not callable, regarding this in xls.parse(xls.sheet_names([0])).
Based on comments below I am updating my answer. Just do:
xls.sheet_names()[0]
However, if you want to loop through the sheets, then you may want all sheet names instead of just the first one.

Categories