I am currently facing a problem when loading data via the read_csv() function of pandas.
Here is an extract of a record from the CSV file :
2021-11-28T03:13:01+00:00,49.59,49.93,49.56,49.88
When I use pandas to_csv() my index column which is in timestamp systematically loses one hour on all records
The previous example look like this after using pandas :
2021-11-28T02:13:01+00:00,49.59,49.93,49.56,49.88
Here is python snippet code. :
df_mre_unpacked = pd.read_csv('mre_unpacked.csv', sep=',',encoding='utf-8',index_col='timestamp',decimal=".")
df_mre_unpacked = df_mre_unpacked[['ASG1.CPU10_XV_ACHSE1_ZR','ASG1.CPU11_XV_ACHSE2_ZR','ASG2.CPU10_XV_ACHSE1_ZR','ASG2.CPU11_XV_ACHSE2_ZR']]
Here is the original CSV
Result of pandas dataframe head() function :
As you can see the first records start from 03:13:01 from the CSV file but from the panda dataframe it begains with 02:13:01 since I do not have this timestamp in my csv file
Has anyone had this problem before ?
UPDATE :
I have a better view of the problem now. Here is what I discovered.
When I do the data extraction via a portal the time that is displayed is as follows:
Here is the same record read from the CSV file with notepad++
The problem comes from the extraction process that automatically removes one hour.
How do I add 1 hour to my timestamp column which is my index?
We just have to simply add this snippet code :
df_de18_EventData.index = pd.to_datetime(df_de18_EventData.index) + pd.Timedelta("1 hour")
This will add 1 hour to the index timestamp column
It's not the best solution but it works for now. An optimal solution would be to take into account automatically the time difference like excel does
Related
I'm running a python script and the result of it is a group of values. Let's say the result is a unique date and time.
date = now.strftime("%d/%m/%Y")
time = now.strftime("%H:%M:%S")
I would like to write down the data into xlsx file, but the problem is that the data is rewrite every single time the scrip is done.
How should I write the date to add it into the xlsx and not to rewrite the first line of it?
That's the code I am using and I'm not sure how to change it.
worksheet.write(1, 0, date)
worksheet.write(1, 1, time)
The result I would like to get at the end should be something like following:
Date Time
20/03/2022 00:24:36
20/03/2022 00:55:36
21/03/2022 15:24:36
22/03/2022 11:24:36
23/03/2022 22:24:36
You can open the excel in append mode and then keep inserting data. Refer below snippet:
with pd.ExcelWriter("existing_file_name.xlsx", engine="openpyxl", mode="a") as writer:
df.to_excel(writer, sheet_name="name")
The simplest way is to convert your data into a Pandas dataframe, then you can easily convert your dataframe into xlxs file by using a simple command like:
df.to_excel("filename.xlxs")
I have a script that loops through a folder of CSVs, reads them, removes any empty rows (they all have 'empty' rows that Pandas reads as NaN) and appends them to a master dataframe. It then writes the dataframe to a new CSV. This is all working as expected:
if pl.Path(file).suffix == '.csv':
fullPath = os.path.join(sourceLoc, file)
print(file)
initDF = pd.read_csv(fullPath)
cleanDF = initDF.dropna(subset=['Name'])
masterDF = masterDF.append(cleanDF)
masterDF.to_csv(destLoc, index=False)
My only issue is the input dates are displayed like this 25/05/21 but the output dates end up formatted like this 05/25/21. As I'm in the UK and using a UK version of Excel to analyse the output, it's confusing all my functions.
The only solutions I've found so far are to reformat the date columns individually or style them, which to my understanding only affects how they look in Jupyter and not in the actual data. As there are multiple date columns in the source data files I'd rather not have to reformat them all individually.
Is there any way of defining the date format when first creating the dataframe, or reformatting every date column once the dataframe is filled?
In the end this issue was caused by two different problems.
The first was Excel intermittently exporting my dates in US format despite the original format (and my Windows Region settings) being UK format. I've now added a short VBA loop in my export code to ensure those columns are formatted correctly every time the data is exported.
The second was the CSV date being imported with incorrect dtypes. I suspect this was again the fault of Excel (2010 is problematic) but I'm unsure. I'm now correcting this with an astype() method.
The end result is my dates are now imported into Pandas in the correct format and outputted to a new CSV in the correct format too.
I have a date column in csv file like as shown below
23/6/2011 7:00
21/4/1998 05:00
17/02/1990
11/01/1985 30:30:01
26/02/1976
45:42:7
But the problem here is, when I double click the rows in csv, the actual date value is correctly displayed 15/02/2010 10:30:00` etc.
My csv looks like as below
But I cannot do this manually because you can imagine, I have 20-30 csv files and there are lot of rows like this.
So, when I read the column in pandas dataframe and apply datetime function like below,
df['Date'] = pd.to_datetime(df['Date'])
ParserError: hour must be in 0..23: 55:45.0
But how can I make pandas read the actual value and not csv display value?
I tried changing the format in excel csv file but that doesn't help
Basically I want pandas to read the double clicked value from csv but not the display value?
I am a student and i am learning pandas.
I have created excel file named Student_Record.xlsx(using microsoft excel)
I wanted to create new file using pandas
import pandas as pd
df = pd.read_excel(r"C:\Users\sudarshan\Desktop\Student_Record.xlsx")
df.head()
df.to_excel(r"C:\Users\sudarshan\Desktop\Output.xlsx",index=False)
I opened the file in pandas and saving the file back to excel with different name(file name = Output)
I saved the file back to Excel, but when i open the file(Output) on MS.Excel the columns(DOB and YOP)have time stamp attached to dates.
Please let me know how to print only date?(I want Output file and its contents to look exactly like the original file)
Hope to get some help/support.
Thank you
Probably your DOB and Year of passing columns are of datetime format before they are saved to Excel. As a result, they got converted back to the datetime representation when saved to Excel.
If you want to retain its contents to look exactly like the original file in dd-mm-YYYY format, you can try converting these 2 columns to string format before saving to Excel. You can do it by:
df['DOB'] = df['DOB'].dt.strftime('%d-%m-%Y')
df['Year of passing'] = df['Year of passing'].dt.strftime('%d-%m-%Y')
I am trying to read excel file with pandas. But my excel has one column called error has more than one rows within each cell. Example below:
Row Error
1 Bank error
Try again
2 Limit error
Cancell
When I read this file into python, I only get first rows of the error columns. My dataframe looks like this
Row Error
0 Bank error
1 Limit error
My code below:
import pandas as pd
df = pd.read_excel('/content/drive/My Drive/error.xlsx')
How can I fix this and read whole cell to python? Thank you.
I also added the image of excel of first two rows.
With specific problems in I/O operations, you are supposed to give us a little sample data(the original excel file with problems). Or we can't get what you are talking about.