how to make pandas not read display values from csv? - python

I have a date column in csv file like as shown below
23/6/2011 7:00
21/4/1998 05:00
17/02/1990
11/01/1985 30:30:01
26/02/1976
45:42:7
But the problem here is, when I double click the rows in csv, the actual date value is correctly displayed 15/02/2010 10:30:00` etc.
My csv looks like as below
But I cannot do this manually because you can imagine, I have 20-30 csv files and there are lot of rows like this.
So, when I read the column in pandas dataframe and apply datetime function like below,
df['Date'] = pd.to_datetime(df['Date'])
ParserError: hour must be in 0..23: 55:45.0
But how can I make pandas read the actual value and not csv display value?
I tried changing the format in excel csv file but that doesn't help
Basically I want pandas to read the double clicked value from csv but not the display value?

Related

How to handle timezone when loading a CSV file with pandas dataframe

I am currently facing a problem when loading data via the read_csv() function of pandas.
Here is an extract of a record from the CSV file :
2021-11-28T03:13:01+00:00,49.59,49.93,49.56,49.88
When I use pandas to_csv() my index column which is in timestamp systematically loses one hour on all records
The previous example look like this after using pandas :
2021-11-28T02:13:01+00:00,49.59,49.93,49.56,49.88
Here is python snippet code. :
df_mre_unpacked = pd.read_csv('mre_unpacked.csv', sep=',',encoding='utf-8',index_col='timestamp',decimal=".")
df_mre_unpacked = df_mre_unpacked[['ASG1.CPU10_XV_ACHSE1_ZR','ASG1.CPU11_XV_ACHSE2_ZR','ASG2.CPU10_XV_ACHSE1_ZR','ASG2.CPU11_XV_ACHSE2_ZR']]
Here is the original CSV
Result of pandas dataframe head() function :
As you can see the first records start from 03:13:01 from the CSV file but from the panda dataframe it begains with 02:13:01 since I do not have this timestamp in my csv file
Has anyone had this problem before ?
UPDATE :
I have a better view of the problem now. Here is what I discovered.
When I do the data extraction via a portal the time that is displayed is as follows:
Here is the same record read from the CSV file with notepad++
The problem comes from the extraction process that automatically removes one hour.
How do I add 1 hour to my timestamp column which is my index?
We just have to simply add this snippet code :
df_de18_EventData.index = pd.to_datetime(df_de18_EventData.index) + pd.Timedelta("1 hour")
This will add 1 hour to the index timestamp column
It's not the best solution but it works for now. An optimal solution would be to take into account automatically the time difference like excel does

Pandas - Change the default date format when creating a dataframe from CSVs

I have a script that loops through a folder of CSVs, reads them, removes any empty rows (they all have 'empty' rows that Pandas reads as NaN) and appends them to a master dataframe. It then writes the dataframe to a new CSV. This is all working as expected:
if pl.Path(file).suffix == '.csv':
fullPath = os.path.join(sourceLoc, file)
print(file)
initDF = pd.read_csv(fullPath)
cleanDF = initDF.dropna(subset=['Name'])
masterDF = masterDF.append(cleanDF)
masterDF.to_csv(destLoc, index=False)
My only issue is the input dates are displayed like this 25/05/21 but the output dates end up formatted like this 05/25/21. As I'm in the UK and using a UK version of Excel to analyse the output, it's confusing all my functions.
The only solutions I've found so far are to reformat the date columns individually or style them, which to my understanding only affects how they look in Jupyter and not in the actual data. As there are multiple date columns in the source data files I'd rather not have to reformat them all individually.
Is there any way of defining the date format when first creating the dataframe, or reformatting every date column once the dataframe is filled?
In the end this issue was caused by two different problems.
The first was Excel intermittently exporting my dates in US format despite the original format (and my Windows Region settings) being UK format. I've now added a short VBA loop in my export code to ensure those columns are formatted correctly every time the data is exported.
The second was the CSV date being imported with incorrect dtypes. I suspect this was again the fault of Excel (2010 is problematic) but I'm unsure. I'm now correcting this with an astype() method.
The end result is my dates are now imported into Pandas in the correct format and outputted to a new CSV in the correct format too.

How to get rid of timestamps infront of Date, as Pandas adds time stamps to date columns after saving to excel

I am a student and i am learning pandas.
I have created excel file named Student_Record.xlsx(using microsoft excel)
I wanted to create new file using pandas
import pandas as pd
df = pd.read_excel(r"C:\Users\sudarshan\Desktop\Student_Record.xlsx")
df.head()
df.to_excel(r"C:\Users\sudarshan\Desktop\Output.xlsx",index=False)
I opened the file in pandas and saving the file back to excel with different name(file name = Output)
I saved the file back to Excel, but when i open the file(Output) on MS.Excel the columns(DOB and YOP)have time stamp attached to dates.
Please let me know how to print only date?(I want Output file and its contents to look exactly like the original file)
Hope to get some help/support.
Thank you
Probably your DOB and Year of passing columns are of datetime format before they are saved to Excel. As a result, they got converted back to the datetime representation when saved to Excel.
If you want to retain its contents to look exactly like the original file in dd-mm-YYYY format, you can try converting these 2 columns to string format before saving to Excel. You can do it by:
df['DOB'] = df['DOB'].dt.strftime('%d-%m-%Y')
df['Year of passing'] = df['Year of passing'].dt.strftime('%d-%m-%Y')

Python export string columns to excel

I have one column in my Data Frame with codes like this: 2106080119283699, 2104985492880938 for that I converted that column to string:
File['mtcn']=File['mtcn'].values.astype(str)
To avoid this when I export this Data Frame to excel: 5.47506e+09
But when I open the excel file the column looks good but If I click into the column the value change to:
I know that I can change the data type into excel to text and this would never happen but I want to know if there is a way to convert into the same python the value and after export to excel even if I click int the excel the column the code would keep this structure as: 2106080119283699
Regards

How to make a date string be read as a date object when saving excel files in python?

I am currently using pandas in python to save my dataframes as csv files. I have a column 'date_arrived' that looks like ['01/03/2010','01/11/2010','01/01/2010','01/05/2010','01/08/2010'] where the dates are in dd/mm/yyyy format. When I save it to excel using df.to_csv(), the dates appear as a string object in excel and not a date object in excel. Is there anyway to change how to way excel reads the date in python?

Categories