Pandas automatically converts string to date - python

I need to work with a csv file in which one column contains values like these: 1/2, 2/1, 3/1, etc.
When I load the csv into a pandas data frame object, automatically the values look like:01-Feb,02-Jan,03-Jan, etc.
How can I load this csv into a dataframe object in which the values of this columns are converted as strings?
I have tried this
df = pd.read_csv("/Users/Name/Desktop/QM/data.csv", encoding='latin-1',dtype=str)
But the dates remains

it sounds like there is format for that column some way...
anyway, you can just convert back to string following this Pandas Series.dt.strftime

Related

Python Pandas Dataframe Remove Float Trailing Zeros

I have a Pandas dataframe that I'm outputting to csv. I would like to keep the data types (i.e. not convert everything to string). I need to format the date properly and there are other non-float columns.
How do I remove trailing zeros from the floats while not changing datatypes? This is what I've tried:
pd.DataFrame(myDataFrame).to_csv("MyOutput.csv", index=False, date_format='%m/%d/%Y', float_format="%.8f")
For example, this:
09/26/2022,43.27334000,2,111.37000000
09/24/2022,16.25930000,5,73.53000000
Should be this:
09/26/2022,43.27334,2,111.37
09/24/2022,16.2593,5,73.53
Any help would be greatly appreciated!
You can load your code like this, without the float_format. Also, if the myDataFrame variable is already a dataframe object, you don't need to add the pd.DataFrame part, you can just do the following.
myDataFrame.to_csv("MyOutput.csv", index=False, date_format='%m/%d/%Y')

How to keep data frame data types when exporting to Excel file?

I have pandas data frame with int64 , object , and datetime64[ns] data types. How to preserve those data types when exporting pandas DataFrame.to_Excel option?
I want exported Excel file columns looks like this:
int64 Number format in Excel
object Text format in Excel
datetime64[ns] Date format in Excel
Right now all of my Excel column format shows as General
You can convert a Pandas dataframe to an Excel file with column formats using Pandas and XlsxWriter.
Here is an example of what you can do: Pandas Excel output with column formatting
For more possibilities, have a look at this page, especially the Format methods and Format properties section: The Format Class
I have pandas data frame with int64 , object , and datetime64[ns] data types. How to preserve those data types when exporting pandas DataFrame.to_Excel option?
The short answer is that you can't.
Excel does't have as many datatypes as Python and far fewer than Pandas. For example the only numeric type it has is a IEEE 754 64bit double. Therefore you won't be able be able to store a int64 without losing information (unless the integer values are <= ~15 digits). Dates and times are are also stored in the same double format and only with millisecond resolution. So you won't be able to store datetime64[ns].
You could store them in string format but you won't be able to use them for calculations and Excel will complain about "Numbers stored as strings".

Pandas to CSV column datatype [duplicate]

This question already has answers here:
datetime dtypes in pandas read_csv
(6 answers)
Closed 2 years ago.
I’m using Pandas and SQL Alchemy to import data from SQL. One of the SQL columns is datetime. I then covert the SQL data into a Pandas dataframe, the datetime column is “datetime64” – which is fine. I am able to use Matplotlib to plot any of my other columns against datetime.
I then covert my pandas dataframe to a csv using:
df.to_csv('filename')
This is to save me having to keep running a rather large sql query each time i log on. If i then try to read the csv back into python and work from that, the datetime column in now of datatype “object” rather than ”datetime64”. This means Matplotlib won't let me plot other columns against datetime because the datetime column is the wrong datatype.
How do I ensure that it stays as the correct datatype during the df to csv process?
EDIT:
The comments/solutions to my original post did work in getting the column to the correct dtype. However I now have a different problem. When i plot against the "datetime" column is looks like this:
When it should be looking like this (this is how it looks when I'm working directly with the SQL data).
I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns].
CSV is a plain text format and does not specify the data type of any column. If you are using pandas to read the csv back into python, pd.read_csv() provides a few ways to specify that a column represents a date.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Try pd.read_csv(file.csv, parse_dates=[<colnum>]), where colnum is an integer index to your date column.
read_csv() provides additional options for parsing dates. Alternatively, you could use the 'dtypes' arg.
Unfortunately, you can not store datatype in CSV format.
One thing you can do if you are only reading the file in python is to use pickle
you can do that like:
import pickle
with open('filename.pkl', 'wb') as pickle_file:
pickle.dump(your_csv_file, pickle_file)
and you can load it using
with open('filename.pkl', 'rb') as pkl_file:
csv_file = pickle.load(pkl_file)

dtype does not provide enough data type precision. Alternatives?

I am trying to check the formats of columns in a number of excel files (.xlsx) to see if they match.
To do so, I am using the function dtype of pandas.
The problem is that it returns the same data type (datetime64[ns]) for two different date formats within 'Date'.
What are the alternatives of this function to have more precision?
#Import pandas
import pandas as pd
#Read MyFile and store in dataframe df1
df1=pd.read_excel(MyFile,sheetname=0,header=0,index_col=False,keep_default_na=False)
#Print the data type of the column MyColumnName
print(df1[str(MyColumnName)].dtype)
I would like to have more accuracy on the data type information to be able to flag differences between spreadsheets.

Column value is read as date instead of string - Pandas

I am having an excel file and in that one row of column Model is having value "9-3" which is a string value. I double-checked the excel file to have the column datatype as Plain string instead of Date. But still When I use read_excel and convert it into a data frame, the value is shown as 2017-09-03 00:00:00 instead of string "9-3".
Here is how I read the excel file:
table = pd.read_excel('ManualProfitAdjustmentUpdates.xlsx' , header=0, converters={'Model': str})
Any idea on why pandas is not treating value as string even when I set the converters as str?
The Plain string setting in the excel file affects only how the data is shown in Excel.
The str setting in the converter affects only how it treats the data that it gets.
To force the excel file to return the data as string, the cell's first character should be an apostrophe.
Change "9-3" to "'9-3".
The problem may be with excel. Make sure the entire column is stored as text and not just the singular value you are talking about. If excel had the column saved as a data at any point it will store a year in that cell no matter what is shown or what the datatype is changed too. Pandas is going to read the entire column as one data type so if you have dates above 9-3 it will be converted. Changing dates to strings without years can be tricky. It may be better to save the excel sheet as a csv once it is in the proper format you like and then use pandas pd.read_csv(). I made a test excel workbook "book1.xlsx"
9-3 1 Hello
12-1 2 World
1-8 3 Test
Then ran
import pandas as pd
df = pd.read_excel('book1.xlsx',header=0)
print(df)
and got back my data frame correctly. Thus, I am led to believe it is excel. Sorry is isn't the best answer but I don't believe it is a pandas error.

Categories