Pandas to CSV column datatype [duplicate] - python

This question already has answers here:
datetime dtypes in pandas read_csv
(6 answers)
Closed 2 years ago.
I’m using Pandas and SQL Alchemy to import data from SQL. One of the SQL columns is datetime. I then covert the SQL data into a Pandas dataframe, the datetime column is “datetime64” – which is fine. I am able to use Matplotlib to plot any of my other columns against datetime.
I then covert my pandas dataframe to a csv using:
df.to_csv('filename')
This is to save me having to keep running a rather large sql query each time i log on. If i then try to read the csv back into python and work from that, the datetime column in now of datatype “object” rather than ”datetime64”. This means Matplotlib won't let me plot other columns against datetime because the datetime column is the wrong datatype.
How do I ensure that it stays as the correct datatype during the df to csv process?
EDIT:
The comments/solutions to my original post did work in getting the column to the correct dtype. However I now have a different problem. When i plot against the "datetime" column is looks like this:
When it should be looking like this (this is how it looks when I'm working directly with the SQL data).
I assume the datetime column is still not quite in the correct dtype (even though it states it is datetime64[ns].

CSV is a plain text format and does not specify the data type of any column. If you are using pandas to read the csv back into python, pd.read_csv() provides a few ways to specify that a column represents a date.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Try pd.read_csv(file.csv, parse_dates=[<colnum>]), where colnum is an integer index to your date column.
read_csv() provides additional options for parsing dates. Alternatively, you could use the 'dtypes' arg.

Unfortunately, you can not store datatype in CSV format.
One thing you can do if you are only reading the file in python is to use pickle
you can do that like:
import pickle
with open('filename.pkl', 'wb') as pickle_file:
pickle.dump(your_csv_file, pickle_file)
and you can load it using
with open('filename.pkl', 'rb') as pkl_file:
csv_file = pickle.load(pkl_file)

Related

Reading Data Frame in Atoti?

While reading Dataframe in Atoti using the following code error is occured which is shown below.
#Code
global_data=session.read_pandas(df,keys=["Row ID"],table_name="Global_Superstore")
#error
ArrowInvalid: Could not convert '2531' with type str: tried to convert to int64
How to solve this? Please help guys..
Was trying to read a Dataframe using atoti functions.
There are values with different types in that particular column. If you aren't going to preprocess the data and you're fine with that column being read as a string, then you should specify the exact datatypes of each of your columns (or that particular column), either when you load the dataframe with pandas, or when you read the data into a table with the function you're currently using:
import atoti as tt
global_superstore = session.read_pandas(
df,
keys=["Row ID"],
table_name="Global_Superstore",
types={
"<invalid_column>": tt.type.STRING
}
)

Pandas automatically converts string to date

I need to work with a csv file in which one column contains values like these: 1/2, 2/1, 3/1, etc.
When I load the csv into a pandas data frame object, automatically the values look like:01-Feb,02-Jan,03-Jan, etc.
How can I load this csv into a dataframe object in which the values of this columns are converted as strings?
I have tried this
df = pd.read_csv("/Users/Name/Desktop/QM/data.csv", encoding='latin-1',dtype=str)
But the dates remains
it sounds like there is format for that column some way...
anyway, you can just convert back to string following this Pandas Series.dt.strftime

dtype does not provide enough data type precision. Alternatives?

I am trying to check the formats of columns in a number of excel files (.xlsx) to see if they match.
To do so, I am using the function dtype of pandas.
The problem is that it returns the same data type (datetime64[ns]) for two different date formats within 'Date'.
What are the alternatives of this function to have more precision?
#Import pandas
import pandas as pd
#Read MyFile and store in dataframe df1
df1=pd.read_excel(MyFile,sheetname=0,header=0,index_col=False,keep_default_na=False)
#Print the data type of the column MyColumnName
print(df1[str(MyColumnName)].dtype)
I would like to have more accuracy on the data type information to be able to flag differences between spreadsheets.

Column value is read as date instead of string - Pandas

I am having an excel file and in that one row of column Model is having value "9-3" which is a string value. I double-checked the excel file to have the column datatype as Plain string instead of Date. But still When I use read_excel and convert it into a data frame, the value is shown as 2017-09-03 00:00:00 instead of string "9-3".
Here is how I read the excel file:
table = pd.read_excel('ManualProfitAdjustmentUpdates.xlsx' , header=0, converters={'Model': str})
Any idea on why pandas is not treating value as string even when I set the converters as str?
The Plain string setting in the excel file affects only how the data is shown in Excel.
The str setting in the converter affects only how it treats the data that it gets.
To force the excel file to return the data as string, the cell's first character should be an apostrophe.
Change "9-3" to "'9-3".
The problem may be with excel. Make sure the entire column is stored as text and not just the singular value you are talking about. If excel had the column saved as a data at any point it will store a year in that cell no matter what is shown or what the datatype is changed too. Pandas is going to read the entire column as one data type so if you have dates above 9-3 it will be converted. Changing dates to strings without years can be tricky. It may be better to save the excel sheet as a csv once it is in the proper format you like and then use pandas pd.read_csv(). I made a test excel workbook "book1.xlsx"
9-3 1 Hello
12-1 2 World
1-8 3 Test
Then ran
import pandas as pd
df = pd.read_excel('book1.xlsx',header=0)
print(df)
and got back my data frame correctly. Thus, I am led to believe it is excel. Sorry is isn't the best answer but I don't believe it is a pandas error.

Pandas DatetimeIndex string format conversion from American to European

Ok I have read some data from a CSV file using:
df=pd.read_csv(path,index_col='Date',parse_dates=True,dayfirst=True)
The data are in European date convention format dd/mm/yyyy, that is why i am using dayfirst=True.
However, what i want to do is change the string format appearance of my dataframe index df from the American(yyyy/mm/dd) to the European format(dd/mm/yyyy) just to visually been consistent with how i am looking the dates.
I could't find any relevant argument in the pd.read_csv method.
In the output I want a dataframe in which simply the index will be a datetime index visually consistent with the European date format.
Could anyone propose a solution? It should be straightforward, since I guess there should be a pandas method to handle that, but i am currently stuck.
Try something like the following once it's loaded from the CSV. I don't believe it's possible to perform the conversion as part of the reading process.
import pandas as pd
df = pd.DataFrame({'date': pd.date_range(start='11/24/2016', periods=4)})
df['date_eu'] = df['date'].dt.strftime('%d/%m/%Y')

Categories