Is there any possibility to convert string timestamp in pyarrow table to datetime format before writing to parquet file?
Depending on the timestamp format, you can make use of pyarrow.compute.strptime function. It is not well-documented yet, but you can use something like this:
import pyarrow.compute as pc
pc.strptime(table.column("Timestamp"), format='%Y-%m-%d %H:%M:%S', unit='s')
provided your data is stored in table and "Timestamp" is the name of the column with timestamp strings.
Related
Spark version : 2.1
I'm trying to convert a string datetime column to utc timestamp with the format yyyy-mm-ddThh:mm:ss
I first start by changing the format of the string column to yyyy-mm-ddThh:mm:ss
and then convert it to timestamp type. Later I would convert the timestamp to UTC using to_utc_timestamp function.
df.select(
f.to_timestamp(
f.date_format(f.col("time"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss"
)
).show(5, False)
The date_format works fine by giving me the correct format. But, when I do to_timestamp on top of that result, the format changes to yyyy-MM-dd HH:mm:ss, when it should instead be yyyy-MM-dd'T'HH:mm:ss. Why does this happen?
Could someone tell me how I could retain the format given by date_format? What should I do?
The function to_timestamp returns a string to a timestamp, with the format yyyy-MM-dd HH:mm:ss.
The second argument is used to define the format of the DateTime in the string you are trying to parse.
You can see a couple of examples in the official documentation.
The code should be like this, just look at the single 'd' part here, and this is tricky in many cases.
data= data.withColumn('date', to_timestamp(col('date'), 'yyyy/MM/d'))
I am aware that this question was posted more times before but yet I have few doubts. I have a datetime.date (ex. mydate = date(2014,5,1)) and I converted this as a string, then saved in DB as a column (dtype:object) in a table. Now I wanted to change the storage of dates from text to timestamp in DB. I tried this,
Ex. My table is tab1. I read this as dataframe df in python.
# datetime to timestamp
df['X'] = pd.to_datetime(mydate)
When I check dtype in python editor df.info(), the dtype of X is datetime64[ns] but when I save this to DB in MySQL and read again as dataframe in python, the dtype changes as object. I have datatype as datetime in MySQL but I need this as timestamp datatype in MySQL. Is there any way to do it? Also, I need only date from Timestamp('2014-5-01 00:00:00') and exclude time.
The problem is that when u read the serialized value from MySQL the python MySQL connector does not convert it. you have to convert it to DateTime value after reading data from the cursor by calling your function again on retrieved data:
df['X'] = pd.to_datetime(df['col'])
As suggested, I changed the column type directly by using dtype argument in to_sql() function while inserting into the database. So, now I can have datatypes as TIMESTAMP, DATETIME and also DATE in MySQL.
such works for me:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df['timestamp'] = df['Date'].apply(lambda x: x.timestamp())
The default output format of to_csv() is:
12/14/2012 12:00:00 AM
I cannot figure out how to output only the date part with specific format:
20121214
or date and time in two separate columns in the csv file:
20121214, 084530
The documentation is too brief to give me any clue as to how to do these. Can anyone help?
Since version v0.13.0 (January 3, 2014) of Pandas you can use the date_format parameter of the to_csv method:
df.to_csv(filename, date_format='%Y%m%d')
You could use strftime to save these as separate columns:
df['date'] = df['datetime'].apply(lambda x: x.strftime('%d%m%Y'))
df['time'] = df['datetime'].apply(lambda x: x.strftime('%H%M%S'))
and then be specific about which columns to export to csv:
df[['date', 'time', ... ]].to_csv('df.csv')
To export as a timestamp, do this:
df.to_csv(filename, date_format='%s')
The %s format is not documented in python/pandas but works in this case.
I found the %s from the dates formats of ruby. Strftime doc for C here
Note that the timestamp miliseconds format %Q does not work with pandas (you'll have a litteral %Q in the field instead of the date). I caried my sets with python 3.6 and pandas 0.24.1
I have a series of tables that I am using to create a map in ArcGIS desktop. In the attribute table there is a date column in the format "19950129000000" and I would like to convert this format to something more meaningful such as "29/1/1995". The column says it is in a string format, but the metadata says it is in a date format.
I have done something similar before but I am having trouble getting it to work.
I've tried:
def dtConversion(date):
from datetime import datetime
od = datetime.strptime(date, "YYYYMMddhhmmss")
nd = datetime.strftime(od, "%d/%m/%Y")
return nd
esri_field_calculator_splitter
dtConversion(!CMPLDT!)
datetime.strptime('19950129000000', "%Y%m%d%H%M%S")
Suppose I have a csv with a timestamp but the format is not defined. It can be of any format with any separator like -
mm/dd/yyyy hh:mm or dd/mm/yyyy hh:mm:ss or mm-dd-yyyy hh:mm or dd-mm-yyyy hh:mm:ss or just like that.
I am trying to parse dates of any format.
Here:
dateparse = lambda dates: datetime.strptime(dates, '%m/%d/%Y %H:%M')
We have defined to parse dates in this format: %m/%d/%Y %H:%M
If anyone can give any valuable suggestion then it will be helpful.
pandas.read_csv has an infer_datetime_format parameter:
infer_datetime_format : boolean, default False
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by ~5-10x.