I have a txt file with data and values like this one:
PP C timestamp HR RMSSD SCL
PP1 1 20120918T131600000 NaN NaN 80.239727
PP1 1 20120918T131700000 61 0.061420 77.365127
and I am importing it like that:
df = pd.read_csv('data.txt','\t', header=0)
which gives me a nice looking dataframe:
Running
df.columns
shows this result Index(['PP', 'C', 'timestamp', 'HR', 'RMSSD', 'SCL'], dtype='object').
Now when I am trying to convert the timestamp column into a datetime column:
df["datetime"] = pd.to_datetime(df["timestamp"], format='%Y%m%dT%H%M%S%f')
I get this:
ValueError: time data 'timestamp' does not match format '%Y%m%dT%H%M%S%f' (match)
Any ideas would be appreciated.
First, the error message you're quoting is from the header row. It's trying to parse the literal string 'timestamp' as a timestamp, which is failing. If you're getting an error on an actual data row, show us that message.
All three of your posted data rows parse fine with your format in my testing:
>>> [pandas.to_datetime(s, format='%Y%m%dT%H%M%S%f')
for s in ['20120918T131600000', '20120918T131700000',
'20120918T131800000']]
[Timestamp('2012-09-18 13:16:00'), Timestamp('2012-09-18 13:17:00'), Timestamp('2012-09-18 13:18:00')]
I have no idea where you got format='%Y%m%dT%H%M%S%f'[:-3], which just removes the S%f from the format string, leaving it invalid. If you want to remove the last three digits of the data so that you ca just use %H%M%S instead of %H%M%S%f, you need to put the [:-3] on the timestamp data value, not the format.
Related
I am importing a csv file into python with the pandas package.
SP = pd.read_csv('S&P500 (5year).csv')
When I go to use the pct_change() operand, it is unable to process the values as they have been saved with the type 'str'.
I have tried using the .astype(float) method and it returns an error could not convert string to float: '1805.51'
The 'Adj Close**' are type str and I need them as type float
Date Open High Low Close* Adj Close** Volume
0 11/1/2013 1,758.70 1,813.55 1,746.20 1,805.81 1,805.81 63,628,190,00
1 12/1/2013 1,806.55 1,849.44 1,767.99 1,848.36 1,848.36 64,958,820,000.00
2 1/1/2014 1,845.86 1,850.84 1,770.45 1,782.59 1,782.59 75,871,910,000.00
3 2/1/2014 1,782.68 1,867.92 1,737.92 1,859.45 1,859.45 69,725,590,000.00
4 3/1/2014 1,857.68 1,883.97 1,834.44 1,872.34 1,872.34 71,885,030,000.00
Try adding dtype and thousands into read_csv function. Replace the column_name in the example with the column you need to convert to float. As csv is split by commas, you need to add the thousands parameter when reading csv.
Example:
SP = pd.read_csv('S&P500 (5year).csv', thousands=',', dtype={'column_name': float})
I am new to Python, Can i please seek some help from experts here?
I wish to construct a dataframe from https://api.cryptowat.ch/markets/summaries JSON response.
based on following filter criteria
Kraken listed currency pairs (Please take note, there are kraken-futures i dont want those)
Currency paired with USD only, i.e aaveusd, adausd....
Ideal Dataframe i am looking for is (somehow excel loads this json perfectly screenshot below)
Dataframe_Excel_Screenshot
resp = requests.get(https://api.cryptowat.ch/markets/summaries) kraken_assets = resp.json() df = pd.json_normalize(kraken_assets) print(df)
Output:
result.binance-us:aaveusd.price.last result.binance-us:aaveusd.price.high ...
0 264.48 267.32 ...
[1 rows x 62688 columns]
When i just paste the link in browser JSON response is with double quotes ("), but when i get it via python code. All double quotes (") are changed to single quotes (') any idea why?. Though I tried to solve it with json_normalize but then response is changed to [1 rows x 62688 columns]. i am not sure how do i even go about working with 1 row with 62k columns. i dont know how to extract exact info in the dataframe format i need (please see excel screenshot).
Any help is much appreciated. thank you!
the result JSON is a dict
load this into a dataframe
decode columns into products & measures
filter to required data
import requests
import pandas as pd
import numpy as np
# load results into a data frame
df = pd.json_normalize(requests.get("https://api.cryptowat.ch/markets/summaries").json()["result"])
# columns are encoded as product and measure. decode columns and transpose into rows that include product and measure
cols = np.array([c.split(".", 1) for c in df.columns]).T
df.columns = pd.MultiIndex.from_arrays(cols, names=["product","measure"])
df = df.T
# finally filter down to required data and structure measures as columns
df.loc[df.index.get_level_values("product").str[:7]=="kraken:"].unstack("measure").droplevel(0,1)
sample output
product
price.last
price.high
price.low
price.change.percentage
price.change.absolute
volume
volumeQuote
kraken:aaveaud
347.41
347.41
338.14
0.0274147
9.27
1.77707
613.281
kraken:aavebtc
0.008154
0.008289
0.007874
0.0219326
0.000175
403.506
3.2797
kraken:aaveeth
0.1327
0.1346
0.1327
-0.00673653
-0.0009
287.113
38.3549
kraken:aaveeur
219.87
226.46
209.07
0.0331751
7.06
1202.65
259205
kraken:aavegbp
191.55
191.55
179.43
0.030559
5.68
6.74476
1238.35
kraken:aaveusd
259.53
267.48
246.64
0.0339841
8.53
3623.66
929624
kraken:adaaud
1.61792
1.64602
1.563
0.0211692
0.03354
5183.61
8366.21
kraken:adabtc
3.757e-05
3.776e-05
3.673e-05
0.0110334
4.1e-07
252403
9.41614
kraken:adaeth
0.0006108
0.00063
0.0006069
-0.0175326
-1.09e-05
590839
367.706
kraken:adaeur
1.01188
1.03087
0.977345
0.0209986
0.020811
1.99104e+06
1.98693e+06
Hello Try the below code. I have understood the structure of the Dataset and modified to get the desired output.
`
resp = requests.get("https://api.cryptowat.ch/markets/summaries")
a=resp.json()
a['result']
#creating Dataframe froom key=result
da=pd.DataFrame(a['result'])
#using Transpose to get required Columns and Index
da=da.transpose()
#price columns contains a dict which need to be seperate Columns on the data frame
db=da['price'].to_dict()
da.drop('price', axis=1, inplace=True)
#intialising seperate Data frame for price
z=pd.DataFrame({})
for i in db.keys():
i=pd.DataFrame(db[i], index=[i])
z=pd.concat([z,i], axis=0 )
da=pd.concat([z, da], axis=1)
da.to_excel('nex.xlsx')`
I have a dataframe:
id timestamp
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:08:59"
1 "2025-08-02 19:09:59"
I need to turn timestamp into integer number to iterate over conditions. So it look like this:
id timestamp
1 20250802190859
1 20250802190859
1 20250802190959
you can convert string using string of pandas :
df = pd.DataFrame({'id':[1,1,1],'timestamp':["2025-08-02 19:08:59",
"2025-08-02 19:08:59",
"2025-08-02 19:09:59"]})
pd.set_option('display.float_format', lambda x: '%.3f' % x)
df['timestamp'] = df['timestamp'].str.replace(r'[-\s:]', '').astype('float64')
>>> df
id timestamp
0 1 20250802190859.000
1 1 20250802190859.000
2 1 20250802190959.000
Have you tried opening the file, skipping the first line (or better: validating that it contains the header fields as expected) and for each line, splitting it at the first space/tab/whitespace. The second part, e.g. "2025-08-02 19:08:59", can be parsed using datetime.fromisoformat(). You can then turn the datetime object back to a string using datetime.strftime(format) with e.g. format = '%Y%m%d%H%M%S'. Note that there is no "milliseconds" format in strftime though. You could use %f for microseconds.
Note: if datetime.fromisoformat() fails to parse the dates, try datetime.strptime(date_string, format) with a different format, e.g. format = '%Y-%m-%d %H:%M:%S'.
You can use the solutions provided in this post: How to turn timestamp into float number? and loop through the dataframe.
Let's say you have already imported pandas and have a dataframe df, see the additional code below:
import re
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
This way you will not modify the original dataframe df and will get desired output in a new dataframe df1.
Full code that I tried (including creatiion of first dataframe), this might help in removing any confusions:
import pandas as pd
import re
l = ["2025-08-02 19:08:59", "2025-08-02 19:08:59", "2025-08-02 19:09:59"]
df = pd.DataFrame(l)
df1 = df.copy()
for x in range(len(df[0])):
df1[0][x] = re.sub(r'\D','', df[0][x])
I have a Dataframe with the following date field:
463 14-05-2019
535 03-05-2019
570 11-05-2019
577 09-05-2019
628 08-08-2019
630 25-05-2019
Name: Date, dtype: object
I have to format it as DDMMAAAA. This is what I'm doing inside a loop (for idx, row in df.iterrows():):
I'm removing the \- char using regex:
df.at[idx, 'Date'] = re.sub('\-', '', df.at[idx, 'Date'])
then using apply to enforce and an 8 digit string with leading zeros
df['Date'] = df['Date'].apply(lambda x: '{0:0>8}'.format(x))
But even though the df['Date'] field has the 8 digits with the leading 0 on the df, whent exporting it to csv the leading zeros are being removed on the exported file like below.
df.to_csv(path_or_buf=report, header=True, index=False, sep=';')
field as in csv:
Dt_DDMMAAAA
30102019
12052019
7052019
26042019
3052019
22042019
25042019
2062019
I know I must be missing the point somewhere along the way here, but I just can't figure out what the issue (or if it's even an issue, rather then a misused method).
IMO the simplest method is to use the date_format argument when writing to CSV. This means you will need to convert the "Date" column to datetime beforehand using pd.to_datetime.
(df.assign(Date=pd.to_datetime(df['Date'], errors='coerce'))
.to_csv(path_or_buf=report, date_format='%d%m%Y', index=False))
This prints,
Date
14052019
05032019
05112019
05092019
08082019
25052019
More information on arguments to to_csv can be found in Writing a pandas DataFrame to CSV file.
What i will do is using strftime + 'to_excel`, since In csv , if you open it with text , it will show the leading zero, since csv will not keeping any format when display, in that case , you can using excel
pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%m%d%Y').to_excel('your.xls')
Out[722]:
463 05142019
535 05032019
570 05112019
577 05092019
628 08082019
630 05252019
Name: Date, dtype: object
Firstly, your method is producing a file which contains leading zeros just as you expect. I reconstructed this minimal working example from your description and it works just fine:
import pandas
import re
df = pandas.DataFrame([["14-05-2019"],
["03-05-2019"],
["11-05-2019"],
["09-05-2019"],
["08-08-2019"],
["25-05-2019"]], columns=['Date'])
for idx in df.index:
df.at[idx, 'Date'] = re.sub('\-', '', df.at[idx, 'Date'])
df['Date'] = df['Date'].apply(lambda x: '{0:0>8}'.format(x))
df.to_csv(path_or_buf="report.csv", header=True, index=False, sep=';')
At this point report.csv contains this (with leading zeros just as you wanted).
Date
14052019
03052019
11052019
09052019
08082019
25052019
Now as to why you thought it wasn't working. If you are mainly in Pandas, you can stop it from guessing the type of the output by specifying a dtype in read_csv:
df_readback = pandas.read_csv('report.csv', dtype={'Date': str})
Date
0 14052019
1 03052019
2 11052019
3 09052019
4 08082019
5 25052019
It might also be that you are reading this in Excel (I'm guessing this from the fact that you are using ; separators). Unfortunately there is no way to ensure that Excel reads this field correctly on double-click, but if this is your final target, you can see how to mangle your file for Excel to read correctly in this answer.
I have a column in an Excel file that is filled with dates, with the mm/dd/yyyy format.
I import the column into a list in Python using this code:
first_excel_file = pd.read_excel('test.xlsx')
item_end_date = first_excel_file['Item End Date'].values.tolist()
But I get this:
[1478476800000000000, 1476921600000000000, 1488240000000000000, 1488240000000000000, 1488240000000000000, 1488326400000000000, 1489622400000000000, 1489622400000000000, 1489968000000000000, 1494288000000000000, 1454198400000000000, 1454198400000000000, 1490918400000000000, 1490918400000000000, 1490918400000000000, 1491955200000000000, 1491955200000000000, 1446249600000000000, 1509408000000000000, 1509408000000000000, 1509408000000000000, 1364688000000000000, 1391126400000000000, 1398816000000000000, 1422662400000000000, 1418428800000000000, 1419292800000000000, 1422662400000000000, 1422662400000000000, 1422662400000000000, 1423612800000000000, 1426291200000000000, 1438300800000000000]
How can I import these dates and keep their original formatting instead of getting these numeric values?
Are these timestamps?
If so, you can convert them into dates.
This may help:
from datetime import datetime
item_end_date = [datetime.fromtimestamp(adt//1000000000).strftime("%m/%d/%Y")
for adt in item_end_date]
You will get:
['11/06/2016', '10/19/2016', '02/27/2017', '02/27/2017', '02/27/2017',
'02/28/2017', '03/15/2017', '03/15/2017', '03/19/2017', '05/08/2017',
'01/30/2016', '01/30/2016', '03/30/2017', '03/30/2017', '03/30/2017',
'04/11/2017', '04/11/2017', '10/30/2015', '10/30/2017', '10/30/2017',
'10/30/2017', '03/30/2013', '01/30/2014', '04/29/2014', '01/30/2015',
'12/12/2014', '12/22/2014', '01/30/2015', '01/30/2015', '01/30/2015',
'02/10/2015', '03/13/2015', '07/30/2015']