I'm trying to scrape a table from this website:
https://www.epexspot.com/en/market-data/capacitymarket/capacity-table/
table = pd.read_html("https://www.epexspot.com/en/market-data/capacitymarket")[0]
Here's the output it gives:
I want to change the columns to the format %y-%m-%d. The columns for the above table should be
2018-09-13, 2018-10-18, 2018-12-13, 2019-03-21, 2019-05-15, 2019-06-27, 2019-09-12
Any suggestion?
By iterating over table.columns and using datetime module. Just make sure to use replace(year=2019) otherwise the default year (1900) will be used.
from datetime import datetime
table.columns = [datetime.strptime(column, '%a, %d/%m').replace(year=2019).strftime('%Y-%m-%d')
if '/' in column else column for column in table.columns]
You can use map if you don't like the long list comprehension:
def rename_col(col):
if '/' in col:
return datetime.strptime(col, '%a, %d/%m').replace(year=2019).strftime('%Y-%m-%d')
return col
table.columns = map(rename_col, table.columns)
Related
I have an excel here as shown in this picture:
I am using pandas to read my excel file and it is working fine, this code below can print all the data in my excel:
import pandas as pd
df = pd.read_csv('alpha.csv')
print(df)
I want to get the values from C2 cell to H9 cell which month is October and day is Monday only. And I want to store these values in my python array below:
mynumbers= []
but I am not sure how should I do it, can you please help me?
You should consider slicing your dataframe and then using .values to story them. If you want them as a list, then you can use to_list():
First transform the Date column to a datetime:
df['Date'] = pd.to_datetime(df['Date'],dayfirst=True,infer_datetime_format=True)
Then, slice and return the values for the Column Number 2
mynumbers = df[(df['Date'].dt.month == 10) & \
(df['Date'].dt.weekday == 0)]['Column 2'].values.tolist()
Assigning the following values to mynumbers:
[11,8]
A first step would be to convert your Date column to datetime objects
import datetime
myDate = "10-11-22"
myDate = datetime.datetime.strptime(myDate, '%d-%m-%y')
Then using myDate.month and myDate.weekday() you can select for mondays in October
I have a dataframe table like this:
df = pd.DataFrame({"txn_id":{'A','B','C'},"txn_date":{'2019-04-01','2020-06-01','2021-05-01'})
I was trying to find rows where transaction month is 2021-05
so what i did was:
import datetime
df['txn_month'] = df['txn_date'].dt.to_period('M')
df[df['txn_month'] == '2021-05']
However, the result returns nothing, even though in table i could see column "txn_month" has "2021-05"
could you please help? Thanks!
Use dt.to_period:
#convert to datetime if needed
df["txn_date"] = pd.to_datetime(df["txn_date"])
>>> df[df["txn_date"].dt.to_period("m").eq("2021-05")]
txn_id txn_date
2 C 2021-05-01
I try to extract a date from a SQL Server Table. I get my query to return it like this:
Hours = pd.read_sql_query("select * from tblAllHours",con)
Now I convert my "Start" Column in the Hours dataframe like this:
Hours['Start'] = pd.to_datetime(Hours['Start'], format='%Y-%m-%d')
then I select the row I want in the column like this:
StartDate1 = Hours.loc[Hours.Month == Sym1, 'Start'].values
Now, if I print my variable print(StartDate1) I get this result:
[datetime.date(2020, 10, 1)]
What I need is actually 2020-10-01
How can I get this result?
You currently have a column of datetimes. The format you're requesting is a string format
Use pandas.Series.dt.strftime to convert the datetime to a string
pd.to_datetime(Hours['Start'], format='%Y-%m-%d'): format tells the parser what format your dates are in, so they can be converted to a datetime, it is not a way to indicate the format you want the datetime.
Review pandas.to_datetime
If you want only the values, not the Series, use .values at the end of the following command, as you did in the question.
start_date_str = Hours.Start.dt.strftime('%Y-%m-%d')
try
print(Hours['Start'].dt.strftime('%Y-%m-%d').values)
result is a list of YYYY-MM-dd
['2020-07-03', '2020-07-02']
a bit similar to this How to change the datetime format in pandas
I am using python to do some data cleaning and i've used the datetime module to split date time and tried to create another column with just the time.
My script works but it just takes the last value of the data frame.
Here is the code:
import datetime
i = 0
for index, row in df.iterrows():
date = datetime.datetime.strptime(df.iloc[i, 0], "%Y-%m-%dT%H:%M:%SZ")
df['minutes'] = date.minute
i = i + 1
This is the dataframe :
Output
df['minutes'] = date.minute reassigns the entire 'minutes' column with the scalar value date.minute from the last iteration.
You don't need a loop, as 99% of the cases when using pandas.
You can use vectorized assignment, just replace 'source_column_name' with the name of the column with the source data.
df['minutes'] = pd.to_datetime(df['source_column_name'], format='%Y-%m-%dT%H:%M:%SZ').dt.minute
It is also most likely that you won't need to specify format as pd.to_datetime is fairly smart.
Quick example:
df = pd.DataFrame({'a': ['2020.1.13', '2019.1.13']})
df['year'] = pd.to_datetime(df['a']).dt.year
print(df)
outputs
a year
0 2020.1.13 2020
1 2019.1.13 2019
Seems like you're trying to get the time column from the datetime which is in string format. That's what I understood from your post.
Could you give this a shot?
from datetime import datetime
import pandas as pd
def get_time(date_cell):
dt = datetime.strptime(date_cell, "%Y-%m-%dT%H:%M:%SZ")
return datetime.strftime(dt, "%H:%M:%SZ")
df['time'] = df['date_time'].apply(get_time)
I'am trying to calculate the difference between string time values but i could not read microseconds format. Why i have this type of errors ? and how i can fix my code for it ?
I have already tried "datetime.strptime" method to get string to time format then use pandas.dataframe.diff method to calculate the difference between each item in the list and create a column in excel for it.
```
from datetime import datetime
import pandas as pd
for itemz in time_list:
df = pd.DataFrame(datetime.strptime(itemz, '%H %M %S %f'))
ls_cnv.append(df.diff())
df = pd.DataFrame(time_list)
ls_cnv = [df.diff()]
print (ls_cnv)
```
I expect the output to be
ls_cnv = [NaN, 00:00:00, 00:00:00]
time_list = ['10:54:05.912783', '10:54:05.912783', '10:54:05.912783']
but i have instead (time data '10:54:05.906224' does not match format '%H %M %S %f')
The error you get is because you are using strptime wrong.
df = pd.DataFrame(datetime.strptime(itemz, '%H:%M:%S.%f'))
The above would be the correct form, the one passed from your time_list but that's not the case. You create the DataFrame in the wrong way too. DataFrame is a table if you wish of data. The following lines will create and replace in every loop a new DataFrame for every itemz which is one element of your list at time. So it will create a DataFrame with one element in the first loop which will be '10:54:05.912783' and it will diff() that with itself while there is no other value.
for itemz in time_list:
df = pd.DataFrame(datetime.strptime(itemz, '%H %M %S %f'))
ls_cnv.append(df.diff())
Maybe what you wanted to do is the following:
from datetime import datetime
import pandas as pd
ls_cnv = []
time_list = ['10:54:03.912743', '10:54:05.912783', '10:44:05.912783']
df = pd.to_datetime(time_list)
data = pd.DataFrame({'index': range(len(time_list))}, index=df)
a = pd.Series(data.index).diff()
ls_cnv.append(a)
print (ls_cnv)
Just because your time format must include colons and point like this
"%H:%M:%S.%f"