Using PYTHON, I am trying to sort the values in the table based on dates and the associated events. Date1 and event1 are a pair because event1 occurred on date1. The same for other dates and events.
date1 is in a column of its own, and event1 is also in its own separate column.
For example, the table would include as below:
date1
event1
date2
event2
date3
event3
March 6, 2021
eventC
Jan. 1, 2020
eventX
May 11, 2020
eventB
Dec. 6, 2021
eventBB
Feb. 11, 2001
eventYY
June 13, 1990
eventSS
March 16, 2019
eventCD
Jan. 1, 1998
eventRE
May 23, 1989
eventDF
Nov. 1, 2008
eventCC
April 14, 2001
eventWQ
March 17, 1999
eventCV
I would like the result to show as follows where for each row, sort the data from oldest date to the newest, but the order of the events follow their respective dates (e.g. eventC that occurred on March 6, 2021 as the first in row 0 is now reordered to be the third in row 0).
date1
event1
date2
event2
date3
event3
Jan. 1, 2020
eventX
May 11, 2020
eventB
March 6, 2021
eventC
June 13, 1990
eventSS
Feb. 11, 2001
eventYY
Dec. 6, 2021
eventBB
May 23, 1989
eventDF
Jan. 1, 1998
eventRE
March 16, 2019
eventCD
March 17, 1999
eventCV
Nov. 1, 2008
eventCC
April 14, 2001
eventWQ
I would like to keep the output in the table format as above! (unless there is a good reason not to...) =)
This is actually a very small version of a much larger data. Any help would be appreciated.
Thanks to all in advance!!!
Cast the 'Dates' values to datetime and use the sort_values() function. Example:
import pandas as pd
df = pd.DataFrame([{"Date":"02-01-1945","Name":"event2"},{"Date":"20-11-1934", "Name":"event1"}])
print(df)
#Output:
#
# Date Name
#0 02-01-1945 event2
#1 20-11-1934 event1
df["Date"] = pd.to_datetime(df["Date"])
df= df.sort_values(by="Date")
print(df)
#Output:
# Date Name
#1 1934-11-20 event1
#0 1945-02-01 event2
Related
I would like to print out "Last Thurs" if the date is the last Thursday of the month and "First Thurs" if the date is the first Thursday of the current month.
For example:
Date: May 27, 2022
Output: Last Thurs
Date: May 5, 2022
Output: First Thurs
I have a pandas date column in the following format
Date
0 March 13, 2020, March 13, 2020
1 3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021
2 NaN
3 May 20, 2022, May 21, 2022
I tried to convert the pattern to a single pattern to store to a new column.
import pandas as pd
import dateutil.parser
# initialise data of lists.
data = {'Date':['March 13, 2020, March 13, 2020', '3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021', 'NaN','May 20, 2022, May 21, 2022']}
# Create DataFrame
df = pd.DataFrame(data)
df["FormattedDate"] = df.Date.apply(lambda x: dateutil.parser.parse(x.strftime("%Y-%m-%d") ))
But i am getting an error
AttributeError: 'str' object has no attribute 'strftime'
Desired Output
Date DateFormatted
0 March 13, 2020, March 13, 2020 2020-03-13, 2020-03-13
1 3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021 2021-03-09, 2021-03-09, 2021-03-09, 2021-09-03
2 NaN NaN
3 May 20, 2022, May 21, 2022 2022-05-20, 2022-05-21
I was authot of previous solution, so possible solution is change also it for avoid , like separator and like value in date strings is used Series.str.extractall, converting to datetimes and last is aggregate join:
format_list = ["[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{2,4}",
"[0-9]{1,2}(?:\.)(?:\s)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\s)?[0-9]{2,4}",
"(?:(?:(?:j|J)an)|(?:(?:f|F)eb)|(?:(?:m|M)ar)|(?:(?:a|A)pr)|(?:(?:m|M)ay)|(?:(?:j|J)un)|(?:(?:j|J)ul)|(?:(?:a|A)ug)|(?:(?:s|S)ep)|(?:(?:o|O)ct)|(?:(?:n|N)ov)|(?:(?:d|D)ec))\w*(?:\s)?(?:\n)?[0-9]{1,2}(?:\s)?(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}",
"[0-9]{1,2}(?:\.)?(?:\s)?(?:\n)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}"]
# initialise data of lists.
data = {'Name':['Today is 09 September 2021', np.nan, '25 December 2021 is christmas', '01/01/2022 is newyear and will be holiday on 02.01.2022 also']}
# Create DataFrame
df = pd.DataFrame(data)
import dateutil.parser
f = lambda x: dateutil.parser.parse(x).strftime("%Y-%m-%d")
df['DateFormatted'] = df['Name'].str.extractall(f'({"|".join(format_list)})')[0].apply(f).groupby(level=0).agg(','.join)
print (df)
Name DateFormatted
0 Today is 09 September 2021 2021-09-09
1 NaN NaN
2 25 December 2021 is christmas 2021-12-25
3 01/01/2022 is newyear and will be holiday on 0... 2022-01-01,2022-02-01
Another alternative is processing lists after remove missing values in generato comprehension with join:
import dateutil.parser
f = lambda x: dateutil.parser.parse(x).strftime("%Y-%m-%d")
df['Date'] = df['Name'].str.findall("|".join(format_list)).dropna().apply(lambda y: ','.join(f(x) for x in y))
print (df)
Name Date
0 Today is 09 September 2021 2021-09-09
1 NaN NaN
2 25 December 2021 is christmas 2021-12-25
3 01/01/2022 is newyear and will be holiday on 0... 2022-01-01,2022-02-01
I have the following data that I am trying to plot.
month year total_sales
May 2020 7
June 2020 2
July 2020 1
August 2020 2
September 2020 22
October 2020 11
November 2020 6
December 2020 3
January 2019 3
feburary 2019 11
March 2019 65
April 2019 22
May 2019 33
June 2019 88
July 2019 44
August 2019 12
September 2019 32
October 2019 54
November 2019 76
December 2019 23
January 2018 12
feburary 2018 32
March 2018 234
April 2018 2432
May 2018 432
Here is the code I am using to do it:
def plot_timeline_data(df):
fig, ax = plt.subplots()
ax.set_xticklabels(df['month'].unique(), rotation=90)
for name, group in df.groupby('year'):
ax.plot(group['month'], group['total_sales'], label=name,linestyle='--', marker='o')
ax.legend()
plt.tight_layout()
plt.show()
I want the order of x labels to start from january to December but my graph is starting with May to December and then resume from Jan to April as shown in the figure ( exact values of the graph are different as I changed the values). How can I put this in correct order?
You can use the following method. The idea is to sort the month column as shown in this and this post
# Capitalize the month names
df["month"] = df["month"].str.capitalize()
# Correct the spelling of February
df['month'] = df['month'].str.replace('Feburary','February')
# Convert to datetime object for sorting
df['month_in'] = pd.DatetimeIndex(pd.to_datetime(df['month'], format='%B')).month
# Sort using the index
df = df.set_index('month_in').sort_index()
plot_timeline_data(df)
Dataframe.plot makes the job a bit easier for you - it plot each series as a different line, and keep the order you perscribe:
import matplotlib.pyplot as plt
# Convert the dataframe to series of years
df = df.set_index(["month","year"])["total_sales"].unstack()
# Sort the index (which is month)
df = df.loc[[
"January","feburary","March","April","May","June",
"July", "August", "September","October", "November", "December"
]]
# Plot!
df.plot(marker="o", linestyle="--", rot=90)
# Show all ticks
plt.xticks(range(12), df.index)
I think you need to change the order of your 'month' index in the pandas dataframe.
try adding:
group['month'] = group['month'][8:] + group['month'][:8]
before unsing the for-loop to plot the years
I have the following dataframe :
month price
0 April 102.478015
1 August 94.868053
2 December 97.278205
3 February 100.114510
4 January 99.419109
5 July 93.402928
6 June 96.114224
7 March 101.297762
8 May 102.905340
9 November 97.952169
10 October 95.606478
11 September 94.226803
I would like to have the months in a coherent order (January in the first row until December in the 12th row). How please could I do ?
If necessary, you can copy this dataframe and then execute
pd.read_clipboard(sep='\s\s+')
to have the dataframe on your jupyter notebook
Convert values to ordered categoricals, so possible use DataFrame.sort_values:
cats = ['January','February','March','April','May','June',
'July','August','September','October','November','December']
df['month'] = pd.CategoricalIndex(df['month'], ordered=True, categories=cats)
#alternative
#df['month'] = pd.Categorical(df['month'], ordered=True, categories=cats)
df = df.sort_values('month')
print (df)
month price
4 January 99.419109
3 February 100.114510
7 March 101.297762
0 April 102.478015
8 May 102.905340
6 June 96.114224
5 July 93.402928
1 August 94.868053
11 September 94.226803
10 October 95.606478
9 November 97.952169
2 December 97.278205
I have a dataframe df1:
Month
1
3
March
April
2
4
5
I have another dataframe df2:
Month Name
1 January
2 February
3 March
4 April
5 May
If I want to replace the integer values of df1 with the corresponding name from df2, what kind of lookup function can I use?
I want to end up with this as my df1:
Month
January
March
March
April
February
May
replace it
df1.replace(dict(zip(df2.Month.astype(str),df2.Name)))
Out[76]:
Month
0 January
1 March
2 March
3 April
4 February
5 April
6 May
You can use pd.Series.map and then fillna. Just be careful to map either strings to strings or, as here, numeric to numeric:
month_name = df2.set_index('Month')['Name']
df1['Month'] = pd.to_numeric(df1['Month'], errors='coerce').map(month_name)\
.fillna(df1['Month'])
print(df1)
Month
0 January
1 March
2 March
3 April
4 February
5 April
6 May
You can also use pd.Series.replace, but this is often inefficient.
One alternative is to use map with a function:
def repl(x, lookup=dict(zip(df2.Month.astype(str), df2.Name))):
return lookup.get(x, x)
df['Month'] = df['Month'].map(repl)
print(df)
Output
Month
0 January
1 February
2 March
3 April
4 May
Use map with a series, just need to make sure your dtypes match:
mapper = df2.set_index(df2['Month'].astype(str))['Name']
df1['Month'].map(mapper).fillna(df1['Month'])
Output:
0 January
1 March
2 March
3 April
4 February
5 April
6 May
Name: Month, dtype: object