How can Python determine the date of a provincial holiday

How can Python determine the date of a provincial holiday - python

In Canada, some holidays are provincial holidays, like civic holiday in Ontario. So given a province in Canada, how can you check if a date is a holiday for this province. I used the holidays module, it seems it gets holidays for Canada, but not each province.

You can set the subdiv to get the holidays for a specific province.
import holidays
ca_on_holidays = holidays.country_holidays('CA', subdiv='ON')
# subdiv can be one of AB, BC, MB, NB, NL, NS, NT, NU, ON, PE, QC, SK, YT
# ON is the default

Related

How to exact holiday names from Pandas USFederalHolidayCalendar?

I would like to extract US Holiday names (e.g. "Memorial Day") using USFederalHolidayCalendar in Pandas library. The code below is just printing US holidays in the given range. I don't necessarily need to use Pandas for this purpose though if there is an easier way.
cal=USFederalHolidayCalendar()
y_str=datetime.datetime.now().strftime("%Y")
holidays = cal.holidays(start=y_str+'-01-01', end=y_str+'-12-31')
for h in holidays:
print(h)
I know that "cal.rules" can return a list like below. Should I extract from this? If so, this doesn't look like a list of strings. What is the content type of list?
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x0000012EF1B3B280>), Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>), Holiday: Presidents Day (month=2, da

cal.rules gives a list of pandas.tseries.holiday.Holiday objects. These objects have .name attributes (see source). So, you can do the following:
from pandas.tseries.holiday import USFederalHolidayCalendar
import datetime
cal = USFederalHolidayCalendar()
holidays = cal.rules
print([holiday.name for holiday in holidays])

Why does my markov chain produce identical sentences from corpus?

I am using markovify markov chain generator in python and when using the example code given there it produces a lot of duplicate sentences for me and I don't know why.
The code is as follows:
import markovify
# Get raw text as string.
with open("testtekst.txt") as f:
text = f.read()
# Build the model.
text_model = markovify.Text(text)
# Print five randomly-generated sentences
for i in range(20):
print(text_model.make_sentence())
This gives me output of:
Time included him on their list of the world's highest-paid athlete by ESPN from 2016 to 2019.
He assumed full captaincy of the world's most marketable and famous athletes, Ronaldo was named the best Portuguese player of all time by the Portuguese Football Federation.
The following year, he led Portugal to their first major tournament title at Euro 2004, where he helped Portugal reach the final.
The following year, he led Portugal to their first major tournament title at Euro 2004, where he helped Portugal reach the final.
He is the first footballer and the FIFA Club World Cup at age 23, he won his first season.
One of the tournament.
He also led them to victory in the world in 2014.
In 2015, Ronaldo was ranked the world's most famous athlete by ESPN from 2016 to 2019.
The following year, he led Portugal to their first major tournament title at Euro 2004, where he helped Portugal reach the final.
In 2015, Ronaldo was ranked the world's most famous athlete by ESPN from 2016 to 2019.
The following year, he led Portugal to their first major tournament title at Euro 2004, where he helped Portugal reach the final.
He is the first footballer and the FIFA Club World Cup at age 23, he won his first international goal at Euro 2004, where he helped Portugal reach the final.
The following year, he led Portugal to their first major tournament title at Euro 2004, where he helped Portugal reach the final.
One of the world's highest-paid athlete by ESPN from 2016 to 2019.
Time included him on their list of the national team in July 2008.
He is the first footballer and the FIFA Club World Cup at age 23, he won his first season.
Time included him on their list of the national team in July 2008.
He also led them to victory in the world in 2014.
The following year, he led Portugal to their first major tournament title at Euro 2004, where he helped Portugal reach the final.
One of the world's most marketable and famous athletes, Ronaldo was ranked the world's most famous athlete by Forbes in 2016 and 2017 and the FIFA Club World Cup at age 23, he won his first international goal at Euro 2016, and received the Silver Boot as top scorer of Euro 2020.
The testtekst.txt is in ANSI encoding and has the following corpus:
Born and raised in Madeira, Ronaldo began his senior club career
playing for Sporting CP, before signing with Manchester United in
2003, aged 18, winning the FA Cup in his first season. He would also
go onto win three consecutive Premier League titles, the Champions
League and the FIFA Club World Cup at age 23, he won his first Ballon
d'Or. Ronaldo was the subject of the then-most expensive association
football transfer when he signed for Real Madrid in 2009 in a transfer
worth €94 million (£80 million), where he won 15 trophies, including
two La Liga titles, two Copa del Rey and four Champions Leagues, and
became the club's all-time top goalscorer. He also finished runner-up
for the Ballon d'Or three times, behind Lionel Messi (his perceived
career rival), and won back-to-back Ballons d'Or in 2013 and 2014, and
again in 2016 and 2017. In 2018, he signed for Juventus in a transfer
worth an initial €100 million (£88 million), the most expensive
transfer for an Italian club and the most expensive transfer for a
player over 30 years old. He won two Serie A titles, two Supercoppe
Italiana and a Coppa Italia, before returning to Manchester United in
2021. Ronaldo made his senior international debut for Portugal in 2003 at the age of 18 and has since earned over 180 caps, making him
Portugal's most-capped player. With more than 100 goals at
international level, he is also the nation's all-time top goalscorer.
He has played in and scored at 11 major tournaments, he scored his
first international goal at Euro 2004, where he helped Portugal reach
the final. He assumed full captaincy of the national team in July
2008. In 2015, Ronaldo was named the best Portuguese player of all time by the Portuguese Football Federation. The following year, he led
Portugal to their first major tournament title at Euro 2016, and
received the Silver Boot as the second-highest goalscorer of the
tournament. He also led them to victory in the inaugural UEFA Nations
League in 2019, and later received the Golden Boot as top scorer of
Euro 2020. One of the world's most marketable and famous athletes,
Ronaldo was ranked the world's highest-paid athlete by Forbes in 2016
and 2017 and the world's most famous athlete by ESPN from 2016 to
2019. Time included him on their list of the 100 most influential people in the world in 2014. He is the first footballer and the third
sportsman to earn US $1 billion in his career.
As you can see in the output - there are several identical sentences printed out and I have no idea why. The default state size should be 2.

The answer is that my state size was too big - after setting it to be 1 it produced unique sentences. I also did not know that Markovify always starts generating new sentences with the first words of the sentences in corpus.

That's right, Markovify always starts generating new sentences with the first words of sentences in the corpus. The answer is that your state size was too big. You answered yourself. However, you got a good result. I have carefully read the text on Ronaldo. Well done

Python public holidays that fall on a weekday

I've managed to find two (fairly comprehensive) holiday packages which are:
Python | Holidays library
workalendar (7.1.0)
End Goal:
I would like to get a int by using either of these two packages for the public holidays that only fall on a week day. The country is Australia and the state is Western Australia (WA), where both packages can accommodate for states and territories of Australia.
MWE:
import numpy as np
start_date = "2019-01-01"
end_date = "2020-01-01"
total_working_days = np.busday_count(start_date, end_date)
# total_public_holidays_that_fall_on_working_days =
# use the holidays package to get the number of public holidays
# that fall on working days (Mon-Fri) between *start_date* and
# *end_date*.
actual_working_days = total_working_days - total_public_holidays_that_fall_on_working_days
Question:
I want to get an int for the actual number of working days (working days - public holidays that fall on weekdays), how can I do this with either of the above libraries?

From the documentation of workalendar, adapted to your requirements:
from datetime import datetime
from workalendar.oceania.australia import WesternAustralia
start_date = "2019-01-01"
end_date = "2020-01-01"
start_datetime = datetime.strptime(start_date, '%Y-%m-%d')
end_datetime = datetime.strptime(end_date, '%Y-%m-%d')
cal = WesternAustralia()
print(cal.get_working_days_delta(start_datetime, end_datetime))
Output:
252

Identify the first sentence from a paragraph using python

I want to get the first sentence from a paragraph using python.
the paragraph as below
ECONOMYNEXT -Sri Lanka rupee closed steady at 176.40/50 rupees to the US dollar on Friday and gilt yields edged higher on profit taking in the secondary market even as the Central Bank cut policy rates to revive credit demand, while stocks ended 0.26 percent lower, market participants said.
The rupee ended at 176.40/50 rupees against the greenback in the spot market on Thursday.
which i was written from the below code was extracting the sentence until
the decimal place. Thanks for help.
import requests
#from pprint import pprint
from IPython.display import HTML
import json
txt = ''' ECONOMYNEXT -Sri Lanka rupee closed steady at 176.40/50 rupees to the US dollar on Friday and gilt yields edged higher on profit taking in the secondary market even as the Central Bank cut policy rates to revive credit demand, while stocks ended 0.26 percent lower, market participants said.
The rupee ended at 176.40/50 rupees against the greenback in the spot market on Thursday. '''
if len(txt) > 100:
txt = txt.partition('.')[0] + '.'
print(txt)

try to split with '. '(with a space) and '.\n'

You can try this...
txt = " ECONOMYNEXT -Sri Lanka rupee closed steady at 176.40/50 rupees to the US
dollar on Friday and gilt yields edged higher on profit taking in the
secondary market even as the Central Bank cut policy rates to revive credit
demand, while stocks ended 0.26 percent lower, market participants said. The
rupee ended at 176.40/50 rupees against the greenback in the spot market on
Thursday. "
sentence_index = txt.find('. ')
print(txt[0: sentence_index])
You will get output like as follow
ECONOMYNEXT -Sri Lanka rupee closed steady at 176.40/50 rupees to the US dollar on Friday and gilt yields edged higher on profit taking in the secondary market even as the Central Bank cut policy rates to revive credit demand, while stocks ended 0.26 percent lower, market participants said

Add to list if string not present in cell value, break and start new list if present?

I am trying to iterate over a column in Excel and check if a string is present. If the string is present, I want to reset the list to [] and repeat the process. Spent too many hours on this and I can't seem to figure out what I am doing wrong.
Example data:
Open Ended Schemes(Balanced)
Aditya Birla Sun Life Mutual Fund
120518 Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Dividend
120517 Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Growth
Open Ended Schemes(Debt Scheme - Banking and PSU Fund)
Axis Mutual Fund
128953 Axis Banking & PSU Debt Fund - Bonus Option
117447 Axis Banking & PSU Debt Fund - Daily Dividend Option
Code:
from openpyxl import load_workbook
import os
wb = load_workbook('m.xlsx')
ws = wb.active
keys = ['1', '2']
m_dict = {}
scheme_codes = []
for g in groups[0:2]:
for row in ws.iter_rows('A{}:A{}'.format(ws.min_row +1, ws.max_row)):
# scheme_codes = []
for cell in row:
if cell.value != None:
if 'Schemes' in cell.value:
print('Found Schemes' + str(cell.value))
scheme_codes=[]
break
else:
scheme = cell.value
scheme_codes.append(scheme)
m_dict[g] = scheme_codes
I only get 1 item per scheme, I have tried various ways of doing this and either it just goes all the way through the rows. The file has 18000 rows.
Expected output
{1:[All items before first repeat of 'schemes' in 'A' column], 2:[All items before 2nd repeat of 'schemes' in 'A' column]
Right now when I run the code, I get a len(scheme_codes) = 8069 which is wrong as far as I can see. The first list should be near 80 items.

This is not exactly what your asking for, it actually provides some additional information...
It gives you a dict of dicts holding a set of tuples of scheme_code and scheme_names, like:
{scheme: {sub_scheme : {(code, name), (code, name), ...}}}
If you really only need the top level scheme and it's codes, you should be able to simplify it.
Just remove one level of defaultdict and use scheme_codes[scheme].add(cell.value) instead...
from openpyxl import load_workbook
import os
from collections import defaultdict
wb = load_workbook("mfcodes.xlsx")
ws = wb.active
scheme_codes = defaultdict(lambda: defaultdict(set))
scheme = 'N/A'
sub_scheme = 'N/A'
for row in ws[f'A{ws.min_row}:B{ws.max_row}']:
cell = row[0]
if not cell.value:
continue
if 'Schemes' in cell.value:
scheme = cell.value
else:
if not cell.value.isdigit():
sub_scheme = cell.value
else:
scheme_codes[scheme][sub_scheme].add((cell.value, row[1].value))
print(repr(next(iter(scheme_codes.items()))))
Output:
{'Open Ended Schemes(Balanced)' :
{'Aditya Birla Sun Life Mutual Fund': {('120518', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Dividend"),
('120517', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Direct Plan-Growth"),
('103154', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Regular Plan-Dividend"),
('103155', "Aditya Birla Sun Life Equity Hybrid'95 Fund - Regular Plan-Growth"),
('131671', 'Aditya Birla Sun Life Balanced Advantage Fund - Direct Plan - Dividend Option'),
('131670', 'Aditya Birla Sun Life Balanced Advantage Fund - Direct Plan - Growth Option'),
('131665', 'Aditya Birla Sun Life Balanced Advantage Fund - Regular Plan - Dividend Option'),
('131666', 'Aditya Birla Sun Life Balanced Advantage Fund - Regular Plan - Growth Option')},
'Baroda Pioneer Mutual Fund': {('125112', 'Baroda Pioneer Balance Fund - Plan A - Bonus Option'),
('101913', 'BARODA PIONEER BALANCE FUND - Plan A - Dividend Option'),
('101912', 'BARODA PIONEER BALANCE FUND - Plan A - Growth Option'),
('119325', 'BARODA PIONEER BALANCE FUND - Plan B (Direct) - Dividend Option'),
('119326', 'BARODA PIONEER BALANCE FUND - Plan B (Direct) - Growth Option')},
# et cetera ...
}
}
By the way: The first scheme has 67 codes...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.