Python - Basic Leap year function problem (Novice) - python

So basically my problem is, I have a list from 2020 to 2030 and my program said every year is a leap year.
My variables are:
yearList = [2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030]
monthList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
daysOfMonth = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
def create_calendar(yearList: list, monthList: list, daysOfMonth: list, numbOfShootingStars: int):
calendar = {}
for year in yearList:
# Create year
calendar[year] = {}
for month in monthList:
# If the list don't have 12 months, it won't loop through all months
if month == 12 and month + 1 == 1: break;
else:
# Create monthly data
calendar[year][month] = {}
# If February is a leap year, it will have 29 days instead of 28
if month == 2 and year % 4 == 0:
daysOfMonth[month - 1] = 29
# Create the days and daily data
for day in range(1, daysOfMonth[monthList.index(month)] + 1):
calendar[year][month][day] = numbOfShootingStars
return calendar
Thank you for your help!
1 question, is it possible to use a list like this for this method?
monthList = [
{1, 'January'},
{2, 'February'},
{3, 'March'},
{4, 'April'},
{5, 'May'},
{6, 'June'},
{7, 'July'},
{8, 'August'},
{9, 'September'},
{10, 'October'},
{11, 'November'},
{12, 'December'}]
Then how should I modify my code because I couldn't get it work :(

Okay, I think I solved it, the problem was, I changed to value in the daysOfMonth list at February to 29 and then it stays like that. With this if - else I changed back to it's original state when it is not a leap year.
# If February is a leap year, it will have 29 days instead of 28
if month == 2 and year % 4 == 0:
daysOfMonth[month - 1] = 29
else:
daysOfMonth[month - 1] = 28

If you take a year to be 365.2425 days, this is 365 + 1/4 - 1/100 + 1/400. This explains why in the long run you stick to the correct value by adding a whole day every fourth year, but not on every century, but on every fourth century (starting from 0).
A possible implementation is, to tell if the year Y is leap:
Floor(365.2425 Y) - Floor(365.2425 (Y-1)) - 365

Related

Get n * k unique sets of 2 from list of length n in Python

I have the following Python brainteaser: We arrange a 30-day programme with 48 participants. Every day in the programme, participants are paired in twos. Participants cannot have the same partners twice and all participants have to be partnered up every day. P.S. I hope my math is right in the title.
I've managed an implementation but it feels very clunky. Is there an efficient way to do this? Perhaps using the cartesian product somehow? All feedback and tips are much appreciated.
# list of people: 48
# list of days: 30
# each day, the people need to be split into pairs of two.
# the same pair cannot occur twice
import random
from collections import Counter
class person ():
def __init__ (self, id):
self.id = id
class schedule ():
def __init__ (self, days):
self.people_list = []
self.days = days
self.placed_people = []
self.sets = []
def create_people_list(self, rangex):
for id in range(rangex):
new_person = person(id)
self.people_list.append(new_person)
print(f"{len(self.people_list)} people and {self.days} days will be considered.")
def assign_pairs(self):
for day in range(self.days): # for each of the 30 days..
print("-" * 80)
print(f"DAY {day + 1}")
self.placed_people = [] # we set a new list to contain ids of placed people
while Counter([pers.id for pers in self.people_list]) != Counter(self.placed_people):
pool = list( set([pers.id for pers in self.people_list]) - set(self.placed_people))
# print(pool)
person_id = random.choice(pool) # pick random person
person2_id = random.choice(pool) # pick random person
if person_id == person2_id: continue
if not set([person_id, person2_id]) in self.sets or len(pool) == 2:
if len(pool) == 2: person_id, person2_id = pool[0], pool[1]
self.sets.append(set([person_id, person2_id]) )
self.placed_people.append(person_id)
self.placed_people.append(person2_id)
print(f"{person_id} {person2_id}, ", end="")
schdl = schedule(30) # initiate schedule with 30 days
schdl.create_people_list(48)
schdl.assign_pairs()
Outputs:
48 people and 30 days will be considered.
--------------------------------------------------------------------------------
DAY 1
37 40, 34 4, 1 46, 13 39, 12 35, 18 33, 25 24, 23 31, 17 42, 32 19, 36 0, 11 9, 7 45, 10 21, 44 43, 29 41, 38 16, 15 22, 2 20, 26 47, 30 28, 3 8, 6 27, 5 14,
--------------------------------------------------------------------------------
DAY 2
42 28, 25 15, 6 17, 2 14, 7 40, 11 4, 22 37, 33 20, 0 16, 3 39, 19 47, 46 24, 12 27, 26 1, 34 10, 45 8, 23 13, 32 41, 9 29, 44 31, 30 5, 38 18, 43 21, 35 36,
--------------------------------------------------------------------------------
DAY 3
8 28, 33 12, 40 26, 5 35, 13 31, 29 43, 44 21, 11 30, 1 7, 34 2, 47 45, 46 17, 4 23, 32 15, 14 22, 36 42, 16 41, 37 19, 38 3, 20 6, 10 0, 24 9, 27 25, 18 39,
--------------------------------------------------------------------------------
[...]
--------------------------------------------------------------------------------
DAY 29
4 18, 38 28, 24 22, 23 33, 9 41, 40 20, 26 39, 2 42, 15 10, 12 21, 11 45, 46 7, 35 27, 29 36, 3 31, 19 6, 47 32, 25 43, 13 44, 1 37, 14 0, 16 17, 30 34, 8 5,
--------------------------------------------------------------------------------
DAY 30
17 31, 25 7, 6 10, 35 9, 41 4, 16 40, 47 43, 39 36, 19 44, 23 11, 13 29, 21 46, 32 34, 12 5, 26 14, 15 0, 28 24, 2 37, 8 22, 27 38, 45 18, 3 20, 1 33, 42 30,
Thanks for your time! Also, a follow up question: How can I calculate whether it is possible to solve the task, i.e. to arrange all the participants in unique pairs every day?
Round-robin tournaments in real life
Round-robin tournaments are extremely easy to organize. In fact, the algorithm is so simple that you can organize a round-robin tournament between humans without any paper or computer, just by giving the humans simple instructions.
You have an even number N = 48 humans to pair up. Imagine you have a long table with N // 2 seats on one side, facing N // 2 seats on the other side. Ask all the humans to seat at that table.
This is your first pairing.
Call one of the seats "seat number 1".
To move to the next pairing: the human at seat number 1 doesn't move. Every other human moves clockwise one seat around the table.
Current pairing
1 2 3 4
8 7 6 5
Next pairing
1 8 2 3
7 6 5 4
Round-robin tournaments in python
# a table is a simple list of humans
def next_table(table):
return [table[0]] + [table[-1]] + table[1:-1]
# [0 1 2 3 4 5 6 7] -> [0 7 1 2 3 4 5 6]
# a pairing is a list of pairs of humans
def pairing_from_table(table):
return list(zip(table[:len(table)//2], table[-1:len(table)//2-1:-1]))
# [0 1 2 3 4 5 6 7] -> [(0,7), (1,6), (2,5), (3,4)]
# a human is an int
def get_programme(programme_length, number_participants):
table = list(range(number_participants))
pairing_list = []
for day in range(programme_length):
pairing_list.append(pairing_from_table(table))
table = next_table(table)
return pairing_list
print(get_programme(3, 8))
# [[(0, 7), (1, 6), (2, 5), (3, 4)],
# [(0, 6), (7, 5), (1, 4), (2, 3)],
# [(0, 5), (6, 4), (7, 3), (1, 2)]]
print(get_programme(30, 48))
If you want the humans to be custom objects instead of ints, you can replace the second argument number_participants by the list table directly; then the user can supply a list of whatever they want:
def get_programme(programme_length, table):
pairing_list = []
for day in range(programme_length):
pairing_list.append(pairing_from_table(table))
table = next_table(table)
return pairing_list
print(get_programme(3, ['Alice', 'Boubakar', 'Chen', 'Damian']))
# [[('Alice', 'Damian'), ('Boubakar', 'Chen')],
# [('Alice', 'Chen'), ('Damian', 'Boubakar')],
# [('Alice', 'Boubakar'), ('Chen', 'Damian')]]
Follow-up question: when does there exist a solution?
If there are N humans, each human can be paired with N-1 different humans. If N is even, then the round-robin circle-method will make sure that the first N-1 rounds are correct. After that, the algorithm is periodic: the Nth round will be identical to the first round.
Thus there is a solution if and only if programme_length < number_participants and the number of participants is even; and the round-robin algorithm will find a solution in that case.
If the number of participants is odd, then every day of the programme, there must be at least one human who is not paired. The round-robin tournament can still be applied in this case: add one extra "dummy" human (usually called bye-player). The dummy human behaves exactly like a normal human for the purposes of the algorithm. Every round, one different real human will be paired with the dummy human, meaning they are not paired with a real human this round. With this method, all you need is programme_length <= number_participants.

problem with if else loop for a month and days problem

why do I get the wrong output for the code below, I've put the output below
(some might suggest using date-time module, I'm going with this method due to some complications with the main program)
months = [1,2,3,4,5,6,7,8,9,10,11,12]
for month in months:
if month == {1,3,5,7,9,11}:
days= 31
print(days)
elif month == {4,6,8,10,12}:
days = 30
print(days)
else :
days = 28
print(days)
I get this output
28
28
28
28
28
28
28
28
28
28
28
28
Question approach
You are checking if an integer is equal to a set. You want to check if the integer is in the set. By the way, the sets you use are wrong (fixed here) and february may have 29 days (not fixed in this solution).
for month in range(1, 13):
if month in {1, 3, 5, 7, 8, 10, 12}:
days = 31
elif month in {4, 6, 9, 11}:
days = 30
else :
days = 28
print(f"{month:2}: {days}")
1: 31
2: 28
3: 31
4: 30
5: 31
6: 30
7: 31
8: 31
9: 30
10: 31
11: 30
12: 31
Calendar approach
Another solution is to use the calendar module which fixed the 29 days on february issue.
import calendar
for month in range(1, 13):
days = calendar.monthrange(2020, month)[1]
print(f"{month:2}: {days}")
1: 31
2: 29
3: 31
4: 30
5: 31
6: 30
7: 31
8: 31
9: 30
10: 31
11: 30
12: 31

Calculate the next and the 3rd business date from a given date

I am trying to implement a function to calculate the next and 3rd business day from a given date (ideally taking into account some given holidays)
def day_of_week(year, month, day):
t = [0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4]
year -= month < 3
return (year + int(year/4) - int(year/100) + int(year/400) + t[month-1] + day) % 7
The input is in the format YYYYMMDD with 21th of March 2018 written as 20180321 and the output date should be in the same format.
I`m trying to do something like this but I realised this is not the best practice
def leap_year(year):
if(year%4==0 and year%100!=0):
return True
elif (year%4==0 and year%100==0 and year%400==0):
return True
else:
return False
def business_day(year, month, day):
if (month==12):
if(day_of_week(year, month, day)<5 and day_of_week(year, month, day)>0 and day==31):
return str(year+1)+str(0)+str(1)+str(0)+str(1)
elif (day_of_week(year, month, day)<5 and day_of_week(year, month, day)>0 and day!=31):
newDay="0"
if(day<10):
newDay = newDay + str(day+1)
else:
newDay = str(day+1)
return str(year) + str(month) + newDay
elif (day_of_week(year, month, day)>=5 and day==31):
if(day_of_week(year, month, day)==5):
return str(year+1)+"01"+"03"
if (day_of_week(year, month, day) == 6):
return str(year + 1) + "01" + "02"
if (day_of_week(year, month, day) == 0):
return str(year + 1) + "01" + "01"
elif (day_of_week(year, month, day)>=5 and day==30):
if((day_of_week(year, month, day)==5)):
return str(year + 1) + "01" + "02"
if ((day_of_week(year, month, day) == 6)):
return str(year + 1) + "01" + "01"
if ((day_of_week(year, month, day) == 0)):
return str(year + 1) + str(month) + str(day+1)
Can`t use any libraries in the solution. Thanks for the help
No libraries! I had some fun learning Python. Did you? :-)
def day_of_week(year, month, day):
t = [0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4]
year -= month < 3
dw = (year + year // 4 - year // 100 + year // 400 + t[month-1] + day) % 7
return [6, 0, 1, 2, 3, 4, 5][dw] # To be consistent with 'datetime' library
def leap_year(year):
leap = year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
return True if leap else False
def valid_day(year, month, day):
month_list = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
if year < 1 or year > 9999 or month < 1 or month > 12:
return False
m = month_list[month-1] if month != 2 or not leap_year(year) else 29
return True if 1 <= day <= m else False
class Career(Exception):
def __str__(self): return 'So I became a waiter...'
MAX_DATE_AND_DAYS_INT = 365 * 100
class Date:
# raise ValueError
def __init__(self, year, month, day):
if not valid_day(year, month, day):
raise Career()
self.y, self.m, self.d = year, month, day
#classmethod
def fromstring(cls, s):
s1, s2, s3 = int(s[0:4]), int(s[4:6]), int(s[6:8])
return cls(s1, s2, s3)
def __repr__(self) -> str:
return '%04d%02d%02d' % (self.y, self.m, self.d)
def weekday_date(self) -> str:
names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
return names[self.weekday()] + ' ' + str(self)
def next_day(self):
if valid_day(self.y, self.m, self.d + 1):
return Date(self.y, self.m, self.d + 1)
elif valid_day(self.y, self.m + 1, 1):
return Date(self.y, self.m + 1, 1)
elif valid_day(self.y + 1, 1, 1):
return Date(self.y + 1, 1, 1)
else:
raise Career
def weekday(self):
return day_of_week(self.y, self.m, self.d)
def __add__(self, other):
"Add a Date to an int."
if isinstance(other, int):
if other < 1 or other > MAX_DATE_AND_DAYS_INT:
raise OverflowError("int > MAX_DATE_AND_DAYS_INT")
new_date = Date(self.y, self.m, self.d)
while other >= 1:
new_date = new_date.next_day()
other -= 1
return new_date
return NotImplemented
def next_working_day(self):
day = self.next_day()
while True:
while day.weekday() >= 5:
day = day.next_day()
holidays_list = year_holidays(day.y)
for str_day in holidays_list:
s2 = str(day)
if str_day == s2:
day = day.next_day()
break # for
if day.weekday() < 5:
break # while True
return day
def year_holidays(year):
holidays = [
["New Year's Day", 1, 1], # Fixed: January 1
["Birthday of Martin Luther King, Jr.", 1, 0, 0, 3], # Floating
["Washington's Birthday", 2, 0, 0, 3], # Third Monday in February
["Memorial Day", 5, 0, 0, 5], # Last Monday
["Independence Day", 7, 4],
["Labor Day", 9, 0, 0, 1],
["Columbus Day", 10, 0, 0, 2],
["Veterans Day", 11, 11],
["Thanksgiving Day", 11, 0, 3, 4],
["Christmas Day", 12, 25]
]
year_list = []
for h in holidays:
if h[2] > 0:
day = Date(year, h[1], h[2]) # Fixed day
else:
day = Date(year, h[1], 1) # Floating day
while h[3] != day.weekday(): # Advance to match the weekday
day = day.next_day()
count = 1
while count != h[4]: # Match the repetition of this day
next_week = day + 7
if next_week.m == day.m:
day = next_week
count += 1
year_list.append(str(day))
return year_list # return the holidays as list of strings
if __name__ == '__main__':
dates = [
['20190308', '20190311', '20190313'],
['20190309', '20190311', '20190313'],
['20190310', '20190311', '20190313'],
['20190311', '20190312', '20190314'],
['20190329', '20190401', '20190403'],
['20181231', '20190102', '20190104'],
['20190118', '20190122', '20190124'],
['20190216', '20190219', '20190221'],
['20190526', '20190528', '20190530'],
['20190703', '20190705', '20190709'],
['20190828', '20190829', '20190903'],
['20191010', '20191011', '20191016'],
['20191108', '20191112', '20191114'],
['20191125', '20191126', '20191129'],
['20191224', '20191226', '20191230'],
['20191227', '20191230', '20200102']]
print('\n Today Next and 3rd business day')
for days in dates:
today = Date.fromstring(days[0])
next_day = today.next_working_day()
third_day = next_day.next_working_day().next_working_day()
if str(next_day) != days[1] or str(third_day) != days[2]:
print('*** ERROR *** ', end='')
else:
print(' ', end='')
print(today.weekday_date(), next_day.weekday_date(), third_day.weekday_date())
Output:
Today Next and 3rd business day
Fri 20190308 Mon 20190311 Wed 20190313
Sat 20190309 Mon 20190311 Wed 20190313
Sun 20190310 Mon 20190311 Wed 20190313
Mon 20190311 Tue 20190312 Thu 20190314
Fri 20190329 Mon 20190401 Wed 20190403
Mon 20181231 Wed 20190102 Fri 20190104
Fri 20190118 Tue 20190122 Thu 20190124
Sat 20190216 Tue 20190219 Thu 20190221
Sun 20190526 Tue 20190528 Thu 20190530
Wed 20190703 Fri 20190705 Tue 20190709
Wed 20190828 Thu 20190829 Tue 20190903
Thu 20191010 Fri 20191011 Wed 20191016
Fri 20191108 Tue 20191112 Thu 20191114
Mon 20191125 Tue 20191126 Fri 20191129
Tue 20191224 Thu 20191226 Mon 20191230
Fri 20191227 Mon 20191230 Thu 20200102
import datetime
example = '20180321'
# you can parse the time string directly to a datetime object
next_buisness_day = datetime.datetime.strptime(example, '%Y%m%d')
# specify the increment based on the day of the week or any
#other condition
increment = 1
print('day day is', next_buisness_day.weekday())
# if friday
if next_buisness_day.weekday() == 4:
increment = 3
# if saturday
elif next_buisness_day.weekday() == 5:
increment = 2
next_buisness_day += datetime.timedelta(days=increment)
# and convert back to whatever format you like
print('{:%Y%m%d}'.format(next_buisness_day))
Have a look at the datetime module you can do all sorts of things with it.
https://docs.python.org/3/library/datetime.html
I use few functions from 'datetime' library. You can have fun to write them: date(y, m, d), timedelta(days=7), day,weekday(), '{:%Y%m%d}'.format(day), strptime(input, '%Y%m%d'), strftime(datetime, '%a %x'). Good idea is to create a class for date and get rid from all format conversions. So, only date(y, m, d), timedelta(days=7), day, weekday() will be left for exercise.
import datetime
from datetime import date, timedelta
def day2string(day):
return '{:%Y%m%d}'.format(day)
def year_holidays(year):
holidays = [
["New Year's Day", 1, 1], # Fixed: January 1
["Birthday of Martin Luther King, Jr.", 1, 0, 0, 3], # Floating
["Washington's Birthday", 2, 0, 0, 3], # Third Monday in February
["Memorial Day", 5, 0, 0, 5], # Last Monday
["Independence Day", 7, 4],
["Labor Day", 9, 0, 0, 1],
["Columbus Day", 10, 0, 0, 2],
["Veterans Day", 11, 11],
["Thanksgiving Day", 11, 0, 3, 4],
["Christmas Day", 12, 25]
]
year_list = []
for h in holidays:
if h[2] > 0:
day = date(year, h[1], h[2]) # Fixed day
else:
day = date(year, h[1], 1) # Floating day
while h[3] != day.weekday(): # Advance to match the weekday
day += timedelta(days=1)
count = 1
while count != h[4]: # Match the repetition of this day
next_week = day + timedelta(days=7)
if next_week.month == day.month:
day = next_week
count += 1
year_list.append(day2string(day))
return year_list # return the holidays as list of strings
def str2datetime(string):
return datetime.datetime.strptime(string, '%Y%m%d')
def next_working_day(string):
day = str2datetime(string)
day += timedelta(days=1)
while True:
while day.weekday() >= 5:
day += timedelta(days=1)
holidays_list = year_holidays(day.year)
for str_day in holidays_list:
s2 = day2string(day)
if str_day == s2:
day += timedelta(days=1)
break # for
if day.weekday() < 5:
break # while True
return day2string(day)
if __name__ == '__main__':
dates = [
['20190308', '20190311', '20190313'],
['20190309', '20190311', '20190313'],
['20190310', '20190311', '20190313'],
['20190311', '20190312', '20190314'],
['20190329', '20190401', '20190403'],
['20181231', '20190102', '20190104'],
['20190118', '20190122', '20190124'],
['20190216', '20190219', '20190221'],
['20190526', '20190528', '20190530'],
['20190703', '20190705', '20190709'],
['20190828', '20190829', '20190903'],
['20191010', '20191011', '20191016'],
['20191108', '20191112', '20191114'],
['20191125', '20191126', '20191129'],
['20191224', '20191226', '20191230'],
['20191227', '20191230', '20200102']]
print('\n Today Next and 3rd business day')
for days in dates:
next_day = next_working_day(days[0])
third_day = next_working_day(next_working_day(next_day))
if next_day != days[1] or third_day != days[2]:
print('*** ERROR *** ', end='')
else:
print(' ', end='')
def f(x): return datetime.datetime.strftime(str2datetime(x), '%a %x')
print(f(days[0]), f(next_day), f(third_day))
It should create the next output:
Today Next and 3rd business day
Fri 03/08/19 Mon 03/11/19 Wed 03/13/19
Sat 03/09/19 Mon 03/11/19 Wed 03/13/19
Sun 03/10/19 Mon 03/11/19 Wed 03/13/19
Mon 03/11/19 Tue 03/12/19 Thu 03/14/19
Fri 03/29/19 Mon 04/01/19 Wed 04/03/19
Mon 12/31/18 Wed 01/02/19 Fri 01/04/19
Fri 01/18/19 Tue 01/22/19 Thu 01/24/19
Sat 02/16/19 Tue 02/19/19 Thu 02/21/19
Sun 05/26/19 Tue 05/28/19 Thu 05/30/19
Wed 07/03/19 Fri 07/05/19 Tue 07/09/19
Wed 08/28/19 Thu 08/29/19 Tue 09/03/19
Thu 10/10/19 Fri 10/11/19 Wed 10/16/19
Fri 11/08/19 Tue 11/12/19 Thu 11/14/19
Mon 11/25/19 Tue 11/26/19 Fri 11/29/19
Tue 12/24/19 Thu 12/26/19 Mon 12/30/19
Fri 12/27/19 Mon 12/30/19 Thu 01/02/20

How to split day, hour, minute and second data in a huge Pandas data frame?

I'm new to Python and I'm working on a project for a Data Science class I'm taking. I have a big csv file (around 190 million lines, approx. 7GB of data) and I need, first, to do some data preparation.
Full disclaimer: data here is from this Kaggle competition.
A picture from Jupyter Notebook with headers follows. Although it reads full_data.head(), I'm using a 100,000-lines sample just to test code.
The most important column is click_time. The format is: dd hh:mm:ss. I want to split this in 4 different columns: day, hour, minute and second. I've reached a solution that works fine with this little file but it takes too long to run on 10% of real data, let alone on top 100% of real data (hasn't even been able to try that since just reading the full csv is a big problem right now).
Here it is:
# First I need to split the values
click = full_data['click_time']
del full_data['click_time']
click = click.str.replace(' ', ':')
click = click.str.split(':')
# Then I transform everything into integers. The last piece of code
# returns an array of lists, one for each line, and each list has 4
# elements. I couldn't figure out another way of making this conversion
click = click.apply(lambda x: list(map(int, x)))
# Now I transform everything into unidimensional arrays
day = np.zeros(len(click), dtype = 'uint8')
hour = np.zeros(len(click), dtype = 'uint8')
minute = np.zeros(len(click), dtype = 'uint8')
second = np.zeros(len(click), dtype = 'uint8')
for i in range(0, len(click)):
day[i] = click[i][0]
hour[i] = click[i][1]
minute[i] = click[i][2]
second[i] = click[i][3]
del click
# Transforming everything to a Pandas series
day = pd.Series(day, index = full_data.index, dtype = 'uint8')
hour = pd.Series(hour, index = full_data.index, dtype = 'uint8')
minute = pd.Series(minute, index = full_data.index, dtype = 'uint8')
second = pd.Series(second, index = full_data.index, dtype = 'uint8')
# Adding to data frame
full_data['day'] = day
del day
full_data['hour'] = hour
del hour
full_data['minute'] = minute
del minute
full_data['second'] = second
del second
The result is ok, it's what I want, but there has to be a faster way doing this:
Any ideas on how to improve this implementation? If one is interested in the dataset, this is from the test_sample.csv: https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data
Thanks a lot in advance!!
EDIT 1: Following #COLDSPEED request, I provide the results of full_data.head.to_dict():
{'app': {0: 12, 1: 25, 2: 12, 3: 13, 4: 12},
'channel': {0: 497, 1: 259, 2: 212, 3: 477, 4: 178},
'click_time': {0: '07 09:30:38',
1: '07 13:40:27',
2: '07 18:05:24',
3: '07 04:58:08',
4: '09 09:00:09'},
'device': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'ip': {0: 87540, 1: 105560, 2: 101424, 3: 94584, 4: 68413},
'is_attributed': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'os': {0: 13, 1: 17, 2: 19, 3: 13, 4: 1}}
Convert to timedelta and extract components:
v = df.click_time.str.split()
df['days'] = v.str[0].astype(int)
df[['hours', 'minutes', 'seconds']] = (
pd.to_timedelta(v.str[-1]).dt.components.iloc[:, 1:4]
)
df
app channel click_time device ip is_attributed os days hours \
0 12 497 07 09:30:38 1 87540 0 13 7 9
1 25 259 07 13:40:27 1 105560 0 17 7 13
2 12 212 07 18:05:24 1 101424 0 19 7 18
3 13 477 07 04:58:08 1 94584 0 13 7 4
4 12 178 09 09:00:09 1 68413 0 1 9 9
minutes seconds
0 30 38
1 40 27
2 5 24
3 58 8
4 0 9
One solution is to first split by whitespace, then convert to datetime objects, then extract components directly.
import pandas as pd
df = pd.DataFrame({'click_time': ['07 09:30:38', '07 13:40:27', '07 18:05:24',
'07 04:58:08', '09 09:00:09', '09 01:22:13',
'09 01:17:58', '07 10:01:53', '08 09:35:17',
'08 12:35:26']})
df[['day', 'time']] = df['click_time'].str.split().apply(pd.Series)
df['datetime'] = pd.to_datetime(df['time'])
df['day'] = df['day'].astype(int)
df['hour'] = df['datetime'].dt.hour
df['minute'] = df['datetime'].dt.minute
df['second'] = df['datetime'].dt.second
df = df.drop(['time', 'datetime'], 1)
Result
click_time day hour minute second
0 07 09:30:38 7 9 30 38
1 07 13:40:27 7 13 40 27
2 07 18:05:24 7 18 5 24
3 07 04:58:08 7 4 58 8
4 09 09:00:09 9 9 0 9
5 09 01:22:13 9 1 22 13
6 09 01:17:58 9 1 17 58
7 07 10:01:53 7 10 1 53
8 08 09:35:17 8 9 35 17
9 08 12:35:26 8 12 35 26

Print month using the month and day

I need to print month using the month and day. But I cannot seem to move the numbers after '1' to the next line using Python.
# This program shows example of "November" as month and "Sunday" as day.
month = input("Enter the month('January', ...,'December'): ")
day = input("Enter the start day ('Monday', ..., 'Sunday'): ")
n = 1
if month == "January" or month == "March" or month == "May" or month == "July" or month == "August" or month == "October" or month == "December":
x = 31
elif month == "February":
x = 28
else:
x = 30
print(month)
print("Mo Tu We Th Fr Sa Su")
if (day == "Sunday"):
print(" ", end='')
for i in range (1, 7):
for j in range (1, 8):
while n != x+1:
print('%2s' % n, end=' ')
n = n + 1
break
print()
Output looks like this:
November
Mo Tu We Th Fr Sa Su
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
Some changes.
Instead of having a nested loop, just have a single loop that prints all the dates. Then, inside that loop, make the decision about whether to end the line (if the date you just printed corresponded to a Sunday).
Also, the # of days in month look-up is a bit cleaner, and you now handle more "days" than just Sunday:
day = "Monday"
month = "March"
# Get the number of days in the months
if month in ["January", "March", "May", "July", "August", "October", "December"]:
x = 31
elif month in ["February"]:
x = 28
else:
x = 30
# Get the number of "blank spaces" we need to skip for the first week, and when to break
DAY_OFF = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
off = DAY_OFF.index(day)
print(month)
print("Mo Tu We Th Fr Sa Su")
# Print empty "cells" when the first day starts after Monday
for i in range(off):
print(" ", end=' ')
# Print days of the month
for i in range(x):
print("%2d" % (i+1), end=' ')
# If we just printed the last day of the week, print a newline
if (i + off) % 7 == 6: print()
March/Monday
March
Mo Tu We Th Fr Sa Su
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
March/Sunday
March
Mo Tu We Th Fr Sa Su
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
February/Sunday
February
Mo Tu We Th Fr Sa Su
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28
First problem I see in your code, is: why are you using an while and a break just after start it?
It seems that you only need an if statement, not a while.
Second, you're using the same logic for any line of your calendar, that means: They start on Monday and end on Sunday.
You should change the start point of your inner for loop for your first line, depending on the day that it starts.
A simple dictionary can hold the number associated with each day of the week and for the first week you use it as the start point of the for instead of 1.
And your code will work only for Monday and Sunday as the first day of the month.
To make it works for any first day you should change the way you print spaces, changing it depending on the first day.
The code with the changes:
month = 'November'
day = 'Sunday'
x = 30
n = 1
days = { 'Mo': 1, 'Tu': 2, 'We': 3, 'Th': 4, 'Fr': 5, 'Sa': 6, 'Su': 7 }
print(" "*(days[day[:2]]-1), end='') # print 3 spaces for each day that isn't the first day of the month
start = days[day[:2]] # Set the start of the inner loop to the first day of the month
for i in range (1, 7):
for j in range (start, 8):
start = 1
if n < x+1:
print('%2s' % n, end=' ')
n = n + 1
print()

Categories