problem with if else loop for a month and days problem - python

why do I get the wrong output for the code below, I've put the output below
(some might suggest using date-time module, I'm going with this method due to some complications with the main program)
months = [1,2,3,4,5,6,7,8,9,10,11,12]
for month in months:
if month == {1,3,5,7,9,11}:
days= 31
print(days)
elif month == {4,6,8,10,12}:
days = 30
print(days)
else :
days = 28
print(days)
I get this output
28
28
28
28
28
28
28
28
28
28
28
28

Question approach
You are checking if an integer is equal to a set. You want to check if the integer is in the set. By the way, the sets you use are wrong (fixed here) and february may have 29 days (not fixed in this solution).
for month in range(1, 13):
if month in {1, 3, 5, 7, 8, 10, 12}:
days = 31
elif month in {4, 6, 9, 11}:
days = 30
else :
days = 28
print(f"{month:2}: {days}")
1: 31
2: 28
3: 31
4: 30
5: 31
6: 30
7: 31
8: 31
9: 30
10: 31
11: 30
12: 31
Calendar approach
Another solution is to use the calendar module which fixed the 29 days on february issue.
import calendar
for month in range(1, 13):
days = calendar.monthrange(2020, month)[1]
print(f"{month:2}: {days}")
1: 31
2: 29
3: 31
4: 30
5: 31
6: 30
7: 31
8: 31
9: 30
10: 31
11: 30
12: 31

Related

Python - Basic Leap year function problem (Novice)

So basically my problem is, I have a list from 2020 to 2030 and my program said every year is a leap year.
My variables are:
yearList = [2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030]
monthList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
daysOfMonth = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
def create_calendar(yearList: list, monthList: list, daysOfMonth: list, numbOfShootingStars: int):
calendar = {}
for year in yearList:
# Create year
calendar[year] = {}
for month in monthList:
# If the list don't have 12 months, it won't loop through all months
if month == 12 and month + 1 == 1: break;
else:
# Create monthly data
calendar[year][month] = {}
# If February is a leap year, it will have 29 days instead of 28
if month == 2 and year % 4 == 0:
daysOfMonth[month - 1] = 29
# Create the days and daily data
for day in range(1, daysOfMonth[monthList.index(month)] + 1):
calendar[year][month][day] = numbOfShootingStars
return calendar
Thank you for your help!
1 question, is it possible to use a list like this for this method?
monthList = [
{1, 'January'},
{2, 'February'},
{3, 'March'},
{4, 'April'},
{5, 'May'},
{6, 'June'},
{7, 'July'},
{8, 'August'},
{9, 'September'},
{10, 'October'},
{11, 'November'},
{12, 'December'}]
Then how should I modify my code because I couldn't get it work :(
Okay, I think I solved it, the problem was, I changed to value in the daysOfMonth list at February to 29 and then it stays like that. With this if - else I changed back to it's original state when it is not a leap year.
# If February is a leap year, it will have 29 days instead of 28
if month == 2 and year % 4 == 0:
daysOfMonth[month - 1] = 29
else:
daysOfMonth[month - 1] = 28
If you take a year to be 365.2425 days, this is 365 + 1/4 - 1/100 + 1/400. This explains why in the long run you stick to the correct value by adding a whole day every fourth year, but not on every century, but on every fourth century (starting from 0).
A possible implementation is, to tell if the year Y is leap:
Floor(365.2425 Y) - Floor(365.2425 (Y-1)) - 365

Python elif to nested if

I have made this program using elif statements and am just curious on how it would be done using nested ifs instead.
The user has to input a number and the program will tell them which month it is and how many days are in it.
month_num = int(input("Enter the number of a month (Jan = 1) : "))
if month_num == 1:
print(month_num, "is Feburary, and has 29 days.")
elif month_num == 2:
print(month_num, "is January, and has 31 days.")
elif month_num == 3:
print(month_num, "is March, and has 31 days.")
elif month_num == 4:
print(month_num, "is April, and has 30 days.")
elif month_num == 5:
print(month_num, "is May, and has 31 days.")
elif month_num == 6:
print(month_num, "is June, and has 30 days.")
elif month_num == 7:
print(month_num, "is July, and has 31 days.")
elif month_num == 8:
print(month_num, "is August, and has 31 days.")
elif month_num == 9:
print(month_num, "is September, and has 30 days.")
elif month_num == 10:
print(month_num, "is october, and has 31 days.")
elif month_num == 11:
print(month_num, "is November, and has 30 days.")
elif month_num == 12:
print(month_num, "is december, and has 31 days.")
else:
print(month_num, "Is not a valid number")
Neither are a good solution. You'd be better off creating a dictionary.
Assuming f-strings are available (Python >= 3.6). If not these can be easily converted to use .format:
month_num = int(input("Enter the number of a month (Jan = 1) : "))
d = {1: ('January', 31),
2: ('February', 29),
...
}
try:
month_name, num_of_days = d[month_num]
print(f'{month_num} is {month_name}, and has {num_of_days} days')
except KeyError:
print(month_num, "Is not a valid number")
Also note that February does not always have 29 days.
You can use calendar, i.e.:
import calendar as cal
from datetime import date
m = int(input("Enter the number of a month (Jan = 1) : "))
if m in range(1,13):
print(f"{m} is {cal.month_name[m]} and has {cal.monthrange(date.today().year, m)[1]} days.")
1 is January and has 31 days.
2 is February and has 29 days.
3 is March and has 31 days.
4 is April and has 30 days.
5 is May and has 31 days.
6 is June and has 30 days.
7 is July and has 31 days.
8 is August and has 31 days.
9 is September and has 30 days.
10 is October and has 31 days.
11 is November and has 30 days.
12 is December and has 31 days.
Demo

Implementation of Plotly on pandas dataframe from pyspark transformation

I'd like to produce plotly plots using pandas dataframes. I am struggling on this topic.
Now, I have this:
AGE_GROUP shop_id count_of_member
0 10 1 40
1 10 12 57615
2 20 1 186
4 30 1 175
5 30 12 322458
6 40 1 171
7 40 12 313758
8 50 1 158
10 60 1 168
Some shop might not have a record. As an example, plotly will need x=[1,2,3], y=[4,5,6]. If my input is x=[1,2,3] and y=[4,5], then x and y is not the same size and an exception will be raised. I need to add a null value record for the missing shop_id. So, I need this:
AGE_GROUP shop_id count_of_member
0 10 1 40
1 10 12 57615
2 20 1 186
3 20 12 0
4 30 1 175
5 30 12 322458
6 40 1 171
7 40 12 313758
8 50 1 158
9 50 12 0
10 60 1 168
11 60 12 0
For each age_group, I need to have 2 shop_id since the unique set of shop_id is 1 and 12
if there are 10 age_group, 20 rows will be shown.
For example:
AGE_GROUP shop_id count_of_member
1 10 12 57615
2 20 1 186
3 30 1 175
4 40 1 171
5 40 12 313758
6 50 1 158
7 60 1 168
there are 2 unique shop_id: 1 and 12 and 6 different age_group: 10,20,30,40,50,60
in age_group 10: only shop_id 12 is exists but no shop_id 1.
So, I need to have a new record to show the count_of_member of age_group 10 of shop_id 1 is 0.
The finally dataframe i will get should be:
AGE_GROUP shop_id count_of_member
1 10 12 57615
**1 10 1 0**
2 20 1 186
**2 20 12 0**
3 30 1 175
**3 30 12 0**
4 40 1 171
5 40 12 313758
6 50 1 158
**6 50 12 0**
7 60 12 0
7 60 1 168
** are the new added rows
How can i implement this transformation?
How can i implement this transformation?
First of all, you don't have to.
When used correctly, plotly has got a wide array of approaches where you can visualize your dataset as it is when your data look like yours in the third sample:
AGE_GROUP shop_id count_of_member
1 10 12 57615
2 20 1 186
3 30 1 175
4 40 1 171
5 40 12 313758
6 50 1 158
7 60 1 168
There's no need to apply pandas to get to the structure of the fourth sample. You're not to clear on what you'd like to do with this sample, but I suspect you'd like to show the accumulated count_of_member per age group split by shop_id like this?
You may wonder why the blue bars for shop_id1 isn't showing. But that's just because the size of the numbers are so hugely different. If you replace the miniscule count_of_member for shop_id=1 to something more comparable for those of shop_id=12, you'll get this instead:
Below is a complete code snippet where the altered dataset has been commented out. The dataset used is still the same as in your third data sample.
Complete code:
# imports
import plotly.graph_objects as go
import pandas as pd
data = {'AGE_GROUP': {0: 10, 1: 10, 2: 20, 4: 30, 5: 30, 6: 40, 7: 40, 8: 50, 10: 60},
'shop_id': {0: 1, 1: 12, 2: 1, 4: 1, 5: 12, 6: 1, 7: 12, 8: 1, 10: 1},
'count_of_member': {0: 40,
1: 57615,
2: 186,
4: 175,
5: 322458,
6: 171,
7: 313758,
8: 158,
10: 168}}
## Optional dataset
# data = {'AGE_GROUP': {0: 10, 1: 10, 2: 20, 4: 30, 5: 30, 6: 40, 7: 40, 8: 50, 10: 60},
# 'shop_id': {0: 1, 1: 12, 2: 1, 4: 1, 5: 12, 6: 1, 7: 12, 8: 1, 10: 1},
# 'count_of_member': {0: 40,
# 1: 57615,
# 2: 186000,
# 4: 175000,
# 5: 322458,
# 6: 171000,
# 7: 313758,
# 8: 158000,
# 10: 168000}}
# # Create DataFrame
df = pd.DataFrame(data)
# Manage shop_id
shops = df['shop_id'].unique()
# set up plotly figure
fig = go.Figure()
# add one trace per NAR type and show counts per hospital
for shop in shops:
# subset dataframe by shop_id
df_ply=df[df['shop_id']==shop]
# add trace
fig.add_trace(go.Bar(x=df_ply['AGE_GROUP'], y=df_ply['count_of_member'], name='shop_id'+str(shop)))
fig.show()
EDIT:
If you for some reason still need to structure your data as in your fourth sample, I suggest that you raise another question and specifically tag it with [pandas] and [python] only, and exclusively focus on the data transformation part of the question.

How to split day, hour, minute and second data in a huge Pandas data frame?

I'm new to Python and I'm working on a project for a Data Science class I'm taking. I have a big csv file (around 190 million lines, approx. 7GB of data) and I need, first, to do some data preparation.
Full disclaimer: data here is from this Kaggle competition.
A picture from Jupyter Notebook with headers follows. Although it reads full_data.head(), I'm using a 100,000-lines sample just to test code.
The most important column is click_time. The format is: dd hh:mm:ss. I want to split this in 4 different columns: day, hour, minute and second. I've reached a solution that works fine with this little file but it takes too long to run on 10% of real data, let alone on top 100% of real data (hasn't even been able to try that since just reading the full csv is a big problem right now).
Here it is:
# First I need to split the values
click = full_data['click_time']
del full_data['click_time']
click = click.str.replace(' ', ':')
click = click.str.split(':')
# Then I transform everything into integers. The last piece of code
# returns an array of lists, one for each line, and each list has 4
# elements. I couldn't figure out another way of making this conversion
click = click.apply(lambda x: list(map(int, x)))
# Now I transform everything into unidimensional arrays
day = np.zeros(len(click), dtype = 'uint8')
hour = np.zeros(len(click), dtype = 'uint8')
minute = np.zeros(len(click), dtype = 'uint8')
second = np.zeros(len(click), dtype = 'uint8')
for i in range(0, len(click)):
day[i] = click[i][0]
hour[i] = click[i][1]
minute[i] = click[i][2]
second[i] = click[i][3]
del click
# Transforming everything to a Pandas series
day = pd.Series(day, index = full_data.index, dtype = 'uint8')
hour = pd.Series(hour, index = full_data.index, dtype = 'uint8')
minute = pd.Series(minute, index = full_data.index, dtype = 'uint8')
second = pd.Series(second, index = full_data.index, dtype = 'uint8')
# Adding to data frame
full_data['day'] = day
del day
full_data['hour'] = hour
del hour
full_data['minute'] = minute
del minute
full_data['second'] = second
del second
The result is ok, it's what I want, but there has to be a faster way doing this:
Any ideas on how to improve this implementation? If one is interested in the dataset, this is from the test_sample.csv: https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data
Thanks a lot in advance!!
EDIT 1: Following #COLDSPEED request, I provide the results of full_data.head.to_dict():
{'app': {0: 12, 1: 25, 2: 12, 3: 13, 4: 12},
'channel': {0: 497, 1: 259, 2: 212, 3: 477, 4: 178},
'click_time': {0: '07 09:30:38',
1: '07 13:40:27',
2: '07 18:05:24',
3: '07 04:58:08',
4: '09 09:00:09'},
'device': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'ip': {0: 87540, 1: 105560, 2: 101424, 3: 94584, 4: 68413},
'is_attributed': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'os': {0: 13, 1: 17, 2: 19, 3: 13, 4: 1}}
Convert to timedelta and extract components:
v = df.click_time.str.split()
df['days'] = v.str[0].astype(int)
df[['hours', 'minutes', 'seconds']] = (
pd.to_timedelta(v.str[-1]).dt.components.iloc[:, 1:4]
)
df
app channel click_time device ip is_attributed os days hours \
0 12 497 07 09:30:38 1 87540 0 13 7 9
1 25 259 07 13:40:27 1 105560 0 17 7 13
2 12 212 07 18:05:24 1 101424 0 19 7 18
3 13 477 07 04:58:08 1 94584 0 13 7 4
4 12 178 09 09:00:09 1 68413 0 1 9 9
minutes seconds
0 30 38
1 40 27
2 5 24
3 58 8
4 0 9
One solution is to first split by whitespace, then convert to datetime objects, then extract components directly.
import pandas as pd
df = pd.DataFrame({'click_time': ['07 09:30:38', '07 13:40:27', '07 18:05:24',
'07 04:58:08', '09 09:00:09', '09 01:22:13',
'09 01:17:58', '07 10:01:53', '08 09:35:17',
'08 12:35:26']})
df[['day', 'time']] = df['click_time'].str.split().apply(pd.Series)
df['datetime'] = pd.to_datetime(df['time'])
df['day'] = df['day'].astype(int)
df['hour'] = df['datetime'].dt.hour
df['minute'] = df['datetime'].dt.minute
df['second'] = df['datetime'].dt.second
df = df.drop(['time', 'datetime'], 1)
Result
click_time day hour minute second
0 07 09:30:38 7 9 30 38
1 07 13:40:27 7 13 40 27
2 07 18:05:24 7 18 5 24
3 07 04:58:08 7 4 58 8
4 09 09:00:09 9 9 0 9
5 09 01:22:13 9 1 22 13
6 09 01:17:58 9 1 17 58
7 07 10:01:53 7 10 1 53
8 08 09:35:17 8 9 35 17
9 08 12:35:26 8 12 35 26

Print month using the month and day

I need to print month using the month and day. But I cannot seem to move the numbers after '1' to the next line using Python.
# This program shows example of "November" as month and "Sunday" as day.
month = input("Enter the month('January', ...,'December'): ")
day = input("Enter the start day ('Monday', ..., 'Sunday'): ")
n = 1
if month == "January" or month == "March" or month == "May" or month == "July" or month == "August" or month == "October" or month == "December":
x = 31
elif month == "February":
x = 28
else:
x = 30
print(month)
print("Mo Tu We Th Fr Sa Su")
if (day == "Sunday"):
print(" ", end='')
for i in range (1, 7):
for j in range (1, 8):
while n != x+1:
print('%2s' % n, end=' ')
n = n + 1
break
print()
Output looks like this:
November
Mo Tu We Th Fr Sa Su
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
Some changes.
Instead of having a nested loop, just have a single loop that prints all the dates. Then, inside that loop, make the decision about whether to end the line (if the date you just printed corresponded to a Sunday).
Also, the # of days in month look-up is a bit cleaner, and you now handle more "days" than just Sunday:
day = "Monday"
month = "March"
# Get the number of days in the months
if month in ["January", "March", "May", "July", "August", "October", "December"]:
x = 31
elif month in ["February"]:
x = 28
else:
x = 30
# Get the number of "blank spaces" we need to skip for the first week, and when to break
DAY_OFF = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
off = DAY_OFF.index(day)
print(month)
print("Mo Tu We Th Fr Sa Su")
# Print empty "cells" when the first day starts after Monday
for i in range(off):
print(" ", end=' ')
# Print days of the month
for i in range(x):
print("%2d" % (i+1), end=' ')
# If we just printed the last day of the week, print a newline
if (i + off) % 7 == 6: print()
March/Monday
March
Mo Tu We Th Fr Sa Su
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
March/Sunday
March
Mo Tu We Th Fr Sa Su
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
February/Sunday
February
Mo Tu We Th Fr Sa Su
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28
First problem I see in your code, is: why are you using an while and a break just after start it?
It seems that you only need an if statement, not a while.
Second, you're using the same logic for any line of your calendar, that means: They start on Monday and end on Sunday.
You should change the start point of your inner for loop for your first line, depending on the day that it starts.
A simple dictionary can hold the number associated with each day of the week and for the first week you use it as the start point of the for instead of 1.
And your code will work only for Monday and Sunday as the first day of the month.
To make it works for any first day you should change the way you print spaces, changing it depending on the first day.
The code with the changes:
month = 'November'
day = 'Sunday'
x = 30
n = 1
days = { 'Mo': 1, 'Tu': 2, 'We': 3, 'Th': 4, 'Fr': 5, 'Sa': 6, 'Su': 7 }
print(" "*(days[day[:2]]-1), end='') # print 3 spaces for each day that isn't the first day of the month
start = days[day[:2]] # Set the start of the inner loop to the first day of the month
for i in range (1, 7):
for j in range (start, 8):
start = 1
if n < x+1:
print('%2s' % n, end=' ')
n = n + 1
print()

Categories