Deleting elements in a list by their value as index - python

I have a list which is days in a time series that doesn't start at zero:
days = [2,3,4,5,...]
I have another list which is minutes of cloud on all days starting at day 0:
weather = [0,120,150,60,120,30,300,...]
I want to iterate through days, and remove it if the corresponding index in weather is greater than some value.
I've tried
downtime = 100
days_new = [x for i, x in enumerate(days) if weather[i] < downtime]
Which should then result in:
days_new = [3,5,...]
as the indexes removed (2,4) have a value greater than 100 in the list weather.
But it's removing them based off the index of days not its value as an index. I.e. this only works if my list starts at 0, not any integer greater than 0. How can I fix this?

Your requirement assumes weather is subscriptable by all the items in days, in which case you wouldn't need enumerate. Index weather directly with the indices in days:
days_new = [x for x in days if weather[x] < downtime]

If weather is given for all days and day from days is an index in weather:
downtime = 100
new_days = [day for day in days if weather[day] <= downtime]

You can try this
[x for x in days if weather[x] <= 100]
Output, for your input:
[3, 5]

You can try this way:
days = [2,3,4,5,...]
weather = [0,120,150,60,120,30,300,...]
downtime = 100
days_new = []
for x in days:
if weather[x] < downtime:
days_new.append(x)
# output :
days_new = [3, 5,...]

Related

Python: Selecting longest consecutive series of dates in list

I have a series of lists (np.arrays, actually), of which the elements are dates.
id
0a0fe3ed-d788-4427-8820-8b7b696a6033 [2019-01-30, 2019-01-31, 2019-02-01, 2019-02-0...
0a48d1e8-ead2-404a-a5a2-6b05371200b1 [2019-01-30, 2019-01-31, 2019-02-01, 2019-02-0...
0a9edba1-14e3-466a-8d0c-f8a8170cefc8 [2019-01-29, 2019-01-30, 2019-01-31, 2019-02-0...
Name: startDate, dtype: object
For each element in the series (i.e. for each list of dates), I want to retain the longest sublist in which all dates are consecutive. I'm struggling to approach this in a pythonic (simple/efficient) way. The only approach that I can think of is to use multiple loops: loop over the series values (the lists), and loop over each element in the list. I would then store the first date and the number of consecutive days, and use temporary values to overwrite the results if a longer sequence of consecutive days is encountered. This seems highly inefficient though. Is there a better way of doing this?
Since you mention you are using numpy arrays of dates it makes sense to stick to numpy types instead of converting to the built-in type. I'm assuming here that your arrays have dtype 'datetime64[D]'. In that case you could do something like
import numpy as np
date_list = np.array(['2005-02-01', '2005-02-02', '2005-02-03',
'2005-02-05', '2005-02-06', '2005-02-07', '2005-02-08', '2005-02-09',
'2005-02-11', '2005-02-12',
'2005-02-14', '2005-02-15', '2005-02-16', '2005-02-17',
'2005-02-19', '2005-02-20',
'2005-02-22', '2005-02-23', '2005-02-24',
'2005-02-25', '2005-02-26', '2005-02-27', '2005-02-28'],
dtype='datetime64[D]')
i0max, i1max = 0, 0
i0 = 0
for i1, date in enumerate(date_list):
if date - date_list[i0] != np.timedelta64(i1-i0, 'D'):
if i1 - i0 > i1max - i0max:
i0max, i1max = i0, i1
i0 = i1
print(date_list[i0max:i1max])
# output: ['2005-02-05' '2005-02-06' '2005-02-07' '2005-02-08' '2005-02-09']
Here, i0 and i1 indicate the start and stop indeces of the current sub-array of consecutive dates, and i0max and i1max the start and stop indices of the longest sub-array found so far. The solution uses the fact that the difference between the i-th and zeroth entry in a list of consecutive dates is exactly i days.
You can convert list to ordinals which are increasing for all consecutive dates. Which means next_date = previous_date + 1 read more.
Then find the longest consecutive sub-array.
This process will take O(n)->single loop time which is the most efficient way to get this.
CODE
from datetime import datetime
def get_consecutive(date_list):
# convert to ordinals
v = [datetime.strptime(d, "%Y-%m-%d").toordinal() for d in date_list]
consecutive = []
run = []
dates = []
# get consecutive ordinal sequence
for i in range(1, len(v) + 1):
run.append(v[i-1])
dates.append(date_list[i-1])
if i == len(v) or v[i-1] + 1 != v[i]:
if len(consecutive) < len(run):
consecutive = dates
dates = []
run = []
return consecutive
OUTPUT:
date_list = ['2019-01-29', '2019-01-30', '2019-01-31','2019-02-05']
get_consecutive(date_list )
# ordinales will be -> v = [737088, 737089, 737090, 737095]
OUTPUT:
['2019-01-29', '2019-01-30', '2019-01-31']
Now use get_consecutive in df.column.apply(get_consecutive)it will give you all increasing date list. Or you can all function for each list if you are using some other data structure.
I'm going to reduce this problem to finding consecutive days in a single list. There are a few tricks that make it more Pythonic as you ask. The following script should run as-is. I've documented how it works inline:
from datetime import timedelta, date
# example input
days = [
date(2020, 1, 1), date(2020, 1, 2), date(2020, 1, 4),
date(2020, 1, 5), date(2020, 1, 6), date(2020, 1, 8),
]
# store the longest interval and the current consecutive interval
# as we iterate through a list
longest_interval_index = current_interval_index = 0
longest_interval_length = current_interval_length = 1
# using zip here to reduce the number of indexing operations
# this will turn the days list into [(2020-01-1, 2020-01-02), (2020-01-02, 2020-01-03), ...]
# use enumerate to get the index of the current day
for i, (previous_day, current_day) in enumerate(zip(days, days[1:]), start=1):
if current_day - previous_day == timedelta(days=+1):
# we've found a consecutive day! increase the interval length
current_interval_length += 1
else:
# nope, not a consecutive day! start from this day and start
# counting from 1
current_interval_index = i
current_interval_length = 1
if current_interval_length > longest_interval_length:
# we broke the record! record it as the longest interval
longest_interval_index = current_interval_index
longest_interval_length = current_interval_length
print("Longest interval index:", longest_interval_index)
print("Longest interval: ", days[longest_interval_index:longest_interval_index + longest_interval_length])
It should be easy enough to turn this into a reusable function.

Is there a way in Python to start a variable at 0 and then increment by 1 while in a for loop?

Below is a section of my code and I am attempting to start with day at 0 then 1, 2, 3, 4 so there are 5 total days, just starting at 0. Is there an easy way to do this because at the moment I am only having days 1, 2, 3, 4, 5 since I have day = day + 1 which doesn't allow to have a day of 0? Sorry if this is a silly question, I am still relatively new to learning Python.
density = np.zeros((6, 91, 181))
day = 0
for i,e in df.iterrows():
lat = int((e['Latitude']+90)/2)
long = int(e['Longitude']/2)
if lat == 0.0 and long == 0.0:
day = day + 1
print(day)
density[day,lat,long] = e['rho']
range() produces an iterable object which you can use for this purpose, starting at 0 and ending at whatever value you specify:
density = np.zeros((6, 91, 181))
days = range(5) # assign days to be an iterator, e.g. range()
for i,e in df.iterrows():
lat = int((e['Latitude']+90)/2)
long = int(e['Longitude']/2)
if lat == 0.0 and long == 0.0:
day = next(days) # assign day by popping the first value from that iterator
print(day)
density[day,lat,long] = e['rho']
If you instead want an infinite list of numbers ascending from zero, you can make your own infinite number generator:
def inf_ints():
i = 0
while True:
yield i
i += 1
...
days = inf_ints()
...
Not really sure what you want to achieve, but if you'd initialise day with -1 instead of 0, wouldn't that solve your problem?

Iteration through a list

I'm very new to Python hence this question.
I have a list that represents dates i.e. Mondays in March and beginning of April
[2, 9, 16, 23, 30, 6]
The list, 'color_sack' is created from a scrape of our local council website.
Im using
next_rubbish_day = next(x for x in color_sack if x > todays_date.day)
todays_date.day returns just the number representing the day i.e. 30
This has worked well all month until today 30th when it now displays a error
next_rubbish_day = next(x for x in color_sack if x > todays_date.day)
StopIteration
Is it possible to step through the list a better way so as next_rubbish_day would populate the 6 after the 30 from the list above.
I can see why its not working but can't work out a better way.
When April starts the list will be updated with the new dates for Mondays in April through to the beginning of May
Consider, if your current month is march and corresponding list of dates is [2, 9, 16, 23, 30, 6] and today's date is 30, basically what we are doing is :
Checking if there is any date in color_sack that is greater than
today's date if it is then we yield that date. In our case no date in the list is greater than 30.
If the 1st condition fails we now find out the index of maximum date in the color_sack, in our case the max date is 30 and its index is 4, now we found out if there is a idx greater than the index of maximum date in the list, if it is then we return that date.
This algorithm will comply with any dates in the current month eg March. As soon as the new month starts eg. "April starts the list will be updated with the new dates for Mondays in April through to the beginning of May".
So this algorithm will always comply.
Try this:
def next_rubbish_day(color_sack, todays_date):
for idx, day in enumerate(color_sack):
if day > todays_date or idx > color_sack.index(max(color_sack)):
yield day
print(next(next_rubbish_day(color_sack, 6)))
print(next(next_rubbish_day(color_sack, 10)))
print(next(next_rubbish_day(color_sack, 21)))
print(next(next_rubbish_day(color_sack, 30)))
print(next(next_rubbish_day(color_sack, 31)))
OUTPUT:
9
16
23
6
6
next takes an optional default that is returned when the iterable is empty. If color_sack consistently has the first-of-next-month day in the last position, return it as a default:
next_rubbish_day = next(
(x for x in color_sack[:-1] if x > todays_date.day),
color_sack[-1],
)
Note that this scheme will not tell you whether you rolled over. It will only tell you the next date is 6th, not 6th of April versus 6th of March.
To avoid the magic indices, consider splitting your list explicitly and giving proper names to each part.
*this_month, fallback_day = color_sack
next_rubbish_day = next(
(day for day in this_month if day > todays_date.day),
fallback_day,
)
If you need to be month-aware, handle the StopIteration explicitly:
try:
day = next(x for x in color_sack[:-1] if x > todays_date.day)
except StopIteration:
day = color_sack[-1]
month = 'next'
else:
month = 'this'
print(f'Next date is {day} of {month} month')
Thank you for the help, Ive used MisterMiyagi snippet as that seems to work at the moment.
Here is the full code:
import datetime
import requests
import calendar
from bs4 import BeautifulSoup
from datetime import date
def ord(n): # returns st, nd, rd and th
return str(n) + (
"th" if 4 <= n % 100 <= 20 else {
1: "st", 2: "nd", 3: "rd"}.get(n % 10, "th")
)
# Scrapes rubbish collection dates
URL = "https://apps.castlepoint.gov.uk/cpapps/index.cfm?roadID=2767&fa=wastecalendar.displayDetails"
raw_html = requests.get(URL)
data = BeautifulSoup(raw_html.text, "html.parser")
pink = data.find_all('td', class_='pink', limit=3)
black = data.find_all('td', class_='normal', limit=3)
month = data.find('div', class_='calMonthCurrent')
# converts .text and strip [] to get month name
month = str((month.text).strip('[]'))
todays_date = datetime.date.today()
print()
# creats sack lists
pink_sack = []
for div in pink:
n = div.text
pink_sack.append(n)
pink_sack = list(map(int, pink_sack))
print(f"Pink list {pink_sack}")
black_sack = []
for div in black:
n = div.text
black_sack.append(n)
black_sack = list(map(int, black_sack))
print(f"Black list {black_sack}")
# creats pink/black list
color_sack = []
color_sack = [None]*(len(pink_sack)+len(black_sack))
color_sack[::2] = pink_sack
color_sack[1::2] = black_sack
print(f"Combined list {color_sack}")
print()
print()
# checks today for rubbish
if todays_date.day in color_sack:
print(f"Today {(ord(todays_date.day))}", end=" ")
if todays_date.day in pink_sack:
print("is pink")
elif todays_date.day in black_sack:
print("is black")
# Looks for the next rubbish day
next_rubbish_day = next(
(x for x in color_sack[:-1] if x > todays_date.day),
color_sack[-1],
)
# gets day number
day = calendar.weekday(
(todays_date.year), (todays_date.month), (next_rubbish_day))
# print(next_rubbish_day)
print(f"Next rubbish day is {(calendar.day_name[day])} the {(ord(next_rubbish_day))}" +
(" and is Pink" if next_rubbish_day in pink_sack else " and is Black"))
print()
Theres probable so many more efficient ways of doing this, so Im open to suggestions and always learning.

Find value and index in panda series where the value increased 5 times

In a panda series it should go through the series and stop if one value has increased 5 times. With a simple example it works so far:
list2 = pd.Series([2,3,3,4,5,1,4,6,7,8,9,10,2,3,2,3,2,3,4])
def cut(x):
y = iter(x)
for i in y:
if x[i] < x[i+1] < x[i+2] < x[i+3] < x[i+4] < x[i+5]:
return x[i]
break
out = cut(list2)
index = list2[list2 == out].index[0]
So I get the correct Output of 1 and Index of 5.
But if I use a second list with series type and instead of (19,) which has (23999,) values then I get the Error:
pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 3489660928
You can do something like this:
# compare list2 with the previous values
s = list2.gt(list2.shift())
# looking at last 5 values
s = s.rolling(5).sum()
# select those equal 5
list2[s.eq(5)]
Output:
10 9
11 10
dtype: int64
The first index where it happens is
s.eq(5).idxmax()
# output 10
Also, you can chain them together:
(list2.gt(list2.shift())
.rolling(5).sum()
.eq(5).idxmax()
)

python tuple over writing previous data

I am trying to create a function that will start the loop and add a day to current day count, it will ask 3 questions then combine that data to equal Total_Output. I then want 'n' to represent the end of the tuple, and in the next step add the Total_Output to the end of the tuple. But when I run the function it seems like it is creating a new tuple.
Example:
Good Morninghi
This is Day: 1
How much weight did you use?40
How many reps did you do?20
How many sets did you do?6
Day: 1
[4800.0]
This is Day: 2
How much weight did you use?50
How many reps did you do?20
How many sets did you do?6
Day: 2
[6000.0, 6000.0]
This is Day: 3
How much weight did you use?40
How many reps did you do?20
How many sets did you do?6
Day: 3
[4800.0, 4800.0, 4800.0]
failed
Here is the function:
def Start_Work(x):
Num_Days = 0
Total_Output = 0
Wght = 0
Reps = 0
Sets = 0
Day = []
while x == 1 and Num_Days < 6: ##will be doing in cycles of 6 days
Num_Days += 1 ##increase day count with each loop
print "This is Day:",Num_Days
Wght = float(raw_input("How much weight did you use?"))
Reps = float(raw_input("How many reps did you do?"))
Sets = float(raw_input("How many sets did you do?"))
Total_Output = Wght * Reps * Sets
n = Day[:-1] ##go to end of tuple
Day = [Total_Output for n in range(Num_Days)] ##add data (Total_Output to end of tuple
print "Day:",Num_Days
print Day
else:
print "failed"
Input = raw_input("Good Morning")
if Input.lower() == str('hi') or str('start') or str('good morning'):
Start_Work(1)
else:
print "Good Bye"
n = Day[:-1] ##go to end of tuple
Day = [Total_Output for n in range(Num_Days)] ##add data (Total_Output to end of tuple
Does not do what you think it does. You assign n but never use it (the n in the loop is assigned by the for n in), and it only hold a list of the end of the Day variable.
You then set Day to be [Total_Output] * Num_Days, so you make a new list of Num_Days occurrences of Total_Output.
You want:
Day.append(Total_Output)
to replace both of those lines.

Categories