Minimum window of days required to travel all cities - python

This is an interesting question that I came across in a coding challenge:
There are k cities and n days. A travel agent is going to show you city k on day n. You're supposed to find the minimum number of days in which you can visit all cities. You're also allowed to visit cities more than once, but ideally you wouldn't want to do that since you want to minimize the number of days.
Input :You're given an array of days and cities where days are indices and cities are values.
A=[7,4,7,3,4,1,7] So A[0]=7 means travel agent will show you city 7 on day 0, city 4 on day 1 etc.
So Here if you start out on day 0, you'll have visited all cities by day 5, but you can also start on day 2 and finish up on day 5.
Output:4 Because it took you 4 days to visit all the cities at least once
My solution : I do have an O(N^2) solution that tries out all combinations of cities. But the test said that the ideal time and space complexity should be O(N). How do I do this?
def findmin(A):
hashtable1={}
locationcount=0
#get the number of unique locations
for x in A:
if A[x] not in hashtable1:
locationcount+=1
index1=0
daycount=sys.maxint
hashtable2={}
#brute force
while index1<len(A):
index2=index1
prevday=index2
ans=0
count1=0
while index2<len(A):
if A[index2] not in hashtable2:
count1+=1
ans+=(index2-prevday)
hashtable2[A[index2]]=1
index2+=1
if count1==count:
daycount=min(ans,daycount)
hashtable2.clear()
index1+=1
return daycount+1

This problem might be solved with two-pointer approach.
Some data structure should contain element counts in current window. Perhaps your hash table is suitable.
Set left and right pointer to the start of list.
Move right pointer, incrementing table entries for elements like this:
hashtable2[A[rightindex]] = hashtable2[A[rightindex]] + 1
When all (locationcount) table entries become non-zero, stop moving right pointer. You have left-right interval covering all cities. Remember interval length.
Now move left pointer, decrementing table entries. When some table entry becomes zero, stop moving left pointer.
Move right pointer again. Repeat until the list end.
Note that indexes run the list only once, and complexity is linear (if table entry update is O(1), as hash map provides in average)

I had this problem in interview and failed as I thought about a moving windows too late. I took it a few days later and here is my C# solution which I think is O(n) (the array will be parsed at most 2 times).
The remaining difficulty after my flash was to understand how to update the end pointer. There's probably a better solution, my solution will always provide the highest possible starting and ending days even if the vacation could be started earlier.
public int solution(int[] A) {
if (A.Length is 0 or 1) {
return A.Length;
}
var startingIndex = 0;
var endingIndex = 0;
var locationVisitedCounter = new int[A.Length];
locationVisitedCounter[A[0] - 1] = 1;
for (var i=1; i<A.Length; i++)
{
var locationIndex = A[i] - 1;
locationVisitedCounter[locationIndex]++;
if (A[i] == A[i - 1])
{
continue;
}
endingIndex=i;
while (locationVisitedCounter[A[startingIndex] - 1] > 1)
{
locationVisitedCounter[A[startingIndex] - 1]--;
startingIndex++;
}
}
return endingIndex - startingIndex + 1;
}

I solved it using two-pointer approach, pointer i is for moving the pointer forward, pointer j is to move towards getting the optimal solution.
Time Complexity: O(2*N)
def solution(A):
n = len(A)
hashSet = dict()
max_count = len(set(A))
i = 0
j = 0
result = float("inf")
while i < n:
if A[i] in hashSet:
hashSet[A[i]] += 1
else:
hashSet[A[i]] = 1
if len(hashSet) == max_count:
result = min(result, i-j)
while len(hashSet) == max_count and j<=i:
hashSet[A[j]] -= 1
if hashSet[A[j]] == 0:
del hashSet[A[j]]
j+=1
if len(hashSet) < max_count:
break
result = min(result, i-j)
if result == max_count:
return result
j+=1
i+=1
return result

Python solution
def vacation(A):
# Get all unique vacation locations
v_set = set(A)
a_l = len(A)
day_count = 0
# Maximum days to cover all locations will be the length of the array
max_day_count = a_l
for i in range(a_l):
count = 0
v_set_copy = v_set.copy()
# Starting point to find next number of days
#that covers all unique locations
for j in range(i, a_l):
# Remove from set, if the location exists,
# meaning we have visited the location
if (A[j] in v_set_copy):
v_set_copy.remove(A[j])
else:
pass
count = count + 1
# If we have visited all locations,
# determine the current minimum days needed to visit all and break
if (len(v_set_copy) == 0):
day_count = min(count, max_day_count)
max_day_count = day_count
break
return day_count

from L = 0 move right until all distinct locations are visited; say R
maintain a map of element to frequency from L to R inclusive
until R == n - 1 OR L - R + 1 == distinct element count:-
Increase L, until we get an invalid window, i.e. map's element freq becomes 0
Increase R by 1 and update map.

For reference, this question kind of is related to leetcode question 76.Minimum Window Substring. You can watch the solution here NeetCode. My solution in python following the same tutorial.
def solution(A):
if not A: return
locations = dict()
for location in A:
locations[location] = 0
res,resLen = [-1,-1],float("infinity")
# left_pointer, right_pointer
lp,rp = 0,0
for rp in range(len(A)):
locations[A[rp]] = locations.get(A[rp],0) + 1
while (0 not in locations.values()):
if(rp - lp + 1) < resLen:
res = [lp,rp]
resLen = (rp-lp + 1)
locations[A[lp]] -= 1
lp += 1
lp,rp = res
return len(A[lp:rp+1]) if resLen != float("infinity") else 0
A = [7,4,7,3,4,1,7]
# A= [2,1,1,3,2,1,1,3]
# A = [7,3,2,3,1,2,1,7,7,1]
print(solution(A=A))

Although others have posted their answers, I think my solution is a little bit simpler and neater, I hope it helps.
from collections import defaultdict
from typing import List
# it is a sliding window problem
def min_days_to_visit_all_cities(arr: List[int]):
no_of_places = len(set(arr))
l, r = 0, 0
place_to_count = defaultdict(int)
res = len(arr)
while r < len(arr):
while r < len(arr) and len(place_to_count) < no_of_places:
place_to_count[arr[r]] += 1
r += 1
while len(place_to_count) >= no_of_places:
res = min(res, r - l)
place_to_count[arr[l]] -= 1
if place_to_count[arr[l]] == 0:
del place_to_count[arr[l]]
l += 1
return res

Related

leetcode two sum problem algorithm efficiency

Problem: Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
You can return the answer in any order.
Input: nums = [2,7,11,15],
target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].
Here's my code:
def twoSum(nums, target):
cnt = 0
i = 0
while cnt < len(nums):
temp = 0
if i == cnt:
i += 1
else:
temp = nums[cnt] + nums[i]
if temp == target and i < cnt:
return [i,cnt]
i += 1
if i == len(nums)-1:
i = 0
cnt += 1
The code seems to work fine for 55/57 test cases. But it doesn't work for really big input cases. But i don't understand why this is happening because i have used only one loop and the time complexity should be O(N) which is efficient enough to run in the given time. So any idea what am i missing? And what can i do to make the algorithm more efficient?
You can make a dictionary of the last position of the complement value of each number. Then use it to find the position of the value for which the complement exists in the list (at a greater index in case you have a value that is half the target):
nums = [2,7,11,15]
target = 9
pos = {target-n:i for i,n in enumerate(nums)}
sol = next([i,pos[n]] for i,n in enumerate(nums) if i<pos.get(n,i))
print(sol)
[0, 1]
This works in O(n) time and space
if we`re not talking about space complexity:
def search(values, target):
hashmap = {}
for i in range(len(values)):
current = values[i]
if target - current in hashmap:
return current, hahsmap[target - current]
hashmap[current] = i
return None
Your code isn't really O(n), it's actually O(n^2) in disguise.
You go through i O(n) times for each cnt (and then reset i back to 0), and go through cnt O(n) times.
For a more efficient algorithm, sites like this one (https://www.educative.io/edpresso/how-to-implement-the-two-sum-problem-in-python) have it down pretty well.
I am not sure of the time complexity but I think this solution will be better. p1 and p2 act as two pointers of indexes:
def twoSum(nums, target):
nums2 = nums[:]
nums2.sort()
p1 = 0
p2 = len(nums2)-1
while nums2[p1]+nums2[p2]!=target:
if nums2[p1]+nums2[p2]<target:
p1 += 1
elif nums2[p1]+nums2[p2]>target:
p2 -= 1
return nums.index(nums2[p1]), nums.index(nums2[p2])

How to count the number of unique numbers in sorted array using Binary Search?

I am trying to count the number of unique numbers in a sorted array using binary search. I need to get the edge of the change from one number to the next to count. I was thinking of doing this without using recursion. Is there an iterative approach?
def unique(x):
start = 0
end = len(x)-1
count =0
# This is the current number we are looking for
item = x[start]
while start <= end:
middle = (start + end)//2
if item == x[middle]:
start = middle+1
elif item < x[middle]:
end = middle -1
#when item item greater, change to next number
count+=1
# if the number
return count
unique([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5])
Thank you.
Edit: Even if the runtime benefit is negligent from o(n), what is my binary search missing? It's confusing when not looking for an actual item. How can I fix this?
Working code exploiting binary search (returns 3 for given example).
As discussed in comments, complexity is about O(k*log(n)) where k is number of unique items, so this approach works well when k is small compared with n, and might become worse than linear scan in case of k ~ n
def countuniquebs(A):
n = len(A)
t = A[0]
l = 1
count = 0
while l < n - 1:
r = n - 1
while l < r:
m = (r + l) // 2
if A[m] > t:
r = m
else:
l = m + 1
count += 1
if l < n:
t = A[l]
return count
print(countuniquebs([1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,5,5,5,5,5,5,5,5,5,5]))
I wouldn't quite call it "using a binary search", but this binary divide-and-conquer algorithm works in O(k*log(n)/log(k)) time, which is better than a repeated binary search, and never worse than a linear scan:
def countUniques(A, start, end):
len = end-start
if len < 1:
return 0
if A[start] == A[end-1]:
return 1
if len < 3:
return 2
mid = start + len//2
return countUniques(A, start, mid+1) + countUniques(A, mid, end) - 1
A = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,3,4,5,5,5,5,5,5,5,5,5,5]
print(countUniques(A,0,len(A)))

DP solution to find the maximum length of a contiguous subarray with equal number of 0 and 1

The question is from here https://leetcode.com/problems/contiguous-array/
Actually, I came up with a DP solution for this question.
However, It won't pass one test case.
Any thought?
DP[i][j] ==1 meaning from substring[i] to substring[j] is valid
Divide the question into smaller
DP[i][j]==1
- if DP[i+2][j]==1 and DP[i][i+1]==1
- else if DP[i][j-2]==1 and DP[j-1][j]==1
- else if num[i],num[j] == set([0,1]) and DP[i+1][j-1]==1
```
current_max_len = 0
if not nums:
return current_max_len
dp = [] * len(nums)
for _ in range(len(nums)):
dp.append([None] * len(nums))
for thisLen in range(2, len(nums)+1, 2):
for i in range(len(nums)):
last_index = i + thisLen -1
if i + thisLen > len(nums):
continue
if thisLen==2:
if set(nums[i:i+2]) == set([0, 1]):
dp[i][last_index] = 1
elif dp[i][last_index-2] and dp[last_index-1][last_index]:
dp[i][last_index] = 1
elif dp[i][i + 1] and dp[i + 2][last_index]:
dp[i][last_index] = 1
elif dp[i + 1][last_index-1] and set([nums[i], nums[last_index]]) == set([0, 1]):
dp[i][last_index] = 1
else:
dp[i][last_index] = 0
if dp[i][last_index] == 1:
current_max_len = max(current_max_len, thisLen)
return current_max_len
```
Here is a counter example [1, 1, 0, 0, 0, 0, 1, 1]. The problem with you solution that it requires a list to be composed of smaller valid lists of size n-1 or n-2 in this counter example it's two lists of length 4 or n-2 . -- SPOILER ALERT -- You can solve the problem by using other dp technique basically for every i,j you can find the number of ones and zeroes between them in constant time to do that just store the number of ones from the start of the list to every index i
here is python code
def func( nums):
track,has=0,{0:-1}
length=len(nums);
ress_max=0;
for i in range(0,length):
track += (1 if nums[i]==1 else -1)
if track not in has:
has[track]=i
elif ress_max <i-has[track]:
ress_max = i-has[track]
return ress_max
l = list(map(int,input().strip().split()))
print(func(l))
Since given length of binary string may be at most 50000. So, running O(n * n) algorithm may lead to time limit exceed. I would like to suggest you to solve it in O(n) time and space complexity. The idea is :
If we take any valid contiguous sub-sequence and perform summation of numbers treating 0 as -1 then, total summation should be zero always.
If we keep track of prefix summation then we can get zero summation in the range L to R, if prefix summation up to L - 1 and prefix summation up to R are equal.
Since we are looking for maximum length, we will always treat index of newly found summation as a first one and put it into hash map with value as current index and which will persist forever for that particular summation.
Every time we calculate cumulative summation, we look whether it has any previous occurrence. If it has previous occurrence we calculate length and try to maximize , otherwise it will be the first one and will persist forever in hash map with value as current index.
Note: To calculate pure prefix, we must treat summation 0 is already in map and paired with value -1 as index.
The sample code in C++ is as follow:
int findMaxLength(vector<int>& nums) {
unordered_map<int,int>lastIndex;
lastIndex[0] = -1;
int cumulativeSum = 0;
int maxLen = 0;
for (int i = 0; i < nums.size(); ++i) {
cumulativeSum += (nums[i] == 0 ? -1 : 1);
if (lastIndex.find(cumulativeSum) != lastIndex.end()) {
maxLen = max(maxLen, i - lastIndex[cumulativeSum]);
} else {
lastIndex[cumulativeSum] = i;
}
}
return maxLen;
}

Speeding up Python code that has to go through entire list

I have a problem where I need to (pretty sure at least) go through the entire list to solve. The question is to figure out the largest number of consecutive numbers in a list that add up to another (greater) element in that list. If there aren't any then we just take the largest value in the list as the candidate summation and 1 as the largest consecutive number of elements.
My general code works, but not too well for large lists (>500,000 elements). I am just looking for tips as to how I could approach the problem differently. My current approach:
L = [1,2,3,4,5,6,7,8,9,10]
candidate_sum = L[-1]
largest_count = 1
N = len(L)
i = 0
while i < N - 1:
s = L[i]
j = 0
while s <= (N - L[i + j + 1]):
j += 1
s += L[i+j]
if s in L and (j+1) > largest_count:
largest_count = j+1
candidate_sum = s
i+=1
In this case, the answer would be [1,2,3,4] as they add up to 10 and the length is 4 (obviously this example L is a very simple example).
I then made it faster by changing the initial while loop condition to:
while i < (N-1)/largest_count
Not a great assumption, but basic thinking that the distribution of numbers is somewhat uniform, so two numbers on the second half of the list are on average bigger than the final number in the list, and therefore are disqualified.
I'm just looking for:
possible bottlenecks
suggestions as to different approaches to try
Strictly ascending: no duplication of elements or subsequences, single possible solution
Arbitrary-spaced: no arithmetical shortcuts, has to operate brute-force
Efficient C implementation using pointer arithmetic, quasi polymorphic over numeric types:
#define TYPE int
int max_subsum(TYPE arr [], int size) {
int max_length = 1;
TYPE arr_fst = * arr;
TYPE* num_ptr = arr;
while (size --) {
TYPE num = * num_ptr++;
TYPE* lower = arr;
TYPE* upper = arr;
TYPE sum = arr_fst;
int length = 1;
for (;;) {
if (sum > num) {
sum -= * lower++;
-- length;
}
else if (sum < num) {
sum += * ++upper;
++ length;
}
else {
if (length > max_length) {
max_length = length;
}
break;
}
}
}
return max_length;
}
The main loop over the nums is parallelizable. Relatively straight-forward translation into Python 3 using the dynamic-array list type for arr and the for each loop:
def max_subsum(arr):
max_len = 1
arr_fst = arr[0]
for n in arr:
lower = 0
upper = 0
sum = arr_fst
while True:
if sum > n:
sum -= arr[lower]
lower += 1
elif sum < n:
upper += 1
sum += arr[upper]
else:
sum_len = upper - lower + 1
if sum_len > max_len:
max_len = sum_len
break
return max_len
This max_subsum is a partial function; Python lists can be empty. The algorithm is appropriate for C-like compiled imperative languages offering fast indexing and statically typed arithmetic. Both are comparatively expensive in Python. A (totally defined) algorithm rather similar to yours, using the set data type for more performant universal quantification, and avoiding Python's dynamically typed arithmetic, can be more efficiently interpreted:
def max_subsum(arr):
size = len(arr)
max_len = 0
arr_set = set(arr)
for i in range(size):
sum = 0
sum_len = 0
for j in range(i, size):
sum_mem = sum + arr[j]
if num_mem not in arr_set:
break
sum = sum_mem
sum_len += 1
if sum_len > max_len:
max_len = sum_len
return max_len
I'm going to ignore the possibility of a changing target value, and let you figure that out, but to answer your question "is there a faster way to do it?" Yes: by using cumulative sums and some math to eliminate one of your loops.
import numpy as np
L = np.random.randint(0,100,100)
L.sort()
cum_sum = np.cumsum(L)
start = 0
end = 0
target = 200
while 1:
total = cum_sum [end-1] - (cum_sum [start-1] if start else 0)
if total == target:
break
elif total < target:
end += 1
elif total > target:
start += 1
if end >= len(L):
raise ValueError('something informative')

Project Euler 14 code efficiency

l = [[i, i, 1] for i in range(1,1000000)]
def collatz(li):
for el in li:
if el[1] == 1:
li.remove(el)
elif el[1] % 2 == 0:
el[1] = el[1] / 2
el[2] += 1
elif el[1] % 2 == 1:
el[1] = 3*el[1] + 1
el[2] += 1
return li
while len(collatz(l)) >= 2:
l = collatz(l)
print l
Hi, this is a (partial) solution to Euler problem 14, written in Python.
Longest Collatz sequence
Problem 14
The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
I wrote partial because it does not really output the solution since I can't really run it in the whole 1 - 1000000 range. It's way too slow - taking more than 20 minutes the last time I killed the process. I have barely just started with python and programming in general (about 2 weeks) and I am looking to understand what's the obvious mistake I am making in terms of efficiency. I googled some solutions and even the average ones are orders of magnitude faster than mine. So what am I missing here? Any pointers to literature to avoid making the same mistakes in the future?
a little improvement upon sara's answer
import time
start = time.time()
def collatz(n):
k = n
length = 1
nList = []
nList.append(n)
while n != 1:
if n not in dic:
n = collatzRule(n)
nList.append(n)
length += 1
else:
# we dont need the values but we do need the real length for the for-loop
nList.extend([None for _ in range(dic[n] - 1)])
length = (length - 1) + dic[n]
break
for seq in nList:
if seq not in dic:
dic[seq] = len(nList) - nList.index(seq)
return length
def collatzRule(n):
if n % 2 == 0:
return n // 2
else:
return 3 * n + 1
longestLen = 0
longestNum = 0
dic = {}
for n in range(2, 1000001):
prsntLen = collatz(n)
if prsntLen > longestLen:
longestLen = prsntLen
longestNum = n
# print(f'{n}: {prsntLen}')
print(f'The starting num is: {longestNum} with the longest chain having: {longestLen} terms.')
print(f'time taken: {time.time() - start}')
Sara's answer is great, but can be more efficient.
If the value we return from the function is len(seq), why not just counting the number of iterations instead of conducting a list first?
I have changed the code slightly, and the performance improvement is significant
def collatz(x):
count = 1
temp = x
while temp > 1:
if temp % 2 == 0:
temp = int(temp/2)
if temp in has2: # calculate temp and check if in cache
count += has2[temp]
break
else:
count += 1
else:
temp = 3*temp + 1
if temp in has2:
count += has2[temp]
break
else:
count += 1
has2[x] = count
return count
837799 has 525 elements. calculation time =1.97099995613 seconds.
Compared to the original version
837799 has 525 elements. calculation time =11.3389999866 seconds.
Using list of int rather than building the whole list is ~80% faster.
the problem is you use brute force algorithm that is inefficient.this is my solution to problem 14 from project Euler. it takes a few second to run. the key is you should save previous results in a dictionary so you don't have to compute those results again.:
#problem 14 project euler
import time
start=time.time()
has2={}
def collatz(x):
seq=[]
seq.append(x)
temp=x
while(temp>1):
if temp%2==0:
temp=int(temp/2)
if temp in has2:
seq+=has2[temp]
break
else:
seq.append(temp)
else:
temp=3*temp+1
if temp in has2:
seq+=has2[temp]
break
else:
seq.append(temp)
has2[x]=seq
return len(seq)
num=0
greatest=0
for i in range(1000000):
c=collatz(i)
if num<c:
num=c
greatest=i
print('{0} has {1} elements. calculation time ={2} seconds.'.format(greatest,num,time.time()-start))
As #Sara says you could use dictionary to save previous results and then look them up for making program run faster. But I don't quite understand your results, taking more than 20 mins sounds like you have some problem.
By using bruteforce i get code to run about at 16 sec.
#!/bin/python3
########################
# Collatz Conjecture #
# Written by jeb 2015 #
########################
import time
current = 0
high = 0
# While number is not one, either divide it by 2
# or multiply with 3 and add one
# Returns number of iterations
def NonRecursiveCollatz(i):
counter = 1
while i != 1:
counter = counter + 1
if i%2 == 0:
i = i / 2
else:
i = 3*i + 1
return counter
time_start = time.time()
# Test all numbers between 1 and 1.000.000
# If number returned is higher than last one, store it nd remember
# what number we used as input to the function
for i in range(1,1000000):
current = NonRecursiveCollatz(i)
if current > high:
high = current
number = i
elapsed_time = time.time() - time_start
print "Highest chain"
print high
print "From number "
print number
print "Time taken "
print elapsed_time
With the output:
Highest chain
525
From number
837799
Time taken
16.730340004
//Longest Colletz Sequence
public class Problem14 {
static long getLength(long numb) {
long length = 0;
for(long i=numb; i>=1;) {
length++;
if(i==1)
break;
if(i%2==0)
i = i/2;
else
i = (3*i)+1;
}
return length;
}
static void solution(long numb) {
long number = numb;
long maxLength = getLength(number);
for(long i=numb; i>=1; i--) {
if(getLength(i)>=maxLength) {
maxLength = getLength(i);
number = i;
}
}
System.out.println("`enter code here`Length of "+number+" is : "+maxLength);
}
public static void main(String args[]) {
long begin = System.currentTimeMillis();
solution(1000000);
long end = System.currentTimeMillis();
System.out.println("Time : "+(end-begin));
}
}
output :
Length of 837799 is : 525
Time : 502

Categories