np.random.randint: ValueError: low >= high - python

getting a high low error on this. Works with another 1D dataframe and dtype but not on this. I'm trying to create a list of means taken randomly from a 1D list. Thanks!
new = []
for x in df:
sb_ = np.random.randint(x, size=100).mean()
new.append(sb_)

numpy.random documentation here. If only one input parameter is given as in your case, that input is the lower end of the range, and high is taken as 0. So, if any x in df is less than 0 (we'll say x', this code tries to draw a random integer from the range [0,x'], which is an empty range (no possible values). Thus your high-low error.
Based on your code and description it's a bit unclear exactly what you're trying to accomplish but with a bit more detail I can probably help you work out the correct code.

Related

Using variable names for 2d matrix elements for readability

While solving Leetcode problems I've been trying to make my answers as easily intelligible as possible, so I can quickly glance at them later and make sense of them. Toward that end I assigned variable names to indices of interest in a 2D list. When I see "matrix[i][j+1]" and variations thereof repeatedly, I sometimes lose track of what I'm dealing with.
So, for this problem: https://leetcode.com/problems/maximal-square/
I wrote this code:
class Solution:
def maximalSquare(self, matrix: List[List[str]]) -> int:
maximum = 0
for y in range(len(matrix)):
for x in range(len(matrix[0])):
#convert to integer from string
matrix[y][x] = int(matrix[y][x])
#use variable for readability
current = matrix[y][x]
#build largest square counts by checking neighbors above and to left
#so, skip anything in first row or first column
if y!=0 and x!=0 and current==1:
#assign variables for readability. We're checking adjacent squares
left = matrix[y][x-1]
up = matrix[y-1][x]
upleft = matrix[y-1][x-1]
#have to use matrix directly to set new value
matrix[y][x] = current = 1 + min(left, up, upleft)
#reevaluate maximum
if current > maximum:
maximum = current
#return maximum squared, since we're looking for largest area of square, not largest side
return maximum**2
I don't think I've seen people do this before and I'm wondering if it's a bad idea, since I'm sort of maintaining two versions of a value.
Apologies if this is a "coding style" question and therefore just a matter of opinion, but I thought there might be a clear answer that I just haven't found yet.
It is very hard to give a straightforward answer, because it might vary from person to person. Let me start from your queries:
When I see "matrix[i][j+1]" and variations thereof repeatedly, I sometimes lose track of what I'm dealing with.
It depends. People who have moderate programming knowledge should not be confused by seeing a 2-D matrix in matrix[x-pos][y-pos] shape. Again, if you don't feel comfortable, you can use the way you have shared here. But, you should try to adopt and be familiar with this type of common concepts parallelly.
I don't think I've seen people do this before and I'm wondering if it's a bad idea, since I'm sort of maintaining two versions of a value.
It is not a bad idea at all. It is "Okay" as long as you are considering to do this for your comfort. But, if you like to share your code with others, then it might not be a very good idea to use something that is too obvious. It might reduce the understandability of your code to others. But, you should not worry with the maintaining two versions of a value, as long as the extra memory is constant.
Apologies if this is a "coding style" question and therefore just a matter of opinion, but I thought there might be a clear answer that I just haven't found yet.
You are absolutely fine by asking this question. As you mentioned, it is really just a matter of opinion. You can follow some standard language guideline like Google Python Style Guide. It is always recommended to follow some standards for this type of coding style things. Always keep in mind, a piece of good code is always self-documented and putting unnecessary comments sometimes make it boring. Also,
Here I have shared my version of your code. Feel free to comment if you have any question.
# Time: O(m*n)
# Space: O(1)
class Solution:
def maximalSquare(self, matrix: List[List[str]]) -> int:
"""Given an m x n binary matrix filled with 0's and 1's,
find the largest square containing only 1's and return its area.
Args:
matrix: An (m x n) string matrix.
Returns:
Area of the largest square containing only 1's.
"""
maximum = 0
for x in range(len(matrix)):
for y in range(len(matrix[0])):
# convert current matrix cell value from string to integer
matrix[x][y] = int(matrix[x][y])
# build largest square side by checking neighbors from up-row and left-column
# so, skip the cells from the first-row and first-column
if x != 0 and y != 0 and matrix[x][y] != 0:
# update current matrix cell w.r.t. the left, up and up-left cell values respectively
matrix[x][y] = 1 + min(matrix[x][y-1], matrix[x-1][y], matrix[x-1][y-1])
# re-evaluate maximum square side
if matrix[x][y] > maximum:
maximum = matrix[x][y]
# returning the area of the largest square
return maximum**2

Efficient way doing comparizons and arithmetic in list with irregular dimensions , Python

I'm trying to find an efficient way to transform an array in the following way:
Each element will get transformed into either None, a real number, or a tuple/list/array of size 2 (contaning real numbers).
The transformation function im using is simple and just does number comparizons. So my first thought is to use np.where to make for fast comparizons. Now, if the transformation is a None or a real number, i have no problems.
But, when the transformation is a tuple/list/array, np.where gives me errors. This is ofc because numpy arrays demand regularity of dimensions. So now im forced to work with lists...
So my idea now is to, instead of tranforming the element into a tuple/list/Array of size 2, i transform it into a complex number. But then i have an Array of complex numbers containing mostly numbers with 0 imaginary part, (since most transformations will be None or real numbers). I cant afford this, memory speaking. (or?)
When i have the transformation list/Array/whatever, i will be doing sign operations and arithmetic btw its elements and comparizons again, thats why i would like to keep it being a numpy Array.
Am I forced to work with lists in this scenario or would you do something else?
EDIT:
Im asked to give certaing examples of my transformation:
Input: an array contaning elements with values None or real numbers btw [0,360)
Transformation: (simplified):
None goes to None
element in [0,45) goes to 2 real numbers (left,right), say 2 random real numbers btw 0 and element.
element in [45,360) goes to 1 real number
what i do is for example:
arrayTransformed = np.where((array>=0) & (array<45), transform(array), array)
#this gives problems ofc
arrayTransformed = np.where((array>=45) & (array<=360), transform(array), arrayTransformed )

Random Choice is out of list range in Python?

I am trying to create a program that will guess the number you entered. I was trying to get the computer to get the number in as few guesses as possible so I am doing something like a binary search. I keep getting a Index out of range when I run the code and it gets to total = total[the_low_one:computer2_guess]. I am confused why it is out of range. Should I be adding or subtracting one to the_low_one each time it hits a new low so it stays in range? I would appreciate any help as I am lost. Thanks a bunch in advance!
Code:
def second_computer():
global computer2_score
computer2_score=0
computer2_guess= -1
the_low_one=1
the_high_one=1000000
total= range(1,1000000)
while computer2_guess != pick_number:
computer2_guess=random.choice(total)
if computer2_guess>pick_number:
total=total[the_low_one:computer2_guess]
the_high_one=computer2_guess-1
else:
total=total[computer2_guess:the_high_one]
the_low_one=computer2_guess+1
computer2_score+=1
As total shrinks, the numerical values in the list no longer line up with their indices. You could make it work by doing
total=total[total.index(the_low_one):total.index(the_high_one)]
A simpler approach would be to do away with total altogether and set
computer2_guess=random.randint(the_low_one,the_high_one)

fill missing values in python array

Using: Python 2.7.1 on Windows
Hello, I fear this question has a very simple answer, but I just can't seem to find an appropriate and efficient solution (I have limited python experience). I am writing an application that just downloads historic weather data from a third party API (wundergorund). The thing is, sometimes there's no value for a given hour (eg, we have 20 degrees at 5 AM, no value for 6 AM, and 21 degrees at 7 AM). I need to have exactly one temperature value in any given hour, so I figured I could just fit the data I do have and evaluate the points I'm missing (using SciPy's polyfit). That's all cool, however, I am having problems handling my program to detect if the list has missing hours, and if so, insert the missing hour and calculate a temperature value. I hope that makes sense..
My attempt at handling the hours and temperatures list is the following:
from scipy import polyfit
# Evaluate simple cuadratic function
def tempcal (array,x):
return array[0]*x**2 + array[1]*x + array[2]
# Sample data, note it has missing hours.
# My final hrs list should look like range(25), with matching temperatures at every point
hrs = [1,2,3,6,9,11,13,14,15,18,19,20]
temps = [14.0,14.5,14.5,15.4,17.8,21.3,23.5,24.5,25.5,23.4,21.3,19.8]
# Fit coefficients
coefs = polyfit(hrs,temps,2)
# Cycle control
i = 0
done = False
while not done:
# It has missing hour, insert it and calculate a temperature
if hrs[i] != i:
hrs.insert(i,i)
temps.insert(i,tempcal(coefs,i))
# We are done, leave now
if i == 24:
done = True
i += 1
I can see why this isn't working, the program will eventually try to access indexes out of range for the hrs list. I am also aware that modifying list's length inside a loop has to be done carefully. Surely enough I am either not being careful enough or just overlooking a simpler solution altogether.
In my googling attempts to help myself I came across pandas (the library) but I feel like I can solve this problem without it, (and I would rather do so).
Any input is greatly appreciated. Thanks a lot.
When I is equal 21. It means twenty second value in list. But there is only 21 values.
In future I recommend you to use PyCharm with breakpoints for debug. Or try-except construction.
Not sure i would recommend this way of interpolating values. I would have used the closest points surrounding the missing values instead of the whole dataset. But using numpy your proposed way is fairly straight forward.
hrs = np.array(hrs)
temps = np.array(temps)
newTemps = np.empty((25))
newTemps.fill(-300) #just fill it with some invalid data, temperatures don't go this low so it should be safe.
#fill in original values
newTemps[hrs - 1] = temps
#Get indicies of missing values
missing = np.nonzero(newTemps == -300)[0]
#Calculate and insert missing values.
newTemps[missing] = tempcal(coefs, missing + 1)

Algorithm to calculate point at which to round values in an array up or down in order to least affect the mean

Consider array random array of values between 0 and 1 such as:
[0.1,0.2,0.8,0.9]
is there a way to calculate the point at which the values should be rounded down or up to an integer in order to match the mean of the un-rounded array the closest? (in above case it would be at the mean but that is purely a coincidence)
or is it just trial and error?
im coding in python
thanks for any help
Add them up, then round the sum. That's how many 1s you want. Round so you get that many 1s.
def rounding_point(l):
# if the input is sorted, you don't need the following line
l = sorted(l)
ones_needed = int(round(sum(l)))
# this may require adjustment if there are duplicates in the input
return 1.0 if ones_needed == len(l) else l[-ones_needed]
If sorting the list turns out to be too expensive, you can use a selection algorithm like quickselect. Python doesn't come with a quickselect function built in, though, so don't bother unless your inputs are big enough that the asymptotic advantage of quickselect outweighs the constant factor advantage of the highly-optimized C sorting algorithm.

Categories