I am trying to check if a numpy array contains a specific value:
>>> x = np.linspace(-5,5,101)
>>> x
array([-5. , -4.9, -4.8, -4.7, -4.6, -4.5, -4.4, -4.3, -4.2, -4.1, -4. ,
-3.9, -3.8, -3.7, -3.6, -3.5, -3.4, -3.3, -3.2, -3.1, -3. , -2.9,
-2.8, -2.7, -2.6, -2.5, -2.4, -2.3, -2.2, -2.1, -2. , -1.9, -1.8,
-1.7, -1.6, -1.5, -1.4, -1.3, -1.2, -1.1, -1. , -0.9, -0.8, -0.7,
-0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0. , 0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5,
1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6,
2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7,
3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
4.9, 5. ])
>>> -5. in x
True
>>> a = 0.2
>>> a
0.2
>>> a in x
False
I assigned a constant to variable a. It seems that the precision of a is not compatible with the elements in the numpy array generated by np.linspace().
I've searched the docs, but didn't find anything about this.
This is not a question of the precision of np.linspace, but rather of the type of the elements in the generated array.
np.linspace generates elements which, conceptually, equally divide the input range between them. However, these elements are then stored as floating point numbers with limited precision, which makes the generation process itself appear to lack precision.
By passing the dtype argument to np.linspace, you can specify the precision of the floating point type used to store its result, which can increase the apparent precision of the generation process.
Nevertheless, you should not use the equality operator to compare floating point numbers. Instead, use np.isclose in conjunction with np.ndarray.any, or some equivalent:
>>> floats_64 = np.linspace(-5, 5, 101, dtype='float64')
>>> floats_128 = np.linspace(-5, 5, 101, dtype='float128')
>>> print(0.2 in floats_64)
False
>>> print(floats_64[52])
0.20000000000000018
>>> print(np.isclose(0.2, floats_64).any()) # check if any element in floats_64 is close to 0.2
True
>>> print(0.2 in floats_128)
False
>>> print(floats_128[52])
0.20000000000000017764
>>> print(np.isclose(0.2, floats_128).any()) # check if any element in floats_128 is close to 0.2
True
Related
I am trying to change 1 row in my heatmap to a different color
here is the dataset:
m = np.array([[ 0.7, 1.4, 0.2, 1.5, 1.7, 1.2, 1.5, 2.5],
[ 1.1, 2.5, 0.4, 1.7, 2. , 2.4, 2. , 3.2],
[ 0.9, 4.4, 0.7, 2.3, 1.6, 2.3, 2.6, 3.3],
[ 0.8, 2.1, 0.2, 1.8, 2.3, 1.9, 2. , 2.9],
[ 0.9, 1.3, 0.8, 2.2, 1.8, 2.2, 1.7, 2.8],
[ 0.7, 0.9, 0.4, 1.8, 1.4, 2.1, 1.7, 2.9],
[ 1.2, 0.9, 0.4, 2.1, 1.3, 1.2, 1.9, 2.4],
[ 6.3, 13.5, 3.1, 13.4, 12.1, 13.3, 13.4, 20. ]])
data = pd.DataFrame(data = m)
Right now I am using seaborn heatmap, I can only create something like this:
cmap = sns.diverging_palette(240, 10, as_cmap = True)
sns.heatmap(data, annot = True, cmap = "Reds")
plt.show
I hope to change the color scheme of the last row, here is what I want to achieve (I did this in Excel):
Is it possible I achieve this in Python with seaborn heatmap? Thank you!
You can split in two, mask the unwanted parts, and plot separately:
# Reds
data1 = data.copy()
data1.loc[7] = float('nan')
ax = sns.heatmap(data1, annot=True, cmap="Reds")
# Greens
data2 = data.copy()
data2.loc[:6] = float('nan')
sns.heatmap(data2, annot=True, cmap="Greens")
output:
NB. you need to adapt the loc[…] parameter to your actual index names
Let's say we have CityName, Min-Temperature, Max-Temperature, Humidity of different cities.
We need an output dataframe grouped on CityName and want to generate 0.25, 0.5 and 0.75 quantiles. New column names would be OldColunmName + ('Q1)/('Q2')/('Q3').
Example INPUT
df = pd.DataFrame({'cityName': pd.Categorical(['a','a','a','a','b','b','b','b','a','a','a','a','b','b','b','b']),
'MinTemp': [1.1, 2.1, 3.1, 1.1, 2, 2.1, 2.2, 2.4, 2.5, 1.11, 1.31, 2.1, 1, 2, 2.3, 2.1],
'MaxTemp': [2.1, 4.2, 5.1, 2.13, 4, 3.1, 5.2, 3.4, 3.5, 2.11, 2.31, 3.1, 2, 4.3, 4.3, 3.1],
'Humidity': [0.29, 0.19, .45, 0.1, 0.1, 0.1, 0.2, 0.5, 0.11, 0.31, 0.1, .1, .2, 0.3, 0.3, 0.1]
})
OUTPUT
First Approach
First you have to group your data on the column you want which is 'cityName'. Then, because on each column you want to do multiple and different kinds of aggregations, you can use 'agg' function. For functions in the 'agg', you cannot give parameters so you define them as follow:
def quantile_50(x):
return x.quantile(0.5)
def quantile_25(x):
return x.quantile(0.25)
def quantile_75(x):
return x.quantile(0.75)
quantile_df = df.groupby('cityName').agg([quantile_25, quantile_50, quantile_75])
quantile_df
Second Approach
You can use describe method and select the statistics you need. By using idx you can choose which subindex to choose.
idx = pd.IndexSlice
df.groupby('cityName').describe().loc[:, idx[:, ['25%', '50%', '75%']]]
i'm trying to create an array which has 5 columns imported from a data file. The 4 of them are floats and the last one string.
The data file looks like this:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
I tried these:
data = np.genfromtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")
data = np.loadtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")
,but both codes import only the first column.
Why? How can i fix this?
Ty for your time! :)
You must specify correctly the str type : "U20" for exemple for 20 characters max :
data = np.loadtxt('data.txt', dtype = "float,"*4 + "U20", delimiter = ",")
seems to work :
array([( 5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
( 4.9, 3. , 1.4, 0.2, 'Iris-setosa'),
( 4.7, 3.2, 1.3, 0.2, 'Iris-setosa'),
( 4.6, 3.1, 1.5, 0.2, 'Iris-setosa'),
( 5. , 3.6, 1.4, 0.2, 'Iris-setosa'),
( 5.4, 3.9, 1.7, 0.4, 'Iris-setosa'),
( 4.6, 3.4, 1.4, 0.3, 'Iris-setosa'),
( 5. , 3.4, 1.5, 0.2, 'Iris-setosa')],
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<U20')])
An other method using pandas give you an object array, but this slow down further computations :
In [336]: pd.read_csv('data.txt',header=None).values
Out[336]:
array([[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
[4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
[4.7, 3.2, 1.3, 0.2, 'Iris-setosa'],
[4.6, 3.1, 1.5, 0.2, 'Iris-setosa'],
[5.0, 3.6, 1.4, 0.2, 'Iris-setosa'],
[5.4, 3.9, 1.7, 0.4, 'Iris-setosa'],
[4.6, 3.4, 1.4, 0.3, 'Iris-setosa'],
[5.0, 3.4, 1.5, 0.2, 'Iris-setosa']], dtype=object)
If I have an array:
StartArray=np.array([1, 2, 3, 1.4, 1.2, 0.6, 1.8, 1.5, 1.9, 2.2, 3, 4 ,2.3])
I would like to loop through this array starting with StartArray[0] and only keep values that are within +/- .5 of the last kept value to yield:
EndArray=[1, 1.4, 1.2, 1.5, 1.9, 2.2, 2.3]
This is what I have tried so far and the results don't make sense
StartArray=np.array([1, 2, 3, 1.4, 1.2, 0.6, 1.8, 1.5, 1.9, 2.2, 3, 4 ,2.3])
EndArray=np.empty_like(StartArray)
EndArray[0]=StartArray[0]
for i in range(len(StartArray)-1):
if EndArray[i]+.5>StartArray[i+1]>EndArray[i]-.5:
EndArray[i+1]=StartArray[i+1]
Out:
array([ 1. , 0.22559146, 0.13015365, 5.24910493, 0.63804761,
0.6 , 1.73143364, 1.5 , 1.9 , 2.2 ,
6.82525036, 0.61641556, 6.82325036])
List is the good structure for this job:
StartArray=np.array([1, 2, 3, 1.4, 1.2, 0.6, 1.8, 1.5, 1.9, 2.2, 3, 4 ,2.3])
ref=StartArray[0]
End=[]
for x in StartArray:
if abs(x- ref)<.5:
End.append(x)
ref=x
print(np.array(End))
[ 1. 1.4 1.2 1.5 1.9 2.2 2.3]
There are multiple problems with your approach. First, you're initializing EndArray to be the same size as StartArray, but that's not what you want your desired output to be. Instead, initialize EndArray to be an empty list and append values as your loop through StartArray. Secondly, you want the output values to be within 0.5 of the last kept value, so you need to keep track of this.
Adapting your code:
StartArray=np.array([1, 2, 3, 1.4, 1.2, 0.6, 1.8, 1.5, 1.9, 2.2, 3, 4 ,2.3])
EndArray=[]
last_kept = StartArray[0]
EndArray.append(last_kept)
for i in range(len(StartArray)-1):
if np.abs(StartArray[i+1] - last_kept) < 0.5:
last_kept = StartArray[i+1]
EndArray.append(last_kept)
# convert back to numpy array
EndArray = np.array(EndArray)
I have a numpy array([1.0, 2.0, 3.0]), which is actually a mesh in 1 dimension in my problem. What I want to do is to refine the mesh to get this: array([0.8, 0.9, 1, 1.1, 1.2, 1.8, 1.9, 2, 2.1, 2.2, 2.8, 2.9, 3, 3.1, 3.2,]).
The actual array is very large and this procedure costs a lot of time. How to do this quickly (maybe vectorize) in python?
Here's a vectorized approach -
(a[:,None] + np.arange(-0.2,0.3,0.1)).ravel() # a is input array
Sample run -
In [15]: a = np.array([1.0, 2.0, 3.0]) # Input array
In [16]: (a[:,None] + np.arange(-0.2,0.3,0.1)).ravel()
Out[16]:
array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
2.9, 3. , 3.1, 3.2])
Here are a few options(python 3):
Option 1:
np.array([j for i in arr for j in np.arange(i - 0.2, i + 0.25, 0.1)])
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Option 2:
np.array([j for x, y in zip(arr - 0.2, arr + 0.25) for j in np.arange(x,y,0.1)])
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Option 3:
np.array([arr + i for i in np.arange(-0.2, 0.25, 0.1)]).T.ravel()
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Timing on a larger array:
arr = np.arange(100000)
arr
# array([ 0, 1, 2, ..., 99997, 99998, 99999])
%timeit np.array([j for i in arr for j in np.arange(i-0.2, i+0.25, 0.1)])
# 1 loop, best of 3: 615 ms per loop
%timeit np.array([j for x, y in zip(arr - 0.2, arr + 0.25) for j in np.arange(x,y,0.1)])
# 1 loop, best of 3: 250 ms per loop
%timeit np.array([arr + i for i in np.arange(-0.2, 0.25, 0.1)]).T.ravel()
# 100 loops, best of 3: 1.93 ms per loop