deleting rows by default value

deleting rows by default value - python

I have found code that i am interested in, on this forum.
But it's not working for my dataframe.
INPUT:
x , y ,value ,value2
1.0 , 1.0 , 12.33 , 1.23367543
2.0 , 2.0 , 11.5 , 1.1523123
4.0, 2.0 , 22.11 , 2.2112312
5.0, 5.0 , 78.13 , 7.8131239
6.0, 6.0 , 33.68 , 3.3681231
i need delete rows in distance between =1, and leave only one where is highest "value"
RESULT to get:
1.0 , 1.0 , 12.23 , 1.23367543
4.0, 2.0 , 22.11 , 2.2112312
5.0, 5.0 , 78.13 , 7.8131239
CODE:
def dist_value_comp(row):
x_dist = abs(df['y'] - row['y']) <= 1
y_dist = abs(df['x'] - row['x']) <= 1
xy_dist = x_dist & y_dist
max_value = df.loc[xy_dist, 'value2'].max()
return row['value2'] == max_value
df['keep_row'] = df.apply(dist_value_comp, axis=1)
df.loc[df['keep_row'], ['x', 'y','value', 'value2']]
PROBLEM:
When i am adding 4th columnvalue2 where valueshave more numbers after dot, code showing me only row with the highest value2 but result should be same as for value.
UPDATE:
it's working when i am using old pycharm and python 2.7 , on new version it's not, any idea why?

Related

How do I mask only the output (labelled data). I don't have any problem in input data

I have so many Nan values in my output data and I padded those values with zeros. Please don't suggest me to delete Nan or impute with any other no. I want model to skip those nan positions.
example:
x = np.arange(0.5, 30)
x.shape = [10, 3]
x = [[ 0.5 1.5 2.5]
[ 3.5 4.5 5.5]
[ 6.5 7.5 8.5]
[ 9.5 10.5 11.5]
[12.5 13.5 14.5]
[15.5 16.5 17.5]
[18.5 19.5 20.5]
[21.5 22.5 23.5]
[24.5 25.5 26.5]
[27.5 28.5 29.5]]
y = np.arange(2, 10, 0.8)
y.shape = [10, 1]
y[4, 0] = 0.0
y[6, 0] = 0.0
y[7, 0] = 0.0
y = [[2. ]
[2.8]
[3.6]
[4.4]
[0. ]
[6. ]
[0. ]
[0. ]
[8.4]
[9.2]]
I expect keras deep learning model to predict zeros for 5th, 7th and 8th row as similar to the padded value in 'y'.

Finding shortest sublist with sum greater than 50

I have a list and I want to find shortest sublist with sum greater than 50.
For example my list is
[8.4 , 10.3 , 12.9 , 8.2 , 13.7 , 11.2 , 11.3 ,10.4 , 4.2 , 3.3 , 4.0 , 2.1]
and I want to find shortest sublist so that its sum is more than 50.
Output Should be like [12.9 , 13.7 , 11.2 , 11.3, 10.4]

this is way bad solution (in term of not doing all graph serach and find optimum values ), but solution is correct
lis =[8.4 , 10.3 , 12.9 , 8.2 , 13.7 , 11.2 , 11.3 ,10.4 , 4.2 , 3.3 , 4.0 , 2.1]
from collections import defaultdict
dic = defaultdict(list)
for i in range(len(lis)):
dic[lis[i]]+=[i]
tmp_lis = lis.copy()
tmp_lis.sort(reverse=True)
res =[]
for i in tmp_lis:
if sum(res)>50 :
break
else:
res.append(i)
res1 = [(i,dic[i]) for i in res]
res1.sort(key=lambda x:x[1])
solution =[i[0] for i in res1]
output
[12.9, 13.7, 11.2, 11.3, 10.4]

O(n) solution for list of positive numbers
Provided your list cannot contain negative numbers, then there is a linear solution using two-pointers traversal.
Track the sum between both pointers. Increment the right pointer whenever the sum is below 50 and increment the left one otherwise.
This provides a sequence of pointers within which you will find the ones with minimal distance. It suffices to use min to get the smallest interval out of those.
Due to the behaviour of min, this will return the left-most sublist with minimal length if more than one solution exists.
Code
def intervals_generator(lst, bound):
i, j = 0, 0
sum_ = 0
while True:
try:
if sum_ <= bound:
sum_ += lst[j]
j += 1
else:
yield i, j
sum_ -= lst[i]
i += 1
except IndexError:
break
def smallest_sub_list(lst, bound):
i, j = min(intervals_generator(lst, bound), key=lambda x: x[1] - x[0])
return lst[i:j]
Examples
lst = [8.4 , 10.3 , 12.9 , 8.2 , 13.7 , 11.2 , 11.3 ,10.4 , 4.2 , 3.3 , 4.0 , 2.1]
print(smallest_sub_list(lst, 50)) # [8.4, 10.3, 12.9, 8.2, 13.7]
lst = [0, 10, 45, 55]
print(smallest_sub_list(lst, 50)) # [55]
Solution for general list of numbers
If the list can contain negative numbers then the above will not work and I believe there exists no solution more efficient than to iterate over all possible sublists.

Sort it in descending order and sum the first elements until you hit +50.0.
myList = [8.4 , 10.3 , 12.9 , 8.2 , 13.7 , 11.2 , 11.3 ,10.4 , 4.2 , 3.3 , 4.0 , 2.1]
mySublist = []
for i in sorted(myList, reverse=True):
mySublist.append(i)
if sum(mySublist) > 50.0:
break
print mySublist # [13.7, 12.9, 11.3, 11.2, 10.4]
Considering that what you want is the smallest sublist in size, and not the smallest in sum value.

If you are searching for any shortest sublist, this can be a solution (maybe to be optimized):
lst = [8.4 , 10.3 , 12.9 , 8.2 , 13.7 , 11.2 , 11.3 , 10.4 , 4.2 , 3.3 , 4.0 , 2.1]
def find_sub(lst, limit=50):
for l in range(1, len(lst)+1):
for i in range(len(lst)-l+1):
sub = lst[i:i+l]
if sum(sub) > limit:
return sub
>>> print(find_sub(lst))
Output:
[8.4, 10.3, 12.9, 8.2, 13.7]

How to normalize data in a text file while preserving the first variable

I have a text file with this format:
1 10.0e+08 1.0e+04 1.0
2 9.0e+07 9.0e+03 0.9
2 8.0e+07 8.0e+03 0.8
3 7.0e+07 7.0e+03 0.7
I would like to preserve the first variable of every line and to then normalize the data for all lines by the data on the first line. The end result would look something like;
1 1.0 1.0 1.0
2 0.9 0.9 0.9
2 0.8 0.8 0.8
3 0.7 0.7 0.7
so essentially, we are doing the following:
1 10.0e+08/10.0e+08 1.0e+04/1.0e+04 1.0/1.0
2 9.0e+07/10.0e+08 9.0e+03/1.0e+04 0.9/1.0
2 8.0e+07/10.0e+08 8.0e+03/1.0e+04 0.8/1.0
3 7.0e+07/10.0e+08 7.0e+03/1.0e+04 0.7/1.0
I'm still researching and reading on how to do this. I'll upload my attempt shortly. Also can anyone point me to a place where I can learn more about manipulating data files?

Read your file into a numpy array and use numpy broadcast feature:
import numpy as np
data = np.loadtxt('foo.txt')
data = data / data[0]
#array([[ 1. , 1. , 1. , 1. ],
# [ 2. , 0.09, 0.9 , 0.9 ],
# [ 2. , 0.08, 0.8 , 0.8 ],
# [ 3. , 0.07, 0.7 , 0.7 ]])
np.savetxt('new.txt', data)

numpy array converted to pandas dataframe drops values

I need to calculate statistics for each node of a 2D grid. I figured the easy way to do this was to take the cross join (AKA cartesian product) of two ranges. I implemented this using numpy as this function:
def node_grid(x_range, y_range, x_increment, y_increment):
x_min = float(x_range[0])
x_max = float(x_range[1])
x_num = (x_max - x_min)/x_increment + 1
y_min = float(y_range[0])
y_max = float(y_range[1])
y_num = (y_max - y_min)/y_increment + 1
x = np.linspace(x_min, x_max, x_num)
y = np.linspace(y_min, y_max, y_num)
ng = list(product(x, y))
ng = np.array(ng)
return ng, x, y
However when I convert this to a pandas dataframe it drops values. For example:
In [2]: ng = node_grid(x_range=(-60, 120), y_range=(0, 40), x_increment=0.1, y_increment=0.1)
In [3]: ng[0][(ng[0][:,0] > -31) & (ng[0][:,0] < -30) & (ng[0][:,1]==10)]
Out[3]: array([[-30.9, 10. ],
[-30.8, 10. ],
[-30.7, 10. ],
[-30.6, 10. ],
[-30.5, 10. ],
[-30.4, 10. ],
[-30.3, 10. ],
[-30.2, 10. ],
[-30.1, 10. ]])
In [4]: node_df = pd.DataFrame(ng[0])
node_df.columns = ['xx','depth']
print(node_df[(node_df.depth==10) & node_df.xx.between(-30,-31)])
Out[4]:Empty DataFrame
Columns: [xx, depth]
Index: []
The dataframe isn't empty:
In [5]: print(node_df.head())
Out[5]: xx depth
0 -60.0 0.0
1 -60.0 0.1
2 -60.0 0.2
3 -60.0 0.3
4 -60.0 0.4
values from the numpy array are being dropped when they are being put into the pandas array. Why?

the "between" function demands that the first argument be less than the latter.
In: print(node_df[(node_df.depth==10) & node_df.xx.between(-31,-30)])
xx depth
116390 -31.0 10.0
116791 -30.9 10.0
117192 -30.8 10.0
117593 -30.7 10.0
117994 -30.6 10.0
118395 -30.5 10.0
118796 -30.4 10.0
119197 -30.3 10.0
119598 -30.2 10.0
119999 -30.1 10.0
120400 -30.0 10.0
For clarity the product() function used comes from the itertools package, i.e., from itertools import product

I can't fully reproduce your code.
But I find the problem is that you have to turn the lower and upper boundaries around in the between query. The following works for me:
print(node_df[(node_df.depth==10) & node_df.xx.between(-31,-30)])
when using:
ng = np.array([[-30.9, 10. ],
[-30.8, 10. ],
[-30.7, 10. ],
[-30.6, 10. ],
[-30.5, 10. ],
[-30.4, 10. ],
[-30.3, 10. ],
[-30.2, 10. ],
[-30.1, 10. ]])
node_df = pd.DataFrame(ng)

MATLAB to Python Conversion Arrays

I have some code in MATLAB that I'm trying to convert into python. I know very little about python, so this is turning out to be a bit of a challenge.
Here's the MATLAB code:
xm_row = -(Nx-1)/2.0+0.5:(Nx-1)/2.0-0.5;
xm = xm_row(ones(Ny-1, 1), :);
ym_col = (-(Ny-1)/2.0+0.5:(Ny-1)/2.0-0.5)';
ym = ym_col(:,ones(Nx-1,1));
And here is my very rough attempt at trying to do the same thing in python:
for x in range (L-1):
for y in range (L-1):
xm_row = x[((x-1)/2.0+0.5):((x-1)/2.0-.5)]
xm = xm_row[(ones(y-1,1)),:]
ym_column = transposey[(-(y-1)/2.0+0.5):((y-1)/2.0-.5)]
ym = ym_column[:,ones(x-1,1)]
In my python code, L is the size of the array I am looping across.
When I try to run it in python, I get there error:
'int' object has no attribute '__getitem__'
at the line:
xm_row = x[((x-1)/2.0+0.5):((x-1)/2.0-.5)]
Any help is appreciated!

In MATLAB, you can implement that in a simpler way with meshgrid, like so -
Nx = 5;
Ny = 7;
xm_row = -(Nx-1)/2.0+0.5:(Nx-1)/2.0-0.5;
ym_col = (-(Ny-1)/2.0+0.5:(Ny-1)/2.0-0.5)';
[xm_out,ym_out] = meshgrid(xm_row,ym_col)
Let's compare this meshgrid version with the original code for verification -
>> Nx = 5;
>> Ny = 7;
>> xm_row = -(Nx-1)/2.0+0.5:(Nx-1)/2.0-0.5;
>> ym_col = (-(Ny-1)/2.0+0.5:(Ny-1)/2.0-0.5)';
>> xm = xm_row(ones(Ny-1, 1), :)
xm =
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
>> ym = ym_col(:,ones(Nx-1,1))
ym =
-2.5 -2.5 -2.5 -2.5
-1.5 -1.5 -1.5 -1.5
-0.5 -0.5 -0.5 -0.5
0.5 0.5 0.5 0.5
1.5 1.5 1.5 1.5
2.5 2.5 2.5 2.5
>> [xm_out,ym_out] = meshgrid(xm_row,ym_col)
xm_out =
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
-1.5 -0.5 0.5 1.5
ym_out =
-2.5 -2.5 -2.5 -2.5
-1.5 -1.5 -1.5 -1.5
-0.5 -0.5 -0.5 -0.5
0.5 0.5 0.5 0.5
1.5 1.5 1.5 1.5
2.5 2.5 2.5 2.5
Now, transitioning from MATLAB to Python has a simpler medium in NumPy, as it hosts many counterparts from MATLAB for use in a Python environment. For our case, we have a NumPy version of meshgrid and that makes it just a straight-forward porting as listed below -
import numpy as np # Import NumPy module
Nx = 5;
Ny = 7;
# Use np.arange that is a colon counterpart in NumPy/Python
xm_row = np.arange(-(Nx-1)/2.0+0.5,(Nx-1)/2.0-0.5+1)
ym_col = np.arange(-(Ny-1)/2.0+0.5,(Ny-1)/2.0-0.5+1)
# Use meshgrid just like in MATLAB
xm,ym = np.meshgrid(xm_row,ym_col)
Output -
In [28]: xm
Out[28]:
array([[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5],
[-1.5, -0.5, 0.5, 1.5]])
In [29]: ym
Out[29]:
array([[-2.5, -2.5, -2.5, -2.5],
[-1.5, -1.5, -1.5, -1.5],
[-0.5, -0.5, -0.5, -0.5],
[ 0.5, 0.5, 0.5, 0.5],
[ 1.5, 1.5, 1.5, 1.5],
[ 2.5, 2.5, 2.5, 2.5]])
Also, please notice that +1 was being added at the end of the second argument to np.arange in both cases, as np.arange excludes the second argument element when creating the range of elements. As an example, if we want to create a range of elements from 3 to 10, we would be required to do np.arange(3,10+1) as shown below -
In [32]: np.arange(3,10+1)
Out[32]: array([ 3, 4, 5, 6, 7, 8, 9, 10])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

deleting rows by default value - python

Related

How do I mask only the output (labelled data). I don't have any problem in input data

Finding shortest sublist with sum greater than 50

How to normalize data in a text file while preserving the first variable

numpy array converted to pandas dataframe drops values

MATLAB to Python Conversion Arrays

Categories

Resources