i have txt file with x,y,z coordinates as follows:
x y z another value
129.000000 -51.000000 3.192000 166 166 166
133.000000 -21.000000 6.982500 171 169 170
134.000000 -51.000000 8.379000 172 170 171
135.000000 -45.000000 8.379000 167 165 166
136.000000 -81.000000 8.578500 160 158 159
137.000000 -51.000000 9.376500 159 157 158
138.000000 -51.000000 9.576000 169 168 167
how to read the value of z when x=20,y=33?
I tried using data = numpy.genfromtxt(yourFileName) but it not worked for me
import pandas as pd
from io import StringIO
x = '''x y z v1 v2 v3
129.000000 -51.000000 3.192000 166 166 166
133.000000 -21.000000 6.982500 171 169 170
134.000000 -51.000000 8.379000 172 170 171
135.000000 -45.000000 8.379000 167 165 166
136.000000 -81.000000 8.578500 160 158 159
137.000000 -51.000000 9.376500 159 157 158
138.000000 -51.000000 9.576000 169 168 167'''
out = StringIO(x )
df = pd.read_csv( out , delimiter="\s+" )
print (df.query( "x==138 and y==-51" ).z.values )
Related
I need to search through each value of a column, do some comparison with all entries of another column and, if certain conditions are met, print. I'm using the python code below and it works, but the drawback is that both columns have tens of thousands of entries, so it's very slow. Is there a more efficient way to do this?
for i in df1.index:
for j in df2['pdb']:
if df1['pdb'][i] == df2['pdb'][j]:
if df1['res1'][i] >= df2['start'][j] and df1['res2'][i] <= df2['end'][j]:
print(df1['pdb'][i], df2['PFAM_ACC'][j])
Example:
df1 =
pdb res1 res2
4xhfA 76 83
4xhfA 126 133
2mx1A 179 186
3s8lA 111 118
4ucmA 115 122
1pigA 119 126
4mavA 263 270
4mavA 289 296
3sbrA 101 108
3sbrA 148 155
3sbrA 158 165
3sbrA 222 229
3sbrA 394 401
5zeaA 83 90
5zeaC 562 569
5zeaD 32 39
5zeaD 89 96
5zeaG 277 284
df2 =
pdb start end PFAM_ACC
4xhfA 140 236 PF04205
1pigA 61 332 PF00128
1pigA 409 493 PF02806
3sbrA 171 241 PF18793
3sbrA 424 494 PF18764
3sbrA 558 635 PF00116
5zeaA 13 75 PF02874
5zeaC 13 75 PF02874
5zeaD 15 81 PF02874
5zeaG 13 75 PF02874
and I want to get as output:
1pigA PF00128
3sbrA PF18793
5zeaD PF02874
I hope it's more clear now.
Please let me know if you have any suggestions
Try:
x = df1.merge(df2, on="pdb")
out = x.loc[
(x["res1"] >= x["start"]) & (x["res2"] <= x["end"]), ["pdb", "PFAM_ACC"]
]
print(out)
Prints:
pdb PFAM_ACC
2 1pigA PF00128
13 3sbrA PF18793
21 5zeaD PF02874
So I have multiple data frames and all need the same kind of formula applied to certain sets within this data frame. I got the locations of the sets inside the df, but I don't know how to access those sets.
This is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt #might used/need it later to check the output
df = pd.read_csv('Dalfsen.csv')
l = []
x = []
y = []
#the formula(trendline)
def rechtzetten(x,y):
a = (len(x)*sum(x*y)- sum(x)*sum(y))/(len(x)*sum(x**2)-sum(x)**2)
b = (sum(y)-a*sum(x))/len(x)
y1 = x*a+b
print(y1)
METING = df.ID.str.contains("<METING>") #locating the sets
indicatie = np.where(METING == False)[0] #and saving them somewhere
if n in df[n] != indicatie & n+1 != indicatie: #attempt to add parts of the set in l
append.l
elif n in df[n] != indicatie & n+1 == indicatie: #attempt defining the end of the set and using the formula for the set
append.l
rechtzetten(l.x, l.y)
else: #emptying the storage for the new set
l = []
indicatie has the following numbers:
0 12 13 26 27 40 41 53 54 66 67 80 81 94 95 108 109 121
122 137 138 149 150 162 163 177 178 190 191 204 205 217 218 229 230 242
243 255 256 268 269 291 292 312 313 340 341 373 374 401 402 410 411 420
421 430 431 449 450 468 469 487 488 504 505 521 522 538 539 558 559 575
576 590 591 604 605 619 620 633 634 647
Because my df looks like this:
ID,NUM,x,y,nap,abs,end
<PROFIEL>not used data
<METING>data</METING>
<METING>data</METING>
...
<METING>data</METING>
<METING>data</METING>
</PROFIEL>,,,,,,
<PROFIEL>not usde data
...
</PROFIEL>,,,,,,
tl;dr I'm trying to use a formula in each profile as shown above. I want to edit the data between 2 numbers of the list indicatie.
For example:
the fucntion rechtzetten(x,y) for the x and y df.x&df.y[1:11](Because [0]&[12] are in the list indicatie.) And then the same for [14:25] etc. etc.
What I try to avoid is typing the following hundreds of times manually:
x_#=df.x[1:11]
y_#=df.y[1:11]
rechtzetten(x_#,y_#)
I cant understand your question clearly, but if you want to replace a specific column of your pandas dataframe with a numpy array, you could simply assign it :
df['Column'] = numpy_array
Can you be more clear ?
I have a txt file like this:
127 181
151 188
120 201
148 207
148 212
145 215
86 219
108 219
67 239
And I want to the second column of numbers is added in order from 180, and the repeated number is added only once.
My expected results are as follows:
127 180
151 181
120 182
148 183
148 184
145 185
86 186
108 186
67 187
Can someone give me some advice?Thanks.
If you are open to use pandas:
df = pd.read_csv('textfile.txt', header=None, sep=' ')
startvalue = 180
df[1] = np.arange(startvalue, startvalue+len(df)) - df[1].duplicated().cumsum()
df.to_csv('textfile_out.txt', sep=' ', index=False, header=False)
Full example (with imports and textfile-creation):
import pandas as pd
import numpy as np
with open('textfile.txt', 'w') as f:
f.write('''\
127 181
151 188
120 201
148 207
148 212
145 215
86 219
108 219
67 239''')
df = pd.read_csv('textfile.txt', header=None, sep=' ')
startvalue = 180
df[1] = np.arange(startvalue, startvalue+len(df)) - df[1].duplicated().cumsum()
df.to_csv('textfile_out.txt', sep=' ', index=False, header=False)
Output:
127 180
151 181
120 182
148 183
148 184
145 185
86 186
108 186
67 187
Without using any library, I suggest this approach. Create a dictionary to store the relation (old value - new value) and iterate over column values.
n = 180
new_dict = {}
for index, value in enumerate(column):
if value in new_dict.keys():
column[index] = new_dict[value]
else:
new_dict[value] = n
column[index] = n
n += 1
I have a 32*32 matrix and I want to break it into 4 8x8 matrixes.
Here's how I try to make a smaller matrix for top-left part of the big one (pix is a 32x32 matrix).
A = [[0]*mat_size]*mat_size
for i in range(mat_ size):
for j in range(mat_size):
A[i][j] = pix[i, j]
So, pix has the following values for top-left part:
198 197 194 194 197 192 189 196
199 199 198 198 199 195 195 145
200 200 201 200 200 204 131 18
201 201 199 201 203 192 57 56
201 200 198 200 207 171 41 141
200 200 198 199 208 160 38 146
198 198 198 198 206 157 39 129
198 197 197 199 209 157 38 77
But when I print(A) after the loop, all the rows of A equal to the last row of pix. So it's 8 rows of 198 197 197 199 209 157 38 77 I know I can use A = pix[:8, :8], but I prefer to use loop for some purpose. I wonder why that loop solution doesn't gives me correct result.
A = np.zeros((4, 4, 8, 8))
for i in range(4):
for j in range(4):
A[i, j] = pix[i*8:(i+1)*8, j*8:(j+1)*8]
If I understand your question correctly, this solution should work. What it's doing is iterating through the pix matrix, and selecting a 8*8 matrix each time. Is this what you need?
Consider using numpy in order to avoid multiple references pointing to the same list (the last list in the matrix):
mat_size = 8
A = np.empty((mat_size,mat_size))
pix = np.array(pix)
for i in range(mat_size):
for j in range(mat_size):
A[i][j] = pix[i][j]
Is there a way to save a custom maplotlib colourmap (matplotlib.cm) as a file (e.g Color Palette Table file (.cpt), like used in MATLAB) to be shared and then use later in other programs? (e.g. Panopoly, MATLAB...)
Example
Below a new LinearSegmentedColormap is made by modifying an existing colormap (by truncation, as shown in another question linked here).
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
# Get an existing colorbar
cb = 'CMRmap'
cmap = plt.get_cmap( cb )
# Variables to modify (truncate) the colormap with
minval = 0.15
maxval = 0.95
npoints = 100
# Now modify (truncate) the colorbar
cmap = matplotlib.colors.LinearSegmentedColormap.from_list(
'trunc({n},{a:.2f},{b:.2f})'.format(n=cmap.name, a=minval,
b=maxval), cmap(np.linspace(minval, maxval, npoints)))
# Now the data can be extracted as a dictionary
cdict = cmap._segmentdata
# e.g. variables ('blue', 'alpha', 'green', 'red')
print( cdict.keys() )
# Now, is it possible to save to this as a .cpt?
More detail
I am aware of ways of loading external colormaps in matplotlib (e.g. shown here and here).
From NASA GISS's Panoply documentation:
Color Palette Table (CPT) indicates a color palette format used by the
Generic Mapping Tools program. The format defines a number of solid
color and/or gradient bands between the colorbar extrema rather than a
finite number of distinct colors.
The following is a function that takes a colormap, some limits (vmin and vmax) and the number of colors as input and creates a cpt file from it.
import matplotlib.pyplot as plt
import numpy as np
def export_cmap_to_cpt(cmap, vmin=0,vmax=1, N=255, filename="test.cpt",**kwargs):
# create string for upper, lower colors
b = np.array(kwargs.get("B", cmap(0.)))
f = np.array(kwargs.get("F", cmap(1.)))
na = np.array(kwargs.get("N", (0,0,0))).astype(float)
ext = (np.c_[b[:3],f[:3],na[:3]].T*255).astype(int)
extstr = "B {:3d} {:3d} {:3d}\nF {:3d} {:3d} {:3d}\nN {:3d} {:3d} {:3d}"
ex = extstr.format(*list(ext.flatten()))
#create colormap
cols = (cmap(np.linspace(0.,1.,N))[:,:3]*255).astype(int)
vals = np.linspace(vmin,vmax,N)
arr = np.c_[vals[:-1],cols[:-1],vals[1:],cols[1:]]
# save to file
fmt = "%e %3d %3d %3d %e %3d %3d %3d"
np.savetxt(filename, arr, fmt=fmt,
header="# COLOR_MODEL = RGB",
footer = ex, comments="")
# test case: create cpt file from RdYlBu colormap
cmap = plt.get_cmap("RdYlBu",255)
# you may create your colormap differently, as in the question
export_cmap_to_cpt(cmap, vmin=0,vmax=1,N=20)
The resulting file looks like
# COLOR_MODEL = RGB
0.000000e+00 165 0 38 5.263158e-02 190 24 38
5.263158e-02 190 24 38 1.052632e-01 215 49 39
1.052632e-01 215 49 39 1.578947e-01 231 83 55
1.578947e-01 231 83 55 2.105263e-01 244 114 69
2.105263e-01 244 114 69 2.631579e-01 249 150 86
2.631579e-01 249 150 86 3.157895e-01 253 181 104
3.157895e-01 253 181 104 3.684211e-01 253 207 128
3.684211e-01 253 207 128 4.210526e-01 254 230 153
4.210526e-01 254 230 153 4.736842e-01 254 246 178
4.736842e-01 254 246 178 5.263158e-01 246 251 206
5.263158e-01 246 251 206 5.789474e-01 230 245 235
5.789474e-01 230 245 235 6.315789e-01 206 234 242
6.315789e-01 206 234 242 6.842105e-01 178 220 235
6.842105e-01 178 220 235 7.368421e-01 151 201 224
7.368421e-01 151 201 224 7.894737e-01 120 176 211
7.894737e-01 120 176 211 8.421053e-01 96 149 196
8.421053e-01 96 149 196 8.947368e-01 70 118 180
8.947368e-01 70 118 180 9.473684e-01 59 86 164
9.473684e-01 59 86 164 1.000000e+00 49 54 149
B 165 0 38
F 49 54 149
N 0 0 0
and would be in the required format.