I wrote a solution to this challenge . It successfully handles the example case given, but not the actual case.
Challenge:
A DNA string is a reverse palindrome if it is equal to its reverse complement. For instance, GCATGC is a reverse palindrome because its reverse complement is GCATGC. For example:
5'...GCATGC...3'
3'...CGTACG...5'
Given:
A DNA string of length at most 1 kbp in FASTA format.
Return:
The position and length of every reverse palindrome in the string
having length between 4 and 12. You may return these pairs in any
order.
Sample Dataset
>Rosalind_24 TCAATGCATGCGGGTCTATATGCAT
Sample Output
4 6
5 4
6 6
7 4
17 4
18 4
20 6
21 4
For the sample, it works. However it failed on the actual sample.
Actual Dataset:
>Rosalind_7901 ATATAGTCGGCTGTCCAGGCAATCGCGAGATGGGGAACGACATCTTGGTACTTTACGGAT GCCAAGACTTAATATCTGGCCCGGATATGACCGCGAGCACCCCCTACTCGTCTGTCGGTT TCGGCCGGCATGACCTGTCCTCTTGATAATAGATATAAGTTGCCAACCGCACTATTTCAA GATCAGATGCCCCAAGGCACAAGGCACAGAAGAATCAGGTACTGAGCAAACAGCGCCCAT TTGTCAGCGCAACTCCGAGCGACAGGCACAAGTGGTAGTAACATCTGTAGTCTACGAGCG CGGGACCGATGTAAAAAGCAACGAGAGACGGGGCCGTCGATAGAAAAGCAATGGAGTCCA TATGGGCACGCTGAGCGTGCCTGTACTAATTTCTATGGGCTACTGGCACTAGGGGCTTAA GCCCTCGGTTACCGCGCTTTATGAATATAGTTTTCGTGCCAGGAGTGTCTTGTTTCGAGG AAGCGTGAGCTACACTTAGCACGTCCGGGCTTATTGGAAATTTGTTCAGTCTGTATGCTC CGCAATATCATGTCGGCGCTCATTCAATGTTGCGTGTAATTTAGACCTCTACTACAGCTG GGGTTGGAGCGGTCGGTAGTAAGACGTATGATTACGGTTTACATCCCGCCGGCGGACACG GAACGTGATTTTCAGCATTGTCCCATCGTAGGGATTGGGGCCCTAGTAGGTGTGGGTAGC ACGTTACATGAAGCTATCCAATGGCGTATATACTCCATCCCATCGGACTAGAAGATTTGA GGGACCCAGTCATAACTGGTGCAAAATTACGTTACAAAAGCCGAGGATACAGTATA
Actual Output:
1 4
2 4
23 6
24 4
48 4
70 4
73 4
79 4
82 4
86 4
93 4
124 6
125 4
126 6
127 4
131 4
155 4
156 4
184 4
222 4
236 4
251 4
337 4
342 4
389 4
394 4
415 4
423 4
440 4
441 4
452 4
453 4
482 4
496 4
509 4
513 4
526 6
527 4
554 4
558 4
565 4
587 4
604 6
605 4
634 4
656 10
657 8
658 6
659 4
674 4
709 6
710 4
714 4
733 4
739 4
744 4
758 8
759 4
759 6
760 4
761 4
780 4
813 4
818 4
822 4
846 4
Code:
from string import maketrans
table=maketrans('ATCG','TAGC')
protein=open('rosalind_revp.txt','r').read()[14::].strip()
for i in range(len(protein)):
for ii in range(2,7):
if protein[i:i+ii]==protein[i+2*ii-1:i+ii-1:-1].translate(table):
print str(i+1),str(2*ii)
(When testing sample, the 4th line is
protein=open('rosalind_revp.txt','r').read()[12::].strip()
I even manually matched a bunch of the position-length pairs, and sad to find that they all worked perfectly. I still don't know why the result wasn't accepted.
Could anyone let me know where I was wrong?
This is my github link and it has the solution hope this works
def reverse(l):
t=""
for i in range(len(l)):
if(l[i]=='A'):
t=t+'T'
elif(l[i]=='T'):
t=t+'A'
elif(l[i]=='C'):
t=t+'G'
elif(l[i]=='G'):
t=t+'C'
return t
def rev(d):
return d[len(d)::-1]
k=input()
p=input()
for i in range(len(p)):
for j in range(4,14):
if (p[i:i+j]==rev(reverse(p[i:i+j]))and i+j<=len(p)):
print(i+1, end=" ")
print(j)
https://github.com/jssssv007/stackexcahnge
Related
I am trying to generate a different random day within each year group of a dataframe. So I need replacement = False, otherwise it will fail.
You can't just add a column of random numbers because I'm going to have more than 365 years in my list of years and once you hit 365 it can't create any more random samples without replacement.
I have explored agg, aggreagte, apply and transform. The closest I have got is with this:
years = pd.DataFrame({"year": [1,1,2,2,2,3,3,4,4,4,4]})
years["day"] = 0
grouped = years.groupby("year")["day"]
grouped.transform(lambda x: np.random.choice(366, replace=False))
Which gives this:
0 8
1 8
2 319
3 319
4 319
5 149
6 149
7 130
8 130
9 130
10 130
Name: day, dtype: int64
But I want this:
0 8
1 16
2 119
3 321
4 333
5 4
6 99
7 30
8 129
9 224
10 355
Name: day, dtype: int64
You can use your code with a minor modification. You have to specify the number of samples.
random_days = lambda x: np.random.choice(range(1, 366), len(x), replace=False)
years['day'] = years.groupby('year').transform(random_days)
Output:
>>> years
year day
0 1 18
1 1 300
2 2 154
3 2 355
4 2 311
5 3 18
6 3 14
7 4 160
8 4 304
9 4 67
10 4 6
With numpy broadcasting :
years["day"] = np.random.choice(366, years.shape[0], False) % 366
years["day"] = years.groupby("year").transform(lambda x: np.random.permutation(x))
Output :
print(years)
year day
0 1 233
1 1 147
2 2 1
3 2 340
4 2 267
5 3 204
6 3 256
7 4 354
8 4 94
9 4 196
10 4 164
I am trying to apply a function on a column of a dataframe.
After getting multiple results as dataframes, I want to concat them all in one.
Why does the first option work and the second not?
import numpy as np
import pandas as pd
def testdf(n):
test = pd.DataFrame(np.random.randint(0,n*100,size=(n*3, 3)), columns=list('ABC'))
test['index'] = n
return test
test = pd.DataFrame({'id': [1,2,3,4]})
testapply = test['id'].apply(func = testdf)
#option 1
pd.concat([testapply[0],testapply[1],testapply[2],testapply[3]])
#option2
pd.concat([testapply])
pd.concat expects a sequence of pandas objects, but your #2 case/option passes a sequence of single pd.Series object that contains multiple dataframes, so it doesn't make concatenation - you just get that series as is.To fix your 2nd approach use unpacking:
print(pd.concat([*testapply]))
A B C index
0 91 15 91 1
1 93 85 91 1
2 26 87 74 1
0 195 103 134 2
1 14 26 159 2
2 96 143 9 2
3 18 153 35 2
4 148 146 130 2
5 99 149 103 2
0 276 150 115 3
1 232 126 91 3
2 37 242 234 3
3 144 73 81 3
4 96 153 145 3
5 144 94 207 3
6 104 197 49 3
7 0 93 179 3
8 16 29 27 3
0 390 74 379 4
1 78 37 148 4
2 350 381 260 4
3 279 112 260 4
4 115 387 173 4
5 70 213 378 4
6 43 37 149 4
7 240 399 117 4
8 123 0 47 4
9 255 172 1 4
10 311 329 9 4
11 346 234 374 4
I have following unique values in dataframe column.
['1473' '1093' '1346' '1324' 'NA' '1129' '58' '847' '54' '831' '816']
I want to drop rows which have 'NA' in this column.
testData = testData[testData.BsmtUnfSF != "NA"]
and got error
TypeError: invalid type comparison
Then I tried
testData = testData[testData.BsmtUnfSF != np.NAN]
It doesn't give any error but it doesn't drop rows.
How to solve this issue?
Here is how you can do it. Just change column with the column name you want.
import pandas as pd
import numpy as np
df = pd.DataFrame({"column": [1,2,3,np.nan,6]})
df = df[np.isfinite(df['column'])]
You could use dropna
testData = testData.dropna(subsets = 'BsmtUnfSF']
assuming your dataFrame:
>>> df
col1
0 1473
1 1093
2 1346
3 1324
4 NaN
5 1129
6 58
7 847
8 54
9 831
10 816
You have multiple solutions:
>>> df[pd.notnull(df['col1'])]
col1
0 1473
1 1093
2 1346
3 1324
5 1129
6 58
7 847
8 54
9 831
10 816
>>> df[df.col1.notnull()]
# df[df['col1'].notnull()]
col1
0 1473
1 1093
2 1346
3 1324
5 1129
6 58
7 847
8 54
9 831
10 816
>>> df.dropna(subset=['col1'])
col1
0 1473
1 1093
2 1346
3 1324
5 1129
6 58
7 847
8 54
9 831
10 816
>>> df.dropna()
col1
0 1473
1 1093
2 1346
3 1324
5 1129
6 58
7 847
8 54
9 831
10 816
>>> df[~df.col1.isnull()]
col1
0 1473
1 1093
2 1346
3 1324
5 1129
6 58
7 847
8 54
9 831
10 816
I am having some data which look like as shown below df.
I am trying to calculate first the mean angle for each group using the function mean_angle. The calculated mean angle is then used to do another calculation per group using the function fun.
import pandas as pd
import numpy as np
generate sample data
a = np.array([1,2,3,4]).repeat(4)
x1 = 90 + np.random.randint(-15, 15, size=a.size//2 - 2 )
x2 = 270 + np.random.randint(-50, 50, size=a.size//2 + 2 )
b = np.concatenate((x1, x2))
np.random.shuffle(b)
df = pd.DataFrame({'a':a, 'b':b})
The returned dataframe is printed below.
a b
0 1 295
1 1 78
2 1 280
3 1 94
4 2 308
5 2 227
6 2 96
7 2 299
8 3 248
9 3 288
10 3 81
11 3 78
12 4 103
13 4 265
14 4 309
15 4 229
My functions are mean_angle and fun
def mean_angle(deg):
deg = np.deg2rad(deg)
deg = deg[~np.isnan(deg)]
S = np.sum(np.sin(deg))
C = np.sum(np.cos(deg))
mu = np.arctan2(S,C)
mu = np.rad2deg(mu)
if mu <0:
mu = 360 + mu
return mu
def fun(x, mu):
return np.where(abs(mu - x) < 45, x, np.where(x+180<360, x+180, x-180))
what I have tried
mu = df.groupby(['a'])['b'].apply(mean_angle)
df2 = df.groupby(['a'])['b'].apply(fun, args = (mu,)) #this function should be element wise
I know it is totally wrong but I could not come up with a better way.
The desired output is something like this where mu the mean_angle per group
a b c
0 1 295 np.where(abs(mu - 295) < 45, 295, np.where(295 +180<360, 295 +180, 295 -180))
1 1 78 np.where(abs(mu - 78) < 45, 78, np.where(78 +180<360, 78 +180, 78 -180))
2 1 280 np.where(abs(mu - 280 < 45, 280, np.where(280 +180<360, 280 +180, 280 -180))
3 1 94 ...
4 2 308 ...
5 2 227 .
6 2 96 .
7 2 299 .
8 3 248 .
9 3 288 .
10 3 81 .
11 3 78 .
12 4 103 .
13 4 265 .
14 4 309 .
15 4 229 .
Any help is appreciated
You don't need your second function, just pass the necessary columns to np.where(). So creating your dataframe in the same manner and not modifying your mean_angle function, we have the following sample dataframe:
a b
0 1 228
1 1 291
2 1 84
3 1 226
4 2 266
5 2 311
6 2 82
7 2 274
8 3 79
9 3 250
10 3 222
11 3 88
12 4 80
13 4 291
14 4 100
15 4 293
Then create your c column (containing your mu values) using groupby() and transform(), and finally apply your np.where() logic:
df['c'] = df.groupby(['a'])['b'].transform(mean_angle)
df['c'] = np.where(abs(df['c'] - df['b']) < 45, df['b'], np.where(df['b']+180<360, df['b']+180, df['b']-180))
Yields:
a b c
0 1 228 228
1 1 291 111
2 1 84 264
3 1 226 226
4 2 266 266
5 2 311 311
6 2 82 262
7 2 274 274
8 3 79 259
9 3 250 70
10 3 222 42
11 3 88 268
12 4 80 260
13 4 291 111
14 4 100 280
15 4 293 113
I have a DataFrame where I need to split the header into multiple rows as headers for the same Dataframe.
The dataframe looks like this,
My data Frame looks like follows,
gene ALL_ID_1 AML_ID_1 AML_ID_2 AML_ID_3 AML_ID_4 AML_ID_5 Stroma_ID_1 Stroma_ID_2 Stroma_ID_3 Stroma_ID_4 Stroma_ID_5 Stroma_CR_Pat_4 Stroma_CR_Pat_5 Stroma_CR_Pat_6 Stroma_CR_Pat_7 Stroma_CR_Pat_8
ENSG 8 1 11 5 10 0 628 542 767 578 462 680 513 968 415 623
ENSG 0 0 1 0 0 0 0 28 1 3 0 1 4 0 0 0
ENSG 661 1418 2580 6817 14727 5968 9 3 5 9 2 9 3 3 5 1
ENSG 20 315 212 8 790 471 1283 2042 1175 2839 1110 857 1880 1526 2262 2624
ENSG 11 26 24 9 11 2 649 532 953 463 468 878 587 245 722 484
And I want the the above header to be spitted as follows,
network ID ID REL
node B_ALL AML Stroma
hemi 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10
ENSG 8 1 11 5 10 0 628 542 767 578 462 680 513 968 415 623
ENSG 0 0 1 0 0 0 0 28 1 3 0 1 4 0 0 0
ENSG 661 1418 2580 6817 14727 5968 9 3 5 9 2 9 3 3 5 1
ENSG 20 315 212 8 790 471 1283 2042 1175 2839 1110 857 1880 1526 2262 2624
ENSG 11 26 24 9 11 2 649 532 953 463 468 878 587 245 722 484
Any help is greatly appreciated ..
Probably not the best minimal example you put here, very few people has the subject knowledge to understand what is network, node and hemi in your context.
You just need to create your MultiIndex and replace your column index with the one you created:
There are 3 rules in your example:
1, whenever 'Stroma' is found, the column belongs to REL, otherwise belongs to ID.
2, node is the first field of the initial column names
3, hemi is the last field of the initial column names
Then, just code away:
In [110]:
df.columns = pd.MultiIndex.from_tuples(zip(np.where(df.columns.str.find('Stroma')!=-1, 'REL', 'ID'),
df.columns.map(lambda x: x.split('_')[0]),
df.columns.map(lambda x: x.split('_')[-1])),
names=['network', 'node', 'hemi'])
print df
network ID REL \
node ALL AML Stroma
hemi 1 1 2 3 4 5 1 2 3 4 5
gene
ENSG 8 1 11 5 10 0 628 542 767 578 462
ENSG 0 0 1 0 0 0 0 28 1 3 0
ENSG 661 1418 2580 6817 14727 5968 9 3 5 9 2
ENSG 20 315 212 8 790 471 1283 2042 1175 2839 1110
ENSG 11 26 24 9 11 2 649 532 953 463 468
network
node
hemi 4 5 6 7 8
gene
ENSG 680 513 968 415 623
ENSG 1 4 0 0 0
ENSG 9 3 3 5 1
ENSG 857 1880 1526 2262 2624
ENSG 878 587 245 722 484