Indexing in Polybius cipher produce error in python - python

I'm making a polybius cipher. So I made a table to convert with the keyword tomato
alp = "abcdefghijklmnopqrstuvwxyz0123456789"
s = str(input("keyword: "))
for i in s:
alp = alp.replace(i,"")
s2 = "".join(dict.fromkeys(s))
jaa = s2+alp
x = list(jaa)
array = np.array(x)
re = np.reshape(array,(6,6))
dt = pd.DataFrame(re)
dt.columns = [1,2,3,4,5,6]
dt.index = [1,2,3,4,5,6]
dt
1 2 3 4 5 6
1 t o m a b c
2 d e f g h i
3 j k l n p q
4 r s u v w x
5 y z 0 1 2 3
6 4 5 6 7 8 9
I want to translate poly with this code
poly = '25 34 14 12 35 22 43 21 25 34 24 33 51 23 12 25 13 34 22'
a = poly.split(" ")
for i in range (len(a)):
hur = a[i]
w = dt._get_value(hur[0],hur[1])
print(w)
But, keyerror : '5'. I've tried to get value with (2,5), the output is good, but can't run it with the indexing. Which part is missing?

It's because hur[0] and hur[1] is a string, not an integer.
You need to do:
for hur in a:
w = dt._get_value(int(hur[0]),int(hur[1]))
print(w, end="") # end="" will print it as one text instead of over multiple lines
Note that your poly has a double space which will mess up the split method.

Related

Pandas: Run a set of codes for multiple parameters with multiple levels of each parameter (output is a dataframe)

Lets say we have a set of codes as given below. Currently, we have two parameters whose value are initialized by user input. The output here is a dataframe.
What we want?
Use a function, to create a dataframe with all combinations of X and Y. Lets say X and Y has 4 input values each. Then
Join the output dataframe, df for each combination to get the desired output dataframe.
X= float(input("Enter the value of X: "))
Y = float(input("Enter the value of Y: "))
A= X*Y
B=X*(Y^2)
df = pd.DataFrame({"X": X, "Y": Y, "A": A, "B": B})
Desired output
X Y A B
1 2 2 4
1 4 4 16
1 6 6 36
1 8 8 64
2 2 4 8
2 4 8 32
2 6 12 72
2 8 16 128
3 2 6 12
3 4 12 48
3 6 18 108
3 8 24 192
4 2 8 16
4 4 16 64
4 6 24 144
4 8 32 256
Is this what you were looking for?
def so_help():
x = input('Please enter all X values separated by a comma(,)')
y = input('Please enter all Y values separated by a comma(,)')
#In case anyone gets comma happy
x = x.strip(',')
y = y.strip(',')
x_list = x.split(',')
y_list = y.split(',')
df_x = pd.DataFrame({'X' : x_list})
df_y = pd.DataFrame({'Y' : y_list})
df_cross = pd.merge(df_x, df_y, how = 'cross')
df_cross['X'] = df_cross['X'].astype(int)
df_cross['Y'] = df_cross['Y'].astype(int)
df_cross['A'] = df_cross['X'].mul(df_cross['Y'])
df_cross['B'] = df_cross['X'].mul(df_cross['Y'].pow(2))
return df_cross
so_help()

How to print the row and columns of the value you're looking for in dataframe

So I made this dataframe
alp = "abcdefghijklmnopqrstuvwxyz0123456789"
s = "carl"
for i in s:
alp = alp.replace(i,"")
jaa = s+alp
x = list(jaa)
array = np.array(x)
re = np.reshape(array,(6,6))
dt = pd.DataFrame(re)
dt.columns = [1,2,3,4,5,6]
dt.index = [1,2,3,4,5,6]
dt
1 2 3 4 5 6
1 c a r l b d
2 e f g h i j
3 k m n o p q
4 s t u v w x
5 y z 0 1 2 3
6 4 5 6 7 8 9
I want to search a value , and print its row(index) and column.
For example, 'h', the output i want is 2,4.
Is there any way to get that output?
row, col = np.where(dt == "h")
print(dt.index[row[0]], dt.columns[col[0]])

How to combine some rows into a single row

Sorry, I should delete the old question, and create the new one.
I have a dataframe with two columns. The df looks as follows:
Word Tag
0 Asam O
1 instruksi O
2 - O
3 instruksi X
4 bahasa Y
5 Instruksi P
6 - O
7 instruksi O
8 sebuah Q
9 satuan K
10 - L
11 satuan O
12 meja W
13 Tiap Q
14 - O
15 tiap O
16 karakter P
17 - O
18 ke O
19 - O
20 karakter O
and I'd like to merge some rows which contain dash - to one row. so the output should be the following:
Word Tag
0 Asam O
1 instruksi-instruksi O
2 bahasa Y
3 Instruksi-instruksi P
4 sebuah Q
5 satuan-satuan K
6 meja W
7 Tiap-tiap Q
8 karakter-ke-karakter P
Any ideas? Thanks in advance. I have tried the answer from Jacob K, it works, then I found in my dataset, there are more than one - row in between. I have put the expected output, like index number 8
Solution from Jacob K:
# Import packages
import pandas as pd
import numpy as np
# Get 'Word' and 'Tag' columns as numpy arrays (for easy indexing)
words = df.Word.to_numpy()
tags = df.Tag.to_numpy()
# Create empty lists for new colums in output dataframe
newWords = []
newTags = []
# Use while (rather than for loop) since index i can change dynamically
i = 0 # To not cause any issues with i-1 index
while (i < words.shape[0] - 1):
if (words[i] == "-"):
# Concatenate the strings above and below the "-"
newWords.append(words[i-1] + "-" + words[i+1])
newTags.append(tags[i-1])
i += 2 # Don't repeat any concatenated values
else:
if (words[i+1] != "-"):
# If there is no "-" next, append the regular word and tag values
newWords.append(words[i])
newTags.append(tags[i])
i += 1 # Increment normally
# Create output dataframe output_df
d2 = {'Word': newWords, 'Tag': newTags}
output_df = pd.DataFrame(data=d2)
My approach with GroupBy.agg:
#df['Word'] = df['Word'].str.replace(' ', '') #if necessary
blocks = df['Word'].shift().ne('-').mul(df['Word'].ne('-')).cumsum()
new_df = df.groupby(blocks, as_index=False).agg({'Word' : ''.join, 'Tag' : 'first'})
print(new_df)
Output
Word Tag
0 Asam O
1 instruksi-instruksi O
2 bahasa Y
3 Instruksi-instruksi P
4 sebuah Q
5 satuan-satuan K
6 meja W
7 Tiap-tiap Q
8 karakter-ke-karakter P
Blocks (Detail)
print(blocks)
0 1
1 2
2 2
3 2
4 3
5 4
6 4
7 4
8 5
9 6
10 6
11 6
12 7
13 8
14 8
15 8
16 9
17 9
18 9
19 9
20 9
Name: Word, dtype: int64
This is a loop version:
import pandas as pd
# import data
DF = pd.read_csv("table.csv")
# creates a new DF
newDF = pd.DataFrame()
# iterate through rows
for i in range(len(DF)-1):
# prepare prev row index (?dealing with private instance of first row)
prev = i-1
if (prev < 0):
prev = 0
# copy column if the row is not '-' and the next row is not '-'
if (DF.loc[i+1, 'Word'] != '-'):
if (DF.loc[i, 'Word'] != '-' and DF.loc[prev, 'Word'] != '-'):
newDF = newDF.append(DF.loc[i, :])
# units the three rows if the middle one is '-'
else:
row = {'Tag': [DF.loc[i, 'Tag']], 'Word': [DF.loc[i, 'Word']+DF.loc[i+1, 'Word']+DF.loc[i+2, 'Word']]}
newDF = newDF.append(pd.DataFrame(row))

Remove part of a string with coordinates in python

Hello I have a list of tuple such as :
indexes_to_delete=((6,9),(20,22),(2,4))
and a sequence that I can open using Biopython :
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
and from indexes_to_delete file I would like to remove the part from :
6 to 9
20 to 22
and
2 to 4
so if I follow these coordinate I should have a new_sequence :
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
so if I remove the coordinates I get :
A E J K L M N O P Q R S W X Y Z
1 5 10 11 12 13 14 15 16 17 18 19 23 24 25 26
indexes_to_delete=((6,9),(20,22),(2,4))
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
s = ''.join(ch for i, ch in enumerate(Sequence1, 1) if not any(a <= i <= b for a, b in indexes_to_delete))
print(s)
Prints:
AEJKLMNOPQRSWXYZ
Here is another approach using several modules.
from string import ascii_uppercase
from intspan import intspan
from operator import itemgetter
indexes_to_delete=((6,9),(20,22),(2,4))
# add dummy 'a' so count begins with 1 for uppercase letters
array = ['a'] + list(ascii_uppercase)
indexes_to_keep = intspan.from_ranges(indexes_to_delete).complement(low = 1, high=26)
slice_of = itemgetter(*indexes_to_keep)
print(' '.join(slice_of(array)))
print(' '.join(map(str,indexes_to_keep)))
Prints:
A E J K L M N O P Q R S W X Y Z
1 5 10 11 12 13 14 15 16 17 18 19 23 24 25 26
def delete_indexes(sequence, indexes_to_delete):
# first convert the sequence to a dictionary
seq_dict = {i+1: sequence[i] for i in range(len(sequence))}
# collect all the keys that need to be removed
keys_to_delete = []
for index_range in indexes_to_delete:
start, end = index_range
keys_to_delete += range(start, end+1)
if not keys_to_delete:
return seq_dict
# reomove the keys from the original dictionary
for key in keys_to_delete:
seq_dict.pop(key)
return seq_dict
You can use this function to get the new sequence.
new_sequence = delete_indexes(Sequence1, indexes_to_delete)
Of course, the new_sequence is still a python dictionary. You can convert it to list or str, or whatever. For example, to convert it into a str as the old Sequence1:
print(''.join(list(new_sequence.values())))
Out[7]:
AEJKLMNOPQRSWXYZ
You can get their coordinates using new_sequence.keys().
A bit more readable version:
indexes_to_delete=((6,9),(20,22),(2,4))
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
newSequence1 = ""
for idx, char in enumerate(Sequence1):
for startIndex, endIndex in indexes_to_delete:
if startIndex <= idx+1 <= endIndex:
break
else:
newSequence1 += char
print(newSequence1)
Prints: AEJKLMNOPQRSWXYZ

Extracting specific elements from a list of strings and creating a new list?

I am a beginner in python.
This my issue. I have a list as below
lst = ['UGAGGUAGUAGGUUGUAUAGUU', 'CUAUGCAAUUUUCUACCUUACC', 'UCCCUGAGACCUCAAGUGUGA',
'ACACCUGGGCUCUCCGGGUACC', 'CAUACUUCCUUACAUGCCCAUA', 'UGGAAUGUAAAGAAGUAUGUA',
'CAUCAAAGCGGUGGUUGAUGUG', 'UAUCACAGCCAGCUUUGAUGUGC', 'AGGCAGUGUGGUUAGCUGGUUG',
'ACGGCUACCUUCACUGCCACCC']
Now I need to extract the first letter from all the 10 elements in the lst and then put them in a new list. similarly second letter, third letter and so forth until the last letter is extracted from all the ten elements and append it to the new list. The output has to look like this
new_lst = ['UCUACUCUAA', 'GUCCAGAAGC', 'AACAUGUUGG', 'GUCCAACCCG', 'GGUCCAAAAC',
'UCGUUUACGU', 'AAAGUGAAUA', 'GAGGCUGGGC', 'UUAGCACCUC', 'AUCCUAGCGU', ..., 'C']
I tried this code:
new_lst = []
new_lst.append(''.join([x[i] for x in lst]))
The above code prints only the first 10 elements in the new_list because the index is from 0 to 9 (I misunderstood what index means).
Then I did the following
final= []
for j in range(1,len(lst),1):
new_lst = []
for x in lst:
c = len(x)
for i in range(1,c,1):
while (i<len(x)):
new_lst.append(x[i])
else:
new_lst.append("")
final.append([new_lst])
print final
When I execute this code, it throws a memory error. The reason why I checked the length is because the elements in the lst are not of the same length and when I was using a different code it threw an error, IndexError: string index out of range.
I first wanted to dissect the code, so I just used the following code:
lst2 = []
for x in lst:
c = len (x)
print c
for i in range(0,c,1):
print i,
print x[i],
I got the following output:
22
0 U 1 G 2 A 3 G 4 G 5 U 6 A 7 G 8 U 9 A 10 G 11 G 12 U 13 U 14 G 15 U 16 A 17 U 18 A 19 G 20 U 21 U 22
0 C 1 U 2 A 3 U 4 G 5 C 6 A 7 A 8 U 9 U 10 U 11 U 12 C 13 U 14 A 15 C 16 C 17 U 18 U 19 A 20 C 21 C 21
0 U 1 C 2 C 3 C 4 U 5 G 6 A 7 G 8 A 9 C 10 C 11 U 12 C 13 A 14 A 15 G 16 U 17 G 18 U 19 G 20 A 22
0 A 1 C 2 A 3 C 4 C 5 U 6 G 7 G 8 G 9 C 10 U 11 C 12 U 13 C 14 C 15 G 16 G 17 G 18 U 19 A 20 C 21 C 22
0 C 1 A 2 U 3 A 4 C 5 U 6 U 7 C 8 C 9 U 10 U 11 A 12 C 13 A 14 U 15 G 16 C 17 C 18 C 19 A 20 U 21 A 21
0 U 1 G 2 G 3 A 4 A 5 U 6 G 7 U 8 A 9 A 10 A 11 G 12 A 13 A 14 G 15 U 16 A 17 U 18 G 19 U 20 A 22
0 C 1 A 2 U 3 C 4 A 5 A 6 A 7 G 8 C 9 G 10 G 11 U 12 G 13 G 14 U 15 U 16 G 17 A 18 U 19 G 20 U 21 G 23
0 U 1 A 2 U 3 C 4 A 5 C 6 A 7 G 8 C 9 C 10 A 11 G 12 C 13 U 14 U 15 U 16 G 17 A 18 U 19 G 20 U 21 G 22 C 22
0 A 1 G 2 G 3 C 4 A 5 G 6 U 7 G 8 U 9 G 10 G 11 U 12 U 13 A 14 G 15 C 16 U 17 G 18 G 19 U 20 U 21 G 22
0 A 1 C 2 G 3 G 4 C 5 U 6 A 7 C 8 C 9 U 10 U 11 C 12 A 13 C 14 U 15 G 16 C 17 C 18 A 19 C 20 C 21 C
As you can see above the loop goes through the first element, but after extracting the first character from the first element in lst, it goes to the second character in the first element. But I wanted the loop to go through the second element in the list lst. Also, there are elements in the list with unequal lengths, so wondering if there is a way to avoid the IndexError: string index out of range?
I guess I am missing something, it might be too silly. sorry for being naive. If you could please suggest different methods to accomplish the job, it would be awesome. I checked online about using array from the module numpy, but is there a way to do this without numpy?
You can use itertools.zip_longest:
import itertools
[''.join(chars) for chars in itertools.zip_longest(*lst,fillvalue = '')]
output:
['UCUACUCUAA', 'GUCCAGAAGC', 'AACAUGUUGG', 'GUCCAACCCG', 'GGUCCAAAAC', 'UCGUUUACGU', 'AAAGUGAAUA', 'GAGGCUGGGC', 'UUAGCACCUC', 'AUCCUAGCGU', 'GUCUUAGAGU', 'GUUCAGUGUC', 'UCCUCAGCUA', 'UUACAAGUAC', 'GAACUGUUGU', 'UCGGGUUUCG', 'ACUGCAGGUC', 'UUGGCUAAGC', 'AUUUCGUUGA', 'GAGAAUGGUC', 'UCACUAUUUC', 'UCCAGGGC', 'C']
The built-in zip() and well as the itertools method zip_longest() in Python 3 (or, in Python 2, the itertools methods izip() and izip_longest()) are the tools of choice when you want to process two or more iterables (such as lists, strings, or generators) in parallel. To see the difference between zip() and zip_longest() consider the following:
for chars in zip('ABCD','EFG','HI'):
print(chars)
print('')
for chars in itertools.zip_longest('ABCD','EFG','HI',fillvalue = ''):
print(chars)
Output:
('A', 'E', 'H')
('B', 'F', 'I')
('A', 'E', 'H')
('B', 'F', 'I')
('C', 'G', '')
('D', '', '')
the first tuple produced is the tuple of the first elements, the second tuple produced is the tuple of the second elements, etc. zip (or izip) stops as soon as the first iterable is exhausted. In this case it can't return a tuple of the third character in each string since the 3rd input to zip lacks a third character. zip_longest() (or izip_longest()) allows for a fillvalue to take the place of missing items in the shorter iterables once they are exahausted. Here I used the empty string since that disappears when the tuples are joined by ''.
In the above code I hardwired in 3 strings to zip_longest(). For your problem, you would have to explicitly enter 10 inputs, which would be tedious in the extreme, or use the unpacking operator *. If I have a list:
strings = ['ABCD','EFG', 'HI']
Then
for char in itertools.zip_longest(*strings, fillvalue = ''):
is equivalent to
for chars in itertools.zip_longest('ABCD','EFG','HI',fillvalue = ''):
You will need to iterate through indices of the longest string:
lst = ['UGAGGUAGUAGGUUGUAUAGUU', 'CUAUGCAAUUUUCUACCUUACC',
'UCCCUGAGACCUCAAGUGUGA', 'ACACCUGGGCUCUCCGGGUACC',
'CAUACUUCCUUACAUGCCCAUA', 'UGGAAUGUAAAGAAGUAUGUA',
'CAUCAAAGCGGUGGUUGAUGUG', 'UAUCACAGCCAGCUUUGAUGUGC',
'AGGCAGUGUGGUUAGCUGGUUG', 'ACGGCUACCUUCACUGCCACCC']
max_len = max(len(x) for x in lst) # length of the longest string
new_lst = [ ''.join(x[i] for x in lst if i < len(x)) for i in range(max_len)]

Categories