Make a list / table with 2 FOR in Python - python

I made a program and it was like that at the exit
A Alanina
B Ácido aspártico ou Asparagina
C Cisteína
D Ácido aspártico
E Ácido glutâmico
F Fenilalanina
G Glicina
H Histidina
I Isoleucina
J Leucina (L) ou Isoleucina
K Lisina
L Leucina
M Metionina
N Asparagina
O Pirrolisina
P Prolina
Q Glutamina
R Arginina
S Serina
T Treonina
U Selenocisteína
V Valina
W Triptofano
X qualquer
Y Tirosina
33
0
4
26
32
14
38
14
26
0
25
36
15
16
0
19
15
16
14
20
0
32
0
11
But I want these numbers to be next to the letter and word column, a list where the type would look like
-A ------ Alanina -------- number of times the A appears
-B ------ Aspartic acid or asparagine -------- number of times B appears
it is getting information from an e.coli.fasta.txt file:
>sp|A1AA21|PEPT_ECOK1 Peptidase T OS=Escherichia coli O1:K1 / APEC OX=405955 GN=pepT PE=3 SV=1
MDKLLERFLNYVSLDTQSKAGVRQVPSTEGQWKLLHLLKEQLEEMGLINVTLSEKGTLMA
TLPANVPGDIPAIGFISHVDTSPDCSGKNVNPQIVENYRGGDIALGIGDEVLSPVMFPVL
HQLLGQTLITTDGKTLLGADDKAGIAEIMTALAVLQQKNIPHGDIRVAFTPDEEVGKGAK
HFDVDAFDARWAYTVDGGGVGELEFENFNAASVNIKIVGNNVHPGTAKGVMVNALSLAAR
IHAEVPADESPEMTEGYEGFYHLASMKGTVERADMHYIIRDFDRKQFEARKRKMMEIAKK
VGKGLHPDCYIELVIEDSYYNMREKVVEHPHILDIAQQAMRDCDIEPELKPIRGGTDGAQ
LSFMGLPCPNLFTGGYNYHGKHEFVTLEGMEKAVQVIVRIAELTAQRK
and this is the program code:
f = open('e.coli.fasta.txt','r')
sequencia = f.readlines()
amino = [] #para colocar o arquivo numa lista só com o texto de interresse
for linha in sequencia:
if linha.find('>') != 0:
amino.append(linha)
tfasta= "".join(amino)
aminoacidos = {}
aminoacidos = {'A':'Alanina','B':'Ácido aspártico ou Asparagina','C':'Cisteína', 'D':'Ácido aspártico','E':'Ácido glutâmico','F':'Fenilalanina','G':'Glicina','H':'Histidina','I':'Isoleucina','J':'Leucina (L) ou Isoleucina','K':'Lisina','L':'Leucina','M':'Metionina','N':'Asparagina','O':'Pirrolisina','P':'Prolina','Q':'Glutamina','R':'Arginina','S':'Serina','T':'Treonina','U':'Selenocisteína','V':'Valina','W':'Triptofano','X':'qualquer','Y':'Tirosina'}
def ocorrencias(string):
result = {}
chaves = 'ABCDEFGHIJKLMNOPQRSTUVXY'
for i in chaves:
result[i] = tfasta.count(i)
return result
ocor = (ocorrencias(tfasta))
with open ('PeptidadeT-aminoacidos','w') as p:
for i in range(65,90):
a = ('%s' % (chr(i)))
p.write('{:4s}\t{:5s}\n'.format(a,(aminoacidos[a])))
for e in ocor.values():
p.write('{}\n'.format(e))

The variable ocor is a python data type called dictionary[1]. In your code it is compose of {key: value} = {"aminoacido": ocorrencias}. You can use aminoacid name to get the number of occurencies just like this: ocor['A'] it will return 33.
f = open('e.coli.fasta.txt','r')
sequencia = f.readlines()
amino = [] #para colocar o arquivo numa lista só com o texto de interresse
for linha in sequencia:
if linha.find('>') != 0:
amino.append(linha)
tfasta= "".join(amino)
aminoacidos = {}
aminoacidos = {'A':'Alanina','B':'Ácido aspártico ou Asparagina','C':'Cisteína', 'D':'Ácido aspártico','E':'Ácido glutâmico','F':'Fenilalanina','G':'Glicina','H':'Histidina','I':'Isoleucina','J':'Leucina (L) ou Isoleucina','K':'Lisina','L':'Leucina','M':'Metionina','N':'Asparagina','O':'Pirrolisina','P':'Prolina','Q':'Glutamina','R':'Arginina','S':'Serina','T':'Treonina','U':'Selenocisteína','V':'Valina','W':'Triptofano','X':'qualquer','Y':'Tirosina'}
def ocorrencias(string):
result = {}
chaves = 'ABCDEFGHIJKLMNOPQRSTUVWXY'
for i in chaves:
result[i] = tfasta.count(i)
return result
ocor = ocorrencias(tfasta)
with open ('PeptidadeT-aminoacidos','w') as p:
for i in range(65,90):
a = ('%s' % (chr(i)))
p.write('-{:4s}------{:5s}------{}\n'.format(a, aminoacidos[a], ocor[a]))
Just a side note here: it was missing the letter W on chaves, I just added it to prevent KeyError. If this is not wanted, you can add try/except clause on p.write.
chaves = 'ABCDEFGHIJKLMNOPQRSTUVWXY'
[1] https://realpython.com/python-dicts/

Related

How to sort the websites by their popularity?

Im using the script currently and i cant seem to find out a way to sort the Websites by their popularity, im a beginner.
import random
# création d'un dictionnaire Hypertexte
Hypertext = {}
# création d'un dictionnaire pour le nombre de visite
Walk_Number = {}
# une variable pour le nombre total de visite
Total_Walk = 0
#liste des sites web
Websites = ["A","B","C","D","E","F"]
# les liens hypertextes
# le dictionnaire possède des clés ( nom des sites)
# Qui contiennent des listes (liens hypertextes)
Hypertext["A"] = ["B","C","E"]
Hypertext["B"] = ["F"]
Hypertext["C"] = ["A","E"]
Hypertext["D"] = ["B","C"]
Hypertext["E"] = ["A","B","C","D","F"]
Hypertext["F"] = ["E"]
print(Hypertext)
# On initialise à 0.0 les visites des sites
Walk_Number["A"] = 0.0
Walk_Number["B"] = 0.0
Walk_Number["C"] = 0.0
Walk_Number["D"] = 0.0
Walk_Number["E"] = 0.0
Walk_Number["F"] = 0.0
i = 0
while i < 1000:
x = random.choice(Websites)
while random.random() < 0.85:
Walk_Number[x] = Walk_Number[x] + 1
Total_Walk = Total_Walk + 1
x = random.choice(Hypertext[x])
i = i + 1
print (Walk_Number)
print(Total_Walk)
I tried using the sort() function but i cant seem to find a way to sort it into the script
I think by popularity you mean the number of visits that you have saved in your Walk_Number dictionary. If you want to resort your dictionary based on values with a descending order you can do it like this:
def sort_dict_by_value(d, reverse=False):
return dict(sorted(d.items(), key=lambda x: x[1], reverse=reverse))
print(sort_dict_by_value(Walk_Number, True))

dataframe put must be a unicode string, not 0, how give the string not the dataframe

i try to manipulate some dataframe and i did a function to calculate the distance between 2 cities.
def find_distance(A,B):
key = '0377f0e6b42a47fe9d30a4e9a2b3bb63' # get api key from: https://opencagedata.com
geocoder = OpenCageGeocode(key)
result_A = geocoder.geocode(A)
lat_A = result_A[0]['geometry']['lat']
lng_A = result_A[0]['geometry']['lng']
result_B = geocoder.geocode(B)
lat_B = result_B[0]['geometry']['lat']
lng_B = result_B[0]['geometry']['lng']
return int(geodesic((lat_A,lng_A), (lat_B,lng_B)).kilometers)
this is my dataframe
2 32 Mulhouse 1874.0 2 797 16.8 16,3 € 10.012786
13 13 Saint-Étienne 1994.0 3 005 14.3 13,5 € 8.009882
39 39 Roubaix 2845.0 2 591 17.4 15,0 € 6.830968
27 27 Perpignan 2507.0 3 119 15.1 13,3 € 6.727255
40 40 Tourcoing 3089.0 2 901 17.5 15,3 € 6.327547
25 25 Limoges 2630.0 2 807 14.2 12,5 € 6.030424
20 20 Le Mans 2778.0 3 202 14.4 12,3 € 5.789559
there is my code:
def clean_text(row):
# return the list of decoded cell in the Series instead
return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]
def main():
inFile = "prix_m2_france.xlsx" #On ouvre l'excel
inSheetName = "Sheet1" #le nom de l excel
cols = ['Ville', 'Prix_moyen', 'Loyer_moyen'] #Les colomnes
df =(pd.read_excel(inFile, sheet_name = inSheetName))
df[cols] = df[cols].replace({'€': '', ",": ".", " ": "", "\u202f":""}, regex=True)
# df['Prix_moyen'] = df.apply(clean_text)
# df['Loyer_moyen'] = df.apply(clean_text)
df['Prix_moyen'] = df['Prix_moyen'].astype(float)
df['Loyer_moyen'] = df['Loyer_moyen'].astype(float)
# df["Prix_moyen"] += 1
df["revenu"] = (df['Loyer_moyen'] * 12) / (df["Prix_moyen"] * 1.0744) * 100
# df['Ville'].replace({'Le-Havre': 'Le Havre', 'Le-Mans': 'Le Mans'})
df["Ville"] = df['Ville'].replace(['Le-Havre', 'Le-Mans'], ['Le Havre', 'Le Mans'])
df["distance"] = find_distance("Paris", df["Ville"])
df2 = df.sort_values(by = 'revenu', ascending = False)
print(df2.head(90))
main()
df["distance"] = find_distance("Paris", df["Ville"]) fails and give me this error:
opencage.geocoder.InvalidInputError: Input must be a unicode string, not 0 Paris
1 Marseille
2 Lyon
3 T
I imagine it as a loop where i will put the distance between paris and the city but i guess it take all the dataframe on my first value.
Thanks for your help
(Edit, i just pasted a part of my dataframe)
You can try something like :
df["distance"] = [find_distance("Paris", city) for city in df["Ville"]]

How to number from an SQL database in Python?

How to get numbers 1 to 10 next to the SQL table contents from the Chinook database in a good format? I can't get the loop from 1 to 10 next to the other three elements of the database file. The output I want :
1 Chico Buarque Minha Historia 27
2 Lenny Kravitz Greatest Hits 26
3 Eric Clapton Unplugged 25
4 Titãs Acústico 22
5 Kiss Greatest Kiss 20
6 Caetano Veloso Prenda Minha 19
7 Creedence Clearwater Revival Chronicle, Vol. 2 19
8 The Who My Generation - The Very Best Of The Who 19
9 Green Day International Superhits 18
10 Creedence Clearwater Revival Chronicle, Vol. 1 18
My code :
import sqlite3
try:
conn = sqlite3.connect(r'C:\Users\Just\Downloads\chinook.db')
except Exception as e:
print(e)
cur = conn.cursor()
cur.execute('''SELECT artists.Name, albums.Title, count (albums.AlbumId) AS AlbumAmountListened
FROM albums
INNER JOIN tracks ON albums.AlbumId = tracks.AlbumId
INNER JOIN invoice_items ON tracks.TrackId = invoice_items.TrackId
INNER JOIN artists ON albums.ArtistId = artists.ArtistId
GROUP BY albums.AlbumId
ORDER BY AlbumAmountListened DESC
LIMIT 10''')
top_10_albums = cur.fetchall()
def rank():
for item in top_10_albums:
name = item[0]
artist = item[1]
album_played = item[2]
def num():
for i in range(1,11):
print (i)
return i
print (num(),'\t', name, '\t', artist, '\t', album_played, '\t')
print (rank())
My 1-10 number loops like this:
1
2
3
4
5
6
7
8
9
10
10 Chico Buarque Minha Historia 27
1
2
3
4
5
6
7
8
9
10
10 Lenny Kravitz Greatest Hits 26
And so on. How do I correctly combine my range object?
You can use enumerate() to provide the numbers for you as you iterate over the rows:
top_10_albums = cur.fetchall()
for i, item in enumerate(top_10_albums, start=1):
name = item[0]
artist = item[1]
album_played = item[2]
print(f'{i}\t{name}\t{artist}\t{album_played}')
You don't even have to unpack the item into variables, just reference them directly in the fstring:
for i, item in enumerate(top_10_albums, start=1):
print(f'{i}\t{item[0]}\t{item[1]}\t{item[2]')
But this is perhaps nicer:
for i, (name, artist, album_played) in enumerate(top_10_albums, start=1):
print(f'{i}\t{name}\t{artist}\t{album_played}')
This uses tuple unpacking to bind the fields from the row to descriptively named variables, which makes it self documenting.
Just need to iterate with an index(i) within the for loop such as
top_10_albums = cur.fetchall()
i=0
for item in top_10_albums:
name = item[0]
artist = item[1]
album_played = item[2]
i += 1
print (i,'\t', name, '\t', artist, '\t', album_played, '\t')
in your case, inner loop produces 10 numbers for each step of outer loop.
Numbered Version
def rowView(strnum,row,flen_align=[(30,"l"),(30,"r"),(5,"r")]):
i = 0
line=""
for k,v in row.items():
flen , align = flen_align[i]
strv = str(v)
spaces = "_" * abs(flen - len(strv))
if align == "l":
line += strv+spaces
if align == "r":
line += spaces+strv
i+=1
return strnum+line
dlist=[
{ "name":"Chico Buarque", "title":"Minha Historia","AAL":27},
{ "name":"Lenny Kravit", "title":"Greatest Hits","AAL":26},
{ "name":"Eric Clapton", "title":"Unplugged","AAL":25},
{ "name":"Titã", "title":"Acústico","AAL":22},
{ "name":"Kis", "title":"Greatest Kiss","AAL":20},
{ "name":"Caetano Velos", "title":"Prenda Minha","AAL":19},
{ "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.2","AAL":19},
{ "name":"TheWho My Generation", "title":"The Very Best Of The Who","AAL":19},
{ "name":"Green Da", "title":"International Superhits","AAL":18},
{ "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.1","AAL":18}
]
for num, row in enumerate(dlist,start=1):
strnum=str(num)
strnum += "_" * (5-len(strnum))
print(rowView(strnum,row))
Or using record id directly
def rowView(row,flen_align=[(5,"l"),(30,"l"),(30,"r"),(5,"r")]):
i,line = 0,""
for k,v in row.items():
flen , align = flen_align[i]
strv = str(v)
spaces = "_" * abs(flen - len(strv))
if align == "l":
line += strv+spaces
if align == "r":
line += spaces+strv
i+=1
return line
dlist=[
{"id":1, "name":"Chico Buarque", "title":"Minha Historia","AAL":27},
{"id":2, "name":"Lenny Kravit", "title":"Greatest Hits","AAL":26},
{"id":3, "name":"Eric Clapton", "title":"Unplugged","AAL":25},
{"id":4, "name":"Titã", "title":"Acústico","AAL":22},
{"id":5, "name":"Kis", "title":"Greatest Kiss","AAL":20},
{"id":6, "name":"Caetano Velos", "title":"Prenda Minha","AAL":19},
{"id":7, "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.2","AAL":19},
{"id":8, "name":"TheWho My Generation", "title":"The Very Best Of The Who","AAL":19},
{"id":9, "name":"Green Da", "title":"International Superhits","AAL":18},
{"id":10, "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.1","AAL":18}
]
for row in dlist:
print(rowView(row))
same output for both versions:
1____Chico Buarque_________________________________Minha Historia___27
2____Lenny Kravit___________________________________Greatest Hits___26
3____Eric Clapton_______________________________________Unplugged___25
4____Titã________________________________________________Acústico___22
5____Kis____________________________________________Greatest Kiss___20
6____Caetano Velos___________________________________Prenda Minha___19
7____Creedence Clearwater Reviva__________________Chronicle,Vol.2___19
8____TheWho My Generation________________The Very Best Of The Who___19
9____Green Da_____________________________International Superhits___18
10___Creedence Clearwater Reviva__________________Chronicle,Vol.1___18

how to sum/aggregate by group without using pandas or import

so I am basically not allowed to use any import or other libraries like pandas or groupby.
and I have to categorize the data and sum up the corresponding values. The data is in the csv file.
For example,
**S** C **T**
A T 100
A. B 102
A. T. 200
A B. 100
C T 203
C. T. 200
C B 200
C T 200
C. B 200
my expected result should be
S C T
A T 300
A B. 202
C T 403
C B. 200
C T. 200
C B. 200
Considering that you have a csv file (i.e., columns split by comma):
with open('myfile.csv', 'r') as file:
header = file.readline().rstrip()
data = {}
for row in file:
state, candidate, value = row.split(',')
k, value = (state, candidate), int(value)
data[k] = data.get(k, 0) + value
result_csv = '\n'.join([header] + [f"{','.join(k)},{v}" for k,v in data.items()])
print(result_csv)
Output:
state,candidate,total votes
Alaska,Trump,300
Alaska,Biden,202
colorado,Trump,403
colorado,Biden,200
California,Trump,200
California,Biden,200
Original content of myfile.csv is (use str.replace if necessary):
state,candidate,total votes
Alaska,Trump,100
Alaska,Biden,102
Alaska,Trump,200
Alaska,Biden,100
colorado,Trump,203
colorado,Trump,200
colorado,Biden,200
California,Trump,200
California,Biden,200
mylist = []
with open("data", "r") as msg:
for line in msg:
mylist.append(line.strip().replace(".",""))
msg.close()
headers = mylist[0].replace("*","").split()
del mylist[0]
headers[2] = headers[2]+" "+headers[3]
mydict = {}
for line in mylist:
state = line.split()[0]
mydict[state] = {}
for line in mylist:
state = line.split()[0]
candidate = line.split()[1]
mydict[state][candidate] = 0
for line in mylist:
state = line.split()[0]
candidate = line.split()[1]
votes = line.split()[2]
mydict[state][candidate] = mydict[state][candidate] + int(votes)
print ("%-15s %-15s %-15s \n\n" % (headers[0],headers[1],headers[2]))
for state in mydict.keys():
for candidate in mydict[state].keys():
print ("%-15s %-15s %-15s" % (state,candidate,str(mydict[state][candidate])))
Output:
state candidate total votes
Alaska Trump 300
Alaska Biden 202
colorado Trump 403
colorado Biden 200
California Trump 200
California Biden 200

How do I generate all possible words of a regular expression with the following rules and syntax?

How do I generate all possible words of a regular expression with the following rules and syntax:
the user inputs the alphabet;
the user inputs the expression;
any character that's not ()*+ or space can be part of the alphabet;
the + character chooses between the character or sequence on its left or the one on its right;
the * character allows one or more repetitions of the character or sequence on its left;
two alphabetic characters in sequence will be concatenated;
parentheses may change precedence.
I'm getting alphabet and expression as strings from user, and casting it to python lists. Then I proceed some simple validation tests on both, based on the rules above.
After that, currently, my algorithm already generates correctly the words of expressions WITHOUT parentheses. And here's my problem: I haven't found yet a way to manage parentheses properly to correctly generate the words. As I see, I have two options:
1) find a way to calculate the possible words directly from the original expression, or
2) somehow eliminate all the parentheses and have my current algorithm to solve it.
I wonder if I can get the first option done using some kind of recursive function, although I still don't know how. Any thoughts?
Here's the code so far (sorry for comments in portuguese, brazilian here):
alphabetInput = input(
"Informe os caracteres do alfabeto (OBS.: os símbolos '(', ')', '+' e '*' são caracteres reservados):\n")
alphabetInput = alphabetInput.strip() # remove os espaços em branco da string
alphabetInput = list(alphabetInput) # transforma a string numa lista
# remove as duplicidades do alfabeto
alphabet = []
for c in alphabetInput:
if c not in alphabet:
alphabet.append(c)
for c in alphabet:
if c in "()*+":
print("Alfabeto inválido (OBS.: os símbolos '(', ')', '+' e '*' são caracteres reservados).")
exit()
expression = input("Informe a expressão regular:\n")
expression = expression.strip() # remove os espaços em branco da string
expression = list(expression) # converte a string numa lista
counter = 0 # controle dos parênteses abertos e fechados no loop a seguir
for c in (expression):
if c == "(":
counter = counter + 1 # +1 significa que um parêntese foi aberto
elif c == ")":
counter = counter - 1 # -1 significa que um parêntese foi fechado
else:
pass
if counter < 0:
print("Expressão inválida. Um parêntese foi fechado sem ter sido aberto antes.")
exit()
# se o contador for maior do que zero, significa que há mais parênteses abertos do que fechados
# se for menor do que zero, terá caído no 'if' dentro do loop acima
# se for igual a zero, estará ok
if counter > 0:
print("Expressão inválida: Existem parênteses em aberto.")
exit()
# testa a validade da expressão com base na existência sequências inválidas de caracteres
if (expression[0] == "+" or expression[len(expression) - 1] == "+"): # testa a existência de "+" no começo e no fim da expressão
print("Expressão inválida: Existem adições lógicas (+) sem um dos operandos.")
exit()
else:
for i in range(0, len(expression)):
if (expression[i] == "+"):
if (expression[i + 1] == "+" or expression[i + 1] == "*" or expression[i + 1] == ""):
print("Expressão inválida: Existem adições lógicas (+) sem um dos operandos.")
exit()
#### FIM DA SEÇÃO 1 ####
#### SEÇÃO 2: TESTE DE PERTINÊNCIA DOS SÍMBOLOS INFORMADOS AO ALFABETO INFORMADO ####
# OBS.: 'expression' já é uma lista, o casting é feito para que o conteúdo modificado em uma não o seja na outra
testExpression = list(expression)
# os loops seguintes marcam e removem os espaços em branco e caracteres especiais da expressão,
# e em seguida testam se os símbolos usados pertencem ao dicionário informado
for i in range(len(testExpression)):
if testExpression[i] in " ()+*":
testExpression[i] = "marked"
while "marked" in testExpression:
testExpression.remove("marked")
# testa se os símbolos usados fazem parte do alfabeto informado
for c in testExpression:
counter = 0
for cc in alphabet:
if c == cc:
counter += 1;
if counter == 0:
# se esse contador for igual a zero, significa que não há correspondência entre o símbolo da expressão e o alfabeto
print("Expressão informada possui símbolos que não pertencem ao alfabeto.")
exit()
#### FIM DA SEÇÃO 2 ####
#### SEÇÃO 3: GERAÇÃO DAS PALAVRAS ####
possibilities = [[]] # vetor que receberá os subvetores com cada possiblilidade
possibilitiesCounter = 0 # contador de possibilidades, toda vez que ele é incrementado um novo subvetor é criado
# monta e arranja as palavras geradas no vetor 'possibilities'
for i in range(0, len(expression)):
if expression[i] in alphabet:
possibilities[possibilitiesCounter].append(expression[i])
if expression[i] == "+":
possibilities.append([])
possibilitiesCounter += 1
# imprime todas as palavras geradas na tela
yetAnotherCounter = 0
words = [] # lista final das palavras
for l in possibilities:
words.append("".join(str(x) for x in l))
print("Palavra ", yetAnotherCounter + 1, ":", words[yetAnotherCounter])
yetAnotherCounter += 1
So, I came to a solution by using a library called exrex: https://github.com/asciimoo/exrex
It has a function generate which does exactly what I wanted.

Categories