How to number from an SQL database in Python? - python

How to get numbers 1 to 10 next to the SQL table contents from the Chinook database in a good format? I can't get the loop from 1 to 10 next to the other three elements of the database file. The output I want :
1 Chico Buarque Minha Historia 27
2 Lenny Kravitz Greatest Hits 26
3 Eric Clapton Unplugged 25
4 Titãs Acústico 22
5 Kiss Greatest Kiss 20
6 Caetano Veloso Prenda Minha 19
7 Creedence Clearwater Revival Chronicle, Vol. 2 19
8 The Who My Generation - The Very Best Of The Who 19
9 Green Day International Superhits 18
10 Creedence Clearwater Revival Chronicle, Vol. 1 18
My code :
import sqlite3
try:
conn = sqlite3.connect(r'C:\Users\Just\Downloads\chinook.db')
except Exception as e:
print(e)
cur = conn.cursor()
cur.execute('''SELECT artists.Name, albums.Title, count (albums.AlbumId) AS AlbumAmountListened
FROM albums
INNER JOIN tracks ON albums.AlbumId = tracks.AlbumId
INNER JOIN invoice_items ON tracks.TrackId = invoice_items.TrackId
INNER JOIN artists ON albums.ArtistId = artists.ArtistId
GROUP BY albums.AlbumId
ORDER BY AlbumAmountListened DESC
LIMIT 10''')
top_10_albums = cur.fetchall()
def rank():
for item in top_10_albums:
name = item[0]
artist = item[1]
album_played = item[2]
def num():
for i in range(1,11):
print (i)
return i
print (num(),'\t', name, '\t', artist, '\t', album_played, '\t')
print (rank())
My 1-10 number loops like this:
1
2
3
4
5
6
7
8
9
10
10 Chico Buarque Minha Historia 27
1
2
3
4
5
6
7
8
9
10
10 Lenny Kravitz Greatest Hits 26
And so on. How do I correctly combine my range object?

You can use enumerate() to provide the numbers for you as you iterate over the rows:
top_10_albums = cur.fetchall()
for i, item in enumerate(top_10_albums, start=1):
name = item[0]
artist = item[1]
album_played = item[2]
print(f'{i}\t{name}\t{artist}\t{album_played}')
You don't even have to unpack the item into variables, just reference them directly in the fstring:
for i, item in enumerate(top_10_albums, start=1):
print(f'{i}\t{item[0]}\t{item[1]}\t{item[2]')
But this is perhaps nicer:
for i, (name, artist, album_played) in enumerate(top_10_albums, start=1):
print(f'{i}\t{name}\t{artist}\t{album_played}')
This uses tuple unpacking to bind the fields from the row to descriptively named variables, which makes it self documenting.

Just need to iterate with an index(i) within the for loop such as
top_10_albums = cur.fetchall()
i=0
for item in top_10_albums:
name = item[0]
artist = item[1]
album_played = item[2]
i += 1
print (i,'\t', name, '\t', artist, '\t', album_played, '\t')
in your case, inner loop produces 10 numbers for each step of outer loop.

Numbered Version
def rowView(strnum,row,flen_align=[(30,"l"),(30,"r"),(5,"r")]):
i = 0
line=""
for k,v in row.items():
flen , align = flen_align[i]
strv = str(v)
spaces = "_" * abs(flen - len(strv))
if align == "l":
line += strv+spaces
if align == "r":
line += spaces+strv
i+=1
return strnum+line
dlist=[
{ "name":"Chico Buarque", "title":"Minha Historia","AAL":27},
{ "name":"Lenny Kravit", "title":"Greatest Hits","AAL":26},
{ "name":"Eric Clapton", "title":"Unplugged","AAL":25},
{ "name":"Titã", "title":"Acústico","AAL":22},
{ "name":"Kis", "title":"Greatest Kiss","AAL":20},
{ "name":"Caetano Velos", "title":"Prenda Minha","AAL":19},
{ "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.2","AAL":19},
{ "name":"TheWho My Generation", "title":"The Very Best Of The Who","AAL":19},
{ "name":"Green Da", "title":"International Superhits","AAL":18},
{ "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.1","AAL":18}
]
for num, row in enumerate(dlist,start=1):
strnum=str(num)
strnum += "_" * (5-len(strnum))
print(rowView(strnum,row))
Or using record id directly
def rowView(row,flen_align=[(5,"l"),(30,"l"),(30,"r"),(5,"r")]):
i,line = 0,""
for k,v in row.items():
flen , align = flen_align[i]
strv = str(v)
spaces = "_" * abs(flen - len(strv))
if align == "l":
line += strv+spaces
if align == "r":
line += spaces+strv
i+=1
return line
dlist=[
{"id":1, "name":"Chico Buarque", "title":"Minha Historia","AAL":27},
{"id":2, "name":"Lenny Kravit", "title":"Greatest Hits","AAL":26},
{"id":3, "name":"Eric Clapton", "title":"Unplugged","AAL":25},
{"id":4, "name":"Titã", "title":"Acústico","AAL":22},
{"id":5, "name":"Kis", "title":"Greatest Kiss","AAL":20},
{"id":6, "name":"Caetano Velos", "title":"Prenda Minha","AAL":19},
{"id":7, "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.2","AAL":19},
{"id":8, "name":"TheWho My Generation", "title":"The Very Best Of The Who","AAL":19},
{"id":9, "name":"Green Da", "title":"International Superhits","AAL":18},
{"id":10, "name":"Creedence Clearwater Reviva", "title":"Chronicle,Vol.1","AAL":18}
]
for row in dlist:
print(rowView(row))
same output for both versions:
1____Chico Buarque_________________________________Minha Historia___27
2____Lenny Kravit___________________________________Greatest Hits___26
3____Eric Clapton_______________________________________Unplugged___25
4____Titã________________________________________________Acústico___22
5____Kis____________________________________________Greatest Kiss___20
6____Caetano Velos___________________________________Prenda Minha___19
7____Creedence Clearwater Reviva__________________Chronicle,Vol.2___19
8____TheWho My Generation________________The Very Best Of The Who___19
9____Green Da_____________________________International Superhits___18
10___Creedence Clearwater Reviva__________________Chronicle,Vol.1___18

Related

dataframe put must be a unicode string, not 0, how give the string not the dataframe

i try to manipulate some dataframe and i did a function to calculate the distance between 2 cities.
def find_distance(A,B):
key = '0377f0e6b42a47fe9d30a4e9a2b3bb63' # get api key from: https://opencagedata.com
geocoder = OpenCageGeocode(key)
result_A = geocoder.geocode(A)
lat_A = result_A[0]['geometry']['lat']
lng_A = result_A[0]['geometry']['lng']
result_B = geocoder.geocode(B)
lat_B = result_B[0]['geometry']['lat']
lng_B = result_B[0]['geometry']['lng']
return int(geodesic((lat_A,lng_A), (lat_B,lng_B)).kilometers)
this is my dataframe
2 32 Mulhouse 1874.0 2 797 16.8 16,3 € 10.012786
13 13 Saint-Étienne 1994.0 3 005 14.3 13,5 € 8.009882
39 39 Roubaix 2845.0 2 591 17.4 15,0 € 6.830968
27 27 Perpignan 2507.0 3 119 15.1 13,3 € 6.727255
40 40 Tourcoing 3089.0 2 901 17.5 15,3 € 6.327547
25 25 Limoges 2630.0 2 807 14.2 12,5 € 6.030424
20 20 Le Mans 2778.0 3 202 14.4 12,3 € 5.789559
there is my code:
def clean_text(row):
# return the list of decoded cell in the Series instead
return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]
def main():
inFile = "prix_m2_france.xlsx" #On ouvre l'excel
inSheetName = "Sheet1" #le nom de l excel
cols = ['Ville', 'Prix_moyen', 'Loyer_moyen'] #Les colomnes
df =(pd.read_excel(inFile, sheet_name = inSheetName))
df[cols] = df[cols].replace({'€': '', ",": ".", " ": "", "\u202f":""}, regex=True)
# df['Prix_moyen'] = df.apply(clean_text)
# df['Loyer_moyen'] = df.apply(clean_text)
df['Prix_moyen'] = df['Prix_moyen'].astype(float)
df['Loyer_moyen'] = df['Loyer_moyen'].astype(float)
# df["Prix_moyen"] += 1
df["revenu"] = (df['Loyer_moyen'] * 12) / (df["Prix_moyen"] * 1.0744) * 100
# df['Ville'].replace({'Le-Havre': 'Le Havre', 'Le-Mans': 'Le Mans'})
df["Ville"] = df['Ville'].replace(['Le-Havre', 'Le-Mans'], ['Le Havre', 'Le Mans'])
df["distance"] = find_distance("Paris", df["Ville"])
df2 = df.sort_values(by = 'revenu', ascending = False)
print(df2.head(90))
main()
df["distance"] = find_distance("Paris", df["Ville"]) fails and give me this error:
opencage.geocoder.InvalidInputError: Input must be a unicode string, not 0 Paris
1 Marseille
2 Lyon
3 T
I imagine it as a loop where i will put the distance between paris and the city but i guess it take all the dataframe on my first value.
Thanks for your help
(Edit, i just pasted a part of my dataframe)
You can try something like :
df["distance"] = [find_distance("Paris", city) for city in df["Ville"]]

How To Add Specific Keys In Nested Dictionary In Python

I may be formatting this dictionary wrong (my first time doing this)
I have a dictionary of every province with corrected ID and added it to value "Canada". I'm trying to add the population of ALL the provinces in the nested dictionary
ontario = dict(capital="Toronto", largest="Toronto", population=14826276)
quebec = dict(capital="Quebec City", largest="Montreal", population=8604495)
nova_Scotia = dict(capital="Halifax", largest='Halifax', population=992055)
new_Brunswick = dict(capital="Fredricton", largest='Moncton', population=789225)
manitoba = dict(capital="Winnipeg", largest="Winnipeg", population=1383765)
canada = {ontario, quebec, nova_Scotia, new_brunswick, manitoba, british_columbia, prince_edward_island, saskatchewan, alberta, newfoundland_and_labrador}
for key, value in canada.items():
if value and 'population' in value.keys():
# Adding all values of population to receive total population of canada
sum += value['population']
print(sum)
thanks again in advance.
You didn't create dictionary but set (which doesn't have keys)
To create dictionary you would need keys like
canada = {1:ontario, 2:quebec, 3:nova_scotia, 4:new_brunswick, 5:manitoba}
canada = {"Ontario":ontario, "Quebec":quebec, "Nova Scotia":nova_scotia, "New Brunswick":new_brunswick, "Manitoba":manitoba}
and then you can use canada.items() and sum population
(I use variable total because there is function sum())
# --- before `for`-loop ---
total = 0
# --- `for`-loop ---
for key, value in canada.items():
total += value['population']
# --- after `for`-loop ---
print(total)
or shorter
total = sum(value['population'] for value in canada.values())
and then you can add to this dictionary
canada['total'] = total
Full code:
ontario = dict(capital="Toronto", largest="Toronto", population=14826276)
quebec = dict(capital="Quebec City", largest="Montreal", population=8604495)
nova_scotia = dict(capital="Halifax", largest='Halifax', population=992055)
new_brunswick = dict(capital="Fredricton", largest='Moncton', population=789225)
manitoba = dict(capital="Winnipeg", largest="Winnipeg", population=1383765)
canada = {1:ontario, 2:quebec, 3:nova_scotia, 4:new_brunswick, 5:manitoba}#, british_columbia, prince_edward_island, saskatchewan, alberta, newfoundland_and_labrador
total = 0
for key, value in canada.items():
total += value['population']
print(total)
#total = sum(value['population'] for value in canada.values())
canada['total'] = total
print(canada)
I only added the listed 5 provinces into the nested dictionary.
I used a for loop to calculate the total population of Canada (the
sum of the 5 listed provinces).
Note that my nested dictionary has the same format as a normal dictionary,
"a key : value" --> "1 : ontario"
ontario = dict(capital="Toronto", largest="Toronto", population=14826276)
quebec = dict(capital="Quebec City", largest="Montreal", population=8604495)
nova_Scotia = dict(capital="Halifax", largest='Halifax', population=992055)
new_Brunswick = dict(capital="Fredricton", largest='Moncton', population=789225)
manitoba = dict(capital="Winnipeg", largest="Winnipeg", population=1383765)
canada = {1:ontario, 2:quebec, 3:nova_Scotia, 4:new_Brunswick, 5:manitoba}
#canada = {ontario, quebec, nova_Scotia, new_brunswick, manitoba, british_columbia, prince_edward_island, saskatchewan, alberta, newfoundland_and_labrador}
sum = 0
for providence in canada:
# Adding all values of population to receive total population of canada
sum += (canada[providence]["population"])
print(sum)
Try this one.
ontario = dict(capital="Toronto", largest="Toronto", population=14826276)
quebec = dict(capital="Quebec City", largest="Montreal", population=8604495)
nova_Scotia = dict(capital="Halifax", largest='Halifax', population=992055)
new_Brunswick = dict(capital="Fredricton", largest='Moncton', population=789225)
manitoba = dict(capital="Winnipeg", largest="Winnipeg", population=1383765)
canada_list = [ontario, quebec, nova_Scotia, new_Brunswick, manitoba]
total = 0
for item in canada_list:
# Adding all values of population to receive total population of canada
total += item.get('population', 0)
print("Total: {}".format(total))
Output:
Total: 26595816

how to sum/aggregate by group without using pandas or import

so I am basically not allowed to use any import or other libraries like pandas or groupby.
and I have to categorize the data and sum up the corresponding values. The data is in the csv file.
For example,
**S** C **T**
A T 100
A. B 102
A. T. 200
A B. 100
C T 203
C. T. 200
C B 200
C T 200
C. B 200
my expected result should be
S C T
A T 300
A B. 202
C T 403
C B. 200
C T. 200
C B. 200
Considering that you have a csv file (i.e., columns split by comma):
with open('myfile.csv', 'r') as file:
header = file.readline().rstrip()
data = {}
for row in file:
state, candidate, value = row.split(',')
k, value = (state, candidate), int(value)
data[k] = data.get(k, 0) + value
result_csv = '\n'.join([header] + [f"{','.join(k)},{v}" for k,v in data.items()])
print(result_csv)
Output:
state,candidate,total votes
Alaska,Trump,300
Alaska,Biden,202
colorado,Trump,403
colorado,Biden,200
California,Trump,200
California,Biden,200
Original content of myfile.csv is (use str.replace if necessary):
state,candidate,total votes
Alaska,Trump,100
Alaska,Biden,102
Alaska,Trump,200
Alaska,Biden,100
colorado,Trump,203
colorado,Trump,200
colorado,Biden,200
California,Trump,200
California,Biden,200
mylist = []
with open("data", "r") as msg:
for line in msg:
mylist.append(line.strip().replace(".",""))
msg.close()
headers = mylist[0].replace("*","").split()
del mylist[0]
headers[2] = headers[2]+" "+headers[3]
mydict = {}
for line in mylist:
state = line.split()[0]
mydict[state] = {}
for line in mylist:
state = line.split()[0]
candidate = line.split()[1]
mydict[state][candidate] = 0
for line in mylist:
state = line.split()[0]
candidate = line.split()[1]
votes = line.split()[2]
mydict[state][candidate] = mydict[state][candidate] + int(votes)
print ("%-15s %-15s %-15s \n\n" % (headers[0],headers[1],headers[2]))
for state in mydict.keys():
for candidate in mydict[state].keys():
print ("%-15s %-15s %-15s" % (state,candidate,str(mydict[state][candidate])))
Output:
state candidate total votes
Alaska Trump 300
Alaska Biden 202
colorado Trump 403
colorado Biden 200
California Trump 200
California Biden 200

Make a list / table with 2 FOR in Python

I made a program and it was like that at the exit
A Alanina
B Ácido aspártico ou Asparagina
C Cisteína
D Ácido aspártico
E Ácido glutâmico
F Fenilalanina
G Glicina
H Histidina
I Isoleucina
J Leucina (L) ou Isoleucina
K Lisina
L Leucina
M Metionina
N Asparagina
O Pirrolisina
P Prolina
Q Glutamina
R Arginina
S Serina
T Treonina
U Selenocisteína
V Valina
W Triptofano
X qualquer
Y Tirosina
33
0
4
26
32
14
38
14
26
0
25
36
15
16
0
19
15
16
14
20
0
32
0
11
But I want these numbers to be next to the letter and word column, a list where the type would look like
-A ------ Alanina -------- number of times the A appears
-B ------ Aspartic acid or asparagine -------- number of times B appears
it is getting information from an e.coli.fasta.txt file:
>sp|A1AA21|PEPT_ECOK1 Peptidase T OS=Escherichia coli O1:K1 / APEC OX=405955 GN=pepT PE=3 SV=1
MDKLLERFLNYVSLDTQSKAGVRQVPSTEGQWKLLHLLKEQLEEMGLINVTLSEKGTLMA
TLPANVPGDIPAIGFISHVDTSPDCSGKNVNPQIVENYRGGDIALGIGDEVLSPVMFPVL
HQLLGQTLITTDGKTLLGADDKAGIAEIMTALAVLQQKNIPHGDIRVAFTPDEEVGKGAK
HFDVDAFDARWAYTVDGGGVGELEFENFNAASVNIKIVGNNVHPGTAKGVMVNALSLAAR
IHAEVPADESPEMTEGYEGFYHLASMKGTVERADMHYIIRDFDRKQFEARKRKMMEIAKK
VGKGLHPDCYIELVIEDSYYNMREKVVEHPHILDIAQQAMRDCDIEPELKPIRGGTDGAQ
LSFMGLPCPNLFTGGYNYHGKHEFVTLEGMEKAVQVIVRIAELTAQRK
and this is the program code:
f = open('e.coli.fasta.txt','r')
sequencia = f.readlines()
amino = [] #para colocar o arquivo numa lista só com o texto de interresse
for linha in sequencia:
if linha.find('>') != 0:
amino.append(linha)
tfasta= "".join(amino)
aminoacidos = {}
aminoacidos = {'A':'Alanina','B':'Ácido aspártico ou Asparagina','C':'Cisteína', 'D':'Ácido aspártico','E':'Ácido glutâmico','F':'Fenilalanina','G':'Glicina','H':'Histidina','I':'Isoleucina','J':'Leucina (L) ou Isoleucina','K':'Lisina','L':'Leucina','M':'Metionina','N':'Asparagina','O':'Pirrolisina','P':'Prolina','Q':'Glutamina','R':'Arginina','S':'Serina','T':'Treonina','U':'Selenocisteína','V':'Valina','W':'Triptofano','X':'qualquer','Y':'Tirosina'}
def ocorrencias(string):
result = {}
chaves = 'ABCDEFGHIJKLMNOPQRSTUVXY'
for i in chaves:
result[i] = tfasta.count(i)
return result
ocor = (ocorrencias(tfasta))
with open ('PeptidadeT-aminoacidos','w') as p:
for i in range(65,90):
a = ('%s' % (chr(i)))
p.write('{:4s}\t{:5s}\n'.format(a,(aminoacidos[a])))
for e in ocor.values():
p.write('{}\n'.format(e))
The variable ocor is a python data type called dictionary[1]. In your code it is compose of {key: value} = {"aminoacido": ocorrencias}. You can use aminoacid name to get the number of occurencies just like this: ocor['A'] it will return 33.
f = open('e.coli.fasta.txt','r')
sequencia = f.readlines()
amino = [] #para colocar o arquivo numa lista só com o texto de interresse
for linha in sequencia:
if linha.find('>') != 0:
amino.append(linha)
tfasta= "".join(amino)
aminoacidos = {}
aminoacidos = {'A':'Alanina','B':'Ácido aspártico ou Asparagina','C':'Cisteína', 'D':'Ácido aspártico','E':'Ácido glutâmico','F':'Fenilalanina','G':'Glicina','H':'Histidina','I':'Isoleucina','J':'Leucina (L) ou Isoleucina','K':'Lisina','L':'Leucina','M':'Metionina','N':'Asparagina','O':'Pirrolisina','P':'Prolina','Q':'Glutamina','R':'Arginina','S':'Serina','T':'Treonina','U':'Selenocisteína','V':'Valina','W':'Triptofano','X':'qualquer','Y':'Tirosina'}
def ocorrencias(string):
result = {}
chaves = 'ABCDEFGHIJKLMNOPQRSTUVWXY'
for i in chaves:
result[i] = tfasta.count(i)
return result
ocor = ocorrencias(tfasta)
with open ('PeptidadeT-aminoacidos','w') as p:
for i in range(65,90):
a = ('%s' % (chr(i)))
p.write('-{:4s}------{:5s}------{}\n'.format(a, aminoacidos[a], ocor[a]))
Just a side note here: it was missing the letter W on chaves, I just added it to prevent KeyError. If this is not wanted, you can add try/except clause on p.write.
chaves = 'ABCDEFGHIJKLMNOPQRSTUVWXY'
[1] https://realpython.com/python-dicts/

Determining most common name from web scraped birth name data

I have the task to do web scraping from this page https://www.ssa.gov/cgi-bin/popularnames.cgi. There you can find a list of the most common birth names. Now I have to find the most common name that both girls and boys have for a given year (in other words, the exact same name is used in both genders), but I don't know how I am able to do that. With the code below I solved the previous task to output the list for a given year but I have no clue how I can modify my code so I get the most common name that both girls and boys have.
import requests
import lxml.html as lh
url = 'https://www.ssa.gov/cgi-bin/popularnames.cgi'
string = input("Year: ")
r = requests.post(url, data=dict(year=string, top="1000", number="n" ))
doc = lh.fromstring(r.content)
tr_elements = doc.xpath('//table[2]//td[2]//tr')
cols = []
for col in tr_elements[0]:
name = col.text_content()
number = col.text_content()
cols.append((number, []))
count=0
for row in tr_elements[1:]:
i = 0
for col in row:
val = col.text_content()
cols[i][1].append(val)
i += 1
if(count<4):
print(val, end = ' ')
count += 1
else:
count=0
print(val)
Here's one approach. The first step is to group the data by name and record how many genders have used the name and their aggregate total. After that, we can filter the structure by names with more than one gender using it. Finally, we sort this multi-gender list by counts and take the 0-th element. This is our most popular multi-gender name for the year.
import requests
import lxml.html as lh
url = "https://www.ssa.gov/cgi-bin/popularnames.cgi"
year = input("Year: ")
response = requests.post(url, data=dict(year=year, top="1000", number="n"))
doc = lh.fromstring(response.content)
tr_elements = doc.xpath("//table[2]//td[2]//tr")
column_names = [col.text_content() for col in tr_elements[0]]
names = {}
most_common_shared_names_by_year = {}
for row in tr_elements[1:-1]:
row = [cell.text_content() for cell in row]
for i, gender in ((1, "male"), (3, "female")):
if row[i] not in names:
names[row[i]] = {"count": 0, "genders": set()}
names[row[i]]["count"] += int(row[i+1].replace(",", ""))
names[row[i]]["genders"].add(gender)
shared_names = [
(name, data) for name, data in names.items() if len(data["genders"]) > 1
]
most_common_shared_names = sorted(shared_names, key=lambda x: -x[1]["count"])
print("%s => %s" % most_common_shared_names[0])
If you're curious, here are the results since 2000:
2000 => Tyler, 22187
2001 => Tyler, 19842
2002 => Tyler, 18788
2003 => Ryan, 20171
2004 => Madison, 20829
2005 => Ryan, 18661
2006 => Ryan, 17116
2007 => Jayden, 17287
2008 => Jayden, 19040
2009 => Jayden, 19053
2010 => Jayden, 18641
2011 => Jayden, 18064
2012 => Jayden, 16952
2013 => Jayden, 15462
2014 => Logan, 14478
2015 => Logan, 13753
2016 => Logan, 12099
2017 => Logan, 15117

Categories