How to do sub-sorting in python?

How to do sub-sorting in python? - python

Many thanks to the SO community for helping me with the previous problems I had encountered. Love the help here!
I have yet another problem now. I have a flat list of DNA sequences that have associated "Construct Number" and "Part Number. As things stand right now, from my previous code, I have it as a csv file which I open up, read in, and import as a list of dictionary objects. Everything is sorted by "Construct Number" already, but I need to then sort by "Part Number". (It's sort of like in Excel, where they say "First sort by and then sort by _."
Does anybody know how to get this done? Thus far, all I have written is this:
primers_list = open('primers-list.csv', 'rU')
primers_unsorted = csv.DictReader(primers_list)
for row in primers_unsorted:
print(row)
A subset of the output thus far is just the following, for visualization of the data I'm working with:
{' Direction': 'fw primer', ' Construct Number': '1', ' Part Number': '2', 'Primer Sequence': 'AAGCGGCCGCTCGAGTCTAAgctcactcaaaggcggtaatcagataaaaaaaatccttag'}
{' Direction': 're primer', ' Construct Number': '1', ' Part Number': '1', 'Primer Sequence': 'attaccgcctttgagtgagcTTAGACTCGAGCGGCCGCTTTTTGACACCAGACCAACTGG'}
{' Direction': 'fw primer', ' Construct Number': '1', ' Part Number': '1', 'Primer Sequence': 'TTTAATTACTAACTTTATCTATGATAGATCCCGTCGTTTTACAACGTCGTGACTGGGAAA'}
{' Direction': 're primer', ' Construct Number': '1', ' Part Number': '2', 'Primer Sequence': 'AAAACGACGGGATCTATCATAGATAAAGTTAGTAATTAAACTTAAAAGTTGTTTAATGTC'}
{' Direction': 'fw primer', ' Construct Number': '2', ' Part Number': '2', 'Primer Sequence': 'gtaaatccaagttgtaataatactagagTAGCATAACCCCTTGGGGCCTCTAAACGGGTC'}
{' Direction': 're primer', ' Construct Number': '2', ' Part Number': '1', 'Primer Sequence': 'GGGGTTATGCTActctagtattattacaacttggatttaccacctttcttcgccttgatc'}
{' Direction': 'fw primer', ' Construct Number': '2', ' Part Number': '1', 'Primer Sequence': 'TACGACTCACTATAGGGAGAtactagagttaaggaggtaaaaaaaatgggtccggtcgtt'}
{' Direction': 're primer', ' Construct Number': '2', ' Part Number': '2', 'Primer Sequence': 'ttacctccttaactctagtaTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'}
{' Direction': 'fw primer', ' Construct Number': '3', ' Part Number': '2', 'Primer Sequence': 'gtaaatccaagttgtaataatactagagTAGCATAACCCCTTGGGGCCTCTAAACGGGTC'}
{' Direction': 're primer', ' Construct Number': '3', ' Part Number': '1', 'Primer Sequence': 'GGGGTTATGCTActctagtattattacaacttggatttaccacctttcttcgccttgatc'}
{' Direction': 'fw primer', ' Construct Number': '3', ' Part Number': '1', 'Primer Sequence': 'TAACTATCACTATAGGGAGAtactagagttaaggaggtaaaaaaaatgggtccggtcgtt'}
{' Direction': 're primer', ' Construct Number': '3', ' Part Number': '2', 'Primer Sequence': 'ttacctccttaactctagtaTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'}

Another way:
import operator
primers_unsorted.sort(key=operator.itemgetter(' Construct Number', ' Part Number'))
for row in primers_unsorted:
print(row)

If you want to do it block by block you can do something like:
a=0
while a<len(primers_list):
b=a
current_construct=primers_list['Construct Number']
while primers_list[b]['Construct Number']==current_construct:
b=b+1
primers_list[a:b]=sorted( primers_list[a:b] , key = lambda e: (e[' Construct Number'],e[' Part Number']))
a=b
which might be useful if the list is very long.

My final code is this, and it worked out perfectly:
primers_list = open('primers-list.csv', 'rU')
primers_unsorted = csv.DictReader(primers_list)
primers_sorted = sorted(primers_unsorted, key=operator.itemgetter('Construct Number', 'Part Number'))
for row in primers_sorted:
print(row)
The key (pardon the pun) to this was to make use operator.itemgetter(...), which accepts as many arguments as needed. It's passed into the 'key' argument in sorted(...).
Many thanks to Eric for answering my question!

Related

How to add an extra column to all rows in a dictionary from a given list?

Here is my program to make a dictionary from a given list:
import csv
list1=[]
header=[]
with open('D:\C++\Programs\Advanced Programming\grades.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
list1.append(line)
header = list1[0]
res = [dict(zip(header, values)) for values in list1[1:]]
for i in res:
print(i)
the output is:
{'Last name': 'Alfalfa', ' First name': ' Aloysius', ' Final': '49', ' Grade': ' D-'}
{'Last name': 'Alfred', ' First name': ' University', ' Final': '48', ' Grade': ' D+'}
{'Last name': 'Gerty', ' First name': ' Gramma', ' Final': '44', ' Grade': ' C'}
{'Last name': 'Android', ' First name': ' Electric', ' Final': '47', ' Grade': ' B-'}
{'Last name': 'Bumpkin', ' First name': ' Fred', ' Final': '45', ' Grade': ' A-'}
{'Last name': 'Rubble', ' First name': ' Betty', ' Final': '46', ' Grade': ' C-'}
Now to this dictionary I have to add another column total marks which should contain total marks of the student which is there in the list i.e list[2]
How can i simultaneously add all the marks to the dictionary so that it will look like:
{'Last name': 'Alfalfa', ' First name': ' Aloysius', ' Final': '49', ' Grade': ' D-', 'Total marks': '49'}
{'Last name': 'Alfred', ' First name': ' University', ' Final': '48', ' Grade': ' D+','Total marks': '48'}
I don't understand how to make it. Please tell me how to solve this.

The solution is to treat your data as pandas dataFrame instead of a list.
As an example:
import pandas as pd
data = [{'Last name': 'Alfalfa', ' First name': ' Aloysius', ' Final': '49',
' Grade': ' D-'},{'Last name': 'Alfred', ' First name': ' University', '
Final': '48', ' Grade': ' D+'}]
df=pd.DataFrame.from_dict(data,orient='columns')
list2=['49','48']
df['total marks']=list2
df

As i understand from this, you want to add a new column with a value equals to ' Final', you could do something like this:
for raw in res:
raw['Total marks'] = raw[' Final']
print(res)

I don't know how you store the marks so I can't provide you with a total working answer, but in short:
res is a list of dictionaries. you can add an extra key and value to a dictionary by just setting it: my_dict['my_key'] = 'my_value'. You should use this in your for loop like so:
for i in res:
i['Total marks'] = total_marks # where total_marks is the total number.
Does this help you?
If list2 contains all total marks in the same order as the students in list1 are, you can:
for i in range(len(res):
res[i]['Total marks'] = list2[i]

for dict in res:
dict['Total marks'] = dict[' Final']
also, you might want to make the numerical values integers/floats, not strings.

Looping through dictionaries within a list and add row to DataFrame

I have a list of items I am trying to unpack from a dictionary which is nested inside another list.
I then want to add the row to a DataFrame.
headers = ['minutesPlayed', 'goals', 'goalsAssist', 'shotsOnTarget',
'shotsOffTarget', 'shotsBlocked', 'hitWoodwork', 'totalContest',
'penaltyMiss', 'penaltyWon', 'bigChanceMissed']
Python Variables
The code I have tried is:
rows = []
for groups in data['groups']:
row = []
#Summary
row.append(groups['minutesPlayed'])
row.append(groups['goals'])
row.append(groups['goalAssist'])
#Attack
row.append(groups['shotsOnTarget'])
row.append(groups['shotsOffTarget'])
row.append(groups['shotsBlocked'])
row.append(groups['hitWoodwork'])
row.append(groups['totalContest'])
row.append(groups['penaltyMiss'])
row.append(groups['penaltyWon'])
row.append(groups['bigChanceMissed'])
rows.append(row)
df = pd.DataFrame(rows, columns=headers)
However I receive the error:
KeyError: 'shotsOnTarget'
It doesn't allow me to iterate over the second element within the groups list.
Any tips?
EDIT added print of data[group]:
print(data['groups'])
[{'minutesPlayed': "89'", 'goals': '0', 'goalAssist': '1', 'statisticsItems': [{'minutesPlayed': 'Minutes played'}, {'goals': 'Goals'}, {'goalAssist': 'Assists'}], 'groupName': 'Summary'}, {'shotsOnTarget': '0', 'shotsOffTarget': '0', 'shotsBlocked': '1', 'hitWoodwork': '0', 'totalContest': '1 (0)', 'goals': '0', 'goalAssist': '1', 'penaltyMiss': '0', 'penaltyWon': '0', 'bigChanceMissed': '0', 'statisticsItems': [{'shotsOnTarget': 'Shots on target'}, {'shotsOffTarget': 'Shots off target'}, {'shotsBlocked': 'Shots blocked'}, {'totalContest': 'Dribble attempts (succ.)'}], 'groupName': 'Attack'}, {'touches': 55, 'accuratePass': '26 (70.3%)', 'keyPass': '1', 'totalCross': '0 (0)', 'totalLongBalls': '2 (0)', 'bigChanceCreated': '0', 'statisticsItems': [{'touches': 'Touches'}, {'accuratePass': 'Passes (acc.)'}, {'keyPass': 'Key passes'}, {'totalCross': 'Crosses (acc.)'}, {'totalLongBalls': 'Long balls (acc.)'}], 'groupName': 'Passing'}, {'possessionLost': '26', 'groundDuels': '9 (0)', 'aerialDuels': '3 (1)', 'wasFouled': '0', 'fouls': '2', 'offsides': '0', 'statisticsItems': [{'groundDuels': 'Ground duels (won)'}, {'aerialDuels': 'Aerial duels (won)'}, {'possessionLost': 'Possession lost'}, {'fouls': 'Fouls'}, {'wasFouled': 'Was fouled'}], 'groupName': 'Other'}, {'totalClearance': '0', 'clearanceOffLine': '0', 'blockedScoringAttempt': '0', 'interceptionWon': '0', 'totalTackle': '0', 'challengeLost': '1', 'lastManTackle': '0', 'errorLeadToShot': '0', 'errorLeadToGoal': '0', 'ownGoals': '0', 'penaltyConceded': '0', 'statisticsItems': [{'totalClearance': 'Clearances'}, {'blockedScoringAttempt': 'Blocked shots'}, {'interceptionWon': 'Interceptions'}, {'totalTackle': 'Tackles'}, {'challengeLost': 'Dribbled past'}], 'groupName': 'Defence'}]

Your data['groups'] is a list. whose elements are dicts now your minutesPlayed, goals and goalAssist is in your first element of the list so they are being called when your groupsvariable from for loop is run but your shotsOnTarget and others are in the second element. A nicer way of doing it can be following. you don't need a for loop.
#Summary
row.append(data['groups'][0]['minutesPlayed'])
row.append(data['groups'][0]['goals'])
row.append(data['groups'][0]['goalAssist'])
#Attact
row.append(data['groups'][1]['shotsOnTarget'])
# same for others,

Nested dictionary issue

I need to create a program that takes a CSV file and returns a nested dictionary. The keys for the outer dictionary should be the first value in each row, starting from the second one (so as to omit the row with the column names). The value for each key in the outer dictionary should be another dictionary, which I explain below.
The inner dictionary's keys should be the column names, while the values should be the value corresponding to that column in each row.
Example:
For a CSV file like this:
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78
I would like to print out the data in this form:
my_dict = {
'4': {'column1':'4','column2':'12', 'column3':'5', 'column4':'11'},
'29': {'column1':'29', 'column2':'47', 'column3':'23', 'column4':'41'},
'66': {'column1':'66', 'column2':'1', 'column3':'98', 'column4':'78'}
}
The closest I've gotten so far (which isn't even close):
import csv
import collections
def csv_to_dict(file, delimiter, quotechar):
list_inside_dict = collections.defaultdict(list)
with open(file, newline = '') as csvfile:
reader = csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar)
for row in reader:
for (k,v) in row.items():
list_inside_dict[k].append(v)
return dict(list_inside_dict)
If I try to run the function with the example CSV file above, delimiter = ",", and quotechar = "'", it returns the following:
{'column1': ['4', '29', '66'], ' column2': ['12', '47', '1'], ' column3': ['5', '23', '98'], ' column4': ['11', '41', '78']}
At this point I got lost. I tried to change:
list_inside_dict = collections.defaultdict(list)
for
list_inside_dict = collections.defaultdict(dict)
And then simply changing the value for each key, since I cannot append into a dictionary, but it all got really messy. So I started from scratch and found I reached the same place.

You can use a dictionary comprehension:
import csv
with open('filename.csv') as f:
header, *data = csv.reader(f)
final_dict = {a:dict(zip(header, [a, *b])) for a, *b in data}
Output:
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

You can use pandas for that task.
>>> df = pd.read_csv('/path/to/file.csv')
>>> df.index = df.iloc[:, 0]
>>> df.to_dict('index')
Not sure why you want to duplicate the value of the first column, but in case you don't the above simplifies to:
>>> pd.read_csv('/path/to/file.csv', index_col=0).to_dict('index')

This is similar to this answer, however, I believe it could be better explained.
import csv
with open('filename.csv') as f:
headers, *data = csv.reader(f)
output = {}
for firstInRow, *restOfRow in data:
output[firstInRow] = dict(zip(headers, [firstInRow, *restOfRow]))
print(output)
What this does is loops through the rows of data in the file with the first value as the index and the following values in a list. The value of the index in the output dictionary is then set by zipping the list of headers and the list of values. That output[first] = ... line is the same as writing output[firstInRow] = {header[1]: firstInRow, header[2]: restOfRow[1], ...}.
Output:
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

It is a couple of zips to get what you want.
Instead of a file, we can use a string for the csv. Just replace that part with a file.
Given:
s='''\
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78'''
You can do:
import csv
data=[]
for row in csv.reader(s.splitlines()): # replace 'splitlines' with your file
data.append(row)
header=data.pop(0)
col1=[e[0] for e in data]
di={}
for c,row in zip(col1,data):
di[c]=dict(zip(header, row))
Then:
>>> di
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
On Python 3.6+, the dicts will maintain insertion order. Earlier Python's will not.

Remove quotes from list of dictionaries in python

How can I remove the single quotes from the following so it can be recognized as a list of one dictionary object as apposed to a list of one string object? Given that it is a list, I cannot use pd.replace.
['{"PLAYER":"Player Name","SALARY":"0000.00","OPP":"CI","POS":"BR","TEAM":"IT","SCHEDULE_ID":"40623","PLAYERID":"12322","GP":"5","TAR":"64","RZTAR":"6","POW TAR":"32.99%","WEEK 2":"11","WEEK 3":"14","WEEK 4":"9","WEEK 5":"19","ARDS":"545","YPT":"8.52","REC":"40","REC RATE":"62.50%"}']

You can use ast.literal_eval:
import ast
s = ['{"PLAYER":"Player Name","SALARY":"0000.00","OPP":"CI","POS":"BR","TEAM":"IT","SCHEDULE_ID":"40623","PLAYERID":"12322","GP":"5","TAR":"64","RZTAR":"6","POW TAR":"32.99%","WEEK 2":"11","WEEK 3":"14","WEEK 4":"9","WEEK 5":"19","ARDS":"545","YPT":"8.52","REC":"40","REC RATE":"62.50%"}']
final_s = [ast.literal_eval(i) for i in s]
Output:
[{'SALARY': '0000.00', 'REC RATE': '62.50%', 'OPP': 'CI', 'YPT': '8.52', 'TAR': '64', 'GP': '5', 'PLAYERID': '12322', 'WEEK 3': '14', 'POS': 'BR', 'ARDS': '545', 'WEEK 2': '11', 'PLAYER': 'Player Name', 'SCHEDULE_ID': '40623', 'POW TAR': '32.99%', 'WEEK 4': '9', 'TEAM': 'IT', 'RZTAR': '6', 'REC': '40', 'WEEK 5': '19'}]

just use eval() for the purpose
s=['{"PLAYER":"PlayerName","SALARY":"0000.00","OPP":"CI","POS":"BR","TEAM":"IT","SCHEDULE_ID":"40623","PLAYERID":"12322","GP":"5","TAR":"64","RZTAR":"6","POW TAR":"32.99%","WEEK 2":"11","WEEK 3":"14","WEEK 4":"9","WEEK 5":"19","ARDS":"545","YPT":"8.52","REC":"40","REC RATE":"62.50%"}']
s = [eval(item) for item in s]

List of values to dictionary

I'm trying to make a list of values to a list of dictionary's based on my set keys. I tried the following but i'm loosing all the other values because of the duplicate key names.
>>> values = ['XS ', '1', 'S ', '10', 'M ', '1', 'L ', '10', 'XL ', '10']
>>> keys = ['size', 'stock'] * (len(values) / 2)
>>> result = dict(zip(keys, values))
>>> print result
{'stock': '10', 'size': 'XL '}
What i'm trying to achieve is a list of the dicts like below. How can I achieve this?
[{'stock': '10', 'size': 'XL '}, {'stock': '10', 'size': 'L'}, ......]

You can use a list comprehension like following:
>>> values = ['XS ', '1', 'S ', '10', 'M ', '1', 'L ', '10', 'XL ', '10']
>>> [{'size':i, 'stock':j} for i, j in zip(values[0::2], values[1::2])]
[{'stock': '1', 'size': 'XS '}, {'stock': '10', 'size': 'S '}, {'stock': '1', 'size': 'M '}, {'stock': '10', 'size': 'L '}, {'stock': '10', 'size': 'XL '}]
Note that in this case you don't have to multiply the keys.

Usually the point of using a dict is to associate unique keys to associated values, you were originally trying to associate size: ... and stock: ... for each item but why not link the size to stock? In that case you would simply do:
result = dict(zip(values[::2], values[1::2]))
or without needing slicing:
value_iter = iter(values)
result = dict(zip(value_iter, value_iter))
This grabs two elements from the list at a time.
This way you still know that a given key in the dict is the size and the associated value is the stock for that size.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to do sub-sorting in python? - python

Another way: import operator primers_unsorted.sort(key=operator.itemgetter(' Construct Number', ' Part Number')) for row in primers_unsorted: print(row)

Related

How to add an extra column to all rows in a dictionary from a given list?

Looping through dictionaries within a list and add row to DataFrame

Nested dictionary issue

Remove quotes from list of dictionaries in python

List of values to dictionary

Categories

Resources