I have a list of dictionaries in which keys are "group_names" and values are gene_lists.
I want to update each dictionary with a new list of genes by looping through a species_list.
Here is my pseudocode:
groups=["group1", "group2"]
species_list=["spA", "spB"]
def get_genes(group,sp)
return gene_list
for sp in species_list:
for group in groups:
gene_list[group]=get_genes(group,sp)
gene_list.update(get_genes(group,sp))
The problem with this code is that new genes are replaced/overwritten by the previous ones instead of being added to the dictionary. My question is where should I put the following line. Although, I'm not sure if this is the only problem.
gene_list.update(get_genes(group,sp))
The data I have looks like this dataframe:
data={"group1":["geneA1", "geneA2"],
"group2":[ "geneB1","geneB2"]}
pd.DataFrame.from_dict(data).T
The data I want to create should look like this:
data={"group1":["geneA1", "geneA2", "geneX"],
"group2":[ "geneB1","geneB2", "geneX"]}
pd.DataFrame.from_dict(data).T
So in this case, "gene_x" refers to the new genes obtained by the get_genes function for each species and finally updated to the existing dictionary.
Any help would be much appreciated!!
You need to append to the list in the dictionary entry, not assign it.
Use setdefault() to provide a default empty list if the dictionary key doesn't exist yet.
for sp in species_list:
for group in groups:
gene_list.setdefault(group, []).extend(get_genes(group, sp))
From what I understand, you want to append new gene to each key, in order to do that:
new_gene = "gene_x"
data={"group1":["geneA1", "geneA2"], "group2":[ "geneB1","geneB2"]}
for value in data.values():
value.append(new_gene)
print(data)
You can also use defaultdict where you can append directly (read the docs for that).
Related
I'm working on automating some word and PDF documents that need to be updated on a certain cadence.
The way I'm doing this is using dictionaries that replace variables within word documents.
My code works but because my area is not tech savvy I'm using an excel file so people can replace the values in that file whenever they need to update the documents.
I was also successful on pulling the dictionary key and values from excel but I'm trying to refactor this code which is repetitive. Here is an excerpt with 2 of the 7 dictionaries I'm creating:
dic = pd.read_excel('test.xlsx',"AD")
AD = dict(zip(dic.Key,dic.Value))
dic = pd.read_excel('test.xlsx',"RSM")
RSM = dict(zip(dic.Key,dic.Value))
I'm trying to refactor this so I can run it all within a single loop and trying something like this:
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = [AD, RSM]
for item in groups:
dic = pd.read_excel('test.xlsx',item)
item = dict(zip(dic.Key,dic.Value))
So I'm basically first using the variable as a string to call the excel tab within the read_excel method and then I want to replace that same variable to become the output dictionary.
When I print item within the loop I do get the correct dictionaries but I'm not able to output a variable that stores each dictionary that the loop creates.
Any help would be appreciated.
Thanks!
You're almost there, you can just have a dictionary of dictionaries:
import pandas as pd
groups = ['AD', 'RSM']
dicts = {}
for item in groups:
dic = pd.read_excel('test.xlsx', item)
dicts[item] = dict(zip(dic.Key, dic.Value))
Now you can just access them like this:
print(dicts['AD']['some key'])
The values of a dictionary can be anything, including other dictionaries. Keys of dictionaries can be many things as well, as long as they're hashable, and strings are a common choice of course - and the names of your groups are just that.
Also note that I removed the variables named AD and RSM. You don't really achieve anything by having variables that are named after the string value they are assigned. It only serves to be able to leave off the quotes where you use the values, but it creates an additional indirection that serves no purpose.
If you don't even need the list of groups, but just want groups to be the actual dictionaries:
import pandas as pd
groups = {}
for item in ['AD', 'RSM']:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value))
The problem is that you assign the result to the item variable and not to an entry in the list.
A simple fix would be to use a dictionary instead of a list to save the reult, eg
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = {AD: None, RSM: None}
for item in groups.keys():
dic = pd.read_excel('test.xlsx',item)
groups[item] = dict(zip(dic.Key,dic.Value))
My suggestion would be to use an overall dictionary to track your work and also to save the results there. I refactored your code slightly to this:
import pandas as pd
groups = dict.fromkeys(('AD', 'RSM')) # setup main dict containing dicts
for item in groups:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value)) # store individual dict
There's no need for your global constants that are used only once, so I removed those. I also added some spaces to help your Python code conform with PEP-8, the global standard style guide.
Now you can access each dictionary as you like, for example, groups['AD'].
I have a dictionary which looks as shown below. Now I need to get the key its corresponding path together so as to use it further to identify its slot number based on the key. How can I achieve that?
I tried an approach but it is giving me key error.
What you need can easily be implemented as:
>>> {key: value["mpath"] for key, value in multipath.items()}
{'/dev/sdh': '/dev/mapper/mpathk', '/dev/sdi': '/dev/mapper/mpathk',
'/dev/sdg': '/dev/mapper/mpathj', '/dev/sdf': '/dev/mapper/mpathj',
'/dev/sdd': '/dev/mapper/mpathi', '/dev/sde': '/dev/mapper/mpathi',
'/dev/sdb': '/dev/mapper/mpathh', '/dev/sdc': '/dev/mapper/mpathh',
'/dev/sdj': '/dev/mapper/mpathg', '/dev/sdk': '/dev/mapper/mpathg'}
Great one line answer by #Selcuk using dictionary comprehension.
An elaborated one along the same line would be:
mpath_dict = {}
for sd, mpath in multipath.items():
mpath_dict[sd] = mpath['mpath']
print(mpath_dict)
Since every value item of "mpath" dictionary is a dictionary itself, you can retrieve values from it as you would do it in a dictionary.
i'm using an api call in python 3.7 which returns json data.
result = (someapicall)
the data returned appears to be in the form of two nested dictionaries within a list, i.e.
[{name:foo, firmware:boo}{name:foo, firmware:bar}]
i would like to retrieve the value of the key "name" from the first dictionary and also the value of key "firmware" from both dictionaries and store in a new dictionary in the following format.
{foo:(boo,bar)}
so far i've managed to retrieve the value of both the first "name" and the first "firmware" and store in a dictionary using the following.
dict1={}
for i in result:
dict1[(i["networkId"])] = (i['firmware'])
i've tried.
d7[(a["networkId"])] = (a['firmware'],(a['firmware']))
but as expected the above just seems to return the same firmware twice.
can anyone help achive the desired result above
you can use defaultdict to accumulate values in a list, like this:
from collections import defaultdict
result = [{'name':'foo', 'firmware':'boo'},{'name':'foo', 'firmware':'bar'}]
# create a dict with a default of empty list for non existing keys
dict1=defaultdict(list)
# iterate and add firmwares of same name to list
for i in result:
dict1[i['name']].append(i['firmware'])
# reformat to regular dict with tuples
final = {k:tuple(v) for k,v in dict1.items()}
print(final)
Output:
{'foo': ('boo', 'bar')}
I have a Python dict stuffs with keys and values(list);
{'car':['bmw','porsche','benz'] 'fruits':['banana','apple']}
And I would like delete first value from cars: bmw and first value from fruits: banana
How can I access and delete them please? I have tried .pop(index), but it doesn't work...
You can create a new dictionary where you skip the first element using [1:]
stuffs = {'car':['bmw','porsche','benz'], 'fruits':['banana','apple']}
stuffs_new = {k:v[1:] for k,v in stuffs.items()}
# {'car': ['porsche', 'benz'], 'fruits': ['apple']}
An easy way of doing this is to use a for loop and iterate over each item in you're dictionary, and pop the first element:
dictionary = {'car':['bmw','porsche','benz'], 'fruits':['banana','apple']}
for key in dictionary:
dictionary[key].pop(0)
Or, as a list comprehension
dictionary = {'car':['bmw','porsche','benz'], 'fruits':['banana','apple']}
[dictionary[i].pop(0) for i in dictionary]
These pieces of code reference the dictionary at each of it's keys ('car' and 'fruits') and then proceeds to use pop on the values indexed by these keys.
Edit:
Don't use a list comprehension if you don't intend to store the list. In the case where you are iterating over large values, you could run into memory errors due to storing a whole load of useless values. Such as in this case:
[print(i) for i in range(9823498)]
This will store 9823498 None values*, where as a for loop would not. but still achieve the same thing.
You were almost there.
Use either:
del dict[key]
Or
dict.pop(key, value)
The second will remove but also leave the item available as a return
import collections
header_dict = {'account number':'ACCOUNT_name','accountID':'ACCOUNT_name','name':'client','first name':'client','tax id':'tin'}
#header_dict = collections.defaultdict(lambda: 'tin') # attempted use of defaultdict...destroys my dictionary
given_header = ['account number','name','tax id']#,'tax identification number']#,'social security number'
#given_header = ['account number','name','tax identification number']...non working header layout
fileLayout = [header_dict[ting] for ting in given_header if ting] #create if else..if ting exists, add to list...else if not in list, add to dictionary
def getLayout(ting):
global given_header
global fileLayout
return given_header[fileLayout.index(ting)]
print getLayout('ACCOUNT_name')
print getLayout('client')
print getLayout('tin')
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
I am working with many files of random, mixed up layouts/column orders. I have a set template for my db table of 'ACCOUNT_name','client','tin' that I want the files to be ordered in. I have created a dictionary of the possible header/column names I might find in other files as keys and my set header names as values. So, for example, if I wanted to see where to put the column 'account number' from one of my given files, I would type header_dict['account number'].
This would give me the corresponding column from my template, 'ACCOUNT_name'. This works great...I also added another feature. Instead of having to type 'account number'..I made a list comprehension that looks up each value by key.
This list I just created with the 'fileLayout' list comprehension essentially transforms my given file's header into my desired names: ['ACCOUNT_name','client']
That makes life a lot easier...I know that I want to look up 'ACCOUNT_name', or 'client'. Next I run a function 'getLayout' that returns the index of the desired columns I am searching...So if I want to see where my desired column 'ACCOUNT_name' is in the file, I just run the function which is called like this...
getLayout('ACCOUNT_name')
Now at this point, I can easily print the columns to my order...with:
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
The above code gives me [('account number'),('name'),('tax id')], which is exactly what I want...
But what if there is a new header I am not used to ?? Lets use the same example code above but change the list 'given_header' to this:
given_header = ['account number','name','tax identification number']
I most certainly get the key error, KeyError: 'tax identification number' I know I can use defaultdict but when I try to use it with the set value 'tin', I end up overwriting my entire dictionary... What I would ultimately like to end up doing is this...
I would like to create an else within my list comprehension that allows me to standard input dictionary entries if they don't exist. In other words, since 'tax identification number' does not exists as a key, add it as one to my dict and give it the value 'tin' via raw_input. Has anyone ever done or tried anything like this? Any ideas? If you have and have any suggestions, I am all ears. I'm struggling on this issue...
The way I would want to go about this is in the list comprehension..
fileLayout = [header_dict[ting] for ting in given_header if ting else raw_input('add missing key value pair to dictionary')] # or do something of the sort.