Creating multiple dictionary variables with loop commands? - python

This is my first time working with python. I'm trying to create a dictionary for each county (23 in total) with year as the key for population and income values. Strong arming the code seems to work, but I'm sure there is an easier way to do it using loops or classes...any suggestions?? Thanks!!!!!
import xlrd
wb= xlrd.open_workbook('C:\Python27\Forecast_test.xls')
popdata=wb.sheet_by_name(u'Sheet1')
incomedata=wb.sheet_by_name(u'Sheet2')
WyomingCnty =('Albany', 'Big Horn',
'Campbell', 'Carbon', 'Converse',
'Crook', 'Fremont', 'Goshen',
'Hot Springs','Johnson', 'Laramie',
'Lincoln', 'Natrona','Niobrara',
'Park', 'Platte', 'Sheridan', 'Sublette',
'Sweetwater', 'Teton', 'Uinta', 'Washakie', 'Weston','Wyoming')
Years = ('y0','y1','y2','y3','y4','y5','y6','y7','y8','y9','y10',
'y11','y12', 'y13', 'y14', 'y15', 'y16', 'y17', 'y18','y19',
'y20','y21','y22','y23','y24','y25','y26','y27','y28','y29','y30')
AlbanyPop = popdata.col_values(colx=1,start_rowx=1,end_rowx=None)
AlbanyIncome= incomedata.col_values(colx=1,start_rowx=1,end_rowx=None)
AlbanyDict1=dict(zip(Years,AlbanyPop))
AlbanyDict2=dict(zip(Years,AlbanyIncome))
BigHornPop = popdata.col_values(colx=2,start_rowx=1,end_rowx=None)
BigHornIncome= incomedata.col_values(colx=2,start_rowx=1,end_rowx=None)
BigHornDict1=dict(zip(Years,BigHornPop))
BigHornDict2=dict(zip(Years,BigHornIncome))

popdict = {}
incdict = {}
for ix, city in enumerate(WyomingCnty):
popdict[city] = dict(zip(Years, popdata.col_values(colx=ix + 1,start_rowx=1,end_rowx=None)
incdict[city] = dict(zip(Years, incomedata.col_values(colx=ix + 1,start_rowx=1,end_rowx=None)

I would just use another dictionary. As in:
import xlrd
wb= xlrd.open_workbook('C:\Python27\Forecast_test.xls')
popdata=wb.sheet_by_name(u'Sheet1') #Import population data
incomedata=wb.sheet_by_name(u'Sheet2') #Import income data
WyomingCnty =('Albany', 'Big Horn',
'Campbell', 'Carbon', 'Converse',
'Crook', 'Fremont', 'Goshen',
'Hot Springs','Johnson', 'Laramie',
'Lincoln', 'Natrona','Niobrara',
'Park', 'Platte', 'Sheridan', 'Sublette',
'Sweetwater', 'Teton', 'Uinta', 'Washakie', 'Weston','Wyoming')
Years = ('y0','y1','y2','y3','y4','y5','y6','y7','y8','y9','y10',
'y11','y12', 'y13', 'y14', 'y15', 'y16', 'y17', 'y18','y19',
'y20','y21','y22','y23','y24','y25','y26','y27','y28','y29','y30')
county_dict = {}
for col, county in enumerate(WyomingCnty):
county_dict[county] = {}
county_popdata = popdata.col_values(colx=col, start_rowx=1, end_rowx=None)
county_incdata = incomedata.col_values(colx=col, start_rowx=1, endrowx=None)
county_dict[county]['population'] = county_popdata
county_dict[county]['income'] = county_incdata
county_dict[county]['pop_by_year'] = dict(zip(Years, county_popdata))
county_dict[county]['inc_by_year'] = dict(zip(Years, county_incdata))

Related

how to use the input with pandas to get all the value.count linked to this input

my dataframe looks like this:
Index(['#Organism/Name', 'TaxID', 'BioProject Accession', 'BioProject ID', 'Group', 'SubGroup', 'Size (Mb)', 'GC%', 'Replicons', 'WGS',
'Scaffolds', 'Genes', 'Proteins', 'Release Date', 'Modify Date',
'Status', 'Center', 'BioSample Accession', 'Assembly Accession',
'Reference', 'FTP Path', 'Pubmed ID', 'Strain'],
dtype='object')
I ask the user to enter the name of the species with this script :
print("bacterie species?")
species=input()
I want to look for the rows with "Organism/Name" equal to the species written by the user (input) then to calculate with "values.count" of the status column and finally to retrieve 'FTP Path'.
Here is the code that I could do but that does not work:
if (data.loc[(data["Organism/Name"]==species)
print(Data['Status'].value_counts())
else:
print("This species not found")
if (data.loc[(data["Organism/Name"]==species)
print(Data['Status'].value_counts())
else:
print(Data.get["FTP Path"]
If I understand your question correctly, this is what you're trying to achieve:
import wget
import numpy as np
import pandas as pd
URL='https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt'
data = pd.read_csv(wget.download(URL) , sep = '\t', header = 0)
species = input("Enter the bacteria species: ")
if data["#Organism/Name"].str.contains(species, case = False).any():
print(data.loc[data["#Organism/Name"].str.contains(species, case = False)]['Status'].value_counts())
FTP_list = data.loc[data["#Organism/Name"].str.contains(species, case = False)]["FTP Path"].values
else:
print("This species not found")
To wite all the FTP_Path urls into a txt file, you can do this:
with open('/path/urls.txt', mode='wt') as file:
file.write('\n'.join(FTP_list))

How to avoid very long if-elif-elif-else statements in Python function

Is there a smart way to shorten very long if-elif-elif-elif... statements?
Let's say I have a function like this:
def very_long_func():
something = 'Audi'
car = ['VW', 'Audi', 'BMW']
drinks = ['Cola', 'Fanta', 'Pepsi']
countries = ['France', 'Germany', 'Italy']
if something in car:
return {'type':'car brand'}
elif something in drinks:
return {'type':'lemonade brand'}
elif something in countries:
return {'type':'country'}
else:
return {'type':'nothing found'}
very_long_func()
>>>> {'type': 'car brand'}
The actual function is much longer than the example. What would be the best way to write this function (not in terms of speed but in readability)
I was reading this, but I have trouble to apply it to my problem.
You can't hash lists as dictionary values. So go other way round. Create a mapping of type -> list. And initialize your output with the default type. This allows you to keep on adding new types to your mapping without changing any code.
def very_long_func():
something = 'Audi'
car = ['VW', 'Audi', 'BMW']
drinks = ['Cola', 'Fanta', 'Pepsi']
countries = ['France', 'Germany', 'Italy']
out = {'type': 'nothing found'} # If nothing matches
mapping = {
'car brand': car,
'lemonade brand': drinks,
'country': countries
}
for k,v in mapping.items() :
if something in v:
out['type'] = k # update if match found
break
return out # returns matched or default value
you can create dictionary like this and then use map_dict.
from functools import reduce
car = ['VW', 'Audi', 'BMW']
drinks = ['Cola', 'Fanta', 'Pepsi']
countries = ['France', 'Germany', 'Italy']
li = [car, drinks, countries]
types = ['car brand', 'lemonade brand', 'country', 'nothing found']
dl = [dict(zip(l, [types[idx]]*len(l))) for idx, l in enumerate(li)]
map_dict = reduce(lambda a, b: dict(a, **b), dl)
Try this:
def create_dct(lst, flag):
return {k:flag for k in lst}
car = ['VW', 'Audi', 'BMW']
drinks = ['Cola', 'Fanta', 'Pepsi']
countries = ['France', 'Germany', 'Italy']
merge_dcts = {}
merge_dcts.update(create_dct(car, 'car brand'))
merge_dcts.update(create_dct(drinks, 'lemonade brand'))
merge_dcts.update(create_dct(countries, 'country'))
something = 'Audi'
try:
print("type: ", merge_dcts[something])
except:
print("type: nothing found")
You can simulate a switch statement with a helper function like this:
def switch(v): yield lambda *c: v in c
The your code could be written like this:
something = 'Audi'
for case in switch(something):
if case('VW', 'Audi', 'BMW'): name = 'car brand' ; break
if case('Cola', 'Fanta', 'Pepsi'): name = 'lemonade brand' ; break
if case('France', 'Germany', 'Italy'): name = 'country' ; break
else: name = 'nothing found'
return {'type':name}
If you don't have specific code to do for each value, then a simple mapping dictionary would probably suffice. For ease of maintenance, you can start with a category-list:type-name mapping and expand it before use:
mapping = { ('VW', 'Audi', 'BMW'):'car brand',
('Cola', 'Fanta', 'Pepsi'):'lemonade brand',
('France', 'Germany', 'Italy'):'country' }
mapping = { categ:name for categs,name in mapping.items() for categ in categs }
Then your code will look like this:
something = 'Audi'
return {'type':mapping.get(something,'nothing found')}
using a defaultdict would make this even simpler to use by providing the 'nothing found' value automatically so you could write: return {'type':mapping[something]}

multiple separator in a string python

text="Brand.*/Smart Planet.#/Color.*/Yellow.#/Type.*/Sandwich Maker.#/Power Source.*/Electrical."
I have this kind of string. I am facing the problem which splits it to 2 lists. Output will be approximately like this :
name = ['Brand','Color','Type','Power Source']
value = ['Smart Plane','Yellow','Sandwich Maker','Electrical']
Is there any solution for this.
name = []
value = []
text = text.split('.#/')
for i in text:
i = i.split('.*/')
name.append(i[0])
value.append(i[1])
This is one approach using re.split and list slicing.
Ex:
import re
text="Brand.*/Smart Planet.#/Color.*/Yellow.#/Type.*/Sandwich Maker.#/Power Source.*/Electrical."
data = [i for i in re.split("[^A-Za-z\s]+", text) if i]
name = data[::2]
value = data[1::2]
print(name)
print(value)
Output:
['Brand', 'Color', 'Type', 'Power Source']
['Smart Planet', 'Yellow', 'Sandwich Maker', 'Electrical']
You can use regex to split the text, and populate the lists in a loop.
Using regex you protect your code from invalid input.
import re
name, value = [], []
for ele in re.split(r'\.#\/', text):
k, v = ele.split('.*/')
name.append(k)
value.append(v)
>>> print(name, val)
['Brand', 'Color', 'Type', 'Power Source'] ['Smart Planet', 'Yellow', 'Sandwich Maker', 'Electrical.']
text="Brand.*/Smart Planet.#/Color.*/Yellow.#/Type.*/Sandwich Maker.#/Power Source.*/Electrical."
name=[]
value=[]
word=''
for i in range(len(text)):
temp=i
if text[i]!='.' and text[i]!='/' and text[i]!='*' and text[i]!='#':
word=word+''.join(text[i])
elif temp+1<len(text) and temp+2<=len(text):
if text[i]=='.' and text[temp+1]=='*' and text[temp+2]=='/':
name.append(word)
word=''
elif text[i]=='.' and text[temp+1]=='#' and text[temp+2]=='/':
value.append(word)
word=''
else:
value.append(word)
print(name)
print(value)
this will be work...

How to distribute comma separated element to form a list in python

How to extract/split multi-line comment to make a new list
clientInfo="""James,Jose,664 New Avenue,New Orleans,Orleans,LA,8/27/200,123,jjose#gmail.com,;
Shenna,Laureles, 288 Livinghood Heights,Brighton,Livingston,MI,2/19/75,laureles9219#yahoo.com,;
"""
into this kind of list
f_name = ["james","sheena"]
l_name = ["jose","Laureles"]
strt = ["664 New Avenue","288 Livinghood Heights"]
cty = ["New Orleans","Brighton"]
state = ["New Orleans","Livingston"]
If the order is always same. You could do something like this;
f_name = []
l_name = []
strt = []
cty = []
state = []
for client in clientData.split(";\n "):
client_ = client.split(",")
f_name.append(client_[0])
l_name.append(client_[1])
strt.append(client_[2])
cty.append(client_[3])
state.append(client_[4])
I could add some exception handling to handle the ; at the end of your string but, leaving that to you.
You can use split and zip.
def extract(string):
lines = string.split(";")
split_lines = tuple(map(lambda line: line.split(","), lines))
no_space1 = tuple(map(lambda item: item.strip(), split_lines[0]))
no_space2 = tuple(map(lambda item: item.strip(), split_lines[1]))
return list(zip(no_space1, no_space2))
This will produce
[('James', 'Shenna'), ('Jose', 'Laureles'), ('664 New Avenue', '288 Livinghood Heights'), ('New Orleans', 'Brighton'), ('Orleans', 'Living
ston'), ('LA', 'MI'), ('8/27/200', '2/19/75'), ('123', 'laureles9219#yahoo.com'), ('jjose#gmail.com', '')]
It has some tuples at the end you didn't ask for, but its relatively good. The no_space 1 and 2 lines are a bit repetitive, but cramming them into one line is worse in my opinion.
You can try:
clientData = """James,Jose,664 New Avenue,New Orleans,Orleans,LA,8/27/200,123,jjose#gmail.com,;
Shenna,Laureles, 288 Livinghood Heights,Brighton,Livingston,MI,2/19/75,laureles9219#yahoo.com,;
"""
data = clientData.split(";\n")
f_name = []
l_name = []
strt = []
cty = []
state = []
for data_line in data:
data_line = data_line.strip()
if len(data_line) >= 5:
line_info = data_line.split(",")
f_name.append(line_info[0].strip())
l_name.append(line_info[1].strip())
strt.append(line_info[2].strip())
cty.append(line_info[3].strip())
state.append(line_info[4].strip())
print(f_name)
print(l_name)
print(strt)
print(cty)
print(state)
Output:
['James', 'Shenna']
['Jose', 'Laureles']
['664 New Avenue', '288 Livinghood Heights']
['New Orleans', 'Brighton']
['Orleans', 'Livingston']

Creating lists from the dictionary or just simply sort it

I have the following code:
import os
import pprint
file_path = input("Please, enter the path to the file: ")
if os.path.exists(file_path):
worker_dict = {}
k = 1
for line in open(file_path,'r'):
split_line = line.split()
worker = 'worker{}'.format(k)
worker_name = '{}_{}'.format(worker, 'name')
worker_yob = '{}_{}'.format(worker, 'yob')
worker_job = '{}_{}'.format(worker, 'job')
worker_salary = '{}_{}'.format(worker, 'salary')
worker_dict[worker_name] = ' '.join(split_line[0:2])
worker_dict[worker_yob] = ' '.join(split_line[2:3])
worker_dict[worker_job] = ' '.join(split_line[3:4])
worker_dict[worker_salary] = ' '.join(split_line[4:5])
k += 1
else:
print('Error: Invalid file path')
File:
John Snow 1967 CEO 3400$
Adam Brown 1954 engineer 1200$
Output from worker_dict:
{
'worker1_job': 'CEO',
'worker1_name': 'John Snow',
'worker1_salary': '3400$',
'worker1_yob': '1967',
'worker2_job': 'engineer',
'worker2_name': 'Adam Brown',
'worker2_salary': '1200$',
'worker2_yob': '1954',
}
And I want to sort data by worker name and after that by salary. So my idea was to create a separate list with salaries and worker names to sort. But I have problems with filling it, maybe there is a more elegant way to solve my problem?
import os
import pprint
file_path = input("Please, enter the path to the file: ")
if os.path.exists(file_path):
worker_dict = {}
k = 1
with open(file_path,'r') as file:
content=file.read().splitlines()
res=[]
for i in content:
val = i.split()
name = [" ".join([val[0],val[1]]),]#concatenate first name and last name
i=name+val[2:] #prepend name
res.append(i) #append modified value to new list
res.sort(key=lambda x: x[3])#sort by salary
print res
res.sort(key=lambda x: x[0])#sort by name
print res
Output:
[['Adam Brown', '1954', 'engineer', '1200$'], ['John Snow', '1967', 'CEO', '3400$']]
[['Adam Brown', '1954', 'engineer', '1200$'], ['John Snow', '1967', 'CEO', '3400$']]
d = {
'worker1_job': 'CEO',
'worker1_name': 'John Snow',
'worker1_salary': '3400$',
'worker1_yob': '1967',
'worker2_job': 'engineer',
'worker2_name': 'Adam Brown',
'worker2_salary': '1200$',
'worker2_yob': '1954',
}
from itertools import zip_longest
#re-group:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
#re-order:
res = []
for group in list(grouper(d.values(), 4)):
reorder = [1,2,0,3]
res.append([ group[i] for i in reorder])
#sort:
res.sort(key=lambda x: (x[1], x[2]))
output:
[['Adam Brown', '1200$', 'engineer', '1954'],
['John Snow', '3400$', 'CEO', '1967']]
Grouper is defined and explained in itertools. I've grouped your dictionary by records pertaining to each worker, returned it as a reordered list of lists. As lists, I sort them by the name and salary. This is solution is modular: it distinctly groups, re-orders and sorts.
I recommend to store the workers in a different format, for example .csv, then you could use csv.DictReader and put it into a list of dictionaries (this would also allow you to use jobs, names, etc. with more words like "tomb raider").
Note that you have to convert the year of birth and salary to ints or floats to sort them correctly, otherwise they would get sorted lexicographically as in a real world dictionary (book) because they are strings, e.g.:
>>> sorted(['100', '11', '1001'])
['100', '1001', '11']
To sort the list of dicts you can use operator.itemgetter as the key argument of sorted, instead of a lambda function, and just pass the desired key to itemgetter.
The k variable is useless, because it's just the len of the list.
The .csv file:
"name","year of birth","job","salary"
John Snow,1967,CEO,3400$
Adam Brown,1954,engineer,1200$
Lara Croft,1984,tomb raider,5600$
The .py file:
import os
import csv
from operator import itemgetter
from pprint import pprint
file_path = input('Please, enter the path to the file: ')
if os.path.exists(file_path):
with open(file_path, 'r', newline='') as f:
worker_list = list(csv.DictReader(f))
for worker in worker_list:
worker['salary'] = int(worker['salary'].strip('$'))
worker['year of birth'] = int(worker['year of birth'])
pprint(worker_list)
pprint(sorted(worker_list, key=itemgetter('name')))
pprint(sorted(worker_list, key=itemgetter('salary')))
pprint(sorted(worker_list, key=itemgetter('year of birth')))
You still need some error handling, if a int conversion fails, or just let the program crash.

Categories