python loop through a dictionary to see if values exist - python

I am trying to loop though a python dictionary to see if values that I am getting from a csv file already exist in the dictionary, If the values do not exist I want to add them to the dictionary. then append this to a list.
I am getting the error list indices must be integers, not str.
example input
first name last name
john smith
john smith
example output
first_name john last name smith
user_list =[]
with open(input_path,'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if row['first_name'] not in user_dictionary['first_name'] and not in row['last_name'] not in user_dictionary['last_name']:
user_dictionary = {
'first_name':row['first_name'],
'last_name':row['last_name']
}
user_list.append(user_dictionary)

Currently, your code is creating a new dictionary on every iteration of the for-loop. If each value of the dictionary is a list, then you can append to that list via the key:
with open(input_path,'rU') as csvfile:
reader = csv.DictReader(csvfile)
user_dictionary = {"first_name":["name1, "name2", ...], "last_name":["name3", name4", ....]}
for row in reader:
if row['first_name'] not in user_dictionary['first_name'] and not in row['last_name'] not in user_dictionary['last_name']:
user_dictionary["first_name"].append(row['first_name'])
user_dictionary['last_name'].append(row['last_name'])

Generally, you can use a membership test (x in y) on dict.values() view to check if the value already exists in your dictionary.
However, if you are trying to add all unique users from your CSV file to a list of users, that has nothing to do with dictionary values testing, but a list membership testing.
Instead of iterating over the complete list each time for a slow membership check, you can use a set that will contain "ids" of all users added to a list and enable a fast O(n) (amortized) time check:
with open(input_path,'rU') as csvfile:
reader = csv.DictReader(csvfile)
user_list = []
user_set = set()
for row in reader:
user_id = (row['first_name'], row['last_name'])
if user_id not in user_set:
user = {
'first_name': row['first_name'],
'last_name': row['last_name'],
# something else ...
}
user_list.append(user)
user_set.add(user_id)

The error "list indices must be integers, not str" makes the problem clear: On the line that throws the error, you have a list that you think is a dict. You try to use a string as a key for it, and boom!
You don't give enough information to guess which dict it is: It could be user_dictionary, it could be that you're using csv.reader and not csv.DictReader as you say you do. It could even be something else-- there's no telling what else you left out of your code. But it's a list that you're using as if it's a dict.

Related

Iterating over a list of dictionaries

JSON
Newbie here. I want to iterate over this list of dictionaries to get the "SUPPLIER" for every dictionary. I tried
turbine_json_path = '_maps/turbine/turbine_payload.json'
with open(turbine_json_path, "r") as f:
turbine = json.load(f)
# print((turbine))
for supplier in turbine[0]['GENERAL']:
print(supplier["SUPPLIER"])
But I get a type error.. TypeError: string indices must be integers
Any help is appreciated.
for d in turbine:
print(d["GENERAL"]["SUPPLIER"])
There is only one supplier key in your dictionary, so it would be
supplier = turbine[0]['GENERAL']['SUPPLIER']
Other wise your for loop is looping over the keys within the 'GENERAL' dictionary, which are strings.

how to return all lines in csv.Dicreader

I'm reading a CSV file using csv.Dicreader. It returns only the last line as a dict but I want to return all of the lines.
I'm filtering the entire row file with dictionary comprehension to get only two keys:value using the field dict, then doing a little cleanup. I need to return each line as a dict after the cleaning process. Finally, it should return a dict.
for row in reader:
data={value:row[key] for key, value in fields.items()}
if data['binomialAuthority']=='NULL':
data['binomialAuthority']=None
data['label']=re.sub(r'\(.*?\)','',data['label']).strip()
return data
out put:
data= {{'label': 'Argiope', 'binomialAuthority': None}
{'label': 'Tick', 'binomialAuthority': None}}
Each iteration through the loop, you assign to data a single value. Think of data like a small markerboard that only has the last thing you wrote on it. At the end of the loop it will refer to the last item assigned.
If you just want to print your structure, move the print statement into the loop.
If you want a data structure containing multiple dicts, then you need to create a list and then append to it in the loop. Note that this will use a lot of memory when loading a large file.
eg.
my_list = []
for row in reader:
data = '...'
my_list.append(data)
return my_list
the best way is to append it to a list and then use a for loop to unwind the list so that you get a dict type.
my_list = []
for row in reader:
data = '...'
my_list.append(data)
for i in my_list:
print (i)

python csv TypeError: unhashable type: 'list'

Hi Im trying to compare two csv files and get the difference. However i get the above mentioned error. Could someone kindly give a helping hand. Thanks
import csv
f = open('ted.csv','r')
psv_f = csv.reader(f)
attendees1 = []
for row in psv_f:
attendees1.append(row)
f.close
f = open('ted2.csv','r')
psv_f = csv.reader(f)
attendees2 = []
for row in psv_f:
attendees2.append(row)
f.close
attendees11 = set(attendees1)
attendees12 = set(attendees2)
print (attendees12.difference(attendees11))
When you iterate csv reader you get lists, so when you do
for row in psv_f:
attendees2.append(row)
Row is actually a list instance. so attendees1 / attendees2 is a list of lists.
When you convert it to set() it need to make sure no item appear more than once, and set() relay on hash function of the items in the list. so you are getting error because when you convert to set() it try to hash a list but list is not hashable.
You will get the same exception if you do something like this:
set([1, 2, [1,2] ])
More in sets: https://docs.python.org/2/library/sets.html
Happened on the line
attendees11 = set(attendees1)
didn't it? You are trying to make a set from a list of lists but it is impossible because set may only contain hashable types, which list is not. You can convert the lists to tuples.
attendees1.append(tuple(row))
Causes you created list of list:
attendees1.append(row)
Like wise:
attendees2.append(row)
Then when you do :
attendees11 = set(attendees1)
The error will be thrown
What you should do is :
attendees2.append(tuple(row))

looking up values and adding to data structure

I have a .tsv file of text data, named world_bank_indicators.
I have another tsv file, which contains additional information that I need to append to a list item in my script. that file is named world_bank_regions
So far, I have code (thanks to some of the good people on this site) that filters the data that I need from world bank indicators and writes it as a 2D list to the variable mylist. additionally, I have code that reads in the second file as a dictionary. code is below:
from math import log
import csv
import re
#filehandles for spreadsheets
fhand=open("world_bank_indicators.txt", "rU")
fhand2=open("world_bank_regions.txt", "rU")
#csv reader objects for files
reader=csv.reader(fhand, dialect="excel", delimiter="\t")
reader2=csv.reader(fhand2, dialect="excel", delimiter="\t")
#empty list for appending data into
#appending into this will create a 2d list, or "a list OF lists"
mylist=list()
mylist2=list()
mydict=dict()
myset=set()
newset=set()
#filters data by iterating over each row in the reader object
#note that this IGNORES headers. This will need to be appended later
for row in reader:
if row[1]=="7/1/2000" or row[1]=="7/1/2010":
#plug columns into specific variables, for easier coding
#replaces "," with empty space for columns that need to be converted to floats
name=row[0]
date=row[1]
pop=row[9].replace(",",'')
mobile=row[4].replace(",",'')
health=row[6]
internet=row[5]
gdp=row[19].replace(",",'')
#only appends rows that have COMPLETE rows of data
if name != '' and date != '' and pop != '' and mobile != '' and health != '' and internet != '' and gdp != '':
#declare calculated variables
mobcap=(float(mobile)/float(pop))
gdplog=log(float(gdp))
healthlog=log(float(health))
#re-declare variables as strings, rounds decimal points to 5th place
#this could have been done once in above step, merely re-coded here for easier reading
mobcap=str(round(mobcap, 5))
gdplog=str(round(gdplog, 5))
healthlog=str(round(healthlog,5))
#put all columns into 2d list (list of lists)
newrow=[name, date, pop, mobile, health, internet, gdp, mobcap, gdplog, healthlog]
mylist.append(newrow)
myset.add(name)
for row in reader2:
mydict[row[2]]=row[0]
what I need to do now is
1. read the country name from the mylist variable,
2.look up that string in the key value of mydict, and
3. append the value of that key back to mylist.
I'm totally stumped on how to do this.
should I make both data structures dictionaries? I still wouldn't know how to execute the above steps.
thanks for any insights.
It depends what you mean by "append the value of that key back to mylist". Do you mean, append the value we got from mydict to the list that contains the country name we used to look it up? Or do you mean to append that value from mydict to mylist itself?
The latter would be a strange thing to do, since mylist is a list of lists, wheras the value we are talking about ("row[0]") is a string. I can't intuit why we would append some strings to a list of lists, even though this is what your description says to do. So I'm assuming the former :)
Let's assume that your mylist is actually called "indicators", and mydict is called "region_info"
for indicator in indicators:
try:
indicator.append(region_info[indicator[0]])
except:
print "there is no region info for country name %s" % indicator[0]
Another comment on readability. I think that the elements of mylist would be better being dicts than lists. I would do this:
newrow={"country_name" : name,
"date": date,
"population": pop,
#... etc
because then when you use these things, you can use them by name instead of number, which will be more readable:
for indicator in indicators:
try:
indicator["region_info"] = region_info[indicator["country_name"]]
except:
print "there is no region info for country name %s" % indicator["country_name"]

Assign strings to IDs in Python

I am reading a text file with python, formatted where the values in each column may be numeric or strings.
When those values are strings, I need to assign a unique ID of that string (unique across all the strings under the same column; the same ID must be assigned if the same string appears elsewhere under the same column).
What would be an efficient way to do it?
Use a defaultdict with a default value factory that generates new ids:
ids = collections.defaultdict(itertools.count().next)
ids['a'] # 0
ids['b'] # 1
ids['a'] # 0
When you look up a key in a defaultdict, if it's not already present, the defaultdict calls a user-provided default value factory to get the value and stores it before returning it.
collections.count() creates an iterator that counts up from 0, so collections.count().next is a bound method that produces a new integer whenever you call it.
Combined, these tools produce a dict that returns a new integer whenever you look up something you've never looked up before.
defaultdict answer updated for python 3, where .next is now .__next__, and for pylint compliance, where using "magic" __*__ methods is discouraged:
ids = collections.defaultdict(functoools.partial(next, itertools.count()))
Create a set, and then add strings to the set. This will ensure that strings are not duplicated; then you can use enumerate to get a unique id of each string. Use this ID when you are writing the file out again.
Here I am assuming the second column is the one you want to scan for text or integers.
seen = set()
with open('somefile.txt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
try:
int(row[1])
except ValueError:
seen.add(row[1]) # adds string to set
# print the unique ids for each string
for id,text in enumerate(seen):
print("{}: {}".format(id, text))
Now you can take the same logic, and replicate it across each column of your file. If you know the column length in advanced, you can have a list of sets. Suppose the file has three columns:
unique_strings = [set(), set(), set()]
with open('file.txt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
for column,value in enumerate(row):
try:
int(value)
except ValueError:
# It is not an integer, so it must be
# a string
unique_strings[column].add(value)

Categories