Generating a .CSV with Several Columns - Use a Dictionary? - python

I am writing a script that looks through my inventory, compares it with a master list of all possible inventory items, and tells me what items I am missing. My goal is a .csv file where the first column contains a unique key integer and then the remaining several columns would have data related to that key. For example, a three row snippet of my end-goal .csv file might look like this:
100001,apple,fruit,medium,12,red
100002,carrot,vegetable,medium,10,orange
100005,radish,vegetable,small,10,red
The data for this is being drawn from a couple sources. 1st, a query to an API server gives me a list of keys for items that are in inventory. 2nd, I read in a .csv file into a dict that matches keys with item name for all possible keys. A snippet of the first 5 rows of this .csv file might look like this:
100001,apple
100002,carrot
100003,pear
100004,banana
100005,radish
Note how any key in my list of inventory will be found in this two column .csv file that gives all keys and their corresponding item name and this list minus my inventory on hand yields what I'm looking for (which is the inventory I need to get).
So far I can get a .csv file that contains just the keys and item names for the items that I don't have in inventory. Give a list of inventory on hand like this:
100003,100004
A snippet of my resulting .csv file looks like this:
100001,apple
100002,carrot
100005,radish
This means that I have pear and banana in inventory (so they are not in this .csv file.)
To get this I have a function to get an item name when given an item id that looks like this:
def getNames(id_to_name, ids):
return [id_to_name[id] for id in ids]
Then a function which gives a list of keys as integers from my inventory server API call that returns a list and I've run this function like this:
invlist = ServerApiCallFunction(AppropriateInfo)
A third function takes this invlist as its input and returns a dict of keys (the item id) and names for the items I don't have. It also writes the information of this dict to a .csv file. I am using the set1 - set2 method to do this. It looks like this:
def InventoryNumbers(inventory):
with open(csvfile,'w') as c:
c.write('InvName' + ',InvID' + '\n')
missinginvnames = []
with open("KeyAndItemNameTwoColumns.csv","rb") as fp:
reader = csv.reader(fp, skipinitialspace=True)
fp.readline() # skip header
invidsandnames = {int(id): str.upper(name) for id, name in reader}
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
print missinginvnameswithids
with open(csvfile,'a') as c:
for invname, invid in missinginvnameswithids.iteritems():
c.write(invname + ',' + str(invid) + '\n')
return missinginvnameswithids
Which I then call like this:
InventoryNumbers(invlist)
With that explanation, now on to my question here. I want to expand the data in this output .csv file by adding in additional columns. The data for this would be drawn from another .csv file, a snippet of which would look like this:
100001,fruit,medium,12,red
100002,vegetable,medium,10,orange
100003,fruit,medium,14,green
100004,fruit,medium,12,yellow
100005,vegetable,small,10,red
Note how this does not contain the item name (so I have to pull that from a different .csv file that just has the two columns of key and item name) but it does use the same keys. I am looking for a way to bring in this extra information so that my final .csv file will not just tell me the keys (which are item ids) and item names for the items I don't have in stock but it will also have columns for type, size, number, and color.
One option I've looked at is the defaultdict piece from collections, but I'm not sure if this is the best way to go about what I want to do. If I did use this method I'm not sure exactly how I'd call it to achieve my desired result. If some other method would be easier I'm certainly willing to try that, too.
How can I take my dict of keys and corresponding item names for items that I don't have in inventory and add to it this extra information in such a way that I could output it all to a .csv file?
EDIT: As I typed this up it occurred to me that I might make things easier on myself by creating a new single .csv file that would have date in the form key,item name,type,size,number,color (basically just copying in the column for item name into the .csv that already has the other information for each key.) This way I would only need to draw from one .csv file rather than from two. Even if I did this, though, how would I go about making my desired .csv file based on only those keys for items not in inventory?
ANSWER: I posted another question here about how to implement the solution I accepted (becauseit was giving me a value error since my dict values were strings rather than sets to start with) and I ended up deciding that I wanted a list rather than a set (to preserve the order.) I also ended up adding the column with item names to my .csv file that had all the other data so that I only had to draw from one .csv file. That said, here is what this section of code now looks like:
MyDict = {}
infile = open('FileWithAllTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in missinginvids: #note that this is the list I was using as the keys for my dict which I was zipping together with a corresponding list of item names to make my dict before.
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict

it sounds like what you need is a dict mapping ints to sets, ie,
MyDict = {100001: set([apple]), 100002: set([carrot])}
you can add with update:
MyDict[100001].update([fruit])
which would give you: {100001: set([apple, fruit]), 100002: set([carrot])}
Also if you had a list of attributes of carrot... [vegetable,orange]
you could say MyDict[100002].update([vegetable, orange])
and get: {100001: set([apple, fruit]), 100002: set([carrot, vegetable, orange])}
does this answer your question?
EDIT:
to read into CSV...
infile = open('MyFile.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[spl_line[0]].update(spl_line[1:])

This isn't an answer to the question, but here is a possible way of simplifying your current code.
This:
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
Can be replaced with:
invonhandset = set(inventory)
missinginvnameswithids = {k: v for k, v in invidsandnames.iteritems() if k in in inventory}
Or:
invonhandset = set(inventory)
for key in invidsandnames.keys():
if key not in invonhandset:
del invidsandnames[key]
missinginvnameswithids = invidsandnames

Have you considered making a temporary RDB (python has sqlite support baked in) and for reasonable numbers of items I don't think you would have a performance issues.
I would turn each CSV file and the result from the web-api into a tables (one table per data source). You can then do everything you want to do with some SQL queries + joins. Once you have the data you want, you can then dump it back to CSV.

Related

Python Refactoring - Changing Variable Type and Value Within a Loop

I'm working on automating some word and PDF documents that need to be updated on a certain cadence.
The way I'm doing this is using dictionaries that replace variables within word documents.
My code works but because my area is not tech savvy I'm using an excel file so people can replace the values in that file whenever they need to update the documents.
I was also successful on pulling the dictionary key and values from excel but I'm trying to refactor this code which is repetitive. Here is an excerpt with 2 of the 7 dictionaries I'm creating:
dic = pd.read_excel('test.xlsx',"AD")
AD = dict(zip(dic.Key,dic.Value))
dic = pd.read_excel('test.xlsx',"RSM")
RSM = dict(zip(dic.Key,dic.Value))
I'm trying to refactor this so I can run it all within a single loop and trying something like this:
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = [AD, RSM]
for item in groups:
dic = pd.read_excel('test.xlsx',item)
item = dict(zip(dic.Key,dic.Value))
So I'm basically first using the variable as a string to call the excel tab within the read_excel method and then I want to replace that same variable to become the output dictionary.
When I print item within the loop I do get the correct dictionaries but I'm not able to output a variable that stores each dictionary that the loop creates.
Any help would be appreciated.
Thanks!
You're almost there, you can just have a dictionary of dictionaries:
import pandas as pd
groups = ['AD', 'RSM']
dicts = {}
for item in groups:
dic = pd.read_excel('test.xlsx', item)
dicts[item] = dict(zip(dic.Key, dic.Value))
Now you can just access them like this:
print(dicts['AD']['some key'])
The values of a dictionary can be anything, including other dictionaries. Keys of dictionaries can be many things as well, as long as they're hashable, and strings are a common choice of course - and the names of your groups are just that.
Also note that I removed the variables named AD and RSM. You don't really achieve anything by having variables that are named after the string value they are assigned. It only serves to be able to leave off the quotes where you use the values, but it creates an additional indirection that serves no purpose.
If you don't even need the list of groups, but just want groups to be the actual dictionaries:
import pandas as pd
groups = {}
for item in ['AD', 'RSM']:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value))
The problem is that you assign the result to the item variable and not to an entry in the list.
A simple fix would be to use a dictionary instead of a list to save the reult, eg
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = {AD: None, RSM: None}
for item in groups.keys():
dic = pd.read_excel('test.xlsx',item)
groups[item] = dict(zip(dic.Key,dic.Value))
My suggestion would be to use an overall dictionary to track your work and also to save the results there. I refactored your code slightly to this:
import pandas as pd
groups = dict.fromkeys(('AD', 'RSM')) # setup main dict containing dicts
for item in groups:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value)) # store individual dict
There's no need for your global constants that are used only once, so I removed those. I also added some spaces to help your Python code conform with PEP-8, the global standard style guide.
Now you can access each dictionary as you like, for example, groups['AD'].

Is there a way to "transform" a CSV table into a simple nested if... else block in python?

I'm fairly new to python and I'm looking forward to achieve the following:
I have a table with several conditions as in the image below (maximum 5 conditions) along with various attributes. Each condition comes from a specific set of values, for example Condition 1 has 2 possible values, Condition 2 has 4 possible values, Condition 3 has 2 possible values etc..
What I would like to do: From the example table above, I would like to generate a simple python code so that when I execute my function and import a CSV file containing the table above, I should get the following output saved as a *.py file:
def myFunction(Attribute, Condition):
if Attribute1 & Condition1:
myValue = val_11
if Attribute1 & Condition2:
myValue = val_12
...
...
if Attribute5 & Condition4:
myValue = val_54
NOTE: Each CSV file will contain only one sheet and the titles for the columns do not change.
UPDATE, NOTE#2: Both "Attribute" and "Condition" are string values, so simple string comparisons would suffice.
Is there a simple way to do this? I dove into NLP and realized that it is not possible (at least from what I found in the literature). I'm open to all forms of suggestions/answers.
You can't really use "If"s and "else"s, since, if I understand your question correctly, you want to be able to read the conditions, attributes and values from a CSV file. Using "If"s and "else"s, you would only be able to check a fixed range of conditions and attributes defined in your code. What I would do, is to write a parser (piece of code, which reads the contents of your CSV file and saves it in another, more usable form).
In this case, the parser is the parseCSVFile() function. Instead of the ifs and elses comparing attributes and conditions, you now use the attributes and conditions to access a specific element in a dictionary (similar to an array or list, but you can now use for example string keys instead of the numerical indexes). I used a dictionary containing a dictionary at each position to split the CSV contents into their rows and columns. Since I used dictionaries, you can now use the strings of the Attributes and Conditions to access your values instead of doing lots of comparisons.
#Output Dictionary
ParsedDict = dict()
#This is either ';' or ',' depending on your operating system or you can open a CSV file with notepad for example to check which character is used
CSVSeparator = ';'
def parseCSVFile(filePath):
global ParsedDict
f = open(filePath)
fileLines = f.readlines()
f.close()
#Extract the conditions
ConditionsArray = (fileLines[0].split(CSVSeparator))[1:]
for x in range(len(fileLines)-1):
#Remove unwanted characters such as newline characters
line = fileLines[1 + x].strip()
#Split by the CSV separation character
LineContents = line.split(CSVSeparator)
ConditionsDict = dict()
for y in range(len(ConditionsArray)):
ConditionsDict.update({ConditionsArray[y]: LineContents[1 + y]})
ParsedDict.update({LineContents[0]: ConditionsDict})
def myFunction(Attribute, Condition):
myValue = ParsedDict[Attribute][Condition]
The "[1:]" is to ignore the contents in the first column (empty field at the top left and the "Attribute x" fields) when reading either the conditions or the values
Use the parseCSVFile() function to extract the information from the csv file
and the myFunction() to get the value you want

Removing extra formatting from a python list element ( [''] )

Re-learning python after not using it for a few years - so go nice on me.
The basis, is I am reading in data from a .csv file, the information i am reading in is as follows
E1435
E46371
E1696
E27454
However, when using print(list[0]) for example, it produces
['E1435']
I am trying to use these pieces of data to interpolate into an API request string, and the " [' '] " in them is breaking the requests - basically, I need the elements in the list to not have the square brackets and quotes when using them as variables.
My interpolation is as follows, in case the way I'm interpolating is the problem:
req = requests.get('Linkgoeshere/%s' % list[i])
Edit;
A sample of the data i'm using is listed above with "E1435, E46371" etc. each item in the csv is a new row in the same column.
As per a request, i have produced a minimal reproduction of my experience.
import csv
#list to store data from csv
geoCode = []
#Read in locations from a designated file
with open('Locations.csv','rt')as f:
data = csv.reader(f)
for row in data:
geoCode.append(row)
i=0
for item in geoCode:
#print the items in the list
print(geoCode[i])
i+=1
It appears that list[i] is itself a nested list, so you need another subscript to get to the element inside it:
print(list[i][0])
NB: Avoid naming variables list as it overrides the built-in list type. Try using a plural word like codes or ids instead.

looking up values and adding to data structure

I have a .tsv file of text data, named world_bank_indicators.
I have another tsv file, which contains additional information that I need to append to a list item in my script. that file is named world_bank_regions
So far, I have code (thanks to some of the good people on this site) that filters the data that I need from world bank indicators and writes it as a 2D list to the variable mylist. additionally, I have code that reads in the second file as a dictionary. code is below:
from math import log
import csv
import re
#filehandles for spreadsheets
fhand=open("world_bank_indicators.txt", "rU")
fhand2=open("world_bank_regions.txt", "rU")
#csv reader objects for files
reader=csv.reader(fhand, dialect="excel", delimiter="\t")
reader2=csv.reader(fhand2, dialect="excel", delimiter="\t")
#empty list for appending data into
#appending into this will create a 2d list, or "a list OF lists"
mylist=list()
mylist2=list()
mydict=dict()
myset=set()
newset=set()
#filters data by iterating over each row in the reader object
#note that this IGNORES headers. This will need to be appended later
for row in reader:
if row[1]=="7/1/2000" or row[1]=="7/1/2010":
#plug columns into specific variables, for easier coding
#replaces "," with empty space for columns that need to be converted to floats
name=row[0]
date=row[1]
pop=row[9].replace(",",'')
mobile=row[4].replace(",",'')
health=row[6]
internet=row[5]
gdp=row[19].replace(",",'')
#only appends rows that have COMPLETE rows of data
if name != '' and date != '' and pop != '' and mobile != '' and health != '' and internet != '' and gdp != '':
#declare calculated variables
mobcap=(float(mobile)/float(pop))
gdplog=log(float(gdp))
healthlog=log(float(health))
#re-declare variables as strings, rounds decimal points to 5th place
#this could have been done once in above step, merely re-coded here for easier reading
mobcap=str(round(mobcap, 5))
gdplog=str(round(gdplog, 5))
healthlog=str(round(healthlog,5))
#put all columns into 2d list (list of lists)
newrow=[name, date, pop, mobile, health, internet, gdp, mobcap, gdplog, healthlog]
mylist.append(newrow)
myset.add(name)
for row in reader2:
mydict[row[2]]=row[0]
what I need to do now is
1. read the country name from the mylist variable,
2.look up that string in the key value of mydict, and
3. append the value of that key back to mylist.
I'm totally stumped on how to do this.
should I make both data structures dictionaries? I still wouldn't know how to execute the above steps.
thanks for any insights.
It depends what you mean by "append the value of that key back to mylist". Do you mean, append the value we got from mydict to the list that contains the country name we used to look it up? Or do you mean to append that value from mydict to mylist itself?
The latter would be a strange thing to do, since mylist is a list of lists, wheras the value we are talking about ("row[0]") is a string. I can't intuit why we would append some strings to a list of lists, even though this is what your description says to do. So I'm assuming the former :)
Let's assume that your mylist is actually called "indicators", and mydict is called "region_info"
for indicator in indicators:
try:
indicator.append(region_info[indicator[0]])
except:
print "there is no region info for country name %s" % indicator[0]
Another comment on readability. I think that the elements of mylist would be better being dicts than lists. I would do this:
newrow={"country_name" : name,
"date": date,
"population": pop,
#... etc
because then when you use these things, you can use them by name instead of number, which will be more readable:
for indicator in indicators:
try:
indicator["region_info"] = region_info[indicator["country_name"]]
except:
print "there is no region info for country name %s" % indicator["country_name"]

Converting dict values into a set while preserving the dict

I have a dict like this:
(100002: 'APPLE', 100004: 'BANANA', 100005: 'CARROT')
I am trying to make my dict have ints for the keys (as it does now) but have sets for the values (rather than strings as it is now.) My goal is to be able to read from a .csv file with one column for the key (an int which is the item id number) and then columns for things like size, shape, and color. I want to add this information into my dict so that only the information for keys already in dict are added.
My goal dict might look like this:
(100002: set(['APPLE','MEDIUM','ROUND','RED']), 100004: set(['Banana','MEDIUM','LONG','YELLOW']), 100005: set(['CARROT','MEDIUM','LONG','ORANGE'])
Starting with my dict of just key + string for item name, I tried code like this to read the extra information in from a .csv file:
infile = open('FileWithTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[int(spl_line[0])].update(spl_line[1:])
Unfortunately this errors out saying AttributeError: 'str' object has no attribute 'update'. My attempts to change my dictionary's values into sets so that I can then .update them have yielded things like this: (100002: set(['A','P','L','E']), 100004: set(['B','A','N']), 100005: set(['C','A','R','O','T']))
I want to convert the values to a set so that the string that is currently the value will be the first string in the set rather than breaking up the string into letters and making a set of those letters.
I also tried making the values a set when I create the dict by zipping two lists together but it didn't seem to make any difference. Something like this
MyDict = dict(zip(listofkeys, set(listofnames)))
still makes the whole listofnames list into a set but it doesn't achieve my goal of making each value in MyDict into a set with the corresponding string from listofnames as the first string in the set.
How can I make the values in MyDict into a set so that I can add additional strings to that set without turning the string that is currently the value in the dict into a set of individual letters?
EDIT:
I currently make MyDict by using one function to generate a list of item ids (which are the keys) and another function which looks up those item ids to generate a list of corresponding item names (using a two column .csv file as the data source) and then I zip them together.
ANSWER:
Using the suggestions here I came up with this solution. I found that the section that has set()).update can easily be changed to list()).append to yield a list rather than a set (so that the order is preserved.) I also found it easier to update by .csv data input files by adding the column containing names to the FileWithTheData.csv so that I didn't have to mess with making the dict, converting the values to sets, and then adding in more data. My code for this section now looks like this:
MyDict = {}
infile = open('FileWithTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in itemidlist: #note that this is the list I was formerly zipping together with a corresponding list of names to make my dict
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict
Your error is because originally your MyDict variable maps an integer to a string. When you are trying to update it you are treating the value like a set, when it is a string.
You can use a defaultdict for this:
combined_dict = defaultdict(set)
# first add all the values from MyDict
for key, value in MyDict.iteritems():
combined_dict[int(key)].add(value)
# then add the values from the file
infile = open('FileWithTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
combined_dict[int(sp_line[0])].update(spl_line[1:])
Your issue is with how you are initializing MyDict, try changing it to the following:
MyDict = dict(zip(listofkeys, [set([name]) for name in listofnames]))
Here is a quick example of the difference:
>>> listofkeys = [100002, 100004, 100005]
>>> listofnames = ['APPLE', 'BANANA', 'CARROT']
>>> dict(zip(listofkeys, set(listofnames)))
{100002: 'CARROT', 100004: 'APPLE', 100005: 'BANANA'}
>>> dict(zip(listofkeys, [set([name]) for name in listofnames]))
{100002: set(['APPLE']), 100004: set(['BANANA']), 100005: set(['CARROT'])}
set(listofnames) is just going to turn your list into a set, and the only effect that might have is to reorder the values as seen above. You actually want to take each string value in your list, and convert it to a one-element set, which is what the list comprehension does.
After you make this change, your current code should work fine, although you can just do the contains check directly on the dictionary instead of explicitly checking the keys (key in MyDict is the same as key in MyDict.keys()).

Categories