import collections
header_dict = {'account number':'ACCOUNT_name','accountID':'ACCOUNT_name','name':'client','first name':'client','tax id':'tin'}
#header_dict = collections.defaultdict(lambda: 'tin') # attempted use of defaultdict...destroys my dictionary
given_header = ['account number','name','tax id']#,'tax identification number']#,'social security number'
#given_header = ['account number','name','tax identification number']...non working header layout
fileLayout = [header_dict[ting] for ting in given_header if ting] #create if else..if ting exists, add to list...else if not in list, add to dictionary
def getLayout(ting):
global given_header
global fileLayout
return given_header[fileLayout.index(ting)]
print getLayout('ACCOUNT_name')
print getLayout('client')
print getLayout('tin')
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
I am working with many files of random, mixed up layouts/column orders. I have a set template for my db table of 'ACCOUNT_name','client','tin' that I want the files to be ordered in. I have created a dictionary of the possible header/column names I might find in other files as keys and my set header names as values. So, for example, if I wanted to see where to put the column 'account number' from one of my given files, I would type header_dict['account number'].
This would give me the corresponding column from my template, 'ACCOUNT_name'. This works great...I also added another feature. Instead of having to type 'account number'..I made a list comprehension that looks up each value by key.
This list I just created with the 'fileLayout' list comprehension essentially transforms my given file's header into my desired names: ['ACCOUNT_name','client']
That makes life a lot easier...I know that I want to look up 'ACCOUNT_name', or 'client'. Next I run a function 'getLayout' that returns the index of the desired columns I am searching...So if I want to see where my desired column 'ACCOUNT_name' is in the file, I just run the function which is called like this...
getLayout('ACCOUNT_name')
Now at this point, I can easily print the columns to my order...with:
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
The above code gives me [('account number'),('name'),('tax id')], which is exactly what I want...
But what if there is a new header I am not used to ?? Lets use the same example code above but change the list 'given_header' to this:
given_header = ['account number','name','tax identification number']
I most certainly get the key error, KeyError: 'tax identification number' I know I can use defaultdict but when I try to use it with the set value 'tin', I end up overwriting my entire dictionary... What I would ultimately like to end up doing is this...
I would like to create an else within my list comprehension that allows me to standard input dictionary entries if they don't exist. In other words, since 'tax identification number' does not exists as a key, add it as one to my dict and give it the value 'tin' via raw_input. Has anyone ever done or tried anything like this? Any ideas? If you have and have any suggestions, I am all ears. I'm struggling on this issue...
The way I would want to go about this is in the list comprehension..
fileLayout = [header_dict[ting] for ting in given_header if ting else raw_input('add missing key value pair to dictionary')] # or do something of the sort.
Related
I'm working on automating some word and PDF documents that need to be updated on a certain cadence.
The way I'm doing this is using dictionaries that replace variables within word documents.
My code works but because my area is not tech savvy I'm using an excel file so people can replace the values in that file whenever they need to update the documents.
I was also successful on pulling the dictionary key and values from excel but I'm trying to refactor this code which is repetitive. Here is an excerpt with 2 of the 7 dictionaries I'm creating:
dic = pd.read_excel('test.xlsx',"AD")
AD = dict(zip(dic.Key,dic.Value))
dic = pd.read_excel('test.xlsx',"RSM")
RSM = dict(zip(dic.Key,dic.Value))
I'm trying to refactor this so I can run it all within a single loop and trying something like this:
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = [AD, RSM]
for item in groups:
dic = pd.read_excel('test.xlsx',item)
item = dict(zip(dic.Key,dic.Value))
So I'm basically first using the variable as a string to call the excel tab within the read_excel method and then I want to replace that same variable to become the output dictionary.
When I print item within the loop I do get the correct dictionaries but I'm not able to output a variable that stores each dictionary that the loop creates.
Any help would be appreciated.
Thanks!
You're almost there, you can just have a dictionary of dictionaries:
import pandas as pd
groups = ['AD', 'RSM']
dicts = {}
for item in groups:
dic = pd.read_excel('test.xlsx', item)
dicts[item] = dict(zip(dic.Key, dic.Value))
Now you can just access them like this:
print(dicts['AD']['some key'])
The values of a dictionary can be anything, including other dictionaries. Keys of dictionaries can be many things as well, as long as they're hashable, and strings are a common choice of course - and the names of your groups are just that.
Also note that I removed the variables named AD and RSM. You don't really achieve anything by having variables that are named after the string value they are assigned. It only serves to be able to leave off the quotes where you use the values, but it creates an additional indirection that serves no purpose.
If you don't even need the list of groups, but just want groups to be the actual dictionaries:
import pandas as pd
groups = {}
for item in ['AD', 'RSM']:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value))
The problem is that you assign the result to the item variable and not to an entry in the list.
A simple fix would be to use a dictionary instead of a list to save the reult, eg
import pandas as pd
AD = "AD"
RSM = "RSM"
groups = {AD: None, RSM: None}
for item in groups.keys():
dic = pd.read_excel('test.xlsx',item)
groups[item] = dict(zip(dic.Key,dic.Value))
My suggestion would be to use an overall dictionary to track your work and also to save the results there. I refactored your code slightly to this:
import pandas as pd
groups = dict.fromkeys(('AD', 'RSM')) # setup main dict containing dicts
for item in groups:
dic = pd.read_excel('test.xlsx', item)
groups[item] = dict(zip(dic.Key, dic.Value)) # store individual dict
There's no need for your global constants that are used only once, so I removed those. I also added some spaces to help your Python code conform with PEP-8, the global standard style guide.
Now you can access each dictionary as you like, for example, groups['AD'].
I am using the python code below to extract some values from an excel spreadsheet and then push them to an html page for further processing. I would like to modify the code below so that I can add additional values against each task, any help
the code below does spit out the following:
{'line items': {'AMS Upgrade': '30667', 'BMS works':
'35722'}}
How can I revise the code below so that I can add 2 more values against each task i.e. AMS Upgrade and BMS works
and get the likes of (note the structure below could be wrong)
{'line items': {'AMS Upgrade': {'30667','100%', '25799'}},{'BMS works':
{'10667','10%', '3572'}} }
Code:
book = xlrd.open_workbook("Example - supporting doc.xls")
first_sheet = book.sheet_by_index(-1)
nested_dict = {}
nested_dict["line items"] = {}
for i in range(21,175):
Line_items = first_sheet.row_slice(rowx=i, start_colx=2, end_colx=8)
if str(Line_items[0].value) and str(Line_items[1].value):
if not Line_items[5].value ==0 :
nested_dict["line items"].update({str(Line_items[0].value) : str(Line_items[1].value)})
print nested_dict
print json.dumps(nested_dict)
*** as requested see excel extract below
In Python, each key of a dict can only be associated with a single value. However that single value can be a dict, list, set, etc that holds many values.
You will need to decide the type to use for the value associated with the 'AMS Upgrade' key, if you want it to hold multiple values like '30667','10%', '222'.
Note: what you have written:
{'30667','100%', '25799'}
Is a set literal in Python.
I'm trying to extract data from a JSON file with Python.
Mainly, I want to pull out the date and time from the "Technicals" section, to put that in one column of a dataframe, as well as pulling the "AKG" number and putting that in the 2nd col of the dataframe. Yes, I've looked at similar questions, but this issue is different. Thanks for your help.
A downNdirty example of the JSON file is below:
{ 'Meta Data': { '1: etc'
'2: etc'},
'Technicals': { '2017-05-04 12:00': { 'AKG': '64.8645'},
'2017-05-04 12:30': { 'AKG': '65.7834'},
'2017-05-04 13:00': { 'AKG': '63.2348'}}}
As you can see, and what's stumping me, is while the date stays the same the time advances. 'AKG' never changes, but the number does. Some of the relevant code I've been using is below. I can extract the date and time, but I can't seem to reach the AKG numbers. Note, I don't need the "AKG", just the number.
I'll mention: I'm creating a DataFrame because this will be easier to work with when creating plots with the data...right? I'm open to an array of lists et al, or anything easier, if that will ultimately help me with the plots.
akg_time = []
akg_akg = []
technicals = akg_data['Technicals'] #akg_data is the entire json file
for item in technicals: #this works
akg_time.append(item)
for item in technicals: #this not so much
symbol = item.get('AKG')
akg_akg.append(symbol)
pp.pprint(akg_akg)
error: 'str' object has no attribute 'get'
You've almost got it. You don't even need the second loop. You can append the akg value in the first one itself:
for key in technicals: # renaming to key because that is a clearer name
akg_time.append(key)
akg_akg.append(technicals[key]['AKG'])
Your error is because you believe item (or key) is a dict. It is not. It is just a string, one of the keys of the technicals dictionary, so you'd actually need to use symbols = technicals[key].get('AKG').
Although Coldspeed answer is right: when you have a dictionary you loop through keys and values like this:
Python 3
for key,value in technicals.items():
akg_time.append(key)
akg_akg.append(value["akg"])
Python 2
for key,value in technicals.iteritems():
akg_time.append(key)
akg_akg.append(value["akg"])
I have a dict created in a for loop in Python dict = {year:{month:{day:[title]}}} where year, month, day, and title are all variables. I then use data = json.dumps(dict) which works perfectly. But if the day is the same, I'd like it to add another [title] aspect to the array, so it would be
for title in x:
dict = {year:{month:{day:[title]}}}
data = json.dumps(dict)
if day==day:
//insert another [title] right next to [title]
I've tried using append, update, and insert, but none of them work.
How would I go about doing this?
Note that as user2357112 mentioned, you are creating a Python dict -- not a Python list (aka a JSON "array"). Thus, when you say "[title] right next to [title]" there is a bit of confusion. Dicts do not use the order you are expecting (they use a hash-ordering).
That, and you are attempting to add a field after you've dumped the JSON to a string. You should do that before you dump it. More so, you're throwing away both your dict and data variables every loop. As written, your code will only have access to the variables in the last iteration of the loop.
And another important note: don't overload dict. Rename your variable to something else.
Also, your line day==day will always return True...
Here is what I think you are trying to do: you are creating a "calendar" of sorts that is organized into years, then months, then days. Each day has a list of "titles."
# Variables I'm assuming exist:
# `title`, `year`, `month`, `day`, `someOtherDay`, `titles`, `someOtherTitle`
myDict = {}
for title in titles: #Renamed `x` to `titles` for clarity.
# Make sure myDict has the necessary keys.
if not myDict[year]:
myDict[year] = {}
if not myDict[year][month]:
myDict[year][month] = {}
# Set the day to be a list with a single `title` (and possibly two).
myDict[year][month][day] = [title]
if day==someOtherDay:
myDict[year][month][day].append(someotherTitle)
# And FINALLY dump the result to a string.
data = json.dumps(myDict)
I am writing a script that looks through my inventory, compares it with a master list of all possible inventory items, and tells me what items I am missing. My goal is a .csv file where the first column contains a unique key integer and then the remaining several columns would have data related to that key. For example, a three row snippet of my end-goal .csv file might look like this:
100001,apple,fruit,medium,12,red
100002,carrot,vegetable,medium,10,orange
100005,radish,vegetable,small,10,red
The data for this is being drawn from a couple sources. 1st, a query to an API server gives me a list of keys for items that are in inventory. 2nd, I read in a .csv file into a dict that matches keys with item name for all possible keys. A snippet of the first 5 rows of this .csv file might look like this:
100001,apple
100002,carrot
100003,pear
100004,banana
100005,radish
Note how any key in my list of inventory will be found in this two column .csv file that gives all keys and their corresponding item name and this list minus my inventory on hand yields what I'm looking for (which is the inventory I need to get).
So far I can get a .csv file that contains just the keys and item names for the items that I don't have in inventory. Give a list of inventory on hand like this:
100003,100004
A snippet of my resulting .csv file looks like this:
100001,apple
100002,carrot
100005,radish
This means that I have pear and banana in inventory (so they are not in this .csv file.)
To get this I have a function to get an item name when given an item id that looks like this:
def getNames(id_to_name, ids):
return [id_to_name[id] for id in ids]
Then a function which gives a list of keys as integers from my inventory server API call that returns a list and I've run this function like this:
invlist = ServerApiCallFunction(AppropriateInfo)
A third function takes this invlist as its input and returns a dict of keys (the item id) and names for the items I don't have. It also writes the information of this dict to a .csv file. I am using the set1 - set2 method to do this. It looks like this:
def InventoryNumbers(inventory):
with open(csvfile,'w') as c:
c.write('InvName' + ',InvID' + '\n')
missinginvnames = []
with open("KeyAndItemNameTwoColumns.csv","rb") as fp:
reader = csv.reader(fp, skipinitialspace=True)
fp.readline() # skip header
invidsandnames = {int(id): str.upper(name) for id, name in reader}
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
print missinginvnameswithids
with open(csvfile,'a') as c:
for invname, invid in missinginvnameswithids.iteritems():
c.write(invname + ',' + str(invid) + '\n')
return missinginvnameswithids
Which I then call like this:
InventoryNumbers(invlist)
With that explanation, now on to my question here. I want to expand the data in this output .csv file by adding in additional columns. The data for this would be drawn from another .csv file, a snippet of which would look like this:
100001,fruit,medium,12,red
100002,vegetable,medium,10,orange
100003,fruit,medium,14,green
100004,fruit,medium,12,yellow
100005,vegetable,small,10,red
Note how this does not contain the item name (so I have to pull that from a different .csv file that just has the two columns of key and item name) but it does use the same keys. I am looking for a way to bring in this extra information so that my final .csv file will not just tell me the keys (which are item ids) and item names for the items I don't have in stock but it will also have columns for type, size, number, and color.
One option I've looked at is the defaultdict piece from collections, but I'm not sure if this is the best way to go about what I want to do. If I did use this method I'm not sure exactly how I'd call it to achieve my desired result. If some other method would be easier I'm certainly willing to try that, too.
How can I take my dict of keys and corresponding item names for items that I don't have in inventory and add to it this extra information in such a way that I could output it all to a .csv file?
EDIT: As I typed this up it occurred to me that I might make things easier on myself by creating a new single .csv file that would have date in the form key,item name,type,size,number,color (basically just copying in the column for item name into the .csv that already has the other information for each key.) This way I would only need to draw from one .csv file rather than from two. Even if I did this, though, how would I go about making my desired .csv file based on only those keys for items not in inventory?
ANSWER: I posted another question here about how to implement the solution I accepted (becauseit was giving me a value error since my dict values were strings rather than sets to start with) and I ended up deciding that I wanted a list rather than a set (to preserve the order.) I also ended up adding the column with item names to my .csv file that had all the other data so that I only had to draw from one .csv file. That said, here is what this section of code now looks like:
MyDict = {}
infile = open('FileWithAllTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in missinginvids: #note that this is the list I was using as the keys for my dict which I was zipping together with a corresponding list of item names to make my dict before.
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict
it sounds like what you need is a dict mapping ints to sets, ie,
MyDict = {100001: set([apple]), 100002: set([carrot])}
you can add with update:
MyDict[100001].update([fruit])
which would give you: {100001: set([apple, fruit]), 100002: set([carrot])}
Also if you had a list of attributes of carrot... [vegetable,orange]
you could say MyDict[100002].update([vegetable, orange])
and get: {100001: set([apple, fruit]), 100002: set([carrot, vegetable, orange])}
does this answer your question?
EDIT:
to read into CSV...
infile = open('MyFile.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[spl_line[0]].update(spl_line[1:])
This isn't an answer to the question, but here is a possible way of simplifying your current code.
This:
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
Can be replaced with:
invonhandset = set(inventory)
missinginvnameswithids = {k: v for k, v in invidsandnames.iteritems() if k in in inventory}
Or:
invonhandset = set(inventory)
for key in invidsandnames.keys():
if key not in invonhandset:
del invidsandnames[key]
missinginvnameswithids = invidsandnames
Have you considered making a temporary RDB (python has sqlite support baked in) and for reasonable numbers of items I don't think you would have a performance issues.
I would turn each CSV file and the result from the web-api into a tables (one table per data source). You can then do everything you want to do with some SQL queries + joins. Once you have the data you want, you can then dump it back to CSV.