Read from file to dictionary as floats instead of strings - python

I'm loading and extracting data from python, which I want to be stored in a dictionary.
I'm using csv to write read the data and externally it is just stored as to comma-separated columns. This works great, but when the data is initially read it (obviously) is read as string.
I can convert it to a dictionary with both keys and values as floats using two lines of code, but my question whether I can load the data directly as floats into a dictionary.
My original code was:
reader = csv.reader(open('testdict.csv','rb'))
dict_read = dict((x,y) for (x,y) in reader)
Which I have changed to:
reader = csv.reader(open('testdict.csv','rb'))
read = [(float(x),float(y)) for (x,y) in reader]
dict_read = dict(read)
which loads the data in the desired way.
So, is it possible to modify the first dict_read = dict((x,y) for (x,y) in reader) to do what the code below does?
SOLUTION:
The solution is to use the map-function, which has to be used on iterable objects:
dict_read = dict(map(float,x) for x in reader)

Try this:
dict_read = dict((map(float,x) for x in reader)

Related

Removing extra formatting from a python list element ( [''] )

Re-learning python after not using it for a few years - so go nice on me.
The basis, is I am reading in data from a .csv file, the information i am reading in is as follows
E1435
E46371
E1696
E27454
However, when using print(list[0]) for example, it produces
['E1435']
I am trying to use these pieces of data to interpolate into an API request string, and the " [' '] " in them is breaking the requests - basically, I need the elements in the list to not have the square brackets and quotes when using them as variables.
My interpolation is as follows, in case the way I'm interpolating is the problem:
req = requests.get('Linkgoeshere/%s' % list[i])
Edit;
A sample of the data i'm using is listed above with "E1435, E46371" etc. each item in the csv is a new row in the same column.
As per a request, i have produced a minimal reproduction of my experience.
import csv
#list to store data from csv
geoCode = []
#Read in locations from a designated file
with open('Locations.csv','rt')as f:
data = csv.reader(f)
for row in data:
geoCode.append(row)
i=0
for item in geoCode:
#print the items in the list
print(geoCode[i])
i+=1
It appears that list[i] is itself a nested list, so you need another subscript to get to the element inside it:
print(list[i][0])
NB: Avoid naming variables list as it overrides the built-in list type. Try using a plural word like codes or ids instead.

Printing list of DictReader twice in a row produces different results

I'm using the csv module to use csv.DictReader to read in a csv file. I am a newbie to Python and the following behavior has me stumped.
EDIT: See original question afterwards.
csv = csv.DictReader(csvFile)
print(list(csv)) # prints what I would expect, a sequence of OrderedDict's
print(list(csv)) # prints an empty list...
Is list somehow mutating csv?
Original question:
def removeFooColumn(csv):
for row in csv:
del csv['Foo']
csv = csv.DictReader(csvFile)
print(list(csv)) # prints what I would expect, a sequence of OrderedDict's
removeFooColum(csv)
print(list(csv)) # prints an empty list...
What is happening to the sequence in the removeFooColumn function?
csv.DictReader is an generator iterator, it can only be consumed once. Here is a fix:
def removeFooColumn(csv):
for row in csv:
del row['Foo']
csv = list(csv.DictReader(csvFile))
print(csv) # prints what I would expect, a sequence of OrderedDict's
removeFooColumn(csv)
print(csv) # prints an empty list...

Result of my CSV file generated contains comma's, brackets

I am generating a csv which contains the results I expect, all the numbers are right.
However, presentation wise it contains parentheses around everything and comma's etc.
Is there a way I can remove these?
I tried adding comma as a delimiter but that didn't solve it.
Example output:
Sure: ('Egg',)
results = []
results1 = []
results2 = []
results3 = []
results4 = []
results5 = []
results6 = []
cur.execute(dbQuery)
results.extend(cur.fetchall())
cur.execute(dbQuery1)
results1.extend(cur.fetchall())
cur.execute(dbQuery2)
results2.extend(cur.fetchall())
cur.execute(dbQuery3)
results3.extend(cur.fetchall())
cur.execute(dbQuery4)
results4.extend(cur.fetchall())
cur.execute(dbQuery5)
results5.extend(cur.fetchall())
cur.execute(dbQuery6)
results6.extend(cur.fetchall())
with open("Stats.csv", "wb") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Query1', 'Query2', 'Query3', 'Query4', 'Query5', 'Query6', 'Query7'])
csv_writer.writerows(zip(*[results, results1, results2, results3, results4, results5, results6]))
The zip function is returning a list of tuples [(x, y), (t, s)...]
The writerows method expects a list of lists. So, I think you should format the zip return before call the writerows. Something like that should work:
result = zip(results, results1, results2, results3, results4, results5, results6)
csv_writer.writerows([list(row) for row in result])
EDIT:
I think I understood the problem you are having here (so ignore my previous answer above).
The fetchall function is returning a list of tuples like [(x,), (y,)]
So, then your resultsX variables will have this format. Then, you are applying a zip between these lists (see here what zip does).
If for example we have
results = [(x,), (y,)]
results1 = [(t,), (z,)]
When you run the zip(results, results1), it will return:
[((x,), (t,)), ((y,), (z,))]
So, that's the format of the list you are passing to the writerows, which means the first row will be: ((x,), (t,)) where the element one is: (x,) and the second one is (t,)
So, not sure what you are expecting to write in the CSV with the zip function. But the result you are getting is because your elements to write in the csv are tuples instead of values.
I don't know the query you are doing here, but if you are expecting just one field per each result, maybe then you need to strip out the tuple in each resultsX variable. You can take a look how to do it in this thread: https://stackoverflow.com/a/12867429/1130381
I hope it helps, but that's the best I can do with the info you provided.

Python: how to sort existing csv files using in python

I have about 200 csv files with the same number of columns: A,B,C,D,E. I want to sort all of them by column B and then column A. Can do this in Python?
I created a sort program for csv files that outputs a new sorted csv file with two keys. In order to sort, first sort by the secondary key and then by the primary key
To sort multiple files, loop over all the input files creating the basic statistics array.
Afterward sort the result.
I only had one input file so I did not have to do that. Here is what I did for one file. You would change where I have infile to be the result of the input loop.
ifile = open('file.csv', 'rb')
infile = csv.DictReader(ifile)
infields = infile.fieldnames
try:
# This assumes that the first row is data
sortedlist = sorted(infile, key = lambda d: float(d['statistic2'], reverse =dir) # dir is True or False
except ValueError:
# Go back and skip header
ifile.seek(0)
ifile.next()
sortedlist = sorted(infile, key = lambda d: float(d['statistic2'], reverse =dir) # dir is True or False
# Now do the primary key.
sortedlist.sort(key = lambda d: float(d['statistic1'], reverse =dir) # dir is True or False
ifile.close()
Now open the output file using csv.DictWriter, write the header and output the data from sortedlist.
csv is a standard text file (not an Excel file). Python can certainly process these files. There is a library called csv which is designed for just this type of work: http://docs.python.org/2/library/csv.html
Assuming the file sizes are manageable, you should be able to simply load them all into memory and then sort.
What have you tried so far?

Generating a .CSV with Several Columns - Use a Dictionary?

I am writing a script that looks through my inventory, compares it with a master list of all possible inventory items, and tells me what items I am missing. My goal is a .csv file where the first column contains a unique key integer and then the remaining several columns would have data related to that key. For example, a three row snippet of my end-goal .csv file might look like this:
100001,apple,fruit,medium,12,red
100002,carrot,vegetable,medium,10,orange
100005,radish,vegetable,small,10,red
The data for this is being drawn from a couple sources. 1st, a query to an API server gives me a list of keys for items that are in inventory. 2nd, I read in a .csv file into a dict that matches keys with item name for all possible keys. A snippet of the first 5 rows of this .csv file might look like this:
100001,apple
100002,carrot
100003,pear
100004,banana
100005,radish
Note how any key in my list of inventory will be found in this two column .csv file that gives all keys and their corresponding item name and this list minus my inventory on hand yields what I'm looking for (which is the inventory I need to get).
So far I can get a .csv file that contains just the keys and item names for the items that I don't have in inventory. Give a list of inventory on hand like this:
100003,100004
A snippet of my resulting .csv file looks like this:
100001,apple
100002,carrot
100005,radish
This means that I have pear and banana in inventory (so they are not in this .csv file.)
To get this I have a function to get an item name when given an item id that looks like this:
def getNames(id_to_name, ids):
return [id_to_name[id] for id in ids]
Then a function which gives a list of keys as integers from my inventory server API call that returns a list and I've run this function like this:
invlist = ServerApiCallFunction(AppropriateInfo)
A third function takes this invlist as its input and returns a dict of keys (the item id) and names for the items I don't have. It also writes the information of this dict to a .csv file. I am using the set1 - set2 method to do this. It looks like this:
def InventoryNumbers(inventory):
with open(csvfile,'w') as c:
c.write('InvName' + ',InvID' + '\n')
missinginvnames = []
with open("KeyAndItemNameTwoColumns.csv","rb") as fp:
reader = csv.reader(fp, skipinitialspace=True)
fp.readline() # skip header
invidsandnames = {int(id): str.upper(name) for id, name in reader}
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
print missinginvnameswithids
with open(csvfile,'a') as c:
for invname, invid in missinginvnameswithids.iteritems():
c.write(invname + ',' + str(invid) + '\n')
return missinginvnameswithids
Which I then call like this:
InventoryNumbers(invlist)
With that explanation, now on to my question here. I want to expand the data in this output .csv file by adding in additional columns. The data for this would be drawn from another .csv file, a snippet of which would look like this:
100001,fruit,medium,12,red
100002,vegetable,medium,10,orange
100003,fruit,medium,14,green
100004,fruit,medium,12,yellow
100005,vegetable,small,10,red
Note how this does not contain the item name (so I have to pull that from a different .csv file that just has the two columns of key and item name) but it does use the same keys. I am looking for a way to bring in this extra information so that my final .csv file will not just tell me the keys (which are item ids) and item names for the items I don't have in stock but it will also have columns for type, size, number, and color.
One option I've looked at is the defaultdict piece from collections, but I'm not sure if this is the best way to go about what I want to do. If I did use this method I'm not sure exactly how I'd call it to achieve my desired result. If some other method would be easier I'm certainly willing to try that, too.
How can I take my dict of keys and corresponding item names for items that I don't have in inventory and add to it this extra information in such a way that I could output it all to a .csv file?
EDIT: As I typed this up it occurred to me that I might make things easier on myself by creating a new single .csv file that would have date in the form key,item name,type,size,number,color (basically just copying in the column for item name into the .csv that already has the other information for each key.) This way I would only need to draw from one .csv file rather than from two. Even if I did this, though, how would I go about making my desired .csv file based on only those keys for items not in inventory?
ANSWER: I posted another question here about how to implement the solution I accepted (becauseit was giving me a value error since my dict values were strings rather than sets to start with) and I ended up deciding that I wanted a list rather than a set (to preserve the order.) I also ended up adding the column with item names to my .csv file that had all the other data so that I only had to draw from one .csv file. That said, here is what this section of code now looks like:
MyDict = {}
infile = open('FileWithAllTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in missinginvids: #note that this is the list I was using as the keys for my dict which I was zipping together with a corresponding list of item names to make my dict before.
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict
it sounds like what you need is a dict mapping ints to sets, ie,
MyDict = {100001: set([apple]), 100002: set([carrot])}
you can add with update:
MyDict[100001].update([fruit])
which would give you: {100001: set([apple, fruit]), 100002: set([carrot])}
Also if you had a list of attributes of carrot... [vegetable,orange]
you could say MyDict[100002].update([vegetable, orange])
and get: {100001: set([apple, fruit]), 100002: set([carrot, vegetable, orange])}
does this answer your question?
EDIT:
to read into CSV...
infile = open('MyFile.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[spl_line[0]].update(spl_line[1:])
This isn't an answer to the question, but here is a possible way of simplifying your current code.
This:
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
Can be replaced with:
invonhandset = set(inventory)
missinginvnameswithids = {k: v for k, v in invidsandnames.iteritems() if k in in inventory}
Or:
invonhandset = set(inventory)
for key in invidsandnames.keys():
if key not in invonhandset:
del invidsandnames[key]
missinginvnameswithids = invidsandnames
Have you considered making a temporary RDB (python has sqlite support baked in) and for reasonable numbers of items I don't think you would have a performance issues.
I would turn each CSV file and the result from the web-api into a tables (one table per data source). You can then do everything you want to do with some SQL queries + joins. Once you have the data you want, you can then dump it back to CSV.

Categories