Related
I have a csv file passed into a function as a string:
csv_input = """
quiz_date,location,size
2022-01-01,london_uk,134
2022-01-02,edingburgh_uk,65
2022-01-01,madrid_es,124
2022-01-02,london_uk,125
2022-01-01,edinburgh_uk,89
2022-01-02,madric_es,143
2022-01-02,london_uk,352
2022-01-01,edinburgh_uk,125
2022-01-01,madrid_es,431
2022-01-02,london_uk,151"""
I want to print the sum of how many people were surveyed in each city by date, so something like:
Date. City. Pop-Surveyed
2022-01-01. London. 134
2022-01-01. Edinburgh. 214
2022-01-01. Madrid. 555
2022-01-02. London. 628
2022-01-02. Edinburgh. 65
2022-01-02. Madrid. 143
As I can't import pandas on my machine (can't install without internet access) I thought I could use a defaultdict to store the value of each city by date
from collections import defaultdict
survery_data = csv_input.split()[1:]
survery_data = [survey.split(',') for survey in survery_data]
survey_sum = defaultdict(dict)
for survey in survery_data:
date = survey[0]
city = survey[1].split("_")[0]
quantity = survey[-1]
survey_sum[date][city] += quantity
print(survey_sum)
But doing this returns a KeyError:
KeyError: 'london'
When I was hoping to have a defaultdict of
{'2022-01-01': {'london': 134}, {'edinburgh': 214}, {'madrid': 555}},
{'2022-01-02': {'london': 628}, {'edinburgh': 65}, {'madrid': 143}}
Is there a way to create a default dict that gives a structure so I could then iterate over to print out each column like above?
Try:
csv_input = """\
quiz_date,location,size
2022-01-01,london_uk,134
2022-01-02,edingburgh_uk,65
2022-01-01,madrid_es,124
2022-01-02,london_uk,125
2022-01-01,edinburgh_uk,89
2022-01-02,madric_es,143
2022-01-02,london_uk,352
2022-01-01,edinburgh_uk,125
2022-01-01,madrid_es,431
2022-01-02,london_uk,151"""
header, *rows = (
tuple(map(str.strip, line.split(",")))
for line in map(str.strip, csv_input.splitlines())
)
tmp = {}
for date, city, size in rows:
key = (date, city.split("_")[0])
tmp[key] = tmp.get(key, 0) + int(size)
out = {}
for (date, city), size in tmp.items():
out.setdefault(date, []).append({city: size})
print(out)
Prints:
{
"2022-01-01": [{"london": 134}, {"madrid": 555}, {"edinburgh": 214}],
"2022-01-02": [{"edingburgh": 65}, {"london": 628}, {"madric": 143}],
}
Changing
survey_sum = defaultdict(dict)
to
survey_sum = defaultdict(lambda: defaultdict(int))
allows the return of
defaultdict(<function survey_sum.<locals>.<lambda> at 0x100edd8b0>, {'2022-01-01': defaultdict(<class 'int'>, {'london': 134, 'madrid': 555, 'edinburgh': 214}), '2022-01-02': defaultdict(<class 'int'>, {'edingburgh': 65, 'london': 628, 'madrid': 143})})
Allowing iterating over to create a list.
I have lists that are formatted like so:
order_ids = ['Order ID', '026-2529662-9119536', '026-4092572-3574764', '026-4267878-0816332', '026-5334006-4073138', '026-5750353-4848328', '026-5945233-4883500', '026-5966822-8160331', '026-8799392-8255522', '202-5076008-9615516', '202-5211901-8584318', '202-5788153-3773918', '202-6208325-9677946', '203-1024454-3409960', '203-1064201-9833131', '203-4104559-7038752', '203-5013053-9959554', '203-5768187-0573905', '203-8639245-4145958', '203-9473169-4807564', '204-1577436-4733125', '204-7025768-1965915', '204-9196762-0226720', '205-6427246-2264368', '205-9028779-8764322', '206-0703454-9777135', '206-0954144-1685131', '206-3381432-7615531', '206-3822931-6939555', '206-4658913-5563533', '206-5213573-9997926', '206-5882801-0583557', '206-7158700-9326744', '206-7668862-3913143', '206-8019246-1474732', '206-8541775-0545153']
one = [['Order ID', 'Amount'], ['026-2529662-9119536', '10.42'], ['026-4092572-3574764', '10.42'], ['026-4267878-0816332', '1.75'], ['026-5334006-4073138', '17.990000000000002'], ['026-5750353-4848328', '16.25'], ['026-5945233-4883500', '1.83'], ['026-5966822-8160331', '11.92'], ['026-8799392-8255522', '8.5'], ['202-5076008-9615516', '1.83'], ['202-5211901-8584318', '1.83'], ['202-5788153-3773918', '8.08'], ['202-6208325-9677946', '11.33'], ['203-1024454-3409960', '8.08'], ['203-1064201-9833131', '1.5'], ['203-4104559-7038752', '8.5'], ['203-5013053-9959554', '9.67'], ['203-5113131-7525963', '-8.5'], ['203-5768187-0573905', '3.66'], ['203-8639245-4145958', '5.08'], ['203-9473169-4807564', '3.66'], ['204-1577436-4733125', '1.83'], ['204-7025768-1965915', '1.83'], ['204-9196762-0226720', '11.33'], ['205-8348990-1889964', '-11.33'], ['205-9028779-8764322', '6.91'], ['206-0703454-9777135', '23.84'], ['206-0954144-1685131', '22.66'], ['206-3381432-7615531', '8.08'], ['206-3822931-6939555', '11.92'], ['206-4658913-5563533', '9.67'], ['206-5213573-9997926', '3.66'], ['206-5882801-0583557', '13.92'], ['206-7158700-9326744', '27.5'], ['206-7668862-3913143', '6.58'], ['206-8541775-0545153', '1.83']]
What I want to do is cycle through every item inside order_ids, and if the order_id is present in one - get the "value"
So far what I have tried is:
with open('test.csv', mode='w', newline='') as outfile:
writer = csv.writer(outfile)
i = 0
while i < len(order_ids):
for order in order_ids:
try:
if order == one[i][0]:
value_a = one[i][1]
print(order, value_a)
writer.writerow([order, value_a])
i += 1
else:
i += 1
pass
except IndexError:
i += 1
This is working somewhat - but there are 36 items inside "order_ids" and 36 lists inside "one", however only 18 rows are being wrote to my outfile.
An example of one order_id that isn't being wrote is "206-7668862-3913143", even though this clearly has a value of "6.58" inside "one"
What is stopping the rest of my rows being written?
You can do this simply with a dictionary. The dict() constructor will accept a nested list of pairs and create a dictionary mapping order_id to amount. Then we can just loop over the order_ids list, and write out any order_id that appears to test.csv.
Code:
import csv
d = dict(one)
with open('test.csv', mode='w', newline='') as outfile:
writer = csv.writer(outfile)
for order_id in order_ids:
if order_id in d:
writer.writerow([order_id, d[order_id]])
test.csv:
Order ID,Amount
026-2529662-9119536,10.42
026-4092572-3574764,10.42
026-4267878-0816332,1.75
026-5334006-4073138,17.990000000000002
026-5750353-4848328,16.25
026-5945233-4883500,1.83
026-5966822-8160331,11.92
026-8799392-8255522,8.5
202-5076008-9615516,1.83
202-5211901-8584318,1.83
202-5788153-3773918,8.08
202-6208325-9677946,11.33
203-1024454-3409960,8.08
203-1064201-9833131,1.5
203-4104559-7038752,8.5
203-5013053-9959554,9.67
203-5768187-0573905,3.66
203-8639245-4145958,5.08
203-9473169-4807564,3.66
204-1577436-4733125,1.83
204-7025768-1965915,1.83
204-9196762-0226720,11.33
205-9028779-8764322,6.91
206-0703454-9777135,23.84
206-0954144-1685131,22.66
206-3381432-7615531,8.08
206-3822931-6939555,11.92
206-4658913-5563533,9.67
206-5213573-9997926,3.66
206-5882801-0583557,13.92
206-7158700-9326744,27.5
206-7668862-3913143,6.58
206-8541775-0545153,1.83
00,0,6098
00,1,6098
00,2,6098
00,3,6098
00,4,6094
00,5,6094
01,0,8749
01,1,8749
01,2,8749
01,3,88609
01,4,88609
01,5,88609
01,6,88611
01,7,88611
01,8,88611
02,0,9006
02,1,9006
02,2,4355
02,3,9013
02,4,9013
02,5,9013
02,6,4341
02,7,4341
02,8,4341
02,9,4341
03,0,6285
03,1,6285
03,2,6285
03,3,6285
03,4,6278
03,5,6278
03,6,6278
03,7,6278
03,8,8960
I have a csv file and a bit of it is shown above.
What I want to do is if the column 0 has the same value, it makes a an array of column 2, prints the array. ie- for 00, it makes an array-
a = [6098,6098,6098,6098,6094,6094]
for 01, it makes an array-
a = [8749,8749,88609,88609,88609,88611,88611,88611]
I don't know how to loop over this file.
This solution assumes that the first column will appear in sorted order in the file.
def main():
import csv
from itertools import groupby
with open("csv.csv") as file:
reader = csv.reader(file)
rows = [[row[0]] + [int(item) for item in row[1:]] for row in reader]
groups = {}
for key, group in groupby(rows, lambda row: row[0]):
groups[key] = [row[2] for row in group]
print(groups["00"])
print(groups["01"])
print(groups["02"])
print(groups["03"])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
[6098, 6098, 6098, 6098, 6094, 6094]
[8749, 8749, 8749, 88609, 88609, 88609, 88611, 88611, 88611]
[9006, 9006, 4355, 9013, 9013, 9013, 4341, 4341, 4341, 4341]
[6285, 6285, 6285, 6285, 6278, 6278, 6278, 6278, 8960]
The idea is to use a dictionary in which 00, 01 etc will be the keys and value will be a list. So you need to iterate through the csv data and push these data to corresponding keys.
import csv
result = {}
with open("you csv file", "r") as csvfile:
data = csv.reader(csvfile)
for row in data:
if result.has_key(row[0]):
result[row[0]].append(row[2])
else:
result[row[0]] = [row[2]]
print (result)
Here
from collections import defaultdict
txt = '''00,0,6098
00,1,6098
00,2,6098
00,3,6098
00,4,6094
00,5,6094
01,0,8749
01,1,8749
01,2,8749
01,3,88609
01,4,88609
01,5,88609
01,6,88611
01,7,88611
01,8,88611
02,0,9006
02,1,9006
02,2,4355
02,3,9013
02,4,9013
02,5,9013
02,6,4341
02,7,4341
02,8,4341
02,9,4341
03,0,6285
03,1,6285
03,2,6285
03,3,6285
03,4,6278
03,5,6278
03,6,6278
03,7,6278
03,8,8960'''
data_holder = defaultdict(list)
lines = txt.split('\n')
for line in lines:
fields = line.split(',')
data_holder[fields[0]].append(fields[2])
for k,v in data_holder.items():
print('{} -> {}'.format(k,v))
output
02 -> ['9006', '9006', '4355', '9013', '9013', '9013', '4341', '4341', '4341', '4341']
03 -> ['6285', '6285', '6285', '6285', '6278', '6278', '6278', '6278', '8960']
00 -> ['6098', '6098', '6098', '6098', '6094', '6094']
01 -> ['8749', '8749', '8749', '88609', '88609', '88609', '88611', '88611', '88611']
I have the following type of document, where each person might have a couple of names and an associated description of features:
New person
name: ana
name: anna
name: ann
feature: A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.
New person
name: tom
name: thomas
name: thimoty
name: tommy
feature: A 32-year old male that is known to be deaf.
New person
.....
What I would like is to read this file in a python dictionary, where each new person is id-ed.
i.e. Person with ID 1 will have the names ['ann','anna','ana']
and will have the feature ['A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.' ]
Any suggestions?
Assuming that your input file is lo.txt. It can be added to dictionary this way:
file = open('lo.txt')
final_data = []
feature = []
names = []
for line in file.readlines():
if ("feature") in line:
data = line.replace("\n","").split(":")
feature=data[1]
final_data.append({
'names': names,
'feature': feature
})
names = []
feature = []
if ("name") in line:
data = line.replace("\n","").split(":")
names.append(data[1])
print final_data
Something like this might work
result = {}
f = open("document.txt")
contents = f.read()
info = contents.split('==== new person ===')
for i in range(len(info)):
info[i].split('\n')
names = []
features = []
for j in range(len(info[i])):
info[i][j].split(':')
if info[i][j][0] == 'name':
names.append(info[i][j][1])
else:
features.append(info[i][j][1])
result[i] = {'names': names,'features': features}
print(result)
This should give you something like:
{0: {'names': ['ana', 'anna', 'ann'], features:['...', '...']}}
e.t.c
Here is code that may work for you:
f = open("documents.txt").readlines()
f = [i.strip('\n') for i in f]
final_condition = f[len(f)-1]
f.remove(final_condition)
names = [i.split(":")[1] for i in f]
the_dict = {}
the_dict["names"] = names
the_dict["features"] = final_condition
print the_dict
All it does is split the names at ":" and take the last element of the resulting list (the names) and keep it for the list names.
This is what I have so far:
EX1 = open('ex1.txt')
EX1READ = EX1.read()
X1READ.splitlines(0)
['jk43:23 Marfield Lane:Plainview:NY:10023',
'axe99:315 W. 115th Street, Apt. 11B:New York:NY:10027',
'jab44:23 Rivington Street, Apt. 3R:New York:NY:10002',
'ap172:19 Boxer Rd.:New York:NY:10005',
'jb23:115 Karas Dr.:Jersey City:NJ:07127',
'jb29:119 Xylon Dr.:Jersey City:NJ:07127',
'ak9:234 Main Street:Philadelphia:PA:08990']
I'd like to be able to just grab the userId from this list and print it alphabetized. Any hints would be great.
userIds = []
EX1 = open('ex1.txt')
X1READ = EX1.readlines()
for line in X1READ:
useridname = line.split(" ")[0].split(":")[0];
userid = line.split(" ")[0].split(":")[1]
userIds.append([useridname, userid])
I'm sure there are more Pythonic ways to do this, but my method will return an list of lists, where each child list in the parent list is formatted like this:
["jk43", "23"]
So to get the first user id and id number, you'd do this:
firstUserId = userIds[0][0] + ": " + userIds[0][1]
Which would output
"jk43: 23"
To sort the list of IDs, you'd do something like this:
userIds = sorted(userIds, key = id: id[0])
Assuming the part before the first ":" is the userID you could do it in a more pythonic way like that:
with open("ex1.txt") as f:
lines = f.readlines()
userIDs = [l.split(":",1)[0] for l in lines]
print "\n".join(sorted(userIDs))
This does it:
IDs=[]
with open('ex1.txt', 'rb') as f:
for line in f:
IDs.append(line.split(':')[0])
print sorted(IDs)
Prints:
['ak9', 'ap172', 'axe99', 'jab44', 'jb23', 'jb29', 'jk43']
If your user id's like jk43:23 use IDs.append(line.split(' ')[0]) and that prints:
['ak9:234', 'ap172:19', 'axe99:315', 'jab44:23', 'jb23:115', 'jb29:119', 'jk43:23']
If your user ids are the number only, use IDs.append(int(line.split(' ')[0].split(':')[1])) which prints:
[19, 23, 23, 115, 119, 234, 315]