This question already has answers here:
A better way to create a dictionary out of two lists with duplicated values in one
(2 answers)
Closed 1 year ago.
I have a question making a dictionary with text file.
I have a text file like this:
0,1
0,2
0,3
1,2
1,3
1,4
2,3
2,4
2,5
What I am trying to do,
I would like to make {0: [1,2,3], 1:[2,3,4], 2:[3,4,5]} like this
lists = {}
test = open('Test.txt', mode='r', encoding = 'utf-8').read().split('\n')
for i in test:
if len(lists) == 0:
lists[i.split(',')[0]] = [i.split(',')[1]]
In here, whenever I call for fuction, the value number is changed..
I am trying to figure out how I gotta do,
But it seems little bit tricky to me
Can anyone give me some advice or direction for it?
I really appreciate it
Thank you!
result = {}
with open('test.txt') as f:
for line in f:
key, value = line.strip().split(',')
if key not in result:
result[key] = [value]
else:
result[key].append(value)
print(result)
Output:
{'0': ['1', '2', '3'], '1': ['2', '3', '4'], '2': ['3', '4', '5']}
You can also try the defaultdict collection which is more convinent.
Here is an approach with a defaultdict. map with the reference to the int function is used to convert the strings to integers.
from collections import defaultdict
result = defaultdict(list)
with open('Test.txt', mode='r', encoding='utf-8') as infile:
for line in infile:
key, value = map(int, line.split(','))
result[key].append(value)
print(result)
The result is
defaultdict(<class 'list'>, {0: [1, 2, 3], 1: [2, 3, 4], 2: [3, 4, 5]})
Besides the bonus of the default value it will behave like a normal dictionary.
Related
I am a newbie into python and I am trying to optimize a snippet of my program from using two for loops to using the list comprehension in Python3. While doing this, I am unable to calculate the sum of more than one column. For Ex, if I have columns 1,2, and 3 of types float int, and string in a dictionary, I am able to calculate the sum only for column one and not for column 2.
The double for loop looks something like this
final_dict = []
for k, g in itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id'])):
txn_amount = 0
distinct_user_count = 0
for v in g:
# print(k)
txn_amount += float(v['transaction_amount'])
distinct_user_count += v['user_id_count']
# print(v)
final_dict.append({'transaction_category_id': k, 'aggregated_transaction_amount': txn_amount,
'distinct_user_count': distinct_user_count})
The code I want to optimise to should ideally look something like this :
final_result = [[k, sum(float(v['transaction_amount']) for v in g),sum(s['user_id_count'] for s in g)] for k, g in
itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id']))]
But the code does not add up values for the user_id_count column and return sum as 0.
The sample data looks something like this :
user_id,transaction_amount,transaction_category_id
b2d30a62-36bd-41c6-8221-987d5c4cd707,63.05,3
b2d30a62-36bd-41c6-8221-987d5c4cd707,13.97,4
b2d30a62-36bd-41c6-8221-987d5c4cd707,97.15,4
b2d30a62-36bd-41c6-8221-987d5c4cd707,23.54,5
and the ideal output would look like :
['4', 111.12, 2],
['3', 63.05, 1],
['5', 23.54, 1],
but it prints out
and the ideal output would look like :
['4', 111.12, 0],
['3', 63.05, 0],
['5', 23.54, 0],
I tried the below sample code out but the output is not what I expected :
final_result = [[k, sum(float(v['transaction_amount']) for v in g),sum(s['user_id_count'] for s in g)] for k, g in
itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id']))]
You seem to have a very simple comma-delimited CSV file. To get output similar to that shown in the question you could do this:
from collections import defaultdict
FILENAME = '/Volumes/G-Drive/sample.csv'
def gendict():
return {'amount': 0.0, 'count': 0}
summary = defaultdict(gendict)
with open(FILENAME) as csv:
next(csv, None) # skip column headers
for line in map(str.rstrip, csv):
*_, _amount, _id = line.split(',')
summary[_id]['amount'] += float(_amount)
summary[_id]['count'] += 1
slist = [[k, *v.values()] for k, v in summary.items()]
print(slist)
Output:
[['3', 63.05, 1], ['4', 111.12, 2], ['5', 23.54, 1]]
Alternatively, you can use csv.DictReader with collections.defaultdict to simulate the groupby.agg of pandas:
from csv import DictReader
from collections import defaultdict
out = defaultdict(lambda: [0,0])
with open("tmp/Ashish Shetty.csv", "r") as csvfile:
reader = DictReader(csvfile)
for row in reader:
tra_id = str(row["transaction_category_id"])
out[tra_id][0] += float(row["transaction_amount"])
out[tra_id][1] += 1
out = [[k]+v for k,v in out.items()]
Output :
print(out)
#[['3', 63.05, 1], ['4', 111.12, 2], ['5', 23.54, 1]]
I'm currently trying to wrap my head around list comprehensions and try to get some practice by taking examples and form loops out of comprehensions and vice versa. Probably a really easy mistake, or a forest for the trees situation. Take the following expression taken from an example project:
rows = []
data = ['a', 'b']
res = ['1', '2']
rows.append({data[counter]: res[counter] for counter, _ in enumerate(data)})
print(rows):
[{'a': '1', 'b': '2'}]
How do i do this as a for loop? The following wraps each loop into a curly bracket instead of both.
for counter, _ in enumerate(data):
rows.append({data[counter]: res[counter]})
print(rows):
[{'a': '1'}, {'b': '2'}]
Am i missing something? Or do i have to merge the items by hand when using a for loop?
The problem in your code is that you create a dictionary for each item in data and append it to rows in each iteration.
In order to achieve the desired behaviour, you should update the same dict in each iteration and after you finish working on your dictionary, only then you should append it to rows.
Try this:
rows = []
data = ['a', 'b']
res = ['1', '2']
payload = {}
for counter, val in enumerate(data):
payload[val] = res[counter]
rows.append(payload)
Another compact way to write it might be:
rows.append(dict(zip(data,res)))
On every iteration of for loop you are creating a new dictionary and appending it into a list if you want to store a whole dictionary in a list then You should try something like that it outputs as you expected:
rows = []
data = ['a', 'b']
res = ['1', '2']
myDict = {}
for counter, _ in enumerate(data):
myDict[data[counter]]= res[counter]
rows.append(myDict)
print(rows)
Output:
[{'b': '2', 'a': '1'}]
This is the txt file content I have:
salesUnits:500
priceUnit:11
fixedCosts:2500
variableCostUnit:2
I need to create a dictionary in Python that will read the file and make the keys the salesUnits etc. and the values the numbers. The code I have so far will only print the variable cost per unit:
with open("myInputFile.txt") as f:
content = f.readlines()
myDict = {}
for line in content:
myDict=line.rstrip('\n').split(":")
print(myDict)
How can I fix the code so that all key and value pairs show up? Thank you!
You're overwriting myDict each time you call myDict=line.rstrip('\n').split(":"). The pattern to add to a dictionary is dictionary[key] = value.
myDict = {}
with open("myInputFile.txt") as f:
for line in f:
key_value = line.rstrip('\n').split(":")
if len(key_value) == 2:
myDict[key_value[0]]=key_value[1]
print(myDict)
outputs
{'fixedCosts': '2500', 'priceUnit': '11', 'variableCostUnit': '2', 'salesUnits': '500'}
Using a simple dict comprehension will handle this:
with open('testinput.txt', 'r') as infile:
dict = {
line.strip().split(':')[0]:
int(line.strip().split(':')[1])
if line.strip().split(':')[1].isdigit()
else
line.strip().split(':')[1]
for line in infile.readlines()}
print(dict)
Output:
{'salesUnits': 500, 'priceUnit': 11, 'fixedCosts': 2500, 'variableCostUnit': 2}
If you wish to bring the numbers in as simple strings, just use:
dict = {
line.strip().split(':')[0]:
line.strip().split(':')[1]
for line in infile.readlines()}
Note also that you can add handling for other data types or data formatting using additional variations of:
int(line.strip().split(':')[1])
if line.strip().split(':')[1].isdigit()
else
myDict = {}
with open('dict.txt', 'r') as file:
for line in file:
key, value = line.strip().split(':')
myDict[key] = value
print myDict
Output:
{'fixedCosts': '2500', 'priceUnit': '11', 'variableCostUnit': '2', 'salesUnits': '500'}
If I have for instance the file:
;;;
;;;
;;;
A 1 2 3
B 2 3 4
C 3 4 5
And I want to read it into a dictionary of {str: list of str} :
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']
I have the following code:
d = {}
with open('file_name') as f:
for line in f:
while ';;;' not in line:
(key, val) = line.split(' ')
#missingcodehere
return d
What should I put in after the line.split to assign the keys and values as a str and list of str?
To focus on your code and what you are doing wrong.
You are pretty much in an infinite loop with your while ';;;' not in line. So, you want to change your logic with how you are trying to insert data in to your dictionary. Simply use a conditional statement to check if ';;;' is in your line.
Then, when you get your key and value from your line.strip().split(' ') you simply just assign it to your dictionary as d[key] = val. However, you want a list, and val is currently a string at this point, so call split on val as well.
Furthermore, you do not need to have parentheses around key and val. It provides unneeded noise to your code.
The end result will give you:
d = {}
with open('new_file.txt') as f:
for line in f:
if ';;;' not in line:
key, val = line.strip().split(' ')
d[key] = val.split()
print(d)
Using your sample input, output is:
{'C': ['3', '4', '5'], 'A': ['1', '2', '3'], 'B': ['2', '3', '4']}
Finally, to provide an improvement to the implementation as it can be made more Pythonic. We can simplify this code and provide a small improvement to split more generically, rather than counting explicit spaces:
with open('new_file.txt') as fin:
valid = (line.split(None, 1) for line in fin if ';;;' not in line)
d = {k:v.split() for k, v in valid}
So, above, you will notice our split looks like this: split(None, 1). Where we are providing a maxsplit=1.
Per the docstring of split, it explains it pretty well:
Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
Finally, we simply use a dictionary comprehension to obtain our final result.
Why not simply:
def make_dict(f_name):
with open(f_name) as f:
d = {k: v.split()
for k, v in [line.strip().split(' ')
for line in f
if ';;;' not in line]}
return d
Then
>>> print(make_dict('file_name'))
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']}
I have a list with nested tuples, like the one below:
data = [('apple', 19.0, ['gala', '14', 'fuji', '5', 'dawn', '3', 'taylor', '3']),
('pear', 35.0, ['anjou', '29', 'william', '6', 'concorde', '4'])]
I want to flatten it out so that I can write a .csv file in which each item on every list corresponds to a column:
apple 19.0, gala 14 fuji 5 dawn 3 taylor 3
pear 35.0 anjou 29 william 6 concorde 4
I tried using simple flattening:
flattened = [value for pair in data for value in pair]
But the outcome has not been the desired one. Any ideas on how to solve this?
To write out the data to CSV, simply use the csv module and give it one row; constructing the row is not that hard:
import csv
with open(outputfile, 'w', newlines='') as ofh:
writer = csv.writer(ofh)
for row in data:
row = list(row[:2]) + row[2]
writer.writerow(row)
This produces:
apple,19.0,gala,14,fuji,5,dawn,3,taylor,3
pear,35.0,anjou,29,william,6,concorde,4
Disclaimer - Not very efficient Python code.
But, it does the job. (You can adjust the width (currently 10))
data = [('apple', 19.0, ['gala', '14', 'fuji', '5', 'dawn', '3', 'taylor', '3']),
('pear', 35.0, ['anjou', '29', 'william', '6', 'concorde', '4'])]
flattened = list()
for i, each in enumerate(data):
flattened.append(list())
for item in each:
if isinstance(item, list):
flattened[i].extend(item)
else:
flattened[i].append(item)
# Now print the flattened list in the required prettified manner.
for each in flattened:
print ("".join(["{:<10}".format(item) for item in each]))
# String is formatted such that all items are width 10 & left-aligned
Note - I tried to write the function for a more general case.
PS - Any code suggestions are welcome. I really want to improve this one.
This seems like it calls for recursion
def flatten(inlist):
outlist=[]
if isinstance(inlist, (list, tuple)):
for item in inlist:
outlist+=flatten(item)
else:
outlist+=[inlist]
return outlist
This should work no matter how nested your list becomes. Tested it with this:
>>> flatten([0,1,2,[3,4,[5,6],[7,8]]])
[0, 1, 2, 3, 4, 5, 6, 7, 8]