Defaultdict appending trick - python

I have a text file where elements are stored in two column like the following:
a 1,a 3,a 4,b 1,b 2,b 3,b 4,c 1,c 2.... etc
The file contains two columns, one is the key a,b,c etc, and the other is the elements 1,2,3,4 etc.
I stored these items using defaultdict and appended them.
The items in the default dict are:
defaultdict(<type 'list'>, `{'a': ['0', '1', '2', '3', '4'], 'c': ['1', '2'], 'b': ['1', '2', '3', '4']}`)
I used following command:
from collections import defaultdict
positions = defaultdict(list)
with open('test.txt') as f:
for line in f:
sob = line.split()
key=sob[0]
ele=sob[1]
positions[key].append(ele)
print positions

insted of defaultdict you can use OrderedDict
from collections import OrderedDict
positions = OrderedDict()
with open('test.txt') as f:
for line in f:
key, ele = line.strip().split()
positions[key] = positions.get(key, []) + [ele]
print positions

Related

Convert string to dictionary with list of values

What is the best way to convert a string to dictionary with value of dictionary as a list
for example
str = "abc=1,abc=2,abc=3,xyz=5,xyz=6"
i need the output as:
d = {"abc":["1","2","3"],"xyz":["5","6"]}
I'm very new to python.
my code:
d = {k: [v] for k, v in map(lambda item: item.split('='), s.split(","))}
Here is the solution with dict.setdefault method.
>>> help({}.setdefault)
Help on built-in function setdefault:
setdefault(key, default=None, /) method of builtins.dict instance
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
>>> your_str = "abc=1,abc=2,abc=3,xyz=5,xyz=6"
>>>
>>> result = {}
>>>
>>> for pair in your_str.split(","):
... name, val = pair.split("=")
... result.setdefault(name, []).append(val)
>>> result
{'abc': ['1', '2', '3'], 'xyz': ['5', '6']}
You could also use defaultdict with default factory as list
>>> from collections import defaultdict
>>>
>>> your_str = "abc=1,abc=2,abc=3,xyz=5,xyz=6"
>>>
>>> result = defaultdict(list)
>>> for pair in str.split(","):
... name, val = pair.split("=")
... result[name].append(val)
...
>>> dict(result)
{'abc': ['1', '2', '3'], 'xyz': ['5', '6']}
The reason the code you have tried already isn't giving you the desired result is the fact that you are overwriting the value assigned to each key as you iterate over the list. What you need to do is append to the value already assigned to the key - except if the key doesn't exist, in which case you need to initialise that key.
This would be one way to go:
s1 = "abc=1,abc=2,abc=3,xyz=5,xyz=6"
list1 = [(each.split('=')) for each in s1.split(',')]
d = {}
for key, val in list1:
if key in d.keys():
d[key].append(val)
else:
d[key] = [val]
print (d)
#result: {'abc': ['1', '2', '3'], 'xyz': ['5', '6']}
You could simplify this and eliminate the if-else by using defaultdict, like so:
from collections import defaultdict
d = defaultdict(lambda: [])
s1 = "abc=1,abc=2,abc=3,xyz=5,xyz=6"
list1 = [(each.split('=')) for each in s1.split(',')]
for key, val in list1:
d[key].append(val)
print (d)
#result: {'abc': ['1', '2', '3'], 'xyz': ['5', '6']}
# initialize a dictionary
d = {}
# split the string (my_str) according to "," in order to get pairs such as 'abc/1' and 'xyz/5' in a list
for elt in my_str.split(",") :
# for each string of the list, split according to '/' to get the pairs ['abc', 1]
# complete the dictionary
if elt.split('/')[0] not in d.keys():
d[elt.split('/')[0]] = [elt.split('/')[1]]
else :
d[elt.split('/')[0]].append(elt.split('/')[1])

Parse every column of a .csv file into a single list python

I have a following type of csv
a,b,c
1,2,3
4,5,6
7,8,9
I would like to parse every column of this csv file into a list with out columns so the end result would be
myList = ["1","4","7","2","5","8","3","6","9"]
I have found many solutions for one column but i need to be flexible to be able to read every column of the file. I'm using an older version of python so i can't use any solutions with pandas library.
You could read the file fully and then zip the rows to transpose them, then chain the result to flatten the list. Standalone example (using a list of strings as input):
import csv,itertools
text="""a,b,c
1,2,3
4,5,6
7,8,9
""".splitlines()
myList = list(itertools.chain.from_iterable(zip(*csv.reader(text[1:]))))
print(myList)
result:
['1', '4', '7', '2', '5', '8', '3', '6', '9']
from a file it would read:
with open("test.csv") as f:
cr = csv.reader(f,separator=",") # comma is by default, but just in case...
next(cr) # skip title
myList = list(itertools.chain.from_iterable(zip(*cr)))
Simple approach:
d = """a,b,c
1,2,3
4,5,6
7,8,9
"""
cells = []
for line in d.split("\n"):
if line:
cells.append(line.strip().split(','))
print(cells)
for n in range(len(cells[0])):
for r in cells:
print(r[n])
Same iteration, but as generator:
def mix(t):
for n in range(len(t[0])):
for r in t:
yield r[n]
print( list( mix(cells) ) )
Using csv and chain to flatten the list
import csv
from itertools import chain
l = list(csv.reader(open('text.csv', 'r')))
mylist = map(list, zip(*l[1:])) # transpose list
list(chain.from_iterable(mylist)) # output ['1', '4', '7', '2', '5', '8', '3', '6', '9']

Get a running total from a list

I'm reading in items:
for line in sys.stdin:
line = line.strip()
data = line.split("-")
If I print data as it is read, it looks like:
['Adam', '5']
['Peter', '7']
['Adam', '8']
['Lucy', '2']
['Peter', '4']
How can I get a running total for each unique name, such my new list would look like:
['Adam', '13'],
['Peter', '11'],
['Lucy', '2']
Use a collections.Counter() to count the occurrences:
import collections
lines = [['Adam', '5'],
['Peter', '7'],
['Adam', '8'],
['Lucy', '2'],
['Peter', '4']]
counter = collections.Counter()
for data in lines:
counter[data[0]] += int(data[1])
print(counter)
You'll get:
Counter({'Adam': 13, 'Peter': 11, 'Lucy': 2})
Initialize a defaultdict with type int and use the name as the key
from collections import defaultdict
name_list = defaultdict(int)
for line in sys.stdin:
line = line.strip()
data = line.split("-")
name = data[0]
value = int(data[1])
name_list[name] += value
for key, value in name_list.items(): print key, value
I recommend creating a dictonary and updating that as you go. I have assumed your data format for data is a list of lists.
finalList = {}
for name, value in data:
if name in finalList.keys():
finalList[name] = finalList[name] + int(value)
else:
finalList[name] = int(value)
print(finalList)
Pandas does a very good job in handling this kind of situations
import pandas as pd
df_data=pd.read_csv(filepath_or_buffer=path,sep='_',names =['Name','value'])
df=df_data.groupby(['Name'])['value'].sum()
print df
output
'Adam' 13
'Lucy' 2
'Peter' 11
Input file
Adam_5
Peter_7
Adam_8
Lucy_2
Peter_4

Adding dictionary keys and values after line split?

If I have for instance the file:
;;;
;;;
;;;
A 1 2 3
B 2 3 4
C 3 4 5
And I want to read it into a dictionary of {str: list of str} :
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']
I have the following code:
d = {}
with open('file_name') as f:
for line in f:
while ';;;' not in line:
(key, val) = line.split(' ')
#missingcodehere
return d
What should I put in after the line.split to assign the keys and values as a str and list of str?
To focus on your code and what you are doing wrong.
You are pretty much in an infinite loop with your while ';;;' not in line. So, you want to change your logic with how you are trying to insert data in to your dictionary. Simply use a conditional statement to check if ';;;' is in your line.
Then, when you get your key and value from your line.strip().split(' ') you simply just assign it to your dictionary as d[key] = val. However, you want a list, and val is currently a string at this point, so call split on val as well.
Furthermore, you do not need to have parentheses around key and val. It provides unneeded noise to your code.
The end result will give you:
d = {}
with open('new_file.txt') as f:
for line in f:
if ';;;' not in line:
key, val = line.strip().split(' ')
d[key] = val.split()
print(d)
Using your sample input, output is:
{'C': ['3', '4', '5'], 'A': ['1', '2', '3'], 'B': ['2', '3', '4']}
Finally, to provide an improvement to the implementation as it can be made more Pythonic. We can simplify this code and provide a small improvement to split more generically, rather than counting explicit spaces:
with open('new_file.txt') as fin:
valid = (line.split(None, 1) for line in fin if ';;;' not in line)
d = {k:v.split() for k, v in valid}
So, above, you will notice our split looks like this: split(None, 1). Where we are providing a maxsplit=1.
Per the docstring of split, it explains it pretty well:
Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
Finally, we simply use a dictionary comprehension to obtain our final result.
Why not simply:
def make_dict(f_name):
with open(f_name) as f:
d = {k: v.split()
for k, v in [line.strip().split(' ')
for line in f
if ';;;' not in line]}
return d
Then
>>> print(make_dict('file_name'))
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']}

Unwanted '\n' character within dictionary items (or within a list of strings)

Any suggestions to get rid of an unwanted \n at the end of each last value of each dictionary key items?
Exemple:
d= {'1' : ['12', '15:23,24:26\n'], '2' : ['13', '15:6\n'],...}
Wanted result:
d= {'1' : ['12', '15:23,24:26'], '2' : ['13', '15:6'],...}
Or,any suggestions to get rid of them within a list of strings?
Exemple:
L= ['1','12','15:23,24:26\n', '2', '13', '15:16\n',...]
Wanted result:
L= ['1','12','15:23,24:26', '2', '13', '15:16',...]
Edit:
The import code:
with open('file.txt', 'r+') as file:
rows = (line.split('\t') for line in file)
d_file = {row[0]:row[1:] for row in rows}
Call str.strip() for each item in the list. You can use a combination of a dict and list comprehension for the dictionary:
In [9]: d = {'1' : ['12', '15:23,24:26\n'], '2' : ['13', '15:6\n']}
In [10]: {k: [x.strip() for x in v] for k, v in d.items()}
Out[10]: {'1': ['12', '15:23,24:26'], '2': ['13', '15:6']}
And just a plain list comprehension for the list one:
In [6]: L= ['1','12','15:23,24:26\n', '2', '13', '15:16\n']
In [7]: [x.strip() for x in L]
Out[7]: ['1', '12', '15:23,24:26', '2', '13', '15:16']
The str.strip() -function will strip leading and trailing characters from your strings. Without parameters it will remove whitespace, which includes newlines. The list comprehensions simply call the str.strip()-function for each element in the list.
Clean your data when you first import it, rather than trying to clean it after the fact.
Here's your data-reading code:
with open('file.txt', 'r+') as file:
rows = ( line.split('\t') for line in file )
d_file = { row[0]:row[1:] for row in rows }
Add a call to rstrip() to remove whitespace at the end of the line:
with open('file.txt', 'r+') as file:
rows = ( line.rstrip().split('\t') for line in file )
d_file = { row[0]:row[1:] for row in rows }
L = {key.strip(): item.strip() for key, item in d.items()}
The strip() function removes newlines, spaces, and tabs from the front and back of a string.
Python strip() function tutorial

Categories