Adding dictionary keys and values after line split?

Adding dictionary keys and values after line split? - python

If I have for instance the file:
;;;
;;;
;;;
A 1 2 3
B 2 3 4
C 3 4 5
And I want to read it into a dictionary of {str: list of str} :
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']
I have the following code:
d = {}
with open('file_name') as f:
for line in f:
while ';;;' not in line:
(key, val) = line.split(' ')
#missingcodehere
return d
What should I put in after the line.split to assign the keys and values as a str and list of str?

To focus on your code and what you are doing wrong.
You are pretty much in an infinite loop with your while ';;;' not in line. So, you want to change your logic with how you are trying to insert data in to your dictionary. Simply use a conditional statement to check if ';;;' is in your line.
Then, when you get your key and value from your line.strip().split(' ') you simply just assign it to your dictionary as d[key] = val. However, you want a list, and val is currently a string at this point, so call split on val as well.
Furthermore, you do not need to have parentheses around key and val. It provides unneeded noise to your code.
The end result will give you:
d = {}
with open('new_file.txt') as f:
for line in f:
if ';;;' not in line:
key, val = line.strip().split(' ')
d[key] = val.split()
print(d)
Using your sample input, output is:
{'C': ['3', '4', '5'], 'A': ['1', '2', '3'], 'B': ['2', '3', '4']}
Finally, to provide an improvement to the implementation as it can be made more Pythonic. We can simplify this code and provide a small improvement to split more generically, rather than counting explicit spaces:
with open('new_file.txt') as fin:
valid = (line.split(None, 1) for line in fin if ';;;' not in line)
d = {k:v.split() for k, v in valid}
So, above, you will notice our split looks like this: split(None, 1). Where we are providing a maxsplit=1.
Per the docstring of split, it explains it pretty well:
Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
Finally, we simply use a dictionary comprehension to obtain our final result.

Why not simply:
def make_dict(f_name):
with open(f_name) as f:
d = {k: v.split()
for k, v in [line.strip().split(' ')
for line in f
if ';;;' not in line]}
return d
Then
>>> print(make_dict('file_name'))
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']}

Related

Split array based on value

I have an array:
foo = ['1', '2', '', '1', '2', '3', '', '1', '', '2']
¿Is there any efficient way to split this array into sub-arrays using '' as separator?
I want to get:
foo = [['1', '2'], ['1', '2', '3'], ['1'], ['2']]

In one line:
[list(g) for k, g in itertools.groupby(foo, lambda x: x == '') if not k]
Edit:
From the oficial documentation:
groupby
generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function).
The key I generate can be True, or False. It changes each time we find the empty string element. So when it's True, g will contain an iterable with all the element before finding an empty string. So I convert this iterable as a list, and of course I add the group only when the key change
Don't know how to explain it better, sorry :/ Hope it helped

Create a list containing a single list.
output = [[]]
Now, iterate over your input list. If the item is not '', append it to the last element of output. If it is, add an empty list to output.
for item in foo:
if item == '':
output.append([])
else:
output[-1].append(item)
At the end of this, you have your desired output
[['1', '2'], ['1', '2', '3'], ['1'], ['2']]

Defaultdict appending trick

I have a text file where elements are stored in two column like the following:
a 1,a 3,a 4,b 1,b 2,b 3,b 4,c 1,c 2.... etc
The file contains two columns, one is the key a,b,c etc, and the other is the elements 1,2,3,4 etc.
I stored these items using defaultdict and appended them.
The items in the default dict are:
defaultdict(<type 'list'>, `{'a': ['0', '1', '2', '3', '4'], 'c': ['1', '2'], 'b': ['1', '2', '3', '4']}`)
I used following command:
from collections import defaultdict
positions = defaultdict(list)
with open('test.txt') as f:
for line in f:
sob = line.split()
key=sob[0]
ele=sob[1]
positions[key].append(ele)
print positions

insted of defaultdict you can use OrderedDict
from collections import OrderedDict
positions = OrderedDict()
with open('test.txt') as f:
for line in f:
key, ele = line.strip().split()
positions[key] = positions.get(key, []) + [ele]
print positions

Unwanted '\n' character within dictionary items (or within a list of strings)

Any suggestions to get rid of an unwanted \n at the end of each last value of each dictionary key items?
Exemple:
d= {'1' : ['12', '15:23,24:26\n'], '2' : ['13', '15:6\n'],...}
Wanted result:
d= {'1' : ['12', '15:23,24:26'], '2' : ['13', '15:6'],...}
Or,any suggestions to get rid of them within a list of strings?
Exemple:
L= ['1','12','15:23,24:26\n', '2', '13', '15:16\n',...]
Wanted result:
L= ['1','12','15:23,24:26', '2', '13', '15:16',...]
Edit:
The import code:
with open('file.txt', 'r+') as file:
rows = (line.split('\t') for line in file)
d_file = {row[0]:row[1:] for row in rows}

Call str.strip() for each item in the list. You can use a combination of a dict and list comprehension for the dictionary:
In [9]: d = {'1' : ['12', '15:23,24:26\n'], '2' : ['13', '15:6\n']}
In [10]: {k: [x.strip() for x in v] for k, v in d.items()}
Out[10]: {'1': ['12', '15:23,24:26'], '2': ['13', '15:6']}
And just a plain list comprehension for the list one:
In [6]: L= ['1','12','15:23,24:26\n', '2', '13', '15:16\n']
In [7]: [x.strip() for x in L]
Out[7]: ['1', '12', '15:23,24:26', '2', '13', '15:16']
The str.strip() -function will strip leading and trailing characters from your strings. Without parameters it will remove whitespace, which includes newlines. The list comprehensions simply call the str.strip()-function for each element in the list.

Clean your data when you first import it, rather than trying to clean it after the fact.
Here's your data-reading code:
with open('file.txt', 'r+') as file:
rows = ( line.split('\t') for line in file )
d_file = { row[0]:row[1:] for row in rows }
Add a call to rstrip() to remove whitespace at the end of the line:
with open('file.txt', 'r+') as file:
rows = ( line.rstrip().split('\t') for line in file )
d_file = { row[0]:row[1:] for row in rows }

L = {key.strip(): item.strip() for key, item in d.items()}
The strip() function removes newlines, spaces, and tabs from the front and back of a string.
Python strip() function tutorial

How to get integer from list and construct hash table in Python?

New to python. I have a tuple variable containing some information and i convert it into list. When I print each data element out by using my for loop, I got.
for data in myTuple:
print list(data)
['1', " This is the system 1 (It has been tested)."]
['2', ' Tulip Database.']
['3', ' Primary database.']
['4', " Fourth database."]
['5', " Munic database."]
['6', ' Test database.']
['7', ' Final database.']
The problem is how I get the the number (which is in single quote/double quotes) and store it in a dictionary as below:
{’1’: 'This is the system 1 (It has been tested).', ’2’: 'Tulip Database.', ...}
Thank you.

As pointed out by JBernardo, you could use the builtin dict().
You can also use a dictionary comprehension!
myTuple = [['1', " This is the system 1 (It has been tested)."],
['2', ' Tulip Database.']]
print {key:value for key, value in myTuple}
Output
{'1': ' This is the system 1 (It has been tested).', '2': ' Tulip Database.'}

Use dict():
my_dict = dict(myTuple)
Demo:
>>> x = ([1, 'spam'], [2, 'foobar'])
>>> dict(x)
{1: 'spam', 2: 'foobar'}
dict() when passed an iterable does something like this(from help(dict)) :
dict(iterable) -> new dictionary initialized as if via:
d = {}
for k, v in iterable:
d[k] = v

Python: Very Basic, Can't figure out why it is not splitting into the larger number listed but rather into individual integers

Really quick question here, some other people helped me on another problem but I can't get any of their code to work because I don't understand something very fundamental here.
8000.5 16745 0.1257
8001.0 16745 0.1242
8001.5 16745 0.1565
8002.0 16745 0.1595
8002.5 16745 0.1093
8003.0 16745 0.1644
I have a data file as such, and when I type
f1 = open(sys.argv[1], 'rt')
for line in f1:
fields = line.split()
print list(fields [0])
I get the output
['1', '6', '8', '2', '5', '.', '5']
['1', '6', '8', '2', '6', '.', '0']
['1', '6', '8', '2', '6', '.', '5']
['1', '6', '8', '2', '7', '.', '0']
['1', '6', '8', '2', '7', '.', '5']
['1', '6', '8', '2', '8', '.', '0']
['1', '6', '8', '2', '8', '.', '5']
['1', '6', '8', '2', '9', '.', '0']
Whereas I would have expected from trialling stuff like print list(fields) to get something like
[16825.5, 162826.0 ....]
What obvious thing am I missing here?
thanks!

Remove the list; .split() already returns a list.
You are turning the first element of the fields into a list:
>>> fields = ['8000.5', '16745', '0.1257']
>>> fields[0]
'8000.5'
>>> list(fields[0])
['8', '0', '0', '0', '.', '5']
If you want to have the first column as a list, you can build a list as you go:
myfirstcolumn = []
for line in f1:
fields = line.split()
myfirstcolumn.append(fields[0])
This can be simplified into a list comprehension:
myfirstcolumn = [line.split()[0] for line in f1]

The last command is the problem.
print list(fields[0]) takes the zero'th item from your split list, then takes it and converts it into a list.
Since you have a list of strings already ['8000.5','16745','0.1257'], the zero'th item is a string, which converts into a list of individual elements when list() is applied to it.

Your first problem is that you apply list to a string:
list("123") == ["1", "2", "3"]
Secondly, you print once per line in the file, but it seems you want to collect the first item of each line and print them all at once.
Third, in Python 2, there's no 't' mode in the call to open (text mode is the default).
I think what you want is:
with open(sys.argv[1], 'r') as f:
print [ line.split()[0] for line in f ]

The problem was you were converting the first field which you correctly extracted into a list.
Here's a solution to print the first column:
with open(sys.argv[1]) as f1:
first_col = []
for line in f1:
fields = line.split()
first_col.append(fields[0])
print first_col
gives:
['8000.5', '8001.0', '8001.5', '8002.0', '8002.5', '8003.0']
Rather than doing f1 = open(sys.argv[1], 'rt') consider using with which will close the file when you are done or in case of an exception. Also, I left off rt since open() defaults to read and text mode.
Finally, this could also be written using list comprehension:
with open(sys.argv[1]) as f1:
first_col = [line.split()[0] for line in f1]

Others have already done a great job answering this question, the behavior that your seeing is because you're using list on a string. list will take any object that you can iterate over and turn it into a list -- one element at a time. This isn't really surprising except that the object doesn't even have to have an __iter__ method (which is the case with strings) -- There are a number of posts on SO about __iter__ so I won't focus on that part.
In any event, try the following code and see what it prints out:
>>> def enlighten_me(obj):
... print (list(obj))
... print (hasattr(obj))
...
>>> enlighten_me("Hello World")
>>> enlighten_me( (1,2,3,4) )
>>> enlighten_me( {'red':'wagon',1:5} )
Of course, you can try the example with sets, lists, generators ... Anything you can iterate over.
Levon posted a nice answer about how to create a column while reading your file. I will demonstrate the same thing using the built-in zip function.
rows=[]
for row in myfile:
rows.append(row.split())
#now rows is stored as [ [col1,col2,...] , [col1,col2,...], ... ]
At this point we could get the first column by (Levon's answer):
column1=[]
for row in rows:
column1.append(row[0])
or more succinctly:
column1=[row[0] for row in rows] #<-- This is called a list comprehension
But what if you want all the columns? (and what if you don't know how many columns there are?). This is a job for zip.
zip takes iterables as input and matches them up. In other words:
zip(iter1,iter2)
will take iter1[0] and match it with iter2[0], and match iter1[1] with iter2[1] and so on -- kind of like a zipper if you think about it. But, zip can take more than just 2 arguments ...
zip(iter1,iter2,iter3) #results in [ [iter1[0],iter2[0],iter3[0]] , [iter1[1],iter2[1],iter3[1]], ... ]
Now, the last piece of the puzzle that we need is argument unpacking with the star operator.
If I have a function:
def foo(a,b,c):
print a
print b
print c
I can call that function like this:
A=[1,2,3]
foo(A[0],A[1],A[2])
Or, I can call it like this:
foo(*A)
Hopefully this makes sense -- the star takes each element in the list and "unpacks" it before passing it to foo.
So, putting the pieces together (remember back to the list of rows), we can unpack the list of rows and pass it to zip which will match corresponding indices in each row (i.e. columns).
columns=zip(*rows)
Now to get the first column, we just do:
columns[0] #first column
for lists of lists, I like to think of zip(*list_of_lists) as a sort of poor-man's transpose.
Hopefully this has been helpful.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding dictionary keys and values after line split? - python

Why not simply: def make_dict(f_name): with open(f_name) as f: d = {k: v.split() for k, v in [line.strip().split(' ') for line in f if ';;;' not in line]} return d Then >>> print(make_dict('file_name')) {'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']}

Related

Split array based on value

Defaultdict appending trick

Unwanted '\n' character within dictionary items (or within a list of strings)

How to get integer from list and construct hash table in Python?

Python: Very Basic, Can't figure out why it is not splitting into the larger number listed but rather into individual integers

Categories

Resources