Creating a dictionary - python

my goal is to create a dictionary in Python. I have a .csv file which contains two columns, first one being 'word', other being 'meaning'. I am trying to read the csv file in the dictionary format and get the 'meaning' when 'word' is given.
Can you please help me by telling me how to get the value of 'word'? this is what I tried:
My codes are,
>>> with open('wordlist.csv', mode = 'r') as infile:
... reader = csv.reader(infile)
... with open('wordlist.csv', mode = 'w') as outfile:
... writer = csv.writer(outfile)
... mydict = {rows[0]:rows[1] for rows in reader}
... print(mydict)
...
The result turns out to be,
{}
the next one I tried was,
>>> reader = csv.reader(open('wordlist.csv', 'r'))
>>> d = {}
>>> for row in reader:
... k, v = row
... d[k] = v
...
But when I wanted to use this, the result was like this-
>>> d['Try']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Try'
The next code I tried was,
>>> reader = csv.DictReader(open('wordlist.csv'))
>>> result = {}
>>> for row in reader:
... key = row.pop('word')
... if key in result:
... pass
... result[key] = row
... print result
...
It didn't give me any answer at all.
>>> for row in reader:
... for column, value in row.iteritems():
... result.setdefault(column, []).append(value)
... print result
...
Neither did this give me a result.

I would use pandas. You could then use zip two create the dictionaries.
import pandas as pd
df = pd.read_csv('wordlist.csv')
words = list(df.word)
meaning = dict( zip( df.word, df.meaning ) )
if your file doesn't have a header row, that is ok. just print out the each column is still given some name which can then be referenced.
Alternative:
import pandas as pd
df = pd.read_csv('wordlist.csv')
dictionary = {}
for w, s, m, p in zip(df.words, df.meaning):
dictionary[w] = [m, p]

If "final_word.csv" looks like this:
word1, synonym1, meaning1, POS_tag1
word2, synonym2, meaning2, POS_tag2
This will read it in as a dictionary:
with open("final_word.csv",'r') as f:
rows = f.readlines()
dictionary = {}
for row in rows:
row = row.strip()
word, synonym, meaning, POS_tag = row.split(", ")
dictionary[word] = [synonym, meaning, POS_tag]
print(dictionary['word1'])
#out>> ['synonym1', 'meaning1', 'POS_tag1']
print(dictionary['word2'][0])
#out>> synonym2
The strip() is used to get rid of the newlines "\n" that's in the end of each csv-row

Related

Print out dictionary from file

E;Z;X;Y
I tried
dl= defaultdict(list)
for line in file:
line = line.strip().split(';')
for x in line:
dl[line[0]].append(line[1:4])
dl=dict(dl)
print (votep)
It print out too many results. I have an init that reads the file.
What ways can I edit to make it work?
The csv module could be really handy here, just use a semicolon as your delimiter and a simple dict comprehension will suffice:
with open('filename.txt') as file:
reader = csv.reader(file, delimiter=';')
votep = {k: vals for k, *vals in reader}
print(votep)
Without using csv you can just use str.split:
with open('filename.txt') as file:
votep = {k: vals for k, *vals in (s.split(';') for s in file)}
print(votep)
Further simplified without the comprehension this would look as follows:
votep = {}
for line in file:
key, *vals = line.split(';')
votep[key] = vals
And FYI, key, *vals = line.strip(';') is just multiple variable assignment coupled with iterable unpacking. The star just means put whatever’s left in the iterable into vals after assigning the first value to key.
if you read file in list object, there is a simple function to iterate over and convert it to dictionary you expect:
a = [
'A;X;Y;Z',
'B;Y;Z;X',
'C;Y;Z;X',
'D;Z;X;Y',
'E;Z;X;Y',
]
def vp(a):
dl = {}
for i in a:
split_keys = i.split(';')
dl[split_keys[0]] = split_keys[1:]
print(dl)

Making python dictionary from a text file with multiple keys

I have a text file named file.txt with some numbers like the following :
1 79 8.106E-08 2.052E-08 3.837E-08
1 80 -4.766E-09 9.003E-08 4.812E-07
1 90 4.914E-08 1.563E-07 5.193E-07
2 2 9.254E-07 5.166E-06 9.723E-06
2 3 1.366E-06 -5.184E-06 7.580E-06
2 4 2.966E-06 5.979E-07 9.702E-08
2 5 5.254E-07 0.166E-02 9.723E-06
3 23 1.366E-06 -5.184E-03 7.580E-06
3 24 3.244E-03 5.239E-04 9.002E-08
I want to build a python dictionary, where the first number in each row is the key, the second number is always ignored, and the last three numbers are put as values. But in a dictionary, a key can not be repeated, so when I write my code (attached at the end of the question), what I get is
'1' : [ '90' '4.914E-08' '1.563E-07' '5.193E-07' ]
'2' : [ '5' '5.254E-07' '0.166E-02' '9.723E-06' ]
'3' : [ '24' '3.244E-03' '5.239E-04' '9.002E-08' ]
All the other numbers are removed, and only the last row is kept as the values. What I need is to have all the numbers against a key, say 1, to be appended in the dictionary. For example, what I need is :
'1' : ['8.106E-08' '2.052E-08' '3.837E-08' '-4.766E-09' '9.003E-08' '4.812E-07' '4.914E-08' '1.563E-07' '5.193E-07']
Is it possible to do it elegantly in python? The code I have right now is the following :
diction = {}
with open("file.txt") as f:
for line in f:
pa = line.split()
diction[pa[0]] = pa[1:]
with open('file.txt') as f:
diction = {pa[0]: pa[1:] for pa in map(str.split, f)}
You can use a defaultdict.
from collections import defaultdict
data = defaultdict(list)
with open("file.txt", "r") as f:
for line in f:
line = line.split()
data[line[0]].extend(line[2:])
Try this:
from collections import defaultdict
diction = defaultdict(list)
with open("file.txt") as f:
for line in f:
key, _, *values = line.strip().split()
diction[key].extend(values)
print(diction)
This is a solution for Python 3, because the statement a, *b = tuple1 is invalid in Python 2. Look at the solution of #cha0site if you are using Python 2.
Make the value of each key in diction be a list and extend that list with each iteration. With your code as it is written now when you say diction[pa[0]] = pa[1:] you're overwriting the value in diction[pa[0]] each time the key appears, which describes the behavior you're seeing.
with open("file.txt") as f:
for line in f:
pa = line.split()
try:
diction[pa[0]].extend(pa[1:])
except KeyError:
diction[pa[0]] = pa[1:]
In this code each value of diction will be a list. In each iteration if the key exists that list will be extended with new values from pa giving you a list of all the values for each key.
To do this in a very simple for loop:
with open('file.txt') as f:
return_dict = {}
for item_list in map(str.split, f):
if item_list[0] not in return_dict:
return_dict[item_list[0]] = []
return_dict[item_list[0]].extend(item_list[1:])
return return_dict
Or, if you wanted to use defaultdict in a one liner-ish:
from collections import defaultdict
with open('file.txt') as f:
return_dict = defaultdict(list)
[return_dict[item_list[0]].extend(item_list[1:]) for item_list in map(str.split, f)]
return return_dict

Sum values in column B for lines with matching column A values

Extract from large csv looks like this:
Description,Foo,Excl,GST,Incl
A,foo,$154.52,$15.44,$169.96
A,foo,$45.44,$4.54,$49.98
A,foo,$45.44,$4.54,$49.98
A,foo,$154.52,$15.44,$169.96
A,foo,$0.00,$0.00,$0.00
A,foo,$50.16,$5.02,$55.18
B,foo,$175.33,$15.65,$190.98
C,foo,$204.52,$15.44,$219.96
D,foo,$154.52,$15.44,$169.96
D,foo,$154.52,$15.44,$169.96
D,foo,$45.44,$4.54,$49.98
D,foo,$154.52,$15.44,$169.96
D,foo,$145.44,$14.54,$159.98
I need to strip the dollar sign and for all lines containing matching Description values (A or B or whatever it may be), sum the Excl column values separately, the GST column values separately and Incl column values separately for that Description value.
End result should be a dictionary object containing the Description column as key and the sum totals of the Excl, GST and Incl columns matching the Description, example:
{
"A": [450.08,44.98,495.06],
"B": [175.33,15.65,190.98],
"C": [204.52,15.44,219.96],
"D": [654.44,65.40,719.84]
}
I'm completely stumped on how to perform the sum operation. My code only goes as far as opening the csv and reading in values on each line. Any enlightenment is appreciated.
import csv
def getField(rowdata, index):
try:
val = rowdata[index]
except IndexError:
val = '-1'
return val
with open(csv, 'r') as f:
reader = csv.reader(f)
order_list = list(reader)
# Remove the header row in csv
order_list.pop(0)
for row in order_list:
Desc = getField(row, 0)
Excl = getField(row, 2)
GST = getField(row, 3)
Incl = getField(row, 4)
This might help
import csv
import decimal
path = "Path to CSV_File.csv"
def removeSym(s):
return float(s.replace("$", ""))
with open(path, 'r') as f:
reader = csv.reader(f)
order_list = list(reader)
d = {}
for i in order_list[1:]: #Skip reading the first line
if i[0] not in d:
d[i[0]] = map(removeSym, i[2:]) #Check if desc is a key the result dict. if not create
else:
d[i[0]] = [float(round(sum(k),2)) for k in zip(d[i[0]], map(removeSym, i[2:]))]
print d
Output:
{'A': [450.08, 44.98, 495.06], 'C': [204.52, 15.44, 219.96], 'B': [175.33, 15.65, 190.98], 'D': [654.44, 65.4, 719.84]}

Python dictionary created from CSV file should merge the value (integer) whenever the key repeats

I have a file named report_data.csv that contains the following:
user,score
a,10
b,15
c,10
a,10
a,5
b,10
I am creating a dictionary from this file using this code:
with open('report_data.csv') as f:
f.readline() # Skip over the column titles
mydict = dict(csv.reader(f, delimiter=','))
After running this code mydict is:
mydict = {'a':5,'b':10,'c':10}
But I want it to be:
mydict = {'a':25,'b':25,'c':10}
In other words, whenever a key that already exists in mydict is encountered while reading a line of the file, the new value in mydict associated with that key should be the sum of the old value and the integer that appears on that line of the file. How can I do this?
The most straightforward way is to use defaultdict or Counter from useful collections module.
from collections import Counter
summary = Counter()
with open('report_data.csv') as f:
f.readline()
for line in f:
lbl, n = line.split(",")
n = int(n)
summary[lbl] = summary[lbl] + n
One of the most useful features in Counter class is the most_common() function, that is absent from the plain dictionaries and from defaultdict
This should work for you:
with open('report_data.csv') as f:
f.readline()
mydict = {}
for line in csv.reader(f, delimiter=','):
mydict[line[0]] = mydict.get(line[0], 0) + int(line[1])
try this.
mydict = {}
with open('report_data.csv') as f:
f.readline()
x = csv.reader(f, delimiter=',')
for x1 in x:
if mydict.get(x1[0]):
mydict[x1[0]] += int(x1[1])
else:
mydict[x1[0]] = int(x1[1])
print mydict

Python: Data Getting

Im trying to find out how to get certain data from a file in the easiest way possible. I have searched all over the internet but can't find anything. I want to be able to do this:
File.txt:
data1 = 1
data2 = 2
but i want to get only data1 like so,
p = open('file.txt')
f = p.get(data1)
print(f)
Any Ideas, Thanks in advance.
You can do:
with open("file.txt", "r") as f:
for line in f:
key, val = line.split('=')
key = key.strip()
val = val.strip()
if key == 'data1': # if data1 is not the first line
# do something with value and data
using map:
from operator import methodcaller
with open("file.txt", "r") as f:
for line in f:
key, val = map(methodcaller("strip", " "), line.split('='))
if key == "data1":
# do something with value and data
with open("file.txt", "r") as f:
key, val = f.readline().split('=')
if key.strip() == 'data1': # if data1 is not the first line
# do something with value and data
If you know you only want data1 which is on the first line, you can do
with open('file.txt', 'r') as f:
key, val = tuple(x.strip() for x in f.readline().split('='))
The list comprehension is used to remove the whitespace from each string.

Categories