I'm still quite new to Python and I was wondering how would I convert something that is already in key:value form in a text file into a Python dictionary?
Eg.
2:red
3:orange
5:yellow
6:green
(each key:value on a separate line)
I've looked at other posts but none of them seem to work and I know I'm doing something wrong. So far, I have:
def create_colours_dictionary(filename):
colours_dict = {}
file = open(filename,'r')
contents = file.read()
for key in contents:
#???
return colours_dict
The straight-forward way to do this is to use a traditional for loop, and the str.split method.
Rather than reading from a file, I'll embed the input data into the script as a multi-line string, and use str.splitlines to convert it to a list of strings, so we can loop over it, just like looping over the lines of a file.
# Use a list of strings to simulate the file
contents = '''\
2:red
3:orange
5:yellow
6:green
'''.splitlines()
colours_dict = {}
for s in contents:
k, v = s.split(':')
colours_dict[k] = v
print(colours_dict)
output
{'2': 'red', '3': 'orange', '5': 'yellow', '6': 'green'}
Be aware that this code will only work correctly if there are no spaces surrounding the colon. If there could be spaces (or spaces at the start or end of the line), they you can use the str.strip method to remove them.
There are a couple of ways to make this more compact.
We could use a list comprehension nested inside a dictionary comprehension:
colours_dict = {k: v for k, v in [s.split(':') for s in contents]}
But it's even more compact to use the dict constructor on a generator expression:
colours_dict = dict(s.split(':') for s in contents)
If you aren't familiar with comprehensions, please see
List Comprehensions and Dictionaries in the official tutorial.
Iterate over your file and build a dictionary.
def create_colours_dictionary(filename):
colours_dict = {}
with open(filename) as file:
for line in file:
k, v = line.rstrip().split(':')
colours_dict[k] = v
return colours_dict
dct = create_colours_dictionary('file.txt')
Or, if you're looking for something compact, you can use a dict comprehension with a lambda to split on colons.
colours_dict = {k : v for k, v in (
line.rstrip().split(':') for line in open(filename)
}
This approach will need some modification if the colon is surrounded by spaces—perhaps regex?
Assuming the textfile has the stated 'key:value' and the name of the file is contained in the variable fname you could write a function that will read the file and return a dict or just use a simple with statment.
A function is probably a better choice if this opertion is performed in several places in your code. If only done once a 2-liner will do fine.
# Example with fname being the path to the textfile
def dict_from(fname):
return dict(line.strip().split(':') for line in open(fname))
fname = '...'
# ...
d1 = dict_from(fname)
# Alternative solution
with open(fname) as fd:
d2 = dict(line.strip().split(':') for line in fd)
Both suggested solutions uses a built-in dictconstructor and a generator expression to parse each line. Use strip to remove white space at both start and end of the line. Use split create a (key, value) pair from each line.
Related
I am trying to convert a file, where every word is on a different newline, into a dictionary where the keys are the word sizes and values are the lists of words.
The first part of my code has removed the newline characters from the text file, and now I am trying to organize the dictionary based on the values a word has.
with open(dictionary_file, 'r') as file:
wordlist = file.readlines()
print([k.rstrip('\n') for k in wordlist])
dictionary = {}
for line in file:
(key, val) = line.split()
dictionary[int(key)] = val
print(dictionary)
However, I keep getting the error that there aren't enough values to unpack, even though I'm sure I have already removed the newline characters from the original text file. Another error I get is that it will only print out the words in a dictionary without the newlines, however, they aren't organized by value. Any help would be appreciated, thanks! :)
(key, val) = line.split()
^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)
I'm not sure why you're trying to use line.split(). All you need is the length of the word, so you can use the len() function. Also, you use collections.defaultdict to make this code shorter. Like this:
import collections
words = collections.defaultdict(list)
with open('test.txt') as file:
for line in file:
word = line.strip()
words[len(word)].append(word)
try this
with open(dictionary_file, 'r') as file:
dictionary = {}
for line in file:
val = line.strip().split()
dictionary[len(val)] = val
print(dictionary)
i got a large textfile (https://int-emb-word2vec-de-wiki.s3.eu-central-1.amazonaws.com/vectors.txt) and put the file into a dictionary:
word2vec = "./vectors.txt"
with open(word2vec, 'r') as f:
file = csv.reader(f, delimiter=' ')
model = {k: np.array(list(map(float, v))) for k, *v in file}
So i got this dictionary: {Word: Embedding vectors}.
Now I want to convert my key from: b'Word' to: Word (so that I got for example UNK instead of b'UNK').
Does anyone know how I can remove the b'...' for every instance?
Or is it easier if i first remove all the b'...' in the textfile before I put the file into a dictionary?
why not just str.decode() it?
the line would be
model = {k.decode(): np.array(list(map(float, v))) for k, *v in file}
Its not possible to change the Keys. You will need to add a new key with the modified value then remove the old one, or create a new dict with a dict comprehension or the like.
Now I want to convert my key from: b'Word' to: Word (so that I got for example UNK instead of b'UNK').
The keys you get are strings like "b'Word'" and "b'UNK'", not b'Word' and b'UNK'. Try executing print(b"Word", type(b"Word"), "b'Word'", type("b'Word'")), it might make things clearer.
This should work:
import ast
import csv
import numpy as np
with open("../out/out_file.txt") as file_in:
reader = csv.reader(file_in, delimiter=" ")
words = {ast.literal_eval(word).decode(): np.array(vect, dtype=np.float64) for word, *vect in reader}
This solution also appears to be much faster.
So I have a .txt file and I want to use methods within this class, Map, to append its contents into a aDictionary.
class Map:
def __init__(self, dataText):
self.dataText = dataText
self.aDictionary = {}
dataFile = open('data.txt', 'r')
c1 = Map(dataFile)
My data.txt file looks something like this:
hello, world
how, are
you, today
and I want aDictionary to print this output:
{how: are, you: today}
Im not very good at manipulating files as I continue to get type errors and what not. Is there an easy way of performing this task using methods within the class?
First you need to read the content of the file. Once you have the content of the file, you could create the dictionary like this (assuming content contains the content of data.txt):
content = """hello, world
how, are
you, today"""
d = {}
for line in content.splitlines():
if line:
key, value = map(str.strip, line.split(','))
d[key] = value
print(d)
Output
{'you': 'today', 'how': 'are', 'hello': 'world'}
The idea is to iterate of over the lines using a for loop, then check if the line is not empty (if line), in case the line is not empty, split on comma (line.split(',')) and remove the trailing whitespaces (str.strip) for each of the values in the list using map.
Or using a dictionary comprehension:
content = """hello, world
how, are
you, today"""
it = (map(str.strip, line.split(',')) for line in content.splitlines() if line)
d = {key: value for key, value in it}
print(d)
To read the content of the file you can do the following:
content = self.dataText.read()
Further
Reading entire file in Python
How to read a file line-by-line into a list?
I have a dictionary that searches for an ID name and reads tokens after it. But I want to know if there is a way to read and print out the whole line that contains that ID name as well.
Here is what I have so far:
lookup = defaultdict(list)
wholelookup =defaultdict(list)
mydata = open('summaryfile.txt')
for line in csv.reader(mydata, delimiter='\t'):
code = re.match('[a-z](\d+)[a-z]', line[-1], re.I)
if code:
lookup[line[-2]].append(code.group(1))
wholelookup[line[-2]].append(code.group(0))
Your code calls csv.reader() which will return a parsed version of the whole line. In my test, this returns a list of values. If this list of values will do for the "whole line" then you can save that.
You have a line where you append something called wholelookup. I think you want to just save line there instead of code.group(0). code.group(0) returns everything matched by the regular expression, and this would be identical to line[-1].
So maybe put this line in your code:
wholelookup[line[-2]].append(line)
Or maybe you need to join together the values from line to make a single string:
s = ' '.join(line)
wholelookup[line[-2]].append(s)
If you want the whole line, not the parsed version, then do something like this:
lookup = defaultdict(list)
wholelookup = defaultdict(list)
pat = re.compile('[a-z](\d+)[a-z]', re.I)
with open('summaryfile.txt') as mydata:
for s_line in mydata:
values = s_line.split('\t')
code = re.match(pat, values[-1])
if code:
lookup[values[-2]].append(code.group(1))
wholelookup[values[-2]].append(s_line)
This example pre-compiles the pattern for the slight speed advantage.
If you have enough memory, the easiest way is to simply save the lines in another defaultdict:
wholeline = defaultdict(list)
...
idname = line[-2]
wholeline[idname].append(line)
I have a problem. I want to make a dictionary that translates english words to estonian. I started, but don't know how to continue. Please, help.
Dictionary is a text document where tab separates english and estonian words.
file = open("dictionary.txt","r")
eng = []
est = []
while True :
lines = file.readline()
if lines == "" :
break
pair = lines.split("\t")
eng.append(pair[0])
est.append(pair[1])
for i in range......
Please, help.
For a dictionary, you should use the dictionary type which maps keys to values and is much more efficient for lookups. I also made some other changes to your code, keep them if you wish:
engToEstDict = {}
# The with statement automatically closes the file afterwards. Furthermore, one shouldn't
# overwrite builtin names like "file", "dict" and so on (even though it's possible).
with open("dictionary.txt", "r") as f:
for line in f:
if not line:
break
# Map the Estonian to the English word in the dictionary-typed variable
pair = lines.split("\t")
engToEstDict[pair[0]] = pair[1]
# Then, lookup of Estonian words is simple
print engToEstDict["hello"] # should probably print "tere", says Google Translator
Mind that the reverse lookup (Estonian to English) is not so easy. If you need that, too, you might be better off creating a second dictionary variable with the reversed key-value mapping (estToEngDict[pair[1]] = pair[0]) because lookup will be a lot faster than your list-based approach.
It would be better to use the appropriately named dict instead of two lists:
d = {}
# ...
d[pair[0]] = pair[1]
Then to use it:
translated = d["hello"]
You should also note that when you call readline() that the resulting string includes the trailing new-line so you should strip this before storing the string in the dictionary.