Transform a txt file to dictionary in Python - python

Assuming a following text file (lemma_es.txt) is present:
comer coma
comer comais
comer comamos
comer coman
The first column represents the lemma of the second column and the second column represents the inflected word.
I am trying to make a dictionary in which the keys are the words in the second word and the values are the words in the first column.
The output I need:
{'coma': 'comer', 'comais': 'comer', 'comamos': 'comer', 'coman': 'comer' ... }
Edit: The txt starts with:
1 primer
1 primera
1 primeras
1 primero
There are some words that's need to be duplicated, only in dictionary's values, first column of words in txt.
Thank you all!

I think you could try this:
myfile = open("lemma_es.txt", 'r')
data_dict = {}
for line in myfile:
k, v = line.strip().split()
data_dict[k.strip()] = v.strip()
myfile.close()
print(' text file to dictionary =\n ',data_dict)

word_dict={}
with open("lemma_es.txt","r") as filehandle:
for line in filehandle.readlines():
word_dict[line.split()[-1]]=line.split()[0]
Read the txt file and read each line using readlines . Split the line and Just use the second value of list as key.

IIUC, you could use:
with open('lemma_es.txt') as f:
d = dict(reversed(l.strip().split()) for l in f)
output:
{'coma': 'comer', 'comais': 'comer', 'comamos': 'comer', 'coman': 'comer'}
NB. note that the second words must be unique

Related

converting user details stored in a text file into a dictionary

I have tried converting the text file into a dictionary using the following code below:
d = {}
with open('staff.txt', 'r') as file:
for line in file:
(key, val) = line.split()
d[str(key)] = val
print(d)
The contents in the file staff.txt:
username1 jaynwauche
password1 juniornwauche123
e_mail1 juniornwauche#gmail.com
Fullname1 Junior Nwauche
Error: too many values to unpack
What am I doing wrong?
According to your file, the last line you have three words and you want to split them by space so you will have three words but just two variables.
You need to specify the split condition. Right now you are splitting each character, there for you get a list with a lot of elements. Try line.split(' ') like this:
d = {}
with open('staff.txt', 'r') as file:
for line in file:
(key, val) = line.split(' ')
d[str(key)] = val
print(d)
This will split the lines where there's an space, so you get only words on the list.

How to fill dictionary values from another file?

I have two files (each indices are separated by a space) :
file1.txt
OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome
file2.txt
UniRef90_1 OTU0001 OTU0004 OTU0005 OTU0007
UniRef90_2 OTU0002 OTU0003 OTU0005
UniRef90_3 OTU0004 OTU0006 OTU0007
I would like, in the second file, replace the OTUXXXX by their values from the first file . And I need to keep the Uniref90_X at the beginning of each line. It should like this for the first line of the second file :
UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
For the moment, I have created a dictionary for the second file, with the
UniRef90_X as keys and the OTUXXXX as values.
f1=open("file1.txt", "r")
f2=open("file2.txt", "r")
dict={}
for i in f2:
i=i.split(" ")
dict[i[0]]=i[1:]
for j in f1:
j=j.split(" ")
if j[0] in dict.values():
dico[i[0]]=j[1:]
But I don't know how to replace the OTUXXXX with the corresponding values from the first fileny idea?
I would suggest putting the first file into a dictionary. That way, as you read file2, you can look up ids you captured from file1.
The way you have your loops set up, you will read the first record from file2 and enter it into a hash. The key will never match anything from file1. Then you read from file1 and do something there. The next time you read from file2, all of file1 will be exhausted from the first iteration of file2.
Here is an approach that reads file 1 into a dictionary, and when it finds matches in file 2, prints them out.
file1 = {} # declare a dictionary
fin = open('f1.txt', 'r')
for line in fin:
# strip the ending newline
line = line.rstrip()
# only split once
# first part into _id and second part into data
_id, data = line.split(' ', 1)
# data here is a single string possibly containing spaces
# because only split once (above)
file1[_id] = data
fin.close()
fin = open('f2.txt', 'r')
for line in fin:
uniref, *ids = line.split() # here ids is a list (because prepended by *)
print(uniref, end='')
for _id in ids:
if _id in file1:
print(' ', file1[_id], '(#' + _id + ')', end='')
print()
fin.close()
The printout is:
UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
UniRef90_2 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon (#OTU0002) Archaea;Altiarchaeales;uncultured euryarchaeote (#OTU0003) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)
UniRef90_3 Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured (#OTU0006) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
First of all, DO NOT NAME YOUR VARIABLES EXACTLY LIKE CLASSES. EVER. Use something like d2 instead.
Then, replace the [1] with [1:]
Then, after importing the first file in a dictionary just like you did with the second one - let's name it d1 - you can combine the values like this:
d3=dict()
for e in d2:
L=list()
for f in d2[e]:
L.append(d1[f])
d3[e]=f(L) #format your list here
Finally, turn it back into a string and write it in a file.

assign numeric values from a text to characters in a string using python 2

if I have a text file contains all english alphabets with some corresponding value like the following:
A 0.00733659550399
B 0.00454138879023
C 0.00279849519224
D 0.00312734304092
.
.
.
I want to assign these numeric values to each line I'm reading from another txt file.
L = open(os.path.join(dir, file), "r").read()
line = L.rstrip()
tokens = line.split()
for word in tokens:
for char in word:
find
Create a dictionary from the first file like this:
with open('values.txt') as f:
values = {k:v for k,v in (line.split() for line in f)}
Then iterate over each character of the data file and replace it with the corresponding value:
with open('A.txt') as infile, open('output.txt', 'w') as outfile:
for line in infile:
for c in line.rstrip():
print(values.get(c.upper(), '0'), file=outfile)
This code (assumes Python 3 or import of print function in Python 2) will write to output.txt the numeric values corresponding to the input characters, one per line. If there is no value for a character, 0 is output (that can be changed to whatever you want). Note that the incoming characters are converted to upper case because your sample looks like it might comprise upper case letters only. If there are separate values for lower case letters, then you can remove the call to upper().
If you would prefer the values to remain on the same line then you can alter the print() function call:
with open('A.txt') as infile, open('output.txt', 'w') as outfile:
for line in infile:
print(*(values.get(c.upper(), '0') for c in line.rstrip()), file=outfile)
Now the values will be space separated.
Is this what you're looking for ?
input.txt
AAB BBC ABC
keyvalue.txt
A 123
B 456
C 789
script.py
def your_func(input_file):
char_value = {}
with open('keyvalue.txt', 'r') as f:
for row in f:
char_value[row.split()[0]] = row.split()[1]
res = []
with open(input_file) as f:
for row in f:
for word in row.split():
for c in word:
# Little trick to append only if key exists
c in char_value and res.append(char_value[c])
return '*'.join(res)
print(your_func("input.txt"))
# >>> 123*123*456*456*456*789*123*456*789

Need to copy the contents of a text file to a dictionary

I have a text file such that each line consists of one word followed by a comma-separated list of that word's synonyms. So for example, one line would look like this:
word, synonym1, synonym2, synonym3
so the first word in each line is the key and the rest are its values
Solution
with open('file_name.txt') as fobj:
synonyms = {}
for line in fobj:
key, *values = [entry.strip() for entry in line.split(',')]
synonyms[key] = values
produces this dictionary synonyms:
{'word1': ['synonym11', 'synonym12', 'synonym13'],
'word2': ['synonym21', 'synonym22', 'synonym23']}
for this file content:
word1, synonym11, synonym12, synonym13
word2, synonym21, synonym22, synonym23
Explanation
Open the file using with open('file_name.txt') as fobj: This opens the file with the promise to close it after dedenting.
Make a new empty dictionary: synonyms = {}.
Go through all lines for line in fobj:.
Split each line at the comma and remove extra white space from each word: [entry.strip() for entry in line.split(',')].
Use the new *-way to unpack an iterable in Python 3 to split key and values key, *values =.
Add the values to the result synonyms[key] = values.
Addition:
Print word and a random synonym:
import random
for word, syns in synonyms.items():
print(word, random.choice(syns))
prints:
word1 synonym12
word2 synonym22

How to create a dictionary that contains key‐value pairs from a text file

I have a text file (one.txt) that contains an arbitrary number of key‐value pairs (where the key and value are separated by a colon – e.g., x:17). Here are some (minus the numbers):
mattis:turpis
Aliquam:adipiscing
nonummy:ligula
Duis:ultricies
nonummy:pretium
urna:dolor
odio:mauris
lectus:per
quam:ridiculus
tellus:nonummy
consequat:metus
I need to open the file and create a dictionary that contains all of the key‐value pairs.
So far I have opened the file with
file = []
with open('one.txt', 'r') as _:
for line in _:
line = line.strip()
if line:
file.append(line)
I opened it this way to get rid of new line characters and the last black line in the text file. I am given a list of the key-value pairs within python.
I am not sure how to create a dictionary with the list key-value pairs.
Everything I have tried gives me an error. Some say something along the lines of
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Use str.split():
with open('one.txt') as f:
d = dict(l.strip().split(':') for l in f)
split() will allow you to specify the separator : to separate the key and value into separate strings. Then you can use them to populate a dictionary, for example: mydict
mydict = {}
with open('one.txt', 'r') as _:
for line in _:
line = line.strip()
if line:
key, value = line.split(':')
mydict[key] = value
print mydict
output:
{'mattis': 'turpis', 'lectus': 'per', 'tellus': 'nonummy', 'quam': 'ridiculus', 'Duis': 'ultricies', 'consequat': 'metus', 'nonummy': 'pretium', 'odio': 'mauris', 'urna': 'dolor', 'Aliquam': 'adipiscing'}

Categories