converting user details stored in a text file into a dictionary - python

I have tried converting the text file into a dictionary using the following code below:
d = {}
with open('staff.txt', 'r') as file:
for line in file:
(key, val) = line.split()
d[str(key)] = val
print(d)
The contents in the file staff.txt:
username1 jaynwauche
password1 juniornwauche123
e_mail1 juniornwauche#gmail.com
Fullname1 Junior Nwauche
Error: too many values to unpack
What am I doing wrong?

According to your file, the last line you have three words and you want to split them by space so you will have three words but just two variables.

You need to specify the split condition. Right now you are splitting each character, there for you get a list with a lot of elements. Try line.split(' ') like this:
d = {}
with open('staff.txt', 'r') as file:
for line in file:
(key, val) = line.split(' ')
d[str(key)] = val
print(d)
This will split the lines where there's an space, so you get only words on the list.

Related

How to convert a file with words on different newlines into a dictionary based on values each word has?

I am trying to convert a file, where every word is on a different newline, into a dictionary where the keys are the word sizes and values are the lists of words.
The first part of my code has removed the newline characters from the text file, and now I am trying to organize the dictionary based on the values a word has.
with open(dictionary_file, 'r') as file:
wordlist = file.readlines()
print([k.rstrip('\n') for k in wordlist])
dictionary = {}
for line in file:
(key, val) = line.split()
dictionary[int(key)] = val
print(dictionary)
However, I keep getting the error that there aren't enough values to unpack, even though I'm sure I have already removed the newline characters from the original text file. Another error I get is that it will only print out the words in a dictionary without the newlines, however, they aren't organized by value. Any help would be appreciated, thanks! :)
(key, val) = line.split()
^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)
I'm not sure why you're trying to use line.split(). All you need is the length of the word, so you can use the len() function. Also, you use collections.defaultdict to make this code shorter. Like this:
import collections
words = collections.defaultdict(list)
with open('test.txt') as file:
for line in file:
word = line.strip()
words[len(word)].append(word)
try this
with open(dictionary_file, 'r') as file:
dictionary = {}
for line in file:
val = line.strip().split()
dictionary[len(val)] = val
print(dictionary)

How do I split a txt file based on a condition of a certain element in in a certain order list (Python)

So I need to split a txt file into a dictionary.
The txt file could look like this:
Keyone -2
key-two 1
Key'Three -3
Key four-here 5
I think I would need to check the list reversed to check if the second to last element is either a " " or a "-", but since there could be "-" between the words in the string, I'am a bit confused as to how to approach this.
I need the dict to look like [str(key); int(value)]
My tries so far, lookes like:
`
for line in file
a=line.split()
value = a[-1]
key=line[0:-2]
key=key.replace("-","")
`
try the following code:
# Define input
txt = "Keyone -2\nkey-two 1\nKey'Three -3\nKey four-here 5"
print(txt)
# Split the text by newlines
lines = txt.split('\n')
print(lines)
# Iterate over all lines
d = {}
for line in lines:
line.split(' ')
# The key is element after the last space
key = "".join(line[:-1])
# The value is everything before the first space
value = line[-1]
# Assuming it can only be an integer
value = int(value)
d[key] = value
print(d)
with open ("text.txt") as f:
for i in f:
a=i.split()
value=a[-1]
key=i[0:-2]
#print(type(key))
key=key.replace("-","")
d[key[0:-1]]=value
print(d)
The following is the answer using regex:
import re
data_to_parse = """
Keyone -2
key-two 1
Key'Three -3
Key four-here 5
"""
data_to_parse = data_to_parse.splitlines()
pattern = " -?\d"
new = {}
for line in data_to_parse:
if re.findall(pattern, line):
x = re.findall(pattern, line)
#print(line[line.find(x[0]) - 1:])
new[line[:line.find(x[0])].strip()] = line[line.find(x[0]):].strip()
print(new)
See the output:
EDITED:
If the values needs to be an integer, please change the line as following:
new[line[:line.find(x[0])].strip()] = int(line[line.find(x[0]):].strip())
So that the output is going to be below:

Use a file to search another file and print lines matching a pattern to first file

Python noob here. I've been smashing my head trying to do this, tried several Unix tools and I'm convinced that python is the way to go.
I have two files, File1 has headers and numbers like this:
>id1
77
>id2
2
>id3
2
>id4
22
...
Note that id number is unique, but the number assigned to it may repeat. I have several files like this all with the same number of headers (~500).
File2 has all numbers of File1 and an appended sequence
1
ATCGTCATA
2
ATCGTCGTA
...
22
CCCGTCGTA
...
77
ATCGTCATA
...
Note that sequence id is unique, as all sequences after it. I have the same amount of files as File1 but the number of sequences within each File2 may vary(~150).
My desired output is the File1 with the sequence from File2, it is important that File1 maintains original order.
>id1
ATCGTCATA
>id2
ATCGTCGTA
>id3
ATCGTCGTA
>id4
CCCGTCGTA
My approach is to extract numbers from File1 and use them as a pattern to match in File2. First I am trying to make this work with only a pair of files. here is what I achieved:
#!/usr/bin/env python
import re
datafile = 'protein2683.fasta.txt.named'
schemaseqs = 'protein2683.fasta'
with open(datafile, 'r') as f:
datafile_lines = set([line.strip() for line in f]) #maybe I could use regex to get only lines with number as pattern?
print (datafile_lines)
outputlist = []
with open(schemaseqs, 'r') as f:
for line in f:
seqs = line.split(',')[0]
if seqs[1:-1] in datafile_lines:
outputlist.append(line)
print (outputlist)
This outputs a mix of patterns from File1 and the sequences from File2. Any help is appreciated.
Ps: I am open to modifications in files structure, I tried substituting \n in File2 for "," with no avail.
import re
datafile = 'protein2683.fasta.txt.named'
schemaseqs = 'protein2683.fasta'
datafile_lines = []
d = {}
prev = None
with open(datafile, 'r') as f:
i = 0
for line in f:
if i % 2 == 0:
d[line.strip()]=0
prev = line.strip()
else:
d[prev] = line.strip()
i+=1
new_d = {}
with open(schemaseqs, 'r') as f:
i=0
prev = None
for line in f:
if i % 2 == 0:
new_d[line.strip()]=0
prev = line.strip()
else:
new_d[prev] = line.strip()
i+=1
for key, value in d.items():
if value in new_d:
d[key] = new_d[value]
print(d)
with open(datafile,'w') as filee:
for k,v in d.items():
filee.writelines(k)
filee.writelines('\n')
filee.writelines(v)
filee.writelines('\n')
creating two dictionary would be easy and then map both dictionary values.
Since the files are so neatly organized, I wouldn't use a set to store the lines. Sets don't enforce order, and the order of these lines conveys a lot of information. I also wouldn't use Regex; it's probably overkill for the task of parsing individual lines, but not powerful enough to keep track of which ID corresponds to each gene sequence.
Instead, I would read the files in the opposite order. First, read the file with the gene sequences and build a mapping of IDs to genes. Then read in the first file and replace each id with the corresponding value in that mapping.
If the IDs are a continuous sequence (1, 2, 3... n, n+1), then a list is probably the easiest way to store them. If the file is already in order, you don't even have to pay attention to the ID numbers; you can just skip every other row and append each gene sequence to an array in order. If they aren't continuous, you can use a dictionary with the IDs as keys. I'll use the dictionary approach for this example:
id_to_gene_map = {}
with open(file2, 'r') as id_to_gene_file:
for line_number, line in enumerate(id_to_gene_file, start=1):
if line_number % 2 == 1: # Update ID on odd numbered lines, including line 1
current_id = line
else:
id_to_gene_map[current_id] = line # Map previous line's ID to this line's value
with open(file1, 'r') as input_file, open('output.txt', 'w') as output_file:
for line in input_file:
if not line.startswith(">"): # Keep ">id1" lines unchanged
line = id_to_gene_map[line] # Otherwise, replace with the corresponding gene
output_file.write(line)
In this case, the IDs and values both have trailing newlines. You can strip them out, but since you'll want to add them back in for writing the output file, it's probably easiest to leave them alone.

How to extract a subset from a text file and store it in a separate file?

I am currently trying to extract information from a text file using Python. I want to extract a subset from the file and store it in a separate file from everywhere it occurs in the text file. To give you an idea of what my file looks like, here is a sample:
C","datatype":"double","value":25.71,"measurement":"Temperature","timestamp":1573039331258250},
{"unit":"%RH","datatype":"double","value":66.09,"measurement":"Humidity","timestamp":1573039331258250}]
Here, I want to extract "value" and the corresponding number beside it. I have tried various techniques but have been unsuccessful. I tried to iterate through the file and stop at where I have "value" but that did not work.
Here is a sample of the code:
with open("DOTemp.txt") as openfile:
for line in openfile:
for part in line.split():
if "value" in part:
print(part)
A simple solution to return the value marked by the "value" key:
with open("DOTemp.txt") as openfile:
for line in openfile:
line = line.replace('"', '')
for part in line.split(','):
if "value" in part:
print(part.split(':')[1])
Note that by default str.split() splits on whitespace. In the last line, if we printed element zero of the list it would just be "value". If you wish to use this as an int or float, simply cast it as such and return it.
First split using , (comma) as the delimiter, then split the corresponding strings using : as the delimiter.
if required trim leading and trailing "" then compare with value
Below code will work for you:
file1 = open("untitled.txt","r")
data = file1.readlines()
#Convert to a single string
val = ""
for d in data:
val = val + d
#split string at comma
comma_splitted = val.split(',')
#find the required float
for element in comma_splitted:
if 'value' in element:
out = element.split('"value":')[1]
print(float(out))
I assume your input file is a json string(list of dictionaries) (looking at the file sample). If that's the case, perhaps you can try this.
import json
#Assuming each record is a dictionary
with open("DOTemp.txt") as openfile:
lines = openfile.readlines()
records = json.loads(lines)
out_lines = list(map(lambda d: d.get('value'), records))
with open('DOTemp_out.txt', 'w') as outfile:
outfile.write("\n".join(out_lines))

How to create a dictionary that contains key‐value pairs from a text file

I have a text file (one.txt) that contains an arbitrary number of key‐value pairs (where the key and value are separated by a colon – e.g., x:17). Here are some (minus the numbers):
mattis:turpis
Aliquam:adipiscing
nonummy:ligula
Duis:ultricies
nonummy:pretium
urna:dolor
odio:mauris
lectus:per
quam:ridiculus
tellus:nonummy
consequat:metus
I need to open the file and create a dictionary that contains all of the key‐value pairs.
So far I have opened the file with
file = []
with open('one.txt', 'r') as _:
for line in _:
line = line.strip()
if line:
file.append(line)
I opened it this way to get rid of new line characters and the last black line in the text file. I am given a list of the key-value pairs within python.
I am not sure how to create a dictionary with the list key-value pairs.
Everything I have tried gives me an error. Some say something along the lines of
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Use str.split():
with open('one.txt') as f:
d = dict(l.strip().split(':') for l in f)
split() will allow you to specify the separator : to separate the key and value into separate strings. Then you can use them to populate a dictionary, for example: mydict
mydict = {}
with open('one.txt', 'r') as _:
for line in _:
line = line.strip()
if line:
key, value = line.split(':')
mydict[key] = value
print mydict
output:
{'mattis': 'turpis', 'lectus': 'per', 'tellus': 'nonummy', 'quam': 'ridiculus', 'Duis': 'ultricies', 'consequat': 'metus', 'nonummy': 'pretium', 'odio': 'mauris', 'urna': 'dolor', 'Aliquam': 'adipiscing'}

Categories