Reading a file and storing contents into a dictionary - Python - python

I'm trying to store contents of a file into a dictionary and I want to return a value when I call its key. Each line of the file has two items (acronyms and corresponding phrases) that are separated by commas, and there are 585 lines. I want to store the acronyms on the left of the comma to the key, and the phrases on the right of the comma to the value. Here's what I have:
def read_file(filename):
infile = open(filename, 'r')
for line in infile:
line = line.strip() #remove newline character at end of each line
phrase = line.split(',')
newDict = {'phrase[0]':'phrase[1]'}
infile.close()
And here's what I get when I try to look up the values:
>>> read_file('acronyms.csv')
>>> acronyms=read_file('acronyms.csv')
>>> acronyms['ABT']
Traceback (most recent call last):
File "<pyshell#65>", line 1, in <module>
acronyms['ABT']
TypeError: 'NoneType' object is not subscriptable
>>>
If I add return newDict to the end of the body of the function, it obviously just returns {'phrase[0]':'phrase[1]'} when I call read_file('acronyms.csv'). I've also tried {phrase[0]:phrase[1]} (no single quotation marks) but that returns the same error. Thanks for any help.

def read_acronym_meanings(path:str):
with open(path) as f:
acronyms = dict(l.strip().split(',') for l in f)
return acronyms

First off, you are creating a new dictionary at every iteration of the loop. Instead, create one dictionary and add elements every time you go over a line. Second, the 'phrase[0]' includes the apostrophes which turn make it a string instead of a reference to the phrase variable that you just created.
Also, try using the with keyword so that you don't have to explicitly close the file later.
def read(filename):
newDict = {}
with open(filename, 'r') as infile:
for line in infile:
line = line.strip() #remove newline character at end of each line
phrase = line.split(',')
newDict[phrase[0]] = phrase[1]}
return newDict

def read_file(filename):
infile = open(filename, 'r')
newDict = {}
for line in infile:
line = line.strip() #remove newline character at end of each line
phrase = line.split(',', 1) # split max of one time
newDict.update( {phrase[0]:phrase[1]})
infile.close()
return newDict
Your original creates a new dictionary every iteration of the loop.

Related

How to read a value of file separate by tabs in Python?

I have a text file with this format
ConfigFile 1.1
;
; Version: 4.0.32.1
; Date="2021/04/08" Time="11:54:46" UTC="8"
;
Name
John Legend
Type
Student
Number
s1054520
I would like to get the value of Name or Type or Number
How do I get it?
I tried with this method, but it does not solve my problem.
import re
f = open("Data.txt", "r")
file = f.read()
Name = re.findall("Name", file)
print(Name)
My expectation output is John Legend
Anyone can help me please. I really appreciated. Thank you
First of all re.findall is used to search for “all” occurrences that match a given pattern. So in your case. you are finding every "Name" in the file. Because that's what you are looking for.
On the other hand, the computer will not know the "John Legend" is the name. it will only know that's the line after the word "Name".
In your case I will suggest you can check this link.
Find the "Name"'s line number
Read the next line
Get the name without the white space
If there is more than 1 Name. this will work as well
the final code is like this
def search_string_in_file(file_name, string_to_search):
"""Search for the given string in file and return lines containing that string,
along with line numbers"""
line_number = 0
list_of_results = []
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
line_number += 1
if string_to_search in line:
# If yes, then add the line number & line as a tuple in the list
list_of_results.append((line_number, line.rstrip()))
# Return list of tuples containing line numbers and lines where string is found
return list_of_results
file = open('Data.txt')
content = file.readlines()
matched_lines = search_string_in_file('Data.txt', 'Name')
print('Total Matched lines : ', len(matched_lines))
for i in matched_lines:
print(content[i[0]].strip())
Here I'm going through each line and when I encounter Name I will add the next line (you can directly print too) to the result list:
import re
def print_hi(name):
result = []
regexp = re.compile(r'Name*')
gotname = False;
with open('test.txt') as f:
for line in f:
if gotname:
result.append(line.strip())
gotname = False
match = regexp.match(line)
if match:
gotname = True
print(result)
if __name__ == '__main__':
print_hi('test')
Assuming those label lines are in the sequence found in the file you
can simply scan for them:
labelList = ["Name","Type","Number"]
captures = dict()
with open("Data.txt","rt") as f:
for label in labelList:
while not f.readline().startswith(label):
pass
captures[label] = f.readline().strip()
for label in labelList:
print(f"{label} : {captures[label]}")
I wouldn't use a regex, but rather make a parser for the file type. The rules might be:
The first line can be ignored
Any lines that start with ; can be ignored.
Every line with no leading whitespace is a key
Every line with leading whitespace is a value belonging to the last
key
I'd start with a generator that can return to you any unignored line:
def read_data_lines(filename):
with open(filename, "r") as f:
# skip the first line
f.readline()
# read until no more lines
while line := f.readline():
# skip lines that start with ;
if not line.startswith(";"):
yield line
Then fill up a dict by following rules 3 and 4:
def parse_data_file(filename):
data = {}
key = None
for line in read_data_lines(filename):
# No starting whitespace makes this a key
if not line.startswith(" "):
key = line.strip()
# Starting whitespace makes this a value for the last key
else:
data[key] = line.strip()
return data
Now at this point you can parse the file and print whatever key you want:
data = parse_data_file("Data.txt")
print(data["Name"])

Using dict keys to find a matching line in a file, and once found, appending the corresponding values in the file

I have a dict with a few {key, value} pairs. I also have a file with some content.
The file is something like this:
some random text
...
...
text-which-matches-a-key
some important lines
...
...
some other important text until end of file
What I want is, to search/iterate through the file until a line matches a key of the dict, the append the corresponding value before/after some important lines
What I've tried to do is this:
with open('file', 'a+') as f:
for key in a:
if f.readlines() == key:
f.writelines(a[key])
f.close()
where a is a dict, with many key,value pairs.
I'd be happy if the results are something like:
some random text
...
...
text-which-matches-a-key
some important lines
value corresponding to the key
...
...
some other important text until end of file
or:
some random text
...
...
text-which-matches-a-key
value corresponding to the key
some important lines
...
...
some other important text until end of file
Any help is appreciated.
P.S: Using Python 2.7, on PyCharm, on Windows 10.
The script below can not insert multiple dictionary values. The value of the last dictionary key that appears before 'some important lines' in the file is inserted.
dictionary = {'text-which-matches-a-key': 'value corresponding to the key'}
# Open file and fill a list in which each element is a line.
f = open('file', 'r')
lines = f.readlines()
f.close()
# Empty the file.
f = open('file', 'w')
f.close()
# Insert dictionary value at the right place.
for index, line in enumerate(lines):
'''Remove the newline character '\n'
to the right of the strings in the file.
The lines don't match dictionary keys if
the dictionary keys don't have newlines
appended.'''
line = line.rstrip()
# Check if any line is a key in the dictionary.
if line in dictionary.keys():
key_occurs_in_text = True
key_occurring_in_text = line
''' 'some important lines' is reached and a key
of the dictionary has appeared as a line in the
file. Save the list index which corresponds to
the line after or before 'some important lines' in
the variable insert_index. '''
if 'some important lines' == line and key_occurs_in_text:
insert_index = index + 1
# insert_index = index - 1
'''A line in the file
is a key in the dictionary.
Insert the value of the key at the index we saved
in insert_index.
Prepend and append newline characters to match
the file format.'''
if key_occurs_in_text:
lines.insert(insert_index, '\n'+dictionary[key_occurring_in_text]+'\n')
# Write the changed file content to the empty file.
f = open('file', 'w')
for line in lines:
f.write(line)
f.close
Your second version, i.e. inserting directly after the key line is quite simple.
If you don't mind loading the whole file into memory it's just
with open('file', 'r') as f:
txt = f.readlines()
with open('file', 'w') as f:
for line in txt:
f.write(line)
if line.strip() in block:
f.write(block[line.strip()])
with block being your dictionary.
However, if you do not want to load the whole file at once, you have to write to a different file than your source file, because inserting into files (in opposite to overwriting portions of a file) is not possible:
with open('source_file', 'r') as fs, open(target_file,'w') as ft:
for line in fs:
ft.write(line)
if line.strip() in block:
ft.write(block[line.strip()])
Of course it would be possible to e.g. rename the source file first and then write everything to the original filename.
Regarding the first version, i.e. leaving several important lines after the key and insert the block after these lines, well, that would require a proper definition of how to decide which or how many lines are important.
However, if it's just about a fix number of lines N after the key line:
with open(file, 'r') as f:
txt = f.readlines()
with open(file, 'w') as f:
N = 3
i = -1
for line in txt:
f.write(line)
if i == N:
f.write(block[key])
i = -1
if line.strip()in block:
key = line.strip()
i = 0
if i >= 0:
i += 1
... or without loading all at once into memory:
with open('source_file', 'r') as fs, open(target_file,'w') as ft:
N = 3
i = -1
for line in fs:
ft.write(line)
if i == N:
ft.write(block[key])
i = -1
if line.strip()in block:
key = line.strip()
i = 0
if i >= 0:
i += 1
There's a difference between readline() and readlines() (the former reads 1 line, the latter reads all lines and returns a list of strings).
See: https://docs.python.org/2.7/tutorial/inputoutput.html
It'd be easier to just read the entire file, apply your changes to it, and write it back to a file once you're done, rather than trying to edit the file inplace.
See: Editing specific line in text file in python
You don't have to manually close the file when you're useing the with-statement. The file will automatically close when you leave the with-block.
a+ means read and append, what you want is r+ which writes a new line at the cursor location.
Try this:
import fileinput
file = fileinput.FileInput(your_text_file, inplace=True)
for line in file:
for key, value in yourdictionary.iteritems():
line.replace(key, key + '\n' + value)
file.close()
After trying a few things mentioned here, and tinkering around with files and dictionaries, I finally came up with this snippet that works for me:
with open("input") as f:
data_file = f.read()
f.close()
data = data_file.splitlines()
f = open('output', 'w')
for line in data:
if line.strip() in b.keys():
line = line.strip() + '\n' + b[line.strip()].rstrip() + '\n'
f.writelines(line)
else:
f.writelines(line + '\n')
f.close()
where data is the content of the original file, and b is my dictionary of keys and values.
I don't know if answering my own question is allowed or not, but it got me the right output, hence posting it anyway.

Parsing a file from first char in each line

I'm trying to group a file by the first character in each line of the file.
For example, the file:
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
I need to group everything starting with the first s/ up to the next s/. I don't think split() will work since it would remove the delimiter.
Desired end result:
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
s/1/1/2/3/4/5///6
p/22/LLL/GP/1/3//
x//-/-/-/1/5/-/-/
I'd prefer to do this without the re module if possible (is it?)
Edit: Attempts:
The following gets me the values in groups using list comprehension:
with open('/file/path', 'r') as f:
content = f.read()
groups = ['s/' + group for group in content.split('s/')[1:]]
Since the s/ is the first character in the sequence, I use the [1:] to avoid having an element of just s/ in groups[0].
Is there a better way? Or is this the best?
Assuming the first line of the file starts with 's/' you could try something like this:
groups = []
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('s/'):
groups.append('')
groups[-1] += line
To deal with files that don't start with 's/' and have the first element be all lines until the first 's/', we can make a small change and add in an empty string on the first line:
groups = []
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('s/') or not groups:
groups.append('')
groups[-1] += line
Alternatively, if we want to skip lines until the first 's/', we can do the following:
groups = []
with open('test.txt', 'r') as f:
for line in f:
if line.startswith('s/'):
groups.append('')
if groups:
groups[-1] += line

Reading a text file then storing first and third word as key and value respectively in a dictionary

Here's the code:
def predator_eat(file):
predator_prey = {}
file.read()
for line in file:
list = file.readline(line)
list = list.split(" ")
predator_prey[list[0]] = list[2]
print(predator_prey)
predator_eat(file)
The text file is three words per line with \n at the end of each line and I previously opened the file and stored the file name as the variable file
using file = open(filename.txt, r)
the print statement ends up being an empty dictionary so it isn't adding keys and values to the dictionary
please help
Your first call to .read() consumes the entire file contents, leaving nothing for the loop to iterate over. Remove it. And that .readline() call does nothing useful. Remove that too.
This should work:
def predator_eat(file):
predator_prey = {}
for line in file:
words = line.split(" ")
predator_prey[words[0]] = words[2]
print(predator_prey)
Cleaning up your code a little:
def predator_eat(f):
predator_prey = {}
for line in f:
rec = line.strip().split(" ")
predator_prey[rec[0]] = rec[2]
return predator_prey
with open(path) as f:
print predator_eat(f)
You basically re declaring python keywords, change file to fileToRead and list something else like totalList or something.

error when reading a file with blank lines to the end

I'm trying to read a file line by line and do some stuff. The problem is that if I add a bunch of blank lines to the end of the file, I'm getting an exception (list index out of range).
def check_ip(address):
try:
socket.inet_aton(address)
return True
except:
return False
def myfunction:
with open(filename, 'r') as f:
for line in f.readlines():
if not line: continue
tokens = line.strip().split()
if not check_ip( tokens[0] ): continue
// do some stuff
Your empty line test doesn't take into account whitespace.
Use:
if not line.strip(): continue
otherwise you end up with an empty tokens list for those lines.
You don't have to call str.strip() when also using str.split() with no arguments; that call already strips leading and trailing whitespace:
tokens = line.split()
Note that you don't need (nor want) to use f.readlines(); you can iterate over the file object directly:
for line in f:

Categories