Reading (somewhat) unstructured data from a text file to create Python Dictionary

Reading (somewhat) unstructured data from a text file to create Python Dictionary - python

I have the following data in a text file named 'user_table.txt':
Jane - valentine4Me
Billy
Billy - slick987
Billy - monica1600Dress
Jason - jason4evER
Brian - briguy987321CT
Laura - 100LauraSmith
Charlotte - beutifulGIRL!
Christoper - chrisjohn
I'm trying to read this data into a Python dictionary using the following code:
users = {}
with open("user_table.txt", 'r') as file:
for line in file:
line = line.strip()
# if there is no password
if '-' in line == False:
continue
# otherwise read into a dictionary
else:
key, value = line.split('-')
users[key] = value
print(users)
I get the following error:
ValueError: not enough values to unpack (expected 2, got 1)
This most likely results because the first instance of Billy doesn't have a '-' to split on.
If that's the case, what's the best way to work around this?
Thanks!

Your condition is wrong, must be:
for line in file:
line = line.strip()
# if there is no password
# if '-' not in line: <- another option
if ('-' in line) == False:
continue
# otherwise read into a dictionary
else:
key, value = line.split('-')
users[key] = value
or
for line in file:
line = line.strip()
# if there is password
if '-' in line:
key, value = line.split('-')
users[key] = value

Related

How do I delete a certain line (given a line #) in a file through python?

I want to delete a line of text from a .txt file given an integer corresponding to the txt file's line number. For example, given the integer 2, delete line 2 of the text file.
I'm sort of lost on what to put into my program.
f = open('text1.txt','r+')
g = open('text2.txt',"w")
line_num = 0
search_phrase = "Test"
for line in f.readlines():
line_num += 1
if line.find(search_phrase) >= 0:
print("text found a line" , line_num)
decision = input("enter letter corresponding to a decision: (d = delete lines, s = save to new txt) \n")
if decision == 'd':
//delete the current line
if decision == 's':
//save the current line to a new file
Any help is appreciated! Thanks :)

This way:
with open('text1.txt','r') as f, open('text2.txt',"w") as g:
to_delete=[2,4]
for line_number, line in enumerate(f.readlines(), 1):
if line_number not in to_delete:
g.write(line)
else:
print(f'line {line_number}, "{line.rstrip()}" deleted')

Here it goes.
f = open('data/test.txt','rb')
text = f.readlines() # all lines are read into a list and you can acess it as a list
deleted_line = text.pop(1) #line is deleted and stored into the variable
print(text)
print(deleted_line)
f.write(text) # here you save it with the new data, you can always delete the data in the file to replace by the new one

New column and column values get added to the next line

I want to add a new column and new values to it. I'm just using normal file handling to do it (just adding a delimiter). I actually did try using csv but the csv file would have one letter per cell after running the code.
#import csv
#import sys
#csv.field_size_limit(sys.maxsize)
inp = open("city2", "r")
inp2 = open("op", "r")
oup = open("op_mod.csv", "a+")
#alldata = []
count = 0
for line in inp2:
check = 0
if count == 0:
count = count + 1
colline = line + "\t" + "cities"
oup.write(colline)
continue
for city in inp:
if city in line:
print(city, line)
linemod = line + "\t" + city #adding new value to an existing row
#alldata.append(linemod)
oup.write(linemod) #writing the new value
check = 1
break
if check == 0:
check = 1
#oup.write(line)
#alldata.append(line)
inp.close()
inp = open("city2", "r")
#writer.writerows(alldata)
inp.close()
inp2.close()
oup.close()
Expected result:
existing fields/values ... new field/value
actual result:
existing fields/values ... new line
new field/value ...next line

there is a carriage return at the end of line, you can remove it using line.rstrip() similar to this answer:
Deleting carriage returns caused by line reading

Text file to dictionary then to print information in python

Hey everyone just have an issue with a text file and putting it into a dictionary.
So my code first starts off by gathering data from a website and writes it to a text file. From there I reopen the file and make it into a dictionary to transfer the data from the text to the dictionary. In the while loop, I am getting the error of
key,value = line.split()
ValueError: too many values to unpack (expected 2)
Which I'm not sure why if I'm using the wrong method to write the text file data to the new place in the program of "countryName"
def main():
import requests
webFile = "https://www.cia.gov/library/publications/the-world-factbook/rankorder/rawdata_2004.txt"
data = requests.get(webFile) #connects to the file and gest a response object
with open("capital.txt",'wb') as f:
f.write(data.content) #write the data out to a file – wb used since thecontent from the response object is returned as abinary object.
f.close()
infile = open('capital.txt', 'r')
line = infile.readline()
countryName = {}
while line != "":
key,value = line.split()
countryName[key] = value
line = infile.readline()
infile.close()
userInput = input("Enter a country name: ")
for i in countryName:
while(userInput != 'stop'):
print("The per capita income in",countryName[key], "is",countryName[value])
userInput = input("Enter a country name: ")
main()
while line != "":
key,value = line.split()
countryName[key] = value
line = infile.readline()
infile.close()
This is where my issue pops up.
I am trying to have the text file information be put into a dictionary.
Once done I want to iterate through the dictionary, and have the user enter a country name. Then in response, the program finds the country name and returns the name of the country and the capital income as well.
So if "United States" is inputed the output would be "The per capita income in the United States is $54000" That as an example to show what im doing.
The key being the country name and the value being the income.
countryName = {}
with open('capital.txt','r') as infile:
for line in infile:
num,key,value = line.split()
countryName[key] = value
num,key,value = infile.readline().split()
#print(line)
print(countryName)

The issue is that the lines return three values each: The line number, the country, and the per-capita income:
fh.readline().split()
['2', 'Qatar', '$124,500']
To fix that, you can capture that third value in a throwaway variable:
n, key, val = fh.readline().split()
However, there's a different problem, what if your country name has spaces in it?
['190', 'Micronesia,', 'Federated', 'States', 'of', '$3,400']
You can use the *arg syntax to capture any number of arguments in a variable:
myline = ['190', 'Micronesia,', 'Federated', 'States', 'of', '$3,400']
num, *key, value = myline
# key is now ['Micronesia,', 'Federated', 'States', 'of']
You can then use join to create a single string
key = ' '.join(key)
# 'Micronesia, Federated States of'
Furthermore, it's important to keep your conventions consistent in your program. You use the with context handler to open and close an earlier file, that's a good practice, so keep it with the other file as well. Also, you can iterate over the file-handle directly like so:
with open('capital.txt', 'r') as infile:
for line in infile: # no need to check the line contents or call readline
num, *key, value = line.split()
key = ' '.join(key)
# No need to close your file at the end either
Last, your print statement will raise a KeyError:
print("The per capita income in",countryName[key], "is",countryName[value])
You've already stored value at countryName[key], so the lookup is only against key, rather than value:
# key is already a string representing the country name
# countrName[key] is the value associated with that key
print("The per capita income in", key , "is", countryName[key])

Python- How do I update an index of a for loop that iterates over lines in a file?

Using a for loop, I'm iterating over the lines in a file. Given this line:
line= [ ‘641', '"Tornadus', ' (Incarnate Form)"', '"Flying"', '""', '5', '"TRUE"']
I need to reformat index [6] from '"TRUE"' to the boolean True.
Full expected output: d={'Tornadus, (Incarnate Form)': (641, 'Flying', None, 5, True}
I used:
if "T" in line[6]: # format legendary if TRUE
line[6] = True
But I get this error:
Traceback (most recent call last):
File "tester5p.py", line 305, in test_read_info_file_05
self.assertEqual(read_info_file(DATAFILE),info_db5())File "/Users/kgreenwo/Desktop/student.py", line 52, in read_info_file
line[5] = False
TypeError: 'str' object does not support item assignment
How can I assign it WITHIN the for loop?
To see my full code:
def read_info_file(filename):
f = open(filename, 'r') # open file in read mode
d = {} # intitialze as empty
count = 0 # helps to skip first line
key = ""
for line in f: # get each line from file
if count != 0: # skip first line
# 1___________________________________________open file,read, skip 1st line
id_num = int(line[0]) # make id an integer
# 2________________________________________________
if ',' in line[1]: # two parts to fullname, changes indexes
part1 = line[1].strip('"') # get format first part of name
part2 = line[2].strip() # get format second part of name
# 3______________
fullname = part1 + part2
key = fullname
# 4______________
type1 = line[3].strip('"')
# 5--------------
if line[4] == "": # check if there is not a second type
type2 = None # correct format
else: # is a second type
type2 = line[4].strip('"') # format second type
# 6______________
generation = line[5] # format generation
# 7_____________
if "T" in line[6]: # format legendary if TRUE
line[6] = True
legendary = line[6]
else: # format legendary if FALSE
line[6] = False
legendary = line[6]
# 8______________________________________________one part to name
else: # one part to name
fullname = line[1].strip('"')
# 9______________
type1 = line[2].strip('"')
# 10_____________
if line[3] == "": # if no second type
type2 = None
else:
type2 = line[3].strip('"') # there is a second type
# 11_____________
generation = line[4] # format generation
# 12_____________
if "T" in line[5]: # format legendary if TRUE
line[5] = True
legendary = line[5]
else: # formmat Legendary if False
line[5] = False
legendary = line[5]
value = (id_num, type1, type2, generation, legendary)
d.update([(key, value)])
count += 1
return d
Reproducible example:
input: (don't forget to skip first line!)
info_file1 = '''"ID","Name","Type 1","Type 2","Generation","Legendary"
1,"Bulbasaur","Grass","Poison",1,"FALSE"
Output:
d={'Bulbasaur':(1,'Grass','Poison',1,False)}

It is quite unclear from your example, but my thoughts go to:
for line in f:
line = line.split(',')
Now you can mess with indexes and see whether you have more errors.
And if you use:
if "T" in line[6]: # format legendary if TRUE
line[6] = True
It will work.

Your input file looks like a comma-separated values file. If it is, what you want is pretty easy.
Let's suppose your input file is literally this:
Input_file-43644346.txt
info_file1 = '''"ID","Name","Type 1","Type 2","Generation","Legendary"
1,"Bulbasaur","Grass","Poison",1,"FALSE"
641,"Tornadus', ' (Incarnate Form)","Flying",,5,"TRUE"
You could do something like that:
#!/usr/bin/env python3
import csv
input_file_name = "Input_file-43644346.txt"
with open(input_file_name, newline='') as input_file:
next(input_file) # skip first line
record_extractor = csv.reader(input_file)
d = {}
for row in record_extractor:
key = row[1].strip()
row_truth = row[5] == "TRUE" # simplifying the boolean retrieving
# Using conditional expressions
row_second_type = row[3].strip() if row[3] else None
output_row = (row[0], row[2], row_second_type, row[4], row_truth)
d[key] = output_row
print("d=", d)
Here are some key points of this solution:
This example is in Python 3's syntax
Using with makes sure that the input file is closed timely
Since a file object is also an iterator, you can skip the first line by using next().
csv.reader() will give you a tuple containing the information from a row. It will process quoted string like you would expect.
The expression row[5] == "TRUE" will yield a boolean expression. You don't need to use an if statement.
An empty string is equivalent to False. Any other string is True.
Conditional expressions can be used to change an empty string to None like you wanted.
dict.update() is useful if you already have a dictionary or a list of tuples you want to use its values to update an dictionary but you are better off using d[key] = value
But my guess is that your file is more like that:
Input_file-43644346b.txt
"ID","Name","Type 1","Type 2","Generation","Legendary"
1,"Bulbasaur","Grass","Poison",1,"FALSE"
641,"Tornadus', ' (Incarnate Form)","Flying",,5,"TRUE"
You can then use csv.DictReader to read your data:
#!/usr/bin/env python3
import csv
input_file_name = "Input_file-43644346b.txt"
with open(input_file_name, newline='') as input_file:
record_extractor = csv.DictReader(input_file)
d = {}
for row in record_extractor:
key = row["Name"].strip()
row_truth = row["Legendary"] == "TRUE"
row_second_type = row["Type 2"].strip() if row["Type 2"] else None
output_row = (row["ID"], row["Type 1"],
row_second_type, row["Generation"], row_truth)
d[key] = output_row
print("d=", d)
That enables you to use "column" names to identify different parts of each row
You can simplify even more your code by using a dictionary comprehension:
#!/usr/bin/env python3
import csv
input_file_name = "Input_file-43644346.txt"
with open(input_file_name, newline='') as input_file:
next(input_file) # skip first line
record_extractor = csv.reader(input_file)
d = { row[1]: (row[0],
row[2],
row[3].strip() if row[3] else None,
row[4],
row[5] == "TRUE")
for row in record_extractor }
print("d=", d)

Instead of reassigning it, I just did this and it worked:
if "T" in line[6]: # format legendary if TRUE
legendary = True
else: # format legendary if FALSE
legendary = False

Parsing Input File in Python

I have a plain text file with some data in it, that I'm trying to open and read using a Python (ver 3.2) program, and trying to load that data into a data structure within the program.
Here's what my text file looks like (file is called "data.txt")
NAME: Joe Smith
CLASS: Fighter
STR: 14
DEX: 7
Here's what my program looks like:
player_name = None
player_class = None
player_STR = None
player_DEX = None
f = open("data.txt")
data = f.readlines()
for d in data:
# parse input, assign values to variables
print(d)
f.close()
My question is, how do I assign the values to the variables (something like setting player_STR = 14 within the program)?

player = {}
f = open("data.txt")
data = f.readlines()
for line in data:
# parse input, assign values to variables
key, value = line.split(":")
player[key.strip()] = value.strip()
f.close()
now the name of your player will be player['name'], and the same goes for all other properties in your file.

import re
pattern = re.compile(r'([\w]+): ([\w\s]+)')
f = open("data.txt")
v = dict(pattern.findall(f.read()))
player_name = v.get("name")
plater_class = v.get('class')
# ...
f.close()

The most direct way to do it is to assign the variables one at a time:
f = open("data.txt")
for line in f: # loop over the file directly
line = line.rstrip() # remove the trailing newline
if line.startswith('NAME: '):
player_name = line[6:]
elif line.startswith('CLASS: '):
player_class = line[7:]
elif line.startswith('STR: '):
player_strength = int(line[5:])
elif line.startswith('DEX: '):
player_dexterity = int(line[5:])
else:
raise ValueError('Unknown attribute: %r' % line)
f.close()
That said, most Python programmers would stored the values in a dictionary rather than in variables. The fields can be stripped (removing the line endings) and split with: characteristic, value = data.rstrip().split(':'). If the value should be a number instead of a string, convert it with float() or int().

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading (somewhat) unstructured data from a text file to create Python Dictionary - python

Related

How do I delete a certain line (given a line #) in a file through python?

New column and column values get added to the next line

Text file to dictionary then to print information in python

Python- How do I update an index of a for loop that iterates over lines in a file?

Parsing Input File in Python

Categories

Resources