Read Text file and Create a Python Dictionary [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a text file as mentioned below:
KEY,NAME,RANK,BOOKNAME,SCORE,AUTHER
123,ABCD,500,FREEDOM1,15200,PXYZ
133,EFGH,400,FREEDOM2,15300.5,XTYZ
nan,SYGH,700,FREEDOM3,15400,RYYZ
143,LKMN,800,FREEDOM4,15500.5,XYCZ
I want read this text file and create a nested dictionary which will be used in my subsequent program.
dict = {
123:{'NAME':'ABCD','RANK':500,'BOOKNAME':'FREEDOM1', 'SCORE':15200, 'AUTHER':'PXYZ'},
133:{'NAME':'EFGH','RANK':400,'BOOKNAME':'FREEDOM2', 'SCORE':15300.5, 'AUTHER':'XTYZ'},
143:{'NAME':'LKMN','RANK':800,'BOOKNAME':'FREEDOM4', 'SCORE':15500.5, 'AUTHER':'XYCZ'}
}
Note: Code should remove the rows with 'nan' KEY values

You can use csv.DictReader to create a list of OrderedDicts from your data file. Then you can rearrange and transform your data to make the nested dictionary to meet your requirements. Here is an example using dictionary comprehension.
import csv
with open('text.csv') as f:
reader = csv.DictReader(f)
result = {
int(d['KEY']):{k: int(v) if v.isdigit() else v for k, v in d.items() if k != 'KEY'}
for d in reader if d['KEY'].isdigit()}
print(result)
EDIT: If all you need is the string values as posted in Tanmay's solution then this does the same with a lot less code.
import csv
from pprint import pprint
with open('text.csv') as f:
results = {d.pop('KEY'): dict(d) for d in csv.DictReader(f)}
pprint(results)
EDIT 2: casting values
import csv
from pprint import pprint
import re
def cast_dict(d: dict):
def cast_value(value: str):
if value.isdigit():
return int(value)
elif re.match(r'^\d+\.\d+$', value):
return float(value)
return value
return {k: cast_value(v) for k, v in d.items()}
with open('text.csv') as f:
results = {int(d.pop('KEY')): cast_dict(d) for d in csv.DictReader(f) if d.get['KEY'].isdigit()}
pprint(results)
pprint(results)

You could use csv module like this. If you need to check if KEY value is number, create corresponding function:
import csv
def is_float(s):
try:
float(s)
except ValueError:
return False
return True
with open('input.csv') as f:
reader = csv.DictReader(f)
rows = list(dict(a) for a in iter(reader) if is_float(a['KEY']))
print(rows)

things that you will need to do to achieve your goal are
first you need to how to open file(assuming its .txt file containing comma separated values)
filename = "csv_data.txt"
file = open(filename, "r") #opening in read mode
line_list = []
for line in file:
print(line) #line_list.append(line.strip().split(','))
then you would want to split string(line) using ',' as delimiters for that you would have to do line.split(',') this will give you list.
line_list[0]
here you will find list of all the strings in line 1 of your text file.
okay I have decided to add code but please don't copy paste try to understand it google or go to python docs look what each inbuilt function do.
from collections import defaultdict
filename = "csv_data.txt"
file = open(filename, "r") #opening in read mode
line_list = []
output_dict = defaultdict(dict) #read about defualtdict vs dict
for line in file:
#print(line,end='')
line_list.append(line.strip().split(','))
key_names = line_list[0] #remember firstline in our file contains name of keys
#read about slicing
for line in line_list[1:]:
#print(line)
this_key = line[0]
if this_key == 'nan':
continue #we don't want to add this to our dict
else:
this_key = int(this_key)
output_dict[this_key]= defaultdict(dict)
# read about enumerate
for i,word in enumerate(line[1:], start = 1):
this_key_dict = output_dict[this_key]
if key_names[i] == 'SCORE' or key_names[i] == 'RANK':
try:
word = int(word)
except ValueError:
word = float(word)
this_key_dict[key_names[i]] = word
def nice_print(dict_d):
for i,v in dict_d.items():
print(i,v)
nice_print(output_dict)
>>> word = '7.8'
>>> float(word) if '.' in word else int(word)
7.8
>>> word = '7'
>>> float(word) if '.' in word else int(word)
7
>>>

Related

Making dictionary in dictionary to separate data by the same values in one column and then from second column

I am new in Python and I am stuck with one problem for a few days now. I made a script that:
-takes data from CSV file -sort it by same values in first column of data file
-instert sorted data in specifield line in different template text file
-save the file in as many copies as there are different values in first column from data file This picture below show how it works:
But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:
What I also need is to add somewhere separeted value of first column from data file by "_".
There is datafile:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
and there is code i made:
import shutil
with open("data.csv") as f:
contents = f.read()
contents = contents.splitlines()
values_per_baseline = dict()
for line in contents:
key = line.split(',')[0]
values = line.split(',')[1:]
if key not in values_per_baseline:
values_per_baseline[key] = []
values_per_baseline[key].append(values)
for file in values_per_baseline.keys():
x = 3
shutil.copyfile("of.txt", (f"of_%s.txt" % file))
filename = f"of_%s.txt" % file
for values in values_per_baseline[file]:
with open(filename, "r") as f:
contents = f.readlines()
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[1] +'\n')
with open(filename, "w") as f:
contents = "".join(contents)
f.write(contents)
f.close()
I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works. Any help or suggestion will be much appreciated.
You could try the following:
import csv
from collections import defaultdict
values_per_baseline = defaultdict(lambda: defaultdict(list))
with open("data.csv", "r") as file:
for key1, key2, value in csv.reader(file):
values_per_baseline[key1][key2].append(value)
x = 3
for filekey, content in values_per_baseline.items():
with open("of.txt", "r") as fin,\
open(f"of_{filekey}.txt", "w") as fout:
fout.writelines(next(fin) for _ in range(x))
for key, values in content.items():
fout.write(
f' o = {key}\n'
+ ' a = '
+ ' <COMMA> '.join(values)
+ '\n'
)
fout.writelines(fin)
The input-reading part is using the csv module from the standard library (for convenience) and a defaultdict. The file is read into a nested dictionary.
Content of datafile.csv:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
Possible solution is the following:
def nested_list_to_dict(lst):
result = {}
subgroup = {}
if all(len(l) == 3 for l in lst):
for first, second, third in lst:
result.setdefault(first, []).append((second, third))
for k, v in result.items():
for item1, item2 in v:
subgroup.setdefault(item1, []).append(item2.strip())
result[k] = subgroup
subgroup = {}
else:
print("Input data must have 3 items like '111_0,3005,QWE'")
return result
with open("datafile.csv", "r", encoding="utf-8") as f:
content = f.read().splitlines()
data = nested_list_to_dict([line.split(',') for line in content])
print(data)
# ... rest of your code ....
Prints
{'111_0': {'3005': ['QWE'], '3006': ['SDE', 'LFR']},
'111_1': {'3005': ['QWE'], '5345': ['JTR']},
'112_0': {'3103': ['JPP'], '3343': ['PDK']},
'113_0': {'2137': ['TRE', 'OMG']}}

Filter unique lines from a text file in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to print the unique lines present within the text file.
For example: if the content of my text file is:
12345
12345
12474
54675
35949
35949
74564
I want my Python program to print:
12474
54675
74564
I'm using Python 2.7.
try this:
from collections import OrderedDict
seen = OrderedDict()
for line in open('file.txt'):
line = line.strip()
seen[line] = seen.get(line, 0) + 1
print("\n".join([k for k,v in seen.items() if v == 1]))
prints
12474
54675
74564
Update: thanks to the comments below, this is even nicer:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
with open('file.txt') as f:
seen = OrderedCounter([line.strip() for line in f])
print("\n".join([k for k,v in seen.items() if v == 1]))
You may use OrderedDict and Counter for removing the duplicates and maintaining order as:
from collections import OrderedDict, Counter
class OrderedCounter(Counter, OrderedDict):
pass
with open('/tmp/hello.txt') as f:
ordered_counter = OrderedCounter(f.readlines())
new_list = [k.strip() for k, v in ordered_counter.items() if v==1]
# ['12474', '54675', '74564']
Use count() to check the number of occurrences of each element in the list, and remove each occurrence using index() in a for loop:
with open("file.txt","r")as f:
data=f.readlines()
for x in data:
if data.count(x)>1: #if item is a duplicate
for i in range(data.count(x)):
data.pop(data.index(x)) #find indexes of duplicates, and remove them
with open("file.txt","w")as f:
f.write("".join(data)) #write data back to file as string
file.txt:
12474
54675
74564
Not the most efficient since it uses count but simple:
with open("input.txt") as f:
orig = list(f)
filtered = [x for x in orig if orig.count(x)==1]
print("".join(filtered))
convert the file to a list of lines
create list comprehension: keep only lines occurring once
print the list (joining with empty string since linefeeds are still in the lines)

Getting Value error when reading file into a dictionary using python

I'm trying to read a file into a dictionary so that the key is the word and the value is the number of occurrences of the word. I have something that should work, but when I run it, it gives me a
ValueError: I/O operation on closed file.
This is what I have right now:
try:
f = open('fileText.txt', 'r+')
except:
f = open('fileText.txt', 'a')
def read_dictionary(fileName):
dict_word = {} #### creates empty dictionary
file = f.read()
file = file.replace('\n', ' ').rstrip()
words = file.split(' ')
f.close()
for x in words:
if x not in result:
dict_word[x] = 1
else:
dict_word[x] += 1
print(dict_word)
print read_dictionary(f)
It is because file was opened in write mode. Write mode is not readable.
Try this:
with open('fileText.txt', 'r') as f:
file = f.read()
Use a context manager to avoid manually keeping track of which files are open. Additionally, you had some mistakes involving using the wrong variable name. I've used a defaultdict below to simplify the code, but it isn't really necessary.
from collections import defaultdict
def read_dict(filename):
with open(filename) as f:
d = defaultdict(int)
words = f.read().split() #splits on both spaces and newlines by default
for word in words:
d[word] += 1
return d

How to append from file into list in Python?

I have a sample file called 'scores.txt' which holds the following values:
10,0,6,3,7,4
I want to be able to somehow take each value from the line, and append it to a list so that it becomes sampleList = [10,0,6,3,7,4].
I have tried doing this using the following code below,
score_list = []
opener = open('scores.txt','r')
for i in opener:
score_list.append(i)
print (score_list)
which partially works, but for some reason, it doesn't do it properly. It just sticks all the values into one index instead of separate indexes. How can I make it so all the values get put into their own separate index?
You have CSV data (comma separated). Easiest is to use the csv module:
import csv
all_values = []
with open('scores.txt', newline='') as infile:
reader = csv.reader(infile)
for row in reader:
all_values.extend(row)
Otherwise, split the values. Each line you read is a string with the ',' character between the digits:
all_values = []
with open('scores.txt', newline='') as infile:
for line in infile:
all_values.extend(line.strip().split(','))
Either way, all_values ends up with a list of strings. If all your values are only consisting of digits, you could convert these to integers:
all_values.extend(map(int, row))
or
all_values.extend(map(int, line.strip().split(',')))
That is an efficient way how to do that without using any external package:
with open('tmp.txt','r') as f:
score_list = f.readline().rstrip().split(",")
# Convert to list of int
score_list = [int(v) for v in score_list]
print score_list
Just use split on comma on each line and add the returned list to your score_list, like below:
opener = open('scores.txt','r')
score_list = []
for line in opener:
score_list.extend(map(int,line.rstrip().split(',')))
print( score_list )

Import dictionary from txt file

I have a function CalcPearson that needs 2 dictionaries as input. The dictionaries are in txt files in the following format:
(22, 0.4271125909116274)
(14, 0.4212051728881959)
(3, 0.4144765342960289)
(26, 0.41114433561925906)
(39, 0.41043882384484764)
.....
How can I import the data from the files as dictionaries? Do I need to modify them or there is a simple function for this?
I tried with this code:
inf = open('Route_cc.txt','r')
inf2 = open('Route_h2.txt','r')
d1 = eval(inf.read())
d2 = eval(inf2.read())
print(calcPearson(d1,d2))
inf.close()
But I got an invalid syntax error at the second row of the first file that the code opened so I think I need a particular syntax in the file.
If you're certain that you are looking for a dictionary, you can use something like this:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
for line in range(content):
content[line] = content[line].strip('(').strip(')')
content[line] = content[line].split(', ')
inf_dict = dict(content)
Or more condensed:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
inf_dict = dict(i.strip('(').strip(')').split(', ') for i in content)
Another option:
import re
inf = open('Route_cc.txt', 'r')
content = inf.read()
inf_dict = dict(i.split(', ') for i in re.findall("[^()\n-]+", content))
Note: Your original use of eval is unsafe and a poor practice.
Since you've mentioned that your dictionaries are in txt files, you'll have to tokenize your input by splitting into key/value pairs.
Read the file, line by line.
Remove the leading and trailing braces.
Split the stripped line using a comma as a delimiter.
Add each line to your dictionary.
I've written this code, and tested it for the sample input you have given. Have a look.
import collections
def addToDictionary(dict, key, value):
if key in dict:
print("Key already exists")
else:
dict[key] = value
def displayDictionary(dict):
dict = collections.OrderedDict(sorted(dict.items()))
for k, v in dict.items():
print(k, v)
filename = "dictionary1.txt"
f = open(filename, 'r')
dict1 = {}
for line in f:
line = line.lstrip('(')
line = line.rstrip(')\n')
tokenizedLine = line.split(', ')
addToDictionary(dict1, tokenizedLine[0], tokenizedLine[1])
displayDictionary(dict1)
Don't use eval it is dangerous (see the dangers of eval). Instead, use ast.literal_eval.
You can't create a dictionary directly from an input as you have given it. You have to go through the lines, one by one, convert them into a zip object and add it to a dictionary.
This process is shown below.
Code:
import ast
inf = open('Route_cc.txt','r')
d1 = {}
for line in inf:
zipi = ast.literal_eval(line)
d1[zipi[0]] = zipi[1]
inf2 = open('Route_h2.txt','r')
d2 = {}
for line1 in inf2:
zipi1 = ast.literal_eval(line1)
d2[zipi1[0]] = zipi1[1]
print(calcPearson(d1, d2))
inf.close()
inf2.close()

Categories