I have a function CalcPearson that needs 2 dictionaries as input. The dictionaries are in txt files in the following format:
(22, 0.4271125909116274)
(14, 0.4212051728881959)
(3, 0.4144765342960289)
(26, 0.41114433561925906)
(39, 0.41043882384484764)
.....
How can I import the data from the files as dictionaries? Do I need to modify them or there is a simple function for this?
I tried with this code:
inf = open('Route_cc.txt','r')
inf2 = open('Route_h2.txt','r')
d1 = eval(inf.read())
d2 = eval(inf2.read())
print(calcPearson(d1,d2))
inf.close()
But I got an invalid syntax error at the second row of the first file that the code opened so I think I need a particular syntax in the file.
If you're certain that you are looking for a dictionary, you can use something like this:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
for line in range(content):
content[line] = content[line].strip('(').strip(')')
content[line] = content[line].split(', ')
inf_dict = dict(content)
Or more condensed:
inf = open('Route_cc.txt', 'r')
content = inf.read().splitlines()
inf_dict = dict(i.strip('(').strip(')').split(', ') for i in content)
Another option:
import re
inf = open('Route_cc.txt', 'r')
content = inf.read()
inf_dict = dict(i.split(', ') for i in re.findall("[^()\n-]+", content))
Note: Your original use of eval is unsafe and a poor practice.
Since you've mentioned that your dictionaries are in txt files, you'll have to tokenize your input by splitting into key/value pairs.
Read the file, line by line.
Remove the leading and trailing braces.
Split the stripped line using a comma as a delimiter.
Add each line to your dictionary.
I've written this code, and tested it for the sample input you have given. Have a look.
import collections
def addToDictionary(dict, key, value):
if key in dict:
print("Key already exists")
else:
dict[key] = value
def displayDictionary(dict):
dict = collections.OrderedDict(sorted(dict.items()))
for k, v in dict.items():
print(k, v)
filename = "dictionary1.txt"
f = open(filename, 'r')
dict1 = {}
for line in f:
line = line.lstrip('(')
line = line.rstrip(')\n')
tokenizedLine = line.split(', ')
addToDictionary(dict1, tokenizedLine[0], tokenizedLine[1])
displayDictionary(dict1)
Don't use eval it is dangerous (see the dangers of eval). Instead, use ast.literal_eval.
You can't create a dictionary directly from an input as you have given it. You have to go through the lines, one by one, convert them into a zip object and add it to a dictionary.
This process is shown below.
Code:
import ast
inf = open('Route_cc.txt','r')
d1 = {}
for line in inf:
zipi = ast.literal_eval(line)
d1[zipi[0]] = zipi[1]
inf2 = open('Route_h2.txt','r')
d2 = {}
for line1 in inf2:
zipi1 = ast.literal_eval(line1)
d2[zipi1[0]] = zipi1[1]
print(calcPearson(d1, d2))
inf.close()
inf2.close()
Related
I am new in Python and I am stuck with one problem for a few days now. I made a script that:
-takes data from CSV file -sort it by same values in first column of data file
-instert sorted data in specifield line in different template text file
-save the file in as many copies as there are different values in first column from data file This picture below show how it works:
But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:
What I also need is to add somewhere separeted value of first column from data file by "_".
There is datafile:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
and there is code i made:
import shutil
with open("data.csv") as f:
contents = f.read()
contents = contents.splitlines()
values_per_baseline = dict()
for line in contents:
key = line.split(',')[0]
values = line.split(',')[1:]
if key not in values_per_baseline:
values_per_baseline[key] = []
values_per_baseline[key].append(values)
for file in values_per_baseline.keys():
x = 3
shutil.copyfile("of.txt", (f"of_%s.txt" % file))
filename = f"of_%s.txt" % file
for values in values_per_baseline[file]:
with open(filename, "r") as f:
contents = f.readlines()
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[1] +'\n')
with open(filename, "w") as f:
contents = "".join(contents)
f.write(contents)
f.close()
I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works. Any help or suggestion will be much appreciated.
You could try the following:
import csv
from collections import defaultdict
values_per_baseline = defaultdict(lambda: defaultdict(list))
with open("data.csv", "r") as file:
for key1, key2, value in csv.reader(file):
values_per_baseline[key1][key2].append(value)
x = 3
for filekey, content in values_per_baseline.items():
with open("of.txt", "r") as fin,\
open(f"of_{filekey}.txt", "w") as fout:
fout.writelines(next(fin) for _ in range(x))
for key, values in content.items():
fout.write(
f' o = {key}\n'
+ ' a = '
+ ' <COMMA> '.join(values)
+ '\n'
)
fout.writelines(fin)
The input-reading part is using the csv module from the standard library (for convenience) and a defaultdict. The file is read into a nested dictionary.
Content of datafile.csv:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
Possible solution is the following:
def nested_list_to_dict(lst):
result = {}
subgroup = {}
if all(len(l) == 3 for l in lst):
for first, second, third in lst:
result.setdefault(first, []).append((second, third))
for k, v in result.items():
for item1, item2 in v:
subgroup.setdefault(item1, []).append(item2.strip())
result[k] = subgroup
subgroup = {}
else:
print("Input data must have 3 items like '111_0,3005,QWE'")
return result
with open("datafile.csv", "r", encoding="utf-8") as f:
content = f.read().splitlines()
data = nested_list_to_dict([line.split(',') for line in content])
print(data)
# ... rest of your code ....
Prints
{'111_0': {'3005': ['QWE'], '3006': ['SDE', 'LFR']},
'111_1': {'3005': ['QWE'], '5345': ['JTR']},
'112_0': {'3103': ['JPP'], '3343': ['PDK']},
'113_0': {'2137': ['TRE', 'OMG']}}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a text file as mentioned below:
KEY,NAME,RANK,BOOKNAME,SCORE,AUTHER
123,ABCD,500,FREEDOM1,15200,PXYZ
133,EFGH,400,FREEDOM2,15300.5,XTYZ
nan,SYGH,700,FREEDOM3,15400,RYYZ
143,LKMN,800,FREEDOM4,15500.5,XYCZ
I want read this text file and create a nested dictionary which will be used in my subsequent program.
dict = {
123:{'NAME':'ABCD','RANK':500,'BOOKNAME':'FREEDOM1', 'SCORE':15200, 'AUTHER':'PXYZ'},
133:{'NAME':'EFGH','RANK':400,'BOOKNAME':'FREEDOM2', 'SCORE':15300.5, 'AUTHER':'XTYZ'},
143:{'NAME':'LKMN','RANK':800,'BOOKNAME':'FREEDOM4', 'SCORE':15500.5, 'AUTHER':'XYCZ'}
}
Note: Code should remove the rows with 'nan' KEY values
You can use csv.DictReader to create a list of OrderedDicts from your data file. Then you can rearrange and transform your data to make the nested dictionary to meet your requirements. Here is an example using dictionary comprehension.
import csv
with open('text.csv') as f:
reader = csv.DictReader(f)
result = {
int(d['KEY']):{k: int(v) if v.isdigit() else v for k, v in d.items() if k != 'KEY'}
for d in reader if d['KEY'].isdigit()}
print(result)
EDIT: If all you need is the string values as posted in Tanmay's solution then this does the same with a lot less code.
import csv
from pprint import pprint
with open('text.csv') as f:
results = {d.pop('KEY'): dict(d) for d in csv.DictReader(f)}
pprint(results)
EDIT 2: casting values
import csv
from pprint import pprint
import re
def cast_dict(d: dict):
def cast_value(value: str):
if value.isdigit():
return int(value)
elif re.match(r'^\d+\.\d+$', value):
return float(value)
return value
return {k: cast_value(v) for k, v in d.items()}
with open('text.csv') as f:
results = {int(d.pop('KEY')): cast_dict(d) for d in csv.DictReader(f) if d.get['KEY'].isdigit()}
pprint(results)
pprint(results)
You could use csv module like this. If you need to check if KEY value is number, create corresponding function:
import csv
def is_float(s):
try:
float(s)
except ValueError:
return False
return True
with open('input.csv') as f:
reader = csv.DictReader(f)
rows = list(dict(a) for a in iter(reader) if is_float(a['KEY']))
print(rows)
things that you will need to do to achieve your goal are
first you need to how to open file(assuming its .txt file containing comma separated values)
filename = "csv_data.txt"
file = open(filename, "r") #opening in read mode
line_list = []
for line in file:
print(line) #line_list.append(line.strip().split(','))
then you would want to split string(line) using ',' as delimiters for that you would have to do line.split(',') this will give you list.
line_list[0]
here you will find list of all the strings in line 1 of your text file.
okay I have decided to add code but please don't copy paste try to understand it google or go to python docs look what each inbuilt function do.
from collections import defaultdict
filename = "csv_data.txt"
file = open(filename, "r") #opening in read mode
line_list = []
output_dict = defaultdict(dict) #read about defualtdict vs dict
for line in file:
#print(line,end='')
line_list.append(line.strip().split(','))
key_names = line_list[0] #remember firstline in our file contains name of keys
#read about slicing
for line in line_list[1:]:
#print(line)
this_key = line[0]
if this_key == 'nan':
continue #we don't want to add this to our dict
else:
this_key = int(this_key)
output_dict[this_key]= defaultdict(dict)
# read about enumerate
for i,word in enumerate(line[1:], start = 1):
this_key_dict = output_dict[this_key]
if key_names[i] == 'SCORE' or key_names[i] == 'RANK':
try:
word = int(word)
except ValueError:
word = float(word)
this_key_dict[key_names[i]] = word
def nice_print(dict_d):
for i,v in dict_d.items():
print(i,v)
nice_print(output_dict)
>>> word = '7.8'
>>> float(word) if '.' in word else int(word)
7.8
>>> word = '7'
>>> float(word) if '.' in word else int(word)
7
>>>
I'm trying to read a file into a dictionary so that the key is the word and the value is the number of occurrences of the word. I have something that should work, but when I run it, it gives me a
ValueError: I/O operation on closed file.
This is what I have right now:
try:
f = open('fileText.txt', 'r+')
except:
f = open('fileText.txt', 'a')
def read_dictionary(fileName):
dict_word = {} #### creates empty dictionary
file = f.read()
file = file.replace('\n', ' ').rstrip()
words = file.split(' ')
f.close()
for x in words:
if x not in result:
dict_word[x] = 1
else:
dict_word[x] += 1
print(dict_word)
print read_dictionary(f)
It is because file was opened in write mode. Write mode is not readable.
Try this:
with open('fileText.txt', 'r') as f:
file = f.read()
Use a context manager to avoid manually keeping track of which files are open. Additionally, you had some mistakes involving using the wrong variable name. I've used a defaultdict below to simplify the code, but it isn't really necessary.
from collections import defaultdict
def read_dict(filename):
with open(filename) as f:
d = defaultdict(int)
words = f.read().split() #splits on both spaces and newlines by default
for word in words:
d[word] += 1
return d
I need to save a dictionary and then be able to read the dictionary after it's been saved.
This is what I have and it should work (i think), but i keep getting the following error when it comes to the read_dict function:
return dict(line.split() for line in x)
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Any advice?
def save_dict(dict1):
with open('save.txt', 'w') as fh:
for key in dict1.keys():
fh.write(key + '' + dictionary1[key] + '\n')
def readDB():
with open('save.txt', 'r') as fh:
return dict(new.split() for new in fh)
Unless you actually need a line-by-line list in the file, use something like json or pickle to save the dict. These formats deal with things like spaces in the key name, non-string values, non-ascii characters and such.
import json
dict1 = {'test':123}
with open('save.txt', 'w') as fh:
json.dump(dict1, fh)
with open('save.txt', 'r') as fh:
dict2 = json.load(fh)
Use space instead of empty string, otherwise str.split will return a single item list which is going to raise an error when passed to dict().
fh.write(key + ' ' + dictionary1[key] + '\n')
Or better use string formatting:
for key, val in dict1.items():
fh.write('{} {}\n'.format(key, val))
Demo:
>>> s = 'k' + '' + 'v' #WRONG
>>> s
'kv'
>>> s.split()
['kv']
>>> s = 'k' + ' ' + 'v' #RIGHT
>>> s
'k v'
>>> s.split()
['k', 'v']
You probably need to use pickle module man!
Check out this example :
## Importing
from pickle import dump
## You make the dictionary
my_dict = {'a':1 , 'b':2 , 'c':3}
## You dump the dictionary's items to a binary (.txt file for windows)
with open('the path you want to save','wb') as da_file:
dump(my_dict , da_file)
save that file as "something0.py"
## Importing
from pickle import load
## You getting the data back from file
## the variable that will get the result of load module
## will be the same type with the variable that "dumped"
## the items to that file!
with open('the file path which you will get the items from' , 'rb') as da_file:
my_dict = load(da_file)
## Print out the results
from pprint import pprint
pprint(my_dict)
save that file as "something1.py"
Now run the two modules with the same file on "with" statement,
first 0 then 1 .
And 1 will print you the same results that the 0 gave to the file!
As mentioned you should use pickle, but as a more simplified way
FileTowriteto = open("foo.txt", "wb")
import pickle
DumpingDict = {"Foo":"Foo"}
pickle.dump(DumpingDict, FileTowriteto)
Then when you want to read it you can do this
OldDict = open("foo.txt", "rb")
OldDictRecover = pickle.load(OldDict)
This should work, and if the output is binary run the str() function on it.
I have an input file in the below format:
<ftnt>
<p><su>1</su> aaaaaaaaaaa </p>
</ftnt>
...........
...........
...........
... the <su>1</su> is availabe in the .........
I need to convert this to the below format by replacing the value and deleting the whole data in ftnt tags:
"""...
...
... the aaaaaaaaaaa is available in the ..........."""
Please find the code which i have written. Initially i saved the keys & values in dictionary and tried to replace the value based on the key using grouping.
import re
dict = {}
in_file = open("in.txt", "r")
outfile = open("out.txt", "w")
File1 = in_file.read()
infile1 = File1.replace("\n", " ")
for mo in re.finditer(r'<p><su>(\d+)</su>(.*?)</p>',infile1):
dict[mo.group(1)] = mo.group(2)
subval = re.sub(r'<p><su>(\d+)</su>(.*?)</p>','',infile1)
subval = re.sub('<su>(\d+)</su>',dict[\\1], subval)
outfile.write(subval)
I tried to use dictionary in re.sub but I am getting a KeyError. I don't know why this happens could you please tell me how to use. I'd appreciate any help here.
Try using a lambda for the second argument to re.sub, rather than a string with backreferences:
subval = re.sub('<su>(\d+)</su>',lambda m:dict[m.group(1)], subval)
First off, don't name dictionaries dict or you'll destroy the dict function. Second, \\1 doesn't work outside of a string hence the syntax error. I think the best bet is to take advantage of str.format
import re
# store the substitutions
subs = {}
# read the data
in_file = open("in.txt", "r")
contents = in_file.read().replace("\n", " ")
in_file.close()
# save some regexes for later
ftnt_tag = re.compile(r'<ftnt>.*</ftnt>')
var_tag = re.compile(r'<p><su>(\d+)</su>(.*?)</p>')
# pull the ftnt tag out
ftnt = ftnt_tag.findall(contents)[0]
contents = ftnt_tag.sub('', contents)
# pull the su
for match in var_tag.finditer(ftnt):
# added s so they aren't numbers, useful for format
subs["s" + match.group(1)] = match.group(2)
# replace <su>1</su> with {s1}
contents = re.sub(r"<su>(\d+)</su>", r"{s\1}", contents)
# now that the <su> are the keys, we can just use str.format
out_file = open("out.txt", "w")
out_file.write( contents.format(**subs) )
out_file.close()