I am trying to write lists from a file and define them to separate values in a dictionary. The text file would look something like this:
[12, 13, 14]
[87, 45, 32]
...
and then the dictionary would look something like this:
{"score_set0": [12, 13, 14], "score_set1": [87, 45, 32]...}
This is the code I have get so far, but it just returns an empty dictionary
def readScoresFile(fileAddr):
dic = {}
i = 0
with open(fileAddr, "r") as f:
x = len(f.readlines())
for line in f:
dic["score_set{}".format(x[i])] = line
i += 1
return dic
I am only programming at GCSE level (UK OCR syllabus if that helps) in year 10. Thanks for any help anyone can give
Also I am trying to do this without pickle module
x = len(f.readlines()) consumed your whole file, so your subsequent loop over f is iterating an exhausted file handle, sees no remaining lines, and exists immediately.
There's zero need to pre-check the length here (and the only use you make of x is trying to index it, which makes no sense; you avoided a TypeError solely because the loop never ran), so just omit that and use enumerate to get the numbers as you go:
def readScoresFile(fileAddr):
dic = {}
with open(fileAddr, "r") as f:
for i, line in enumerate(f): # Let enumerate manage the numbering for you
dic["score_set{}".format(i)] = line # If you're on 3.6+, dic[f'score_set{i}'] = line is nicer
return dic
Note that this does not actually convert the input lines to lists of int (neither did your original code). If you want to do that, you can change:
dic[f'score_set{i}'] = line
to:
dic[f'score_set{i}'] = ast.literal_eval(line) # Add import ast to top of file
to interpret the line as a Python literal, or:
dic[f'score_set{i}'] = json.loads(line) # Add import json to top of file
to interpret each line as JSON (faster, but supports fewer Python types, and some legal Python literals are not legal JSON).
As a rule, you basically never want to use .readlines(); simply iterating over the file handle will get you the lines live and avoid a memory requirement proportionate to the size of the file. (Frankly, I'd have preferred if they'd gotten rid of it in Py3, since list(f) gets the same result if you really need it, and it doesn't create a visible method that encourages you to do "The Wrong Thing" so often).
By operating line-by-line, you eventually store all the data, but that's better than doubling the overhead by storing both the parsed data in the dict and all the string data it came from in the list.
If you're trying to turn the lines into actual Python lists, I suggest using the json module. (Another option would be ast.literal_eval, since the syntax happens to be the same in this case.)
import json
def read_scores_file(file_path):
with open(file_path) as f:
return {
f"score_set{i}": json.loads(line)
for i, line in enumerate(f)
}
Related
I have two CSV files, one of which is likely to contain a few more records that the other. I am writing a function to iterate over each and determine which records are in dump but not liar.
My code is as follows:
def update_lib(x, y):
dump = open(x, newline='')
libr = open(y, newline='')
dump_reader = csv.reader(dump)
for dump_row in dump_reader:
libr_reader = csv.reader(libr)
for libr_row in libr_reader:
if dump_row[0] == libr_row[0]:
break
I am expecting this to take the first row in dump (dump_row) and iterate over each row in library (libr_row) to see if the first elements match. If they do then I want to move to the next row in dump and if not I will do something else eventually.
My issue is that libr_reader appears to "remember" where it is and I can't get it to go back to the first row in libr, even when the break has been reached and I would therefore expect libr_reader to be re-initiated. I have even tried del libr_row and del libr_reader but this doesn't appear to make a difference. I suspect I am misunderstanding iterators, any help gratefully received.
As it's pasted in your question, you'll be creating a libr_reader object every time you iterate over a row in dump_reader.
dump_reader = csv.reader(dump)
for dump_row in dump_reader:
libr_reader = csv.reader(libr)
dump_reader here is created once. Assuming there are 10 rows from dump_reader, you will be creating 10 libr_reader instances, all from the same file handle.
Per our discussion in the comments, you're aware of that, but what you're unaware of is that the reader object is working on the same file handle and thus, is still at the same cursor.
Consider this example:
>>> import io
>>> my_file = io.StringIO("""Line 1
... Another Line
... Finally, a third line.""")
This is creating a simulated file object. Now I'll create a "LineReader" class.
>>> class LineReader:
... def __init__(self, file):
... self.file = file
... def show_me_a_line(self):
... print(self.file.readline())
...
If I use three line readers on the same file, the file still remembers its place:
>>> line_reader = LineReader(my_file)
>>> line_reader.show_me_a_line()
Line 1
>>> second_line_reader = LineReader(my_file)
>>> second_line_reader.show_me_a_line()
Another Line
>>> third_line_reader = LineReader(my_file)
>>> third_line_reader.show_me_a_line()
Finally, a third line.
To the my_file object, there's no material difference between what I just did, and doing this directly. First, I'll "reset" the file to the beginning by calling seek(0):
>>> my_file.seek(0)
0
>>> my_file.readline()
'Line 1\n'
>>> my_file.readline()
'Another Line\n'
>>> my_file.readline()
'Finally, a third line.'
There you have it.
So TL/DR: Files have cursors and remember where they are. Think of the file handle as a thing that remembers where the file is, yes, but also remembers where in the file your program is.
I want to create a dictionary with a list of values for multiple keys with a single for loop in Python3. For me, the time execution and memory footprint are of utmost importance since the file which my Python3 script is reading is rather long.
I have already tried the following simple script:
p_avg = []
p_y = []
m_avg = []
m_y = []
res_dict = {}
with open('/home/user/test', 'r') as f:
for line in f:
p_avg.append(float(line.split(" ")[5].split(":")[1]))
p_y.append(float(line.split(" ")[6].split(":")[1]))
m_avg.append(float(line.split(" ")[1].split(":")[1]))
m_avg.append(float(line.split(" ")[2].split(":")[1]))
res_dict['p_avg'] = p_avg
res_dict['p_y'] = p_y
res_dict['m_avg'] = m_avg
res_dict['m_y'] = mse_y
print(res_dict)
The format of my home/user/test file is:
n:1 m_avg:7588.39 m_y:11289.73 m_u:147.92 m_v:223.53 p_avg:9.33 p_y:7.60 p_u:26.43 p_v:24.64
n:2 m_avg:7587.60 m_y:11288.54 m_u:147.92 m_v:223.53 p_avg:9.33 p_y:7.60 p_u:26.43 p_v:24.64
n:3 m_avg:7598.56 m_y:11304.50 m_u:148.01 m_v:225.33 p_avg:9.32 p_y:7.60 p_u:26.43 p_v:24.60
.
.
.
The Python script shown above works but first it is too long and repetitive, second, I am not sure how efficient it is. I was eventually thinking to create the same with list-comprehensions. Something like that:
(res_dict['p_avg'], res_dict['p_y']) = [(float(line.split(" ")[5].split(":")[1]), float(line.split(" ")[6].split(":")[1])) for line in f]
But for all four dictionary keys. Do you think that using list comprehension could reduce the used memory footprint of the script and the speed of execution? What should be the right syntax for the list-comprehension?
[EDIT] I have changed the dict -> res_dict as it was mentioned that it is not a good practice, I have also fixed a typo, where the p_y wasn't pointing to the right value and added a print statement to print the resulting dictionary as mentioned by the other users.
You can make use of defaultdict. There is no need to split the line each time, and to make it more readable you can use a lambda to extract the fields for each item.
from collections import defaultdict
res = defaultdict(list)
with open('/home/user/test', 'r') as f:
for line in f:
items = line.split()
extract = lambda x: x.split(':')[1]
res['p_avg'].append(extract(items[5]))
res['p_y'].append(extract(items[6]))
res['m_avg'].append(extract(items[1]))
res['m_y'].append(extract(items[2]))
You can initialize your dict to contain the string/list pairs, and then append directly as you iterate through every line. Also, you don't want to keep calling split() on line on each iteration. Rather, just call once and save to a local variable and index from this variable.
# Initialize dict to contain string key and list value pairs
dictionary = {'p_avg':[],
'p_y':[],
'm_avg':[],
'm_y':[]
}
with open('/home/user/test', 'r') as f:
for line in f:
items = line.split() # store line.split() so you don't split multiple times per line
dictionary['p_avg'].append(float(items[5].split(':')[1]))
dictionary['p_y'].append(float(items[6].split(':')[1])) # I think you meant index 6 here
dictionary['m_avg'].append(float(items[1].split(':')[1]))
dictionary['m_y'].append(float(items[2].split(':')[1]))
You can just pre-define dict attributes:
d = {
'p_avg': [],
'p_y': [],
'm_avg': [],
'm_y': []
}
and then append directly to them:
with open('/home/user/test', 'r') as f:
for line in f:
splitted_line = line.split(" ")
d['p_avg'].append(float(splitted_line[5].split(":")[1]))
d['p_y'].append(float(splitted_line[5].split(":")[1]))
d['m_avg'].append(float(splitted_line[1].split(":")[1]))
d['m_avg'].append(float(splitted_line[2].split(":")[1]))
P.S. Never use variable names equal to built-in words, like dict, list etc. It can cause MANY various errors!
dict ={"Rahul":"male",
"sahana":"female"
"pavan":"male" }
in a text file we have
rahul|sharma
sahana|jacob
Pavan|bhat
in a python program we have to open the text file and read the all line and "Name" we have to match with dict what we have and make a new text file with gender..
OUTPUT SHOULD BE LIKE
rahul|sharma|male
sahana|jacob|female
Pavan|bhat|male
It would seem to me that this is roughly what you want. Note that your formatting for input and output was slightly off, but I'm pretty sure I've got it.
genders = {"rahul":"male",
"sahana":"female",
"pavan":"male" }
with open("input.txt") as in_file:
for line in in_file:
a, b = line.strip().split("|")
gen = genders[a]
print("{}|{}|{}".format(a, b, gen))
where input.txt contains
rahul|sharma
sahana|jacob
pavan|bhat
will correctly (I think) produce the output
rahul|sharma|male
sahana|jacob|female
pavan|bhat|male
I have changed all of your data to be lowercase, as with your casing, it would have been ambiguous as to how to lookup in the dictionary, and how to end up providing output (only one key was capital-cased, so I couldn't use any kind of reasonable string function to accomodate the keys as they were). I've also had to add a comma to your dictionary.
I've also renamed your dictionary - it's no longer dict, because dict is a Python builtin. It seems a bit strange to me that you will have available in your code a dictionary that can anticipate your input file, but this is what I got from the question.
To get the value for the key in a dict, the syntax is simply:
b = "Rahul"
dict = {"Rahul":"male", "Mahima":"female"}
dict[b]
This is similar to, Python creating dynamic global variable from list, but I'm still confused.
I get lots of flo data in a semi proprietary format. I've already used Python to strip the data to my needs and save the data into a json file called badactor.json and are saved in the following format:
[saddr as a integer, daddr as a integer, port, date as Julian, time as decimal number]
An arbitrary example [1053464536, 1232644361, 2222, 2014260, 15009]
I want to go through my weekly/monthly flo logs and save everything by Julian date. To start I want to go through the logs and create a list that is named according to the Julian date it happened, i.e, 2014260 and then save it to the same name 2014260.json. I have the following, but it is giving me an error:
#!/usr/bin/python
import sys
import json
import time
from datetime import datetime
import calendar
#these are varibles I've had to use throughout, kinda a boiler plate for now
x=0
templist2 = []
templist3 = []
templist4 = []
templist5 = []
bad = {}
#this is my list of "bad actors", list is in the following format
#[saddr as a integer, daddr as a integer, port, date as Julian, time as decimal number]
#or an arbitrary example [1053464536, 1232644361, 2222, 2014260, 15009]
badactor = 'badactor.json'
with open(badactor, 'r') as f1:
badact = json.load(f1)
f1.close()
for i in badact:
print i[3] #troubleshooting to verify my value is being read in
tmp = str(i[3])
print tmp#again just troubleshooting
tl=[i[0],i[4],i[1],i[2]]
bad[tmp]=bad[tmp]+tl
print bad[tmp]
Trying to create the variable is giving me the following error:
Traceback (most recent call last):
File "savetofiles.py", line 39, in <module>
bad[tmp]=bad[tmp]+tl
KeyError: '2014260'
By the time your code is executed, there is no key "2014260" in the "bad" dict.
Your problem is here:
bad[tmp]=bad[tmp]+tl
You're saying "add t1 to something that doesn't exist."
Instead, you seem to want to do:
bad[tmp]=tl
I suggest you initialize bad to be an empty collections.defaultdict instead of just regular built-in dict. i.e.
import collections
...
bad = collections.defaultdict(list)
That way, initial empty list values will be created for you automatically the first time a date key is encountered and the error you're getting from the bad[tmp]=bad[tmp]+tl statement will go away since it will effectively become bad[tmp]=list()+tl — where the list() call just creates and returns an empty list — the first time a particular date is encountered.
It's also not clear whether you really need the tmp = str(i[3]) conversion because values of any non-mutable type are valid dictionary (or defaultdict) keys, not just strings — assuming i[3] isn't a string already. Regardless, subsequent code would be more readable if you named the result something else, like julian_date = i[3] (or julian_date = str(i[3]) if the conversion really is required).
I have this class that consists of 3 functions. Each function is in charge of one part of the whole process.
.load() loads up two files, re-formats their content and writes them to two new files.
.compare() takes two files and prints out their differences in a specific format.
.final() takes the result of .compare() and creates a file for every set of values.
Please ignore the Frankenstein nature of the logic as it is not my main concern at the moment. I know it can be written a thousand times better and that's fine by me for now as i am still new to Python and programing in general. I do have some theoretical experience but very limited technical practice and that is something i am working on.
Here is the code:
from collections import defaultdict
from operator import itemgetter
from itertools import groupby
from collections import deque
import os
class avs_auto:
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
with open(fileIn1+'.txt') as fin1, open(fileIn2+'.txt') as fin2:
frame_rects = defaultdict(list)
for row in (map(str, line.split()) for line in fin1):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
for row in (map(str, line.split()) for line in fin2):
id, frame, rect = row[0], row[2], [row[3],row[4],row[5],row[6]]
frame_rects[frame].append(id)
frame_rects[frame].append(rect)
with open(fileOut1+'.txt', 'w') as fout1, open(fileOut2+'.txt', 'w') as fout2:
for frame, rects in sorted(frame_rects.iteritems()):
fout1.write('{{{}:{}}}\n'.format(frame, rects))
fout2.write('{{{}:{}}}\n'.format(frame, rects))
def compare(self, f1, f2):
with open(f1+'.txt', 'r') as fin1:
with open(f2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
diff_lines = [l.strip() for l in lines1 if l not in lines2]
diffs = defaultdict(list)
with open(f1+'x'+f2+'Result.txt', 'w') as fout:
for line in diff_lines:
d = eval(line)
for k in d:
list_ids = d[k]
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
fout.write('{0} {1} {2}\n'.format(id_, group[0], group[-1]))
def final(self):
with open('hw1load3xhw1load2Result.txt', 'r') as fin:
lines = (line.split() for line in fin)
for k, g in groupby(lines, itemgetter(0)):
fst = next(g)
lst = next(iter(deque(g, 1)), fst)
with open('final/{}.avs'.format(k), 'w') as fout:
fout.write('video0=ImageSource("MovieName\original\%06d.jpeg", {}, {}, 15)\n'.format(fst[1], lst[2]))
Now to my question, how do i make it so each of the functions passes it's output files as values to the next function and calls it?
So for an example:
running .load() should output two files, call the .compare() function passing it those two files.
Then when .compare() is done, it should pass .final() the output file and calls it.
So .final() will open whatever file is passed to it from .compare() and not "test123.txt" as it is defined above.
I hope this all makes sense. Let me know if you need clarification. Any criticism is welcome concerning the code itself. Thanks in advance.
There are a couple of ways to do this, but I would write a master function that calls the other three in sequence. Something like:
def load_and_compare(self, input_file1, input_file2, output_file1, output_file2, result_file):
self.load(input_file1, input_file2, output_file1, output_file2)
self.compare(output_file1, output_file2)
self.final(result_file)
Looking over your code, I think you have a problem in load. You only declare a single dictionary, then load the contents of both files into it and write those same contents out to two files. Because each file has the same content, compare won't do anything meaningful.
Also, do you really want to write out the file contents and then re-read it into memory? I would keep the frame definitions in memory for use in compare after loading rather than reading them back in.
I don't really see a reason for this to be a class at all rather than just a trio of functions, but maybe if you have to read multiple files with mildly varying formats you could get some benefit of using class attributes to define the format while inheriting the general logic.
Do you mean call with the name of the two files? Well you defined a class, so you can just do:
def load(self, fileIn1, fileIn2, fileOut1, fileOut2):
... // do stuff here
// when done
self.compare( fileOut1, fileOut2 )
And so on.
I might be totally off here, but why don't you do it exactly as you're saying?
Just call self.compare() out of your load() method.
You can also add return statements to load() and return a tuple with the files.
Then add a 4th method to your class, which then collects the returned files and pipes them to the compare() method.
Best Regards!
One of the more powerful aspects of Python is that you can return something called a tuple. To answer this in a more generic Python sense consider this code:
>>> def load(file1, file2):
return file1+'.txt',file2+'.txt'
>>> def convert(file1, file2):
return 'converted_'+file1,'converted_'+file2
>>> convert(*load("Java", "C#"))
('converted_Java.txt', 'converted_C#.txt')
Each function takes two named arguments, but the returned tuple of the first can be "unpacked" into the input arguments of the second by adding a * in front of it.