Python 3.4 pickle.load() don't work - python

EDITED
My code:
#!/usr/bin/python3
import os.path
import pickle
def read_hts(hts):
if(os.path.isfile("hts.pickle")):
p = open("hts.pickle", "rb")
hts = pickle.load(p)
p.close()
else:
f = open("hts.dat", "r")
for line in f:
spl = line.split()
val = [spl[2], spl[3], spl[4]]
hts[spl[0]+"."+spl[1]] = val
f.close()
p = open("hts.pickle", "wb")
pickle.dump(hts, p)
p.close()
def main():
hts = {}
read_hts(hts)
print(list(hts))
main()
The hts.dat file:
1 A1 1 1 6523.00
1 A2 1 2 10823.08
1 A3 1 3 8661.76
1 A4 1 4 9851.96
1 A5 1 5 6701.12
1 A6 1 6 12934.13
1 A7 1 7 11882.38
1 A8 1 8 9787.36
1 A9 1 9 10292.06
1 A10 1 10 9040.32
1 A11 1 11 12742.89
1 A12 1 12 11607.01
1 A13 1 13 13638.06
1 A14 1 14 11038.11
1 A15 1 15 11839.42
1 A16 1 16 13206.73
If there is no hts.pickle the output is:
['1.A3', '1.A7', '1.A14', '1.A11', '1.A8', '1.A16', '1.A15', '1.A12', '1.A9', '1.A4', '1.A1', '1.A6', '1.A10', '1.A2', '1.A5', '1.A13']
But if there is a hts.pickle the output is only:
[]
I don't understand why it don't restores the dictonary. EDIT: I think pickle is not the problem. It have to be a problem with the variables.

The issue is not with the pickle , when your pickle file already exists , it is creating a completely new dictionary object and returning and your are assigning this back to hts variable , this will not change the hts variable in your main() function .
In the else part you are changing the hts variable inplace hence it will reflect in the main() function.
Instead of relying on this , you should always return the variable hts from your function , and assign it back to hts in your main() function.

Related

Extract data from alternate rows with python

I want to extract the number corresponding to O2H from the following file format (The delimiter used here is space):
# Timestep No_Moles No_Specs SH2 S2H4 S4H6 S2H2 H2 S2H3 OSH2 Mo1250O3736S57H111 OSH S3H6 OH2 S3H4 O2S SH OS2H3
144500 3802 15 3639 113 1 10 18 2 7 1 3 2 1 2 1 1 1
# Timestep No_Moles No_Specs SH2 S2H4 S2H2 H2 S2H3 OSH2 Mo1250O3733S61H115 OS2H2 OSH S3H6 OS O2S2H2 OH2 S3H4 SH
149000 3801 15 3634 114 11 18 2 7 1 1 2 2 1 1 4 2 1
# Timestep No_Moles No_Specs SH2 OS2H3 S3H Mo1250O3375S605H1526 OS S2H4 O3S3H3 OSH2 OSH S2H2 H2 OH2 OS2H2 S2H O2S3H3 SH O4S4H4 OH O2S2H O6S5H3 O6S5H5 O3S4H4 O2S3H2 O3S4H3 OS3H3 O3S2H2 O4S3H4 O3S3H O6S4H5 OS4H3 O3S2H O5S4H4 OS2H O2SH2 S2H3 O4S3H3 O3S3H4 O O5S3H4 O5S3H3 OS3H4 O2S4H4 O4S4H3 O2SH O2S2H2 O5S4H5 O3S3H2 S3H6
589000 3269 48 2900 11 1 1 47 11 1 81 74 26 25 21 17 1 3 5 2 3 3 1 1 2 2 1 2 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# Timestep No_Moles No_Specs SH2 Mo1250O3034S578H1742 OH2 OSH2 O3S3H5 OS2H2 OS OSH O2S3H2 OH O3S2H2 O6S6H4 SH O2S2H2 S2H2 OS2H H2 OS2H3 O5S4H2 O7S6H5 S3H2 O2SH2 OSH3 O7S6H4 O2S2H3 O6S5H3 O2SH O4S4H O3S2H3 S2 O2S2H S5H3 O7S4H4 O3S3H OS3H OS4H O5S3H3 S3H O17S12H9 O3S3H2 O7S5H4 O4SH3 O3S2H O7S8H4 O3S3H3 O11S9H6 OS3H2 S4H2 O10S8H6 O4S3H2 O5S5H4 O6S8H4 OS2 OS3H6 S3H3
959500 3254 55 2597 1 83 119 1 46 59 172 4 3 4 1 27 7 38 6 23 3 1 2 3 5 3 1 2 1 2 1 1 6 3 1 1 2 1 1 1 1 1 3 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1
That is, all the alternate rows contain the corresponding data of its previous row.
And I want the output to look like this
1
4
21
83
How it should work:
1 (14th number on 2nd row which corresponds to 14th word of 1st row i.e. O2H)
4 (16th number on 4th row which corresponds to 16th word of 3rd row i.e. O2H)
21 (15th number on 6th row which corresponds to 15th word of 5th row i.e. O2H)
83 (6th number on 8th row which corresponds to 6th word of 7th row i.e. O2H)
I was trying to extract it using regex but couldnot do it. Can anyone please help me to extract the data?
You easily parse this to a dataframe and select the desired column to fetch the values.
Assuming your data looks like the sample you've provided, you can try the following:
import pandas as pd
with open("data.txt") as f:
lines = [line.strip() for line in f.readlines()]
header = max(lines, key=len).replace("#", "").split()
df = pd.DataFrame([line.split() for line in lines[1::2]], columns=header)
print(df["OH2"])
df.to_csv("parsed_data.csv", index=False)
Output:
0 1
1 11
2 1
3 83
Name: OH2, dtype: object
Dumping this to a .csv would yield:
i think you want OH2 and not O2H and it's a typo. Assuming this:
(1) iterate every single line
(2) take in account only even lines. ( if (line_counter % 2) == 0: continue )
(3) splitting all the spaces and using a counter variable, count the index of the OH2 in the even line. assuming it is 14 in the first line
(4) access the following line ( +1 index ) and splitting spaces of the following line, access the element at the index of the element that you find in point (3)
since you haven't post any code i assumed your problem was more about finding a way to achieve this, than coding, so i wrote you the algorithm
Thank you, everyone, for the help, I figured out the solution
i=0
j=1
with open ('input.txt','r') as fin:
with open ('output.txt','w') as fout:
for lines in fin: #Iterating over each lines
lists = lines.split() #Splits each line in list of words
try:
if i%2 == 0: #Odd lines
index_of_OH2 = lists.index('OH2')
#print(index_of_OH2)
i=i+1
if j%2 == 0: #Even lines
number_of_OH2 = lists[index_of_OH2-1]
print(number_of_OH2 + '\n')
fout.write(number_of_OH2 + '\n')
j=j+1
except:
pass
Output:
1
4
21
83
try:, except: pass added so that if OH2 is not found in that line it moves on without error

Python: Redirect various output sections to separate variables

I'm trying to create a mechanism to redirect print outputs to a number of variables. Following code simulates what I'm looking for :-
import sys
import io
class MultiOut(object):
def __init__(self, stream_out):
self.terminal = sys.stdout
self.output = stream_out
def write(self, message):
self.terminal.write(message)
self.output.write(message)
def flush(self):
self.terminal.flush()
self.output.flush()
vals = {'a1': io.StringIO(), 'a2': io.StringIO(), 'a3': io.StringIO()}
for i,val in enumerate(vals):
sys.stdout = MultiOut(vals[val])
[print(x**i, end=' ') for x in range(11)]
print("\n")
with open('temp.txt', 'w') as f:
for x in vals:
f.write(f"{x} :-\n")
f.write(vals[x].getvalue())
f.write(f"{'='*50}\n")
File Output (tmp.txt) :-
a1 :-
1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 10
0 1 4 9 16 25 36 49 64 81 100
==================================================
a2 :-
0 1 2 3 4 5 6 7 8 9 10
0 1 4 9 16 25 36 49 64 81 100
==================================================
a3 :-
0 1 4 9 16 25 36 49 64 81 100
==================================================
What I'm trying to do here is to redirect various 'sections' of the outputs to different variables of vals - a1, a2, a3 and dump all the outputs to the terminal. Strangely, each successive variable contains data starting from that point till the end. Is there a way to avoid this issue and save each section in a different variable ?
The issue is you're replacing sys.stdout with your object:
sys.stdout = MultiOut(vals[val])
and in your object init, you're setting sys.stdout as object attribute
self.terminal = sys.stdout
On the second iteration, when executing this statement
self.terminal = sys.stdout
sys.stdout is the replacement from the first iteration, which includes the first MultiOut object.
I hope this makes sense.
Rather then modifying stdout, I'd use the logging module to achieve what you want.

I want to make a1=0, a2=0.. aN=0 [duplicate]

This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 4 years ago.
I want to make a1=0, a2=0,... aN=0.
I thought using "for"
For example N=10
for i in range(0, 10):
print('a%d'%i)
but it isn't not zeros(just print).
So, I did 'a%d'%i=0. but It didn't work
How can I make that?
For printing use .format() (or f-strings on python 3.6+ :
for i in range(0, 10):
print('a{} = {}'.format(i,i)) # the 1st i is put into the 1. {}, the 2nd i is put ...
If you want to calculate with those as names, store them into a dictionary and use the values to calculate with them:
d = {}
for i in range(0, 10):
d["a{}".format(i)] = i # the nth i is put instead nth {}
print("sum a4 to a7: {} + {} + {} + {} = {}".format( # use the values stored in dict to
d["a4"], ["a5"], ["a6"], ["a7"], # calculate and print the single
d["a4"]+d["a5"]+d["a6"]+d["a7"])) # values where needed
Output:
# for loop
a0 = 0
a1 = 1
a2 = 2
a3 = 3
a4 = 4
a5 = 5
a6 = 6
a7 = 7
a8 = 8
a9 = 9
# calculation
sum a4 to a7: 4 + ['a5'] + ['a6'] + ['a7'] = 22
You can use a dictionary for that.
var_name = 'a'
for i in range(0, 10):
key = var_name + str(i) # an
new_values[key] = 0 # assign 0 to the new name
For accessing them individually,
new_values['a1']
>>> 0
or you can access them all together like this,
for k,v in new_values.items():
print(k,'=',v)
outputs:
a0 = 0
a1 = 0
a2 = 0
a3 = 0
a4 = 0
a5 = 0
a6 = 0
a7 = 0
a8 = 0
a9 = 0
Simple solution, using const value x=0, and counter i:
x = 0
for i in range(0,10):
print(f"a{i} = {x}")
output:
a0 = 0
a1 = 0
a2 = 0
a3 = 0
a4 = 0
a5 = 0
a6 = 0
a7 = 0
a8 = 0
a9 = 0

To find the difference b/w two numbers in a column of file?

Consider a input file with 5 column(0-5):
1 0 937 306 97 3
2 164472 75 17 81 3
3 197154 35268 306 97 3
4 310448 29493 64 38 1
5 310541 29063 64 38 1
6 310684 33707 64 38 1
7 319091 47451 16 41 1
8 319101 49724 16 41 1
9 324746 61578 1 5 1
10 324939 54611 1 5 1
for the second column i,e column1(0,164472,197154-----------) need to find the difference b/w numbers so that the column1 should be (0,164472-0,197154-164472,____) so (0,164472,32682..............).
And the output file must change only the column1 values all other values must remain the same as input file:
1 0 937 306 97 3
2 164472 75 17 81 3
3 32682 35268 306 97 3
4 113294 29493 64 38 1
5 93 29063 64 38 1
6 143 33707 64 38 1
7 8407 47451 16 41 1
8 10 49724 16 41 1
9 5645 61578 1 5 1
10 193 54611 1 5 1
if anyone could suggest a python code to do this it would be helpfull........
Actually i tried to append all the columns into list and find the difference of column2 and again write back to another file.But the input file i have posed is just a sample the entire input file contains 50,000 lines so my attempt failed
The attempt code i tried is as follows:
import sys
import numpy
old_stdout = sys.stdout
log_file = open("newc","a")
sys.stdout = log_file
a1 = []; a2 = []; a2f = []; v = []; a3 = []; a4 = []; a5 = []; a6 = []
with open("newfileinput",'r') as f:
for line in f:
job = map(int,line.split())
a1.append(job[0])
a3.append(job[2])
a4.append(job[3])
a5.append(job[4])
a6.append(job[5])
a2.append(job[1])
v = [a2[i+1]-a2[i] for i in range(len(a2)-1)]
print a1
print v
print a3
print a4
print a5
print a6
sys.stdout = old_stdout
log_file.close()
Now from the output file of the code "newc" which contained 6 list i wrote it in to an file one by one...Which was time consuming.... & not so efficient...
So if anyone could suggest a simpler method it will be helpful..........
Try this. let me know if any problems or if you want me to explain any of the code:
import sys
log_file = open("newc.txt","a")
this_no, prev_no = 0, 0
with open("newfileinput.txt",'r') as f:
for line in f:
row = line.split()
this_no = int(row[1])
log_file.write(line.replace(str(this_no), str(this_no - prev_no)))
prev_no = this_no
log_file.close()
don't downvote me, just for fun.
import re
from time import sleep
p = re.compile(r'\s+')
data = '''1 0 937 306 97 3
2 164472 75 17 81 3
3 197154 35268 306 97 3
4 310448 29493 64 38 1
5 310541 29063 64 38 1
6 310684 33707 64 38 1
7 319091 47451 16 41 1
8 319101 49724 16 41 1
9 324746 61578 1 5 1
10 324939 54611 1 5 1\n''' * 5000
data = data.split('\n')[0:-1]
data = [p.split(one) for one in data]
data = [map(int, one) for one in data]
def list_diff(a, b):
temp = a[:]
temp[1] = a[1] - b[1]
return temp
result = [
data[0],
]
for i, _ in enumerate(data):
if i < len(data) - 1:
result.append(list_diff(data[i+1], data[i]))
for i, one in enumerate(result):
one[0] = i+1
print one
sleep(0.1)

Comparing 2 files line by line

I have 2 file of the following form:
file1:
work1
7 8 9 10 11
1 2 3 4 5
6 7 8 9 10
file2:
work2
2 3 4 5 5
2 4 7 8 9
work1
7 8 9 10 11
1 2 4 4 5
6 7 8 9 10
work3
1 7 8 9 10
Now I want to compare to file and wherever say the header (work1) is equal..I want to compare the subsequent sections and print the line at which the difference is found. E.g.
work1 (file1)
7 8 9 10 11
1 2 3 4 5
6 7 8 9 10
work1 (file2)
7 8 9 10 11
1 2 4 4 5
6 7 8 9 10
Now I want to print the line where difference occurs i.e. "1 2 4 4 5"
For doing so I have written the following code:
with open("file1",) as r, open("file2") as w:
for line in r:
if "work1" in line:
for line1 in w:
if "work1" in line1:
print "work1"
However, from here on I am confused as to how can I read both the files parallely. Can someone please help me with this...as I am not getting after comparing "work1"'s how should I read the files parallelly
You would probably want to try out itertools module in Python.
It contains a function called izip that can do what you need, along with a function called islice. You can iterate through the second file until you hit the header you were looking for, and you could slice the header up.
Here's a bit of the code.
from itertools import *
w = open('file2')
for (i,line) in enumerate(w):
if "work1" in line:
iter2 = islice(open('file2'), i, None, 1) # Starts at the correct line
f = open('file1')
for (line1,line2) in izip(f,iter2):
print line1, line2 # Place your comparisons of the two lines here.
You're guaranteed now that on the first run through of the loop you'll get "work1" on both lines. After that you can compare. Since f is shorter than w, the iterator will exhaust itself and stop once you hit the end of f.
Hopefully I explained that well.
EDIT: Added import statement.
EDIT: We need to reopen file2. This is because iterating through iterables in Python consumes the iterable. So, we need to pass a brand new one to islice so it works!
with open('f1.csv') as f1, open('f2.csv') as f2 :
i=0
break_needed = False
while True :
r1, r2 = f1.readline(), f2.readline()
if len(r1) == 0 :
print "eof found for f1"
break_needed = True
if len(r2) == 0 :
print "eof found for f2"
break_needed = True
if break_needed :
break
i += 1
if r1 != r2 :
print " line %i"%i
print "file 1 : " + r1
print "file 2 : " + r2

Categories