I am trying to write a matrix (<type 'numpy.ndarray'>) in a file, with this format:
index_of_row # v0, v1, v2
which will be read by my partner's Scala code (if that matters).
After reading this, I ended up with this code:
print dense_reduced
# this will give an error:
#TypeError: expected a single-segment buffer object
#f = open('dense.txt','w')
#f.write(dense_reduced[0])
#f.close()
numpy.savetxt('dense.txt', dense_reduced, delimiter=", ", fmt="%s")
which outputs:
[[-0.17033304 0.13854157 0.22427917]
..
[-0.15361054 0.38628932 0.05236084]]
and the dense.txt is:
-0.170333043895, 0.138541569519, 0.224279174382
...
However, there are several reasons I need the dense.txt to look like this (index of row of the matrix # values separated by comma):
0 # -0.17033304, 0.13854157, 0.22427917
...
How to proceed?
With savetext() options :
u = dense_reduced
w = np.hstack((np.arange(u.shape[0]).reshape(-1,1),u))
np.savetxt('dense.txt', w, fmt=["%i #"]+ ["%.10s, "]*(u.shape[1]-1)+["%.10s"])
for :
0 # 0.57105063, 0.70274226, 0.87870916
1 # 0.28735507, 0.94860021, 0.63763897
2 # 0.26302099, 0.26609319, 0.75001683
3 # 0.93315750, 0.19700358, 0.13632004
You can also simplify with w=pd.DataFrame(u).reset_index() if you have pandas.
There are several options that you can provide in numpy.savetxt (such as comments, delimiter, etc) but I don't believe you can do it in this way. A multidimensional np array can be used as an iterable of smaller arrays, so we can easily run:
my_array = np.array(range(20)).reshape((4,5))
f = open("output.txt", "w")
for i, a in enumerate(my_array):
f.write("{} # {}".format(i, ', '.join(list(map(str, a)))))
f.close()
Related
i'm just a tech newbie learning how ec cryptography works and stumbled into a problem with my python code
i'm testing basic elliptic curve operations like adding points, multiple by G etc, let's say i have
Ax = 0xbc46aa75e5948daa08123b36f2080d234aac274bf62fca8f9eb0aadf829c744a
Ay = 0xe5f28c3a044b1cac54a9b4bf719f02dfae93a0bae73832301e786104f43255a5
A = (Ax,Ay)
f = open('B-coordinates.txt', 'r')
data = f.read()
f.close()
print (data)
B = 'data'
there B-coordinates.txt contains lines like (0xe7e6bd3424a1e92abb45846c82d570f0596850661d1c952f9fe3564567d9b9e8,0x59c9e0bba945e45f40c0aa58379a3cb6a5a2283993e90c58654af4920e37f5)
then i perform basic point addition A+B add(A,B)
so because of B = 'data' i obviously have this error
TypeError: unsupported operand type(s) for -: 'int' and 'str'
and if i add int(data) >
Error invalid literal for int() with base 10: because letters in input (i.e. in points coordinates).
so my question is, please can someone knowledgeable in python and elliptic curve calculations tell me how to add the coordinates of a point so as to bypass these int problems when extracting lines from a file into .py? I will be very grateful for the answer! I've been trying to figure out how to do it right for many hours now, and maybe just goofing off, but please I'll appreciate any hints
You can load B from B-coordinates.txt by evaluating its content as Python code:
B = eval(data)
As the code above leads to arbitrary code execution if you don't trust B-coordinates.txt content. If so, parse the hexadecimal tuple manually:
B = tuple([int(z, 16) for z in data[1:-1].split(',')])
Then to sum A and B in a pairwise manner using native Python 3 and keep a tuple, you can proceed as follows by summing unpacked coordinates (zip) for both tuples:
print(tuple([a + b for (a, b) in zip(A, B)]))
UPDATE:
Assume B-coordinates.txt looks like the following as described by OP author comment:
(0x1257e93a78a5b7d8fe0cf28ff1d8822350c778ac8a30e57d2acfc4d5fb8c192,0x1124ec11c77d356e042dad154e1116eda7cc69244f295166b54e3d341904a1a7)
(0x754e3239f325570cdbbf4a87deee8a66b7f2b33479d468fbc1a50743bf56cc18,0x673fb86e5bda30fb3cd0ed304ea49a023ee33d0197a695d0c5d98093c536683)
...
You can load the Bs from this file by doing:
f = open('B-coordinates.txt', 'r')
lines = f.read().splitlines()
f.close()
Bs = [eval(line) for line in lines]
As described above to avoid arbitrary code execution, use the following:
Bs = [tuple([int(z, 16) for z in line[1:-1].split(',')]) for line in lines]
That way you can use for instance the first B pair, by using Bs[0], defined by the first line of B-coordinates.txt that is:
(0x1257e93a78a5b7d8fe0cf28ff1d8822350c778ac8a30e57d2acfc4d5fb8c192,0x1124ec11c77d356e042dad154e1116eda7cc69244f295166b54e3d341904a1a7)
You probably dont want to set B equal to 'data' (as a string) but instead to data (as the variable)
replace B = 'data' with B = data in the last row
Your data seems to be a tuple of hex-strings.
Use int(hex_string, 16) to convert them (since hex is base 16 not 10)
EDIT based on comment:
Assuming your file looks like this:
with open("B-coordinates.txt", "r") as file:
raw = file.read()
data = [tuple(int(hex_str, 16) for hex_str in item[1:-1].split(",")) for item in raw.split("\n")]
You can then get the first Bx, By like this:
Bx, By = data[0]
I have a bit trouble with some data stored in a text file on hand for regression analysis using Python.
The data are stored in the format that look like this:
2104,3,399900 1600,3,329900 2400,3,369000 ....
I need to do some analysis like finding mean by this:
(2104+1600+...)/number of data
I think the appropriate steps is to store the data into array. But I have no idea how to store it. I think of two ways to do so. The first one is to set 3 array that stores like
a=[2104 1600 2400 ...] b=[3 3 3 ...] c=[399900 329900 36000 ...]
The second way is to store in
a=[2104 3 399900], b=[1600 3 329900] and so on.
Which one is better?
Also, how to write code that allows the data can be stored into array? I think of like this:
with open("file.txt", "r") as ins:
array = []
elt.strip(',."\'?!*:') for line in ins:
array.append(line)
Is that correct?
You could use :
with open('data.txt') as data:
substrings = data.read().split()
values = [map(int, substring.split(',')) for substring in substrings]
average = sum([a for a, b, c in values]) / float(len(values))
print average
With this data.txt, :
2104,3,399900 1600,3,329900 2400,3,369000
2105,3,399900 1601,3,329900 2401,3,369000
It outputs :
2035.16666667
Using pandas and numpy you can get the data into an array as follows:
In [37]: data = "2104,3,399900 1600,3,329900 2400,3,369000"
In [38]: d = pd.read_csv(StringIO.StringIO(data), sep=',| ', header=None, index_col=None, engine="python")
In [39]: d.values.reshape(3, d.shape[1]/3)
Out[39]:
array([[ 2104, 3, 399900],
[ 1600, 3, 329900],
[ 2400, 3, 369000]])
Instead of having multiple arrays a, b, c... you could store your data as an array of arrays (a 2 dimensional array). For example:
[[2104,3,399900],
[1600,3,329900],
[2400,3,369000]...]
This way you don't have to deal with dynamically naming your arrays. How you store your data, i.e. 3 * array of length n or n * array of length 3 is up to you. I would prefer the second way. To read the data into your array you should then use the split() function, which will split your input into an array. So in your case:
with open("file.txt", "r") as ins:
tmp = ins.read().split(" ")
array = [i.split(",") for i in tmp]
>>> array
[['2104', '3', '399900'], ['1600', '3', '329900'], ['2400', '3', '369000']]
Edit:
To find the mean e.g. for the first element in each list you could do the following:
arraymean = sum([int(i[0]) for i in array]) / len(array)
Where the 0 in i[0] specifies the first element in each list. Note that this code uses list comprehension, which you can learn more about in this post if you want to.
Also this code stores the values in the array as strings, hence the cast to int in the part to get the mean. If you want to store the data as int directly just edit the part in the file reading section:
array = [[int(j) for j in i.split(",")] for i in tmp]
This a quick solution without error checking (using a list comprehension technique, PEP202). But if your file has a consistent format you can do the following:
import numpy as np
a = np.array([np.array(i.split(",")).astype("float") for i in open("example.txt").read().split(" ")])
Should you print it:
print(a)
print("Mean of column 0: ", np.mean(a[:, 0]))
You'll obtain the following:
[[ 2.10400000e+03 3.00000000e+00 3.99900000e+05]
[ 1.60000000e+03 3.00000000e+00 3.29900000e+05]
[ 2.40000000e+03 3.00000000e+00 3.69000000e+05]]
Mean of column 0: 2034.66666667
Notice how, in the code snippet, specified the "," as separator inside triplet, and the space " " as separator between triplets. This is the exact contents of the file I used as an example:
2104,3,399900 1600,3,329900 2400,3,369000
I am just starting out with Python. I have some fortran and some Matlab skills, but I am by no means a coder. I need to post-process some output files.
I can't figure out how to read each value into the respective variable. The data looks something like this:
h5097600N1 2348.13 2348.35 -0.2219 20.0 -4.438
h5443200N1 2348.12 2348.36 -0.2326 20.0 -4.651
h8467200N2 2348.11 2348.39 -0.2813 20.0 -5.627
...
In my limited Matlab notation, I would like to assign the following variables of the form tN1(i,j) something like this:
tN1(1,1)=5097600; tN1(1,2)=5443200; tN2(1,3)=8467200; #time between 'h' and 'N#'
hmN1(1,1)=2348.13; hmN1(1,2)=2348.12; hmN2(1,3)=2348.11; #value in 2nd column
hsN1(1,1)=2348.35; hsN1(1,2)=2348.36; hsN2(1,3)=2348.39; #value in 3rd column
I will have about 30 sets, or tN1(1:30,1:j); hmN1(1:30,1:j);hsN1(1:30,1:j)
I know it may not seem like it, but I have been trying to figure this out for 2 days now. I am trying to learn this on my own and it seems I am missing something fundamental in my understanding of python.
I wrote a simple script which does what you asks. It creates three dictionaries, t, hm and hs. These will have keys as the N values.
import csv
import re
path = 'vector_data.txt'
# Using the <with func as obj> syntax handles the closing of the file for you.
with open(path) as in_file:
# Use the csv package to read csv files
csv_reader = csv.reader(in_file, delimiter=' ')
# Create empty dictionaries to store the values
t = dict()
hm = dict()
hs = dict()
# Iterate over all rows
for row in csv_reader:
# Get the <n> and <t_i> values by using regular expressions, only
# save the integer part (hence [1:] and [1:-1])
n = int(re.findall('N[0-9]+', row[0])[0][1:])
t_i = int(re.findall('h.+N', row[0])[0][1:-1])
# Cast the other values to float
hm_i = float(row[1])
hs_i = float(row[2])
# Try to append the values to an existing list in the dictionaries.
# If that fails, new lists is added to the dictionaries.
try:
t[n].append(t_i)
hm[n].append(hm_i)
hs[n].append(hs_i)
except KeyError:
t[n] = [t_i]
hm[n] = [hm_i]
hs[n] = [hs_i]
Output:
>> t
{1: [5097600, 5443200], 2: [8467200]}
>> hm
{1: [2348.13, 2348.12], 2: [2348.11]}
>> hn
{1: [2348.35, 2348.36], 2: [2348.39]}
(remember that Python uses zero-indexing)
Thanks for all your comments. Suggested readings led to other things which helped. Here is what I came up with:
if len(line) >= 45:
if line[0:45] == " FIT OF SIMULATED EQUIVALENTS TO OBSERVATIONS": #! indicates data to follow, after 4 lines of junk text
for i in range (0,4):
junk = file.readline()
for i in range (0,int(nobs)):
line = file.readline()
sline = line.split()
obsname.append(sline[0])
hm.append(sline[1])
hs.append(sline[2])
I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!
First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.
Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))
I need to write a series of matrices out to a plain text file from python. All my matricies are in float format so the simple
file.write() and file.writelines()
do not work. Is there a conversion method I can employ that doesn't have me looping through all the lists (matrix = list of lists in my case) converting the individual values?
I guess I should clarify, that it needn't look like a matrix, just the associated values in an easy to parse list, as I will be reading in later. All on one line may actually make this easier!
m = [[1.1, 2.1, 3.1], [4.1, 5.1, 6.1], [7.1, 8.1, 9.1]]
file.write(str(m))
If you want more control over the format of each value:
def format(value):
return "%.3f" % value
formatted = [[format(v) for v in r] for r in m]
file.write(str(formatted))
the following works for me:
with open(fname, 'w') as f:
f.writelines(','.join(str(j) for j in i) + '\n' for i in matrix)
Why not use pickle?
import cPickle as pickle
pckl_file = file("test.pckl", "w")
pickle.dump([1,2,3], pckl_file)
pckl_file.close()
for row in matrix:
file.write(" ".join(map(str,row))+"\n")
This works for me... and writes the output in matrix format
import pickle
# write object to file
a = ['hello', 'world']
pickle.dump(a, open('delme.txt', 'wb'))
# read object from file
b = pickle.load(open('delme.txt', 'rb'))
print b # ['hello', 'world']
At this point you can look at the file 'delme.txt' with vi
vi delme.txt
1 (lp0
2 S'hello'
3 p1
4 aS'world'
5 p2
6 a.