How do I import this file of data in Python?

How do I import this file of data in Python? - python

So I just started with Python and I have no idea how to import the text file the way I want it to.
This is my code so far:
f = open("Data.txt", "r")
attributes = f.readlines()
w1 = 2 * np.random.random((1,19)) -1
def sigmoid(x):
return 1 / (1 + np.exp(-x))
outputs = sigmoid(np.dot(attributes, w1))
So the problem here is that I get the error message:
Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'.
I know that the problem is that the code does not read the text file as an array of numbers which is why I get the error.
This is one line in the text file:
1,1,22,22,22,19,18,14,49.895756,17.775994,5.27092,0.771761,0.018632,0.006864,0.003923,0.003923,0.486903,0.100025,1,0

I think what you want to do is something like the following:
import numpy as np
f = open("Data.txt", "r")
# Read the file, replace newline by "", split the string on commas
attributes = f.read().replace("\n", "").split(",")
# convert every element from string to float
attributes = np.array([float(element) for element in attributes])
# 20 random values in [-1,1)
w1 = 2 * np.random.random(20) - 1
def sigmoid(x):
return 1 / (1 + np.exp(-x))
outputs = sigmoid(np.dot(attributes, w1))
# Printing results
print(attributes)
print(w1)
print(outputs)

You're almost there :
import numpy as np
with open("Data.txt", "r") as f:
attributes = f.readlines()
attributes = [row.strip().split(",") for row in attributes]
attributes = [[float(x) for x in row] for row in attributes]
w1 = 2 * np.random.random((1,20)) -1
def sigmoid(x):
return 1 / (1 + np.exp(-x))
outputs = sigmoid(np.dot(np.array(attributes), w1.T))
You needed something like "list comprehension" to split values on each row (if there are more than one). Also, you were missing one element in w1 to match the dimension of attributes. And finally, you will have to use the transposition of w1 to make the production using np.dot (np.array(attributes)'s shape is (1,20) as well as w1's.
Also : always remember to use this "with" statement to open files (it automatically close the file after ending the statement ; otherwise, you might get into some trouble...)
Edit
If your real dataset contains more data, you should consider the use of pandas, which will be much more efficient :
import pandas as pd
df = pd.read_csv("Data.txt"), sep=",", header=None)
numpy_array = df.values

Related

Python too many values to unpack

so I want to find distance between two xyz file coordinate using python,
I was given this for example
0.215822, -1.395942, -1.976109
0.648518, -0.493053, -2.101929
In python, I wrote,
f = open('First_Ten_phenolMeNH3+.txt', 'r')
lines = f.readlines()
a = lines[0] #(0.215822, -1.395942, -1.976109)
(x1,y1,z1) = a
b = lines[1] #(0.648518, -0.493053, -2.101929)
(x2,y2,z2) = b
distance = math.sqrt((x1-x2)**2 + (y1-y2)**2 + (z1-z2)**2)
print(distance)
f.close()
I want to use tuple unpacking and to calucalte distance, however, I keep getting too many values to unpack (expected 3), I think it is because of the txt file, Is there any better way I can do it ?? The problem is i need have 5000 coordinate file to sort through, so it will be inefficient to plug in coordinate one by one.
Thank you

You could use this:
from scipy.spatial import distance
from ast import literal_eval
a = literal_eval(lines[0]) #(0.215822, -1.395942, -1.976109)
b = literal_eval(lines[1]) #(0.648518, -0.493053, -2.101929)
dst = distance.euclidean(a, b) #1.009091198622305

you are trying to assign a string to a tuple. try using split() first:
a = lines[0].split() #(0.215822, -1.395942, -1.976109)
(x1,y1,z1) = a
also, this will put strings into x1,x2,x3, you need to convert them to float:
x1 = float(x1)
x2 = float(x2)
x3 = float(x3)

line[0] is a string, as it in a text file. You need to convert it into tuple of floats
x1, y1, z1 = (float(value) for value in line[1:-1].split(', '))

Write data to file

I am trying to write a matrix (<type 'numpy.ndarray'>) in a file, with this format:
index_of_row # v0, v1, v2
which will be read by my partner's Scala code (if that matters).
After reading this, I ended up with this code:
print dense_reduced
# this will give an error:
#TypeError: expected a single-segment buffer object
#f = open('dense.txt','w')
#f.write(dense_reduced[0])
#f.close()
numpy.savetxt('dense.txt', dense_reduced, delimiter=", ", fmt="%s")
which outputs:
[[-0.17033304 0.13854157 0.22427917]
..
[-0.15361054 0.38628932 0.05236084]]
and the dense.txt is:
-0.170333043895, 0.138541569519, 0.224279174382
...
However, there are several reasons I need the dense.txt to look like this (index of row of the matrix # values separated by comma):
0 # -0.17033304, 0.13854157, 0.22427917
...
How to proceed?

With savetext() options :
u = dense_reduced
w = np.hstack((np.arange(u.shape[0]).reshape(-1,1),u))
np.savetxt('dense.txt', w, fmt=["%i #"]+ ["%.10s, "]*(u.shape[1]-1)+["%.10s"])
for :
0 # 0.57105063, 0.70274226, 0.87870916
1 # 0.28735507, 0.94860021, 0.63763897
2 # 0.26302099, 0.26609319, 0.75001683
3 # 0.93315750, 0.19700358, 0.13632004
You can also simplify with w=pd.DataFrame(u).reset_index() if you have pandas.

There are several options that you can provide in numpy.savetxt (such as comments, delimiter, etc) but I don't believe you can do it in this way. A multidimensional np array can be used as an iterable of smaller arrays, so we can easily run:
my_array = np.array(range(20)).reshape((4,5))
f = open("output.txt", "w")
for i, a in enumerate(my_array):
f.write("{} # {}".format(i, ', '.join(list(map(str, a)))))
f.close()

How to write .csv file in Python?

I am running the following: output.to_csv("hi.csv") where output is a pandas dataframe.
My variables all have values but when I run this in iPython, no file is created. What should I do?

Better give the complete path for your output csv file. May be that you are checking in a wrong folder.

You have to make sure that your 'to_csv' method of 'output' object has a write-file function implemented.
And there is a lib for csv manipulation in python, so you dont need to handle all the work:
https://docs.python.org/2/library/csv.html

I'm not sure if this will be useful to you, but I write to CSV files frequenly in python. Here is an example generating random vectors (X, V, Z) values and writing them to a CSV, using the CSV module. (The paths are os paths are for OSX but you should get the idea even on a different os.
Working Writing Python to CSV example
import os, csv, random
# Generates random vectors and writes them to a CSV file
WriteFile = True # Write CSV file if true - useful for testing
CSVFileName = "DataOutput.csv"
CSVfile = open(os.path.join('/Users/Si/Desktop/', CSVFileName), 'w')
def genlist():
# Generates a list of random vectors
global v, ListLength
ListLength = 25 #Amount of vectors to be produced
Max = 100 #Maximum range value
x = [] #Empty x vector list
y = [] #Empty y vector list
z = [] #Empty x vector list
v = [] #Empty xyz vector list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max)) #Generate random number
x.append(rnd) #Add it to x list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max))
y.append(rnd) #Add it to y list
for i in xrange (ListLength):
rnd = random.randrange(0,(Max)) #Generate random number
z.append(rnd) #Add it to z list
for i in xrange (ListLength):
merge = x[i], y[i],z[i] # Merge x[i], y[i], x[i]
v.append(merge) #Add merged list into v list
def writeCSV():
# Write Vectors to CSV file
wr = csv.writer(CSVfile, quoting = csv.QUOTE_MINIMAL, dialect='excel')
wr.writerow(('Point Number', 'X Vector', 'Y Vector', 'Z Vector'))
for i in xrange (ListLength):
wr.writerow((i+1, v[i][0], v[i][1], v[i][2]))
print "Data written to", CSVfile
genlist()
if WriteFile is True:
writeCSV()
Hopefully there is something useful in here for you!

How do I get a text output from a string created from an array to remain unshortened?

Python/Numpy Problem. Final year Physics undergrad... I have a small piece of code that creates an array (essentially an n×n matrix) from a formula. I reshape the array to a single column of values, create a string from that, format it to remove extraneous brackets etc, then output the result to a text file saved in the user's Documents directory, which is then used by another piece of software. The trouble is above a certain value for "n" the output gives me only the first and last three values, with "...," in between. I think that Python is automatically abridging the final result to save time and resources, but I need all those values in the final text file, regardless of how long it takes to process, and I can't for the life of me find how to stop it doing it. Relevant code copied beneath...
import numpy as np; import os.path ; import os
'''
Create a single column matrix in text format from Gaussian Eqn.
'''
save_path = os.path.join(os.path.expandvars("%userprofile%"),"Documents")
name_of_file = 'outputfile' #<---- change this as required.
completeName = os.path.join(save_path, name_of_file+".txt")
matsize = 32
def gaussf(x,y): #defining gaussian but can be any f(x,y)
pisig = 1/(np.sqrt(2*np.pi) * matsize) #first term
sumxy = (-(x**2 + y**2)) #sum of squares term
expden = (2 * (matsize/1.0)**2) # 2 sigma squared
expn = pisig * np.exp(sumxy/expden) # and put it all together
return expn
matrix = [[ gaussf(x,y) ]\
for x in range(-matsize/2, matsize/2)\
for y in range(-matsize/2, matsize/2)]
zmatrix = np.reshape(matrix, (matsize*matsize, 1))column
string2 = (str(zmatrix).replace('[','').replace(']','').replace(' ', ''))
zbfile = open(completeName, "w")
zbfile.write(string2)
zbfile.close()
print completeName
num_lines = sum(1 for line in open(completeName))
print num_lines
Any help would be greatly appreciated!

Generally you should iterate over the array/list if you just want to write the contents.
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
with open(completeName, "w") as zbfile: # with closes your files automatically
for row in zmatrix:
zbfile.writelines(map(str, row))
zbfile.write("\n")
Output:
0.00970926751178
0.00985735189176
0.00999792646484
0.0101306077521
0.0102550302672
0.0103708481917
0.010477736974
0.010575394844
0.0106635442315
.........................
But using numpy we simply need to use tofile:
zmatrix = np.reshape(matrix, (matsize*matsize, 1))
# pass sep or you will get binary output
zmatrix.tofile(completeName,sep="\n")
Output is in the same format as above.
Calling str on the matrix will give you similarly formatted output to what you get when you try to print so that is what you are writing to the file the formatted truncated output.
Considering you are using python2, using xrange would be more efficient that using rane which creates a list, also having multiple imports separated by colons is not recommended, you can simply:
import numpy as np, os.path, os
Also variables and function names should use underscores z_matrix,zb_file,complete_name etc..

You shouldn't need to fiddle with the string representations of numpy arrays. One way is to use tofile:
zmatrix.tofile('output.txt', sep='\n')

Python script for trasnforming ans sorting columns in ascending order, decimal cases

I wrote a script in Python removing tabs/blank spaces between two columns of strings (x,y coordinates) plus separating the columns by a comma and listing the maximum and minimum values of each column (2 values for each the x and y coordinates). E.g.:
100000.00 60000.00
200000.00 63000.00
300000.00 62000.00
400000.00 61000.00
500000.00 64000.00
became:
100000.00,60000.00
200000.00,63000.00
300000.00,62000.00
400000.00,61000.00
500000.00,64000.00
10000000 50000000 60000000 640000000
This is the code I used:
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_new.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = s.split()
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
s = input.readline()
input.close()
output.close()
I need to change the above code to also transform the coordinates from two decimal to one decimal values and each of the two new columns to be sorted in ascending order based on the values of the x coordinate (left column).
I started by writing the following but not only is it not sorting the values, it is placing the y coordinates on the left and the x on the right. In addition I don't know how to transform the decimals since the values are strings and the only function I know is using %f and that needs floats. Any suggestions to improve the code below?
import string
input = open(r'C:\coordinates.txt', 'r')
output = open(r'C:\coordinates_sorted.txt', 'wb')
s = input.readline()
while s <> '':
s = input.readline()
liste = string.split(s)
x = liste[0]
y = liste[1]
output.write(str(x) + ',' + str(y))
output.write('\n')
sorted(s, key=lambda x: x[o])
s = input.readline()
input.close()
output.close()
thanks!

First, try to format your code according to PEP8—it'll be easier to read. (I've done the cleanup in your post already).
Second, Tim is right in that you should try to learn how to write your code as (idiomatic) Python not just as if translated directly from its C equivalent.
As a starting point, I'll post your 2nd snippet here, refactored as idiomatic Python:
# there is no need to import the `string` module; `.strip()` is a built-in
# method of strings (i.e. objects of type `str`).
# read in the data as a list of pairs of raw (i.e. unparsed) coordinates in
# string form:
with open(r'C:\coordinates.txt') as in_file:
coords_raw = [line.strip().split() for line in in_file.readlines()]
# convert the raw list into a list of pairs (2-tuples) containing the parsed
# (i.e. float not string) data:
coord_pairs = [(float(x_raw), float(y_raw)) for x_raw, y_raw in coords_raw]
coord_pairs.sort() # you want to sort the entire data set, not just values on
# individual lines as in your original snippet
# build a list of all x and y values we have (this could be done in one line
# using some `zip()` hackery, but I'd like to keep it readable (for you at
# least)):
all_xs = [x for x, y in coord_pairs]
all_ys = [y for x, y in coord_pairs]
# compute min and max:
x_min, x_max = min(all_xs), max(all_xs)
y_min, y_max = min(all_ys), max(all_ys)
# NOTE: the above section performs well for small data sets; for large ones, you
# should combine the 4 lines in a single for loop so as to NOT have to read
# everything to memory and iterate over the data 6 times.
# write everything out
with open(r'C:\coordinates_sorted.txt', 'wb') as out_file:
# here, we're doing 3 things in one line:
# * iterate over all coordinate pairs and convert the pairs to the string
# form
# * join the string forms with a newline character
# * write the result of the join+iterate expression to the file
out_file.write('\n'.join('%f,%f' % (x, y) for x, y in coord_pairs))
out_file.write('\n\n')
out_file.write('%f %f %f %f' % (x_min, x_max, y_min, y_max))
with open(...) as <var_name> gives you guaranteed closing of the file handle as with try-finally; also, it's shorter than open(...) and .close() on separate lines. Also, with can be used for other purposes, but is commonly used for dealing with files. I suggest you look up how to use try-finally as well as with/context managers in Python, in addition to everything else you might have learned here.

Your code looks more like C than like Python; it is quite unidiomatic. I suggest you read the Python tutorial to find some inspiration. For example, iterating using a while loop is usually the wrong approach. The string module is deprecated for the most part, <> should be !=, you don't need to call str() on an object that's already a string...
Then, there are some errors. For example, sorted() returns a sorted version of the iterable you're passing - you need to assign that to something, or the result will be discarded. But you're calling it on a string, anyway, which won't give you the desired result. You also wrote x[o] where you clearly meant x[0].
You should be using something like this (assuming Python 2):
with open(r'C:\coordinates.txt') as infile:
values = []
for line in infile:
values.append(map(float, line.split()))
values.sort()
with open(r'C:\coordinates_sorted.txt', 'w') as outfile:
for value in values:
outfile.write("{:.1f},{:.1f}\n".format(*value))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I import this file of data in Python? - python

Related

Python too many values to unpack

Write data to file

How to write .csv file in Python?

How do I get a text output from a string created from an array to remain unshortened?

Python script for trasnforming ans sorting columns in ascending order, decimal cases

Categories

Resources