Write multiple arrays with different format (string and numbers) python - python

I am very new to python and I would like to write the following (something like fprintf in matlab) I do not know why this string not working ???
Here is the code
import numpy as np
coord=np.linspace(0,10,5)
keyy=("LE")
key=np.repeat(keyy,5)
out_arr=np.array_str(key)
zip=np.array([coord,out_arr])
zzip=zip.T
print(zzip)
savefile=np.savetxt("nam.dat",zzip,fmt="%f %s")

The problem is with the following line:
out_arr=np.array_str(key)
This is converting the array ['LE' 'LE' 'LE' 'LE' 'LE'] to the string "['LE' 'LE' 'LE' 'LE' 'LE']". Note the quotes. This is no longer an array, it is a single string, and numpy interprets it as a length-1 array. You first need to drop that line:
key=np.repeat(keyy,5)
zip=np.array([coord,key])
The next problem you will run into is that this will convert the coord numbers into strings, resulting in all elements being string. This is because numpy arrays have a single, fixed type (there are exceptions but they are more complicated). And the only way to do that in this case is to make everything a string.
The simple way around this is to use an "object" array (basically the same as a cell array in python), which stores arbitrary python objects rather than fixed data:
zip=np.array([coord,out_arr], dtype='object')
However, the better solution if you can is to use pandas. Pandas is sort of like MATLAB tables, but much more powerful. It is designed for this sort of data, and has very nice functions for writing text files like you want to do here in a cleaner, more explicit way.
Also, zip is a built-in function, and it is better not to name variables the same names as built-in functions. It is allowed, but zip is an important function and you don't want to block access to it.

Related

Python np.fromfile() adding arbitrary random comma when reading from binary file

I encounter weird problem and could not solve it for days. I have created byte array that contains values from 1 to 250 and write it to binary file from C# using WriteAllBytes.
Later i read it from Python using np.fromfile(filename, dtype=np.ubyte). However, i realize this functions was adding arbitrary comma (see the image). Interestingly it is not visible in array property. And if i call numpy.array2string, comma turns '\n'. One solution is to replace comma with none, however i have very long sequences it will take forever on 100gb of data to use replace function. I also recheck the files by reading using .net Core, i'm quite sure comma is not there.
What could i be missing?
Edit:
I was trying to read all byte values to array and cast each member to or entire array to string. I found out that most reliable way to do this is:
list(map(str, (ubyte_array))
Above code returns string list that its elements without any arbitrary comma or blank space.

How to convert String (containing a table of numbers without comma delimiter) into Array in Python

I have a CSV file and I load it by "pd.read_csv". One of the columns is a variable with String datatype. But, it actually contains a table of numbers (like a 2D array) without comma delimiter.
I would like to convert it into Array. I tried "eval()" function but it gives an error (as can be seen in the following image).
If you have any idea how to solve this issue, please let me know.

Issues relating to file input

I am stuck with a problem and I would like to get input from you guys.
I am coding a Neo4J application using py2neo. I want to read a file and use that file to create the nodes and relationships
The problem I have is that the file input using code below, gives the lines back as a string.
file = "../create_db"
dbFile=open(file,'r')
And what I need is, instead of getting it back as a string, to get it raw.
At the moment the problem is that I want:
graph_db.create(node({'Id':'1', 'Description':'Computer'}))
But I get:
graph_db.create("node({'Id':'1', 'Description':'Computer'})")
Is there a way to get file input raw? Maybe an library that gives it back raw?
Thanks in advance,
Jiar
It seem your input file contains code statements (or partial code statements).
You can execute the statements using the eval builtin function and pass the results of that to the graph_db.create function.
However, you should be aware this allows arbitrary code to be executed (i.e. the input file becomes part of the executing script) and should be treated as part of the code (i.e. don't use an untrusted input file).
You could also check the ast module. Although I don't know if this will work in your case (emphasis mine):
ast.literal_eval(node_or_string)
Safely evaluate an expression node or a Unicode or Latin-1 encoded string
containing a Python expression.
The string or node provided may only consist of the following Python literal structures: strings,
numbers, tuples, lists, dicts, booleans, and None.
This can be used for safely evaluating strings containing Python expressions
from untrusted sources without the need to parse the values oneself.
So maybe if you have some control on the file to only use the dict part…
Using eval can be dangerous. Check also this question and its answers.

Using python's pack with arrays

I'm trying to use the pack function in the struct module to encode data into formats required by a network protocol. I've run into a problem in that I don't see any way to encode arrays of anything other than 8-bit characters.
For example, to encode "TEST", I can use format specifier "4s". But how do I encode an array or list of 32-bit integers or other non-string types?
Here is a concrete example. Suppose I have a function doEncode which takes an array of 32-bit values. The protocol requires a 32-bit length field, followed by the array itself. Here is what I have been able to come up with so far.
from array import *
from struct import *
def doEncode(arr):
bin=pack('>i'+len(arr)*'I',len(arr), ???)
arr=array('I',[1,2,3])
doEncode(arr)
The best I have been able to come up with is generating a format to the pack string dynamically from the length of the array. Is there some way of specifying that I have an array so I don't need to do this, like there is with a string (which e.g. would be pack('>i'+len(arr)+'s')?
Even with the above approach, I'm not sure how I would go about actually passing the elements in the array in a similar dynamic way, i.e. I can't just say , arr[0], arr[1], ... because I don't know ahead of time what the length will be.
I suppose I could just pack each individual integer in the array in a loop, and then join all the results together, but this seems like a hack. Is there some better way to do this? The array and struct modules each seem to do their own thing, but in this case what I'm trying to do is a combination of both, which neither wants to do.
data = pack('>i', len(arr)) + arr.tostring()

Limiting Numeric Digits in Python

I want to put numerics and strings into the same numpy array. However, I very rarely (difficult to replicate, but sometimes) run into an error where the numeric to string conversion results in a value that cannot back-translate into a decimal (ie, I get "9.8267567e", as opposed to "9.8267567e-5" in the array). This is causing problems after writing files. Here is an example of what I am doing (though on a much smaller scale):
import numpy as np
x = np.array(.94749128494582)
y = np.array(x, dtype='|S100')
My understanding is that this should allow 100 string characters, but sometimes I am seeing a cut-off after ~10. Is there another type that I should be assigning, or a way to limit the number of characters in my array (x)?
First of all, x = np.array(.94749128494582) may not be doing what you think because the argument passed into np.array should be some kind of sequence or something with the array interface. Perhaps you meant x = np.array([.94749128494582])?
Now, as for preserving the strings properly, you could solve this by using
y = np.array(x, dtype=object)
However, as Joe has mentioned in his comment, it's not very numpythonic and you may as well be using plain old python lists.
I would recommend to examine carefully why you seem to have this requirement to hold strings and numbers in the same array, it smells to me like you might have inappropriate data structures set up and could benefit from redesigning/refactoring. numpy arrays are for fast numerical operations, they are not really suited to be used for string manipulations or as some kind of storage/database.

Categories