Reading from file into Python data structure - python

I have a file which has the following format:
[
["unique_id1", {"cc":15, "dd":30}], ["unique_id2", {"cc": 184,"dd":10}], ...
]
I want to directly read the file and put data in a Python data structure. For now, I'm processing using regular expressions. Is there a command that I'm missing to read it directly?

This file format is probably JSON from what you've shown us.
You can parse it by doing
import json
out = json.load(file_object)
Either that or its a literal
out = eval(file_object.read())
OR (Preferred)
import ast
out = ast.literal_eval(file_object.read())

You can use literal_eval
import ast
f = open('myfile.txt')
print ast.literal_eval(f.read())

You should use ast.literal_eval(), it turns strings that contain Python objects, into Python objects:
>>> import ast
>>> l = ast.literal_eval('[1, 2, 3]')
>>> type(l)
<class 'list'>
>>> l
[1, 2, 3]
So you would read the data from your file and turn it into a list:
with open('file.txt') as infile:
data = ast.literal_eval(infile.read())

Related

python: shlex splitting [duplicate]

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?
I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.
You can convert a string to a file object using io.StringIO and then pass that to the csv module:
from io import StringIO
import csv
scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
gęś,zółty,wąż,idzie,wąską,dróżką,
"""
f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\t'.join(row))
simpler version with split() on newlines:
reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
print('\t'.join(row))
Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.
On Python 2 you have to import StringIO as
from StringIO import StringIO
instead.
Simple - the csv module works with lists, too:
>>> a=["1,2,3","4,5,6"] # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]
The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says
file objects and list objects are both suitable
import csv
text = """1,2,3
a,b,c
d,e,f"""
lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
print('\t'.join(row))
Per the documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just turn your string into a single element list.
Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.
As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.
The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:
with open("some.csv", "rb") as csvFile:
for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
print row
Not a generic CSV parser but usable for simple strings with commas.
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']
To parse a CSV file:
f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed
for line in lines:
if line != "": # add other needed checks to skip titles
cols = line.split(",")
print cols
https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.
Use this to have a csv loaded into a list
import csv
csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
['2nd_line', '0']]
Here's an alternative solution:
>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]
Here's the documentation
For anyone still looking for a reliable way of converting a standard CSV str to a list[str] as well as in reverse, here are two functions I put together from some of the answers in this and other SO threads:
def to_line(row: list[str]) -> str:
with StringIO() as line:
csv.writer(line).writerow(row)
return line.getvalue().strip()
def from_line(line: str) -> list[str]:
return next(csv.reader([line]))

How to write a number as text while writing in csv file in python

import csv
a = ['679L', 'Z60', '033U', '0003']
z = csv.writer(open("test1.csv", "wb"))
z.writerow(a)
Consider the code above
Output:
676L Z60 33U 3
I need to get it in the text format itself as
676L Z60 033U 0003
How to do that.
The Python csv module does not treat strings as numbers when writing the file:
>>> import csv
>>> from StringIO import StringIO
>>> a = ['679L', 'Z60', '033U', '0003']
>>> out = StringIO()
>>> z = csv.writer(out)
>>> z.writerow(a)
>>> out.getvalue()
'679L,Z60,033U,0003\r\n'
If you are seeing 3 in some other tool when reading you need to fix that tool; Python is not at fault here.
You can instruct the csv.writer() to put quotes around anything that is not a number; this could make it clearer to whatever reads your CSV that the column is not numeric. Set quoting to csv.QUOTE_NONNUMERIC:
>>> out = StringIO()
>>> z = csv.writer(out, quoting=csv.QUOTE_NONNUMERIC)
>>> z.writerow(a)
>>> out.getvalue()
'"679L","Z60","033U","0003"\r\n'
but this won't prevent Excel from treating the column as numeric anyway.
If you are loading this into Excel then don't use the Open feature. Instead create a new empty worksheet and use the Import feature instead. This will let you designate a column as Text rather than General.

Numpy savetxt to a string

I would like to load the result of numpy.savetxt into a string. Essentially the following code without the intermediate file:
import numpy as np
def savetxts(arr):
np.savetxt('tmp', arr)
with open('tmp', 'rb') as f:
return f.read()
For Python 3.x you can use the io module:
>>> import io
>>> s = io.BytesIO()
>>> np.savetxt(s, (1, 2, 3), '%.4f')
>>> s.getvalue()
b'1.0000\n2.0000\n3.0000\n'
>>> s.getvalue().decode()
'1.0000\n2.0000\n3.0000\n'
Note: I couldn't get io.StringIO() to work. Any ideas?
You can use StringIO (or cStringIO):
This module implements a file-like class, StringIO, that reads and writes a string buffer (also known as memory files).
The description of the module says it all. Just pass an instance of StringIO to np.savetxt instead of a filename:
>>> s = StringIO.StringIO()
>>> np.savetxt(s, (1,2,3))
>>> s.getvalue()
'1.000000000000000000e+00\n2.000000000000000000e+00\n3.000000000000000000e+00\n'
>>>
Have a look at array_str or array_repr: http://docs.scipy.org/doc/numpy/reference/routines.io.html
Just requires extending previous answers with decode to UTF8 in order to generate a string. Very useful for exporting data to human readable text files.
import io
import numpy as np
s = io.BytesIO()
np.savetxt(s, np.linspace(0,10, 30).reshape(-1,3), delim=',' '%.4f')
outStr = s.getvalue().decode('UTF-8')

How to convert a list of numbers to jsonarray in Python

I have a row in following format:
row = [1L,[0.1,0.2],[[1234L,1],[134L,2]]]
Now, what I want is to write the following in the file:
[1,[0.1,0.2],[[1234,1],[134,2]]]
Basically converting above into a jsonarray?
Is there an built-in method, library, or function in Python to "dump" array into json array?
Also note that I don't want "L" to be serialized in my file.
Use the json module to produce JSON output:
import json
with open(outputfilename, 'wb') as outfile:
json.dump(row, outfile)
This writes the JSON result directly to the file (replacing any previous content if the file already existed).
If you need the JSON result string in Python itself, use json.dumps() (added s, for 'string'):
json_string = json.dumps(row)
The L is just Python syntax for a long integer value; the json library knows how to handle those values, no L will be written.
Demo string output:
>>> import json
>>> row = [1L,[0.1,0.2],[[1234L,1],[134L,2]]]
>>> json.dumps(row)
'[1, [0.1, 0.2], [[1234, 1], [134, 2]]]'
import json
row = [1L,[0.1,0.2],[[1234L,1],[134L,2]]]
row_json = json.dumps(row)

Storing a random byte string in Python

For my project, I need to be able to store random byte strings in a file and read the byte string again later. For example, I want to store randomByteString from the following code:
>>> from os import urandom
>>> randomByteString=urandom(8)
>>> randomByteString
b'zOZ\x84\xfb\xceM~'
What would be the proper way to do this?
Edit: Forgot to mention that I also want to store 'normal' string alongside the byte strings.
Code like:
>>> fh = open("e:\\test","wb")
>>> fh.write(randomByteString)
8
>>> fh.close()
Operate the file as binary mode. Also, you could do it in a better manner if the file operations are near one place (Thanks to #Blender):
>>> with open("e:\\test","wb") as fh:
fh.write(randomByteString)
Update: if you want to strong normal strings, you could encode it and then write it like:
>>> "test".encode()
b'test'
>>> fh.write("test".encode())
Here the fh means the same file handle opened previously.
Works just fine. You can't expect the output to make much sense though.
>>> import os
>>> with open("foo.txt", "wb") as fh:
... fh.write(os.urandom(8))
...
>>> fh.close()
>>> with open("foo.txt", "r") as fh:
... for line in fh.read():
... print line
...
^J^JM-/
^O
R
M-9
J
~G

Categories