Create Python array.array Object from cStringIO Object - python

I want to create an array.array object from a cStringIO object:
import cStringIO, array
s = """
<several lines of text>
"""
f = cStringIO.StringIO(s)
a = array.array('c')
a.fromfile(f, len(s))
But I get the following exception:
Traceback (most recent call last):
File "./myfile.py", line 22, in <module>
a.fromfile(f, len(s))
TypeError: arg1 must be open file
It seems like array.array() is checking the type() of the first argument, which makes it incompatible with cStringIO (and StringIO for that matter). Is there any way to make this work?

Why not use a.fromstring()? Since the StringIO buffer is entirely in memory, there is no benefit to trying to use a file api to read the bits from one memory location to another.
a = array.array('c')
a.fromstring(s)
If you are using StringIO for another reason (as a memory buffer, or as a file earlier on), then you can use StringIO's getvalue() function to get the string value.

Related

I keep getting a type error got str instead of int

import pickle
usernames_passwords = open("username_password.pck", "wb")
customer_login = []
pickle.dump(customer_login, usernames_passwords, "wb")
usernames_passwords.close()
I'm trying to dump a list of usernames and passwords into a pickle file, and I keep getting a type error. Can anyone explain what I'm doing wrong?
Traceback (most recent call last):
File "/Users/andy/PycharmProjects/python/venv/scratch.py", line 4, in <module>
pickle.dump(customer_login, usernames_passwords, "wb")
TypeError: an integer is required (got type str)
Process finished with exit code 1
From the pickle docs:
The optional protocol argument, an integer, tells the pickler to use the given protocol; supported protocols are 0 to HIGHEST_PROTOCOL. If not specified, the default is DEFAULT_PROTOCOL. If a negative number is specified, HIGHEST_PROTOCOL is selected.
So, your third argument is using the format you'd use to open a file, but pickle works differently and expects an int. See the docs here.

Python loadarff fails for string attributes

I am trying to load an arff file using Python's 'loadarff' function from scipy.io.arff. The file has string attributes and it is giving the following error.
>>> data,meta = arff.loadarff(fpath)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/home/eex608/conda3_envs/PyT3/lib/python3.6/site-packages/scipy/io/arff/arffread.py", line 805, in loadarff
return _loadarff(ofile)
File "/data/home/eex608/conda3_envs/PyT3/lib/python3.6/site-packages/scipy/io/arff/arffread.py", line 838, in _loadarff
raise NotImplementedError("String attributes not supported yet, sorry")
NotImplementedError: String attributes not supported yet, sorry
How to read the arff successfully?
Since SciPy's loadarff converts containings of arff file into NumPy array, it does not support strings as attributes.
In 2020, you can use liac-arff package.
import arff
data = arff.load(open('your_document.arff', 'r'))
However, make sure your arff document does not contain inline comments after a meaningful text.
So there won't be such inputs:
#ATTRIBUTE class {F,A,L,LF,MN,O,PE,SC,SE,US,FT,PO} %Check and make sure that FT and PO should be there
Delete or move comment to the next line.
I'd got such mistake in one document and it took some time to figure out what's wrong.

recover a pickle corrupted file after getting OSError: [Errno 28] No space left on device [duplicate]

My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.
Is it possible to partially or fully recover the data? If so, how?
Here's what I've tried:
>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
obj = pik.load()
EOFError: Ran out of input
>>>
The file is not empty:
>>> os.stat(filename).st_size
31110059
Note: all data in the dictionary was comprised of python built-in types.
The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:
import io
import pickle
# Use the pure-Python version, we can't see the internal state of the C version
pickle.Unpickler = pickle._Unpickler
import dill
if __name__ == '__main__':
obj = [1, 2, {3: 4, "5": ('6',)}]
data = dill.dumps(obj)
handle = io.BytesIO(data[:-5]) # cut it off
unpickler = dill.Unpickler(handle)
try:
unpickler.load()
except EOFError:
pass
print(unpickler.stack)
I get the following output:
[3, 4, '5', ('6',)]
The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.
I can't comment on the above answer, but to extend Blender's answer:
unpickler.metastack worked for me, dill v0.3.5.1 (though you could do it without dill, afaik). stack did exist, but was an empty list.
Also, with dill I got a UnpicklingError rather than EOFError. This could also be partly because of how my file got corrupted (ran out of disk space)

save string to a binary file in python

I would like to know a very basic thing of Python programming as I am a very basic programmer right now): how can I save a result (either a list, a string, or whatever) to a file in Python?
I've been searching a lot, but I couldn't find any good answer to this.
I was thinking about the ".write ()" method, but (for instance) it seems not working with strings, neither I know what it is supposed to do though.
So, my situation is that I have binary fils, which I would like to edit, therefore I found easy to convert them to strings, modify them, and now I'd like to save them i) back to binary files (jpegs images) and ii) in the folder I want.
How would I do that? Please I need some help.
UPDATE
Here is the script I'm trying to run:
import os, sys
newpath= r'C:/Users/Umberto/Desktop/temporary'
if not os.path.exists (newpath):
os.makedirs (newpath)
data= open ('C:/Users/Umberto/Desktop/Prove_Script/Varie/_BR_Browse.001_2065642654_1.BINARY', 'rb+')
edit_data= str (data.read () )
out_dir= os.path.join (newpath, 'feed', 'address')
data.close ()
# do my edits in a secon time...
edit_data.write (newpath)
edit_data.close ()
The error I get is:
AttributeError: 'str' object has no attribute 'write'
UPDATE_2
I tried to use pickle module to serialize my binary file, modify it and save it at the end, but still not getting it to work... This is what I've been trying so far:
import cPickle as pickle
binary= open ('C:\Users\Umberto\Desktop\Prove_Script\Varie\_BR_Browse.001_2065642654_1.BINARY', 'rb')
out= open ('C:\Users\Umberto\Desktop\Prove_Script\Varie\preview.txt', 'wb')
pickle.dump (binary, out, 1)
TypeError Traceback (most recent call last)
<ipython-input-6-981b17a6ad99> in <module>()
----> 1 pprint.pprint (pickle.dump (binary, out, 1))
C:\Python27\ArcGIS10.1\lib\copy_reg.pyc in _reduce_ex(self, proto)
68 else:
69 if base is self.__class__:
---> 70 raise TypeError, "can't pickle %s objects" % base.__name__
71 state = base(self)
72 args = (self.__class__, base, state)
TypeError: can't pickle file objects
Another thing I didn't get is that if I am supposed to create a file to poit to (in my case I had to create "out", otherwise I wouldn't have the right arguments for the pickle method) or it's not necessary.
Hope I'm getting close to the solution.
P.S.: I tried also with pickle.dumps (), not achieving a nicer result though...
If you're opening a binary file and saving another binary file you could do something like this:
with open('file.jpg', 'rb') as jpgFile:
contents = jpgFile.read()
contents = (some operations here)
with open('file2.jpg', 'wb') as jpgFile:
jpgFile.write(contents)
Some comments:
'rb' and 'wb' means read and write in binary mode respectively. More info on why 'b' is recommended when working with binary files here.
Python's with statement takes care of closing the file when exiting the block.
If you need to save lists, strings or other objects, and retrieving them later, use pickle as others pointed out.
You can use standard python module named "pickle".
You can read about it here: pickle documentation
Read and write any data structure will be very easy
pickle.dump(obj, file_handler) # for serialize object to file
pickle.load(file) # for deserialize from file
or you can serialize to string: pickle.dumps(..) and load from it: pickle.loads(...)

Creating a Python function that opens a textfile, reads it, tokenizes it, and finally runs from the command line or as a module

I have been trying to learn Python for a while now. By chance, I happened across chapter 6 of the official tutorial through a Google search link pointing
here.
When I learned, from that page, that functions were the heart of modules, and that modules could be called from the command line, I was all ears. Here's my first attempt at doing both, openbook.py
import nltk, re, pprint
from __future__ import division
def openbook(book):
file = open(book)
raw = file.read()
tokens = nltk.wordpunct_tokenize(raw)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
vocab = sorted(set(words))
return vocab
if __name__ == "__main__":
import sys
openbook(file(sys.argv[1]))
What I want is for this function to be importable as the module openbook, as well as for openbook.py to take a file from the command line and do all of those things to it.
When I run openbook.py from the command line, this happens:
gemeni#a:~/Projects-FinnegansWake$ python openbook.py vicocyclometer
Traceback (most recent call last):
File "openbook.py", line 23, in <module>
openbook(file(sys.argv[1]))
File "openbook.py", line 5, in openbook
file = open(book)
When I try using it as a module, this happens:
>>> import openbook
>>> openbook('vicocyclometer')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable
So, what can I do to fix this, and hopefully continue down the long winding path to enlightenment?
Error executing openbook.py
For the first error, you are opening the file twice:
openbook(file(sys.argv[1]))
ph0 = open(book)
Calling both file() and open() is redundant. They both do the same thing. Pick one or the other: preferably open().
open(...)
open(name[, mode[, buffering]]) → file object
Open a file using the file() type, returns a file object. This is the
preferred way to open a file.
Error importing openbook module
For the second error, you need to add the module name:
>>> import openbook
>>> openbook.openbook('vicocyclometer')
Or import the openbook() function into the global namespace:
>>> from openbook import openbook
>>> openbook('vicocyclometer')
Here are some things you need to fix:
nltk.word_tokenize will fail every time:
The function takes sentences as arguments. Make sure that you use nltk.sent_tokenize on the whole text first, so that things work correctly.
Files not being dealt with:
Only open the file once.
You're not closing the file once it's done. I recommend using Python's with statement to extract the text, as it closes things automatically: with open(book) as raw: nltk.sent_tokenize(raw) ...
Import the openbook function from the module, not just the module: from openbook import openbook.
Lastly, you could consider:
Adding things to the set with a generator expression, which will probably reduce the memory load: set(w.lower() for w in text)
Using nltk.FreqDist to generate a vocab & frequency distribution for you.
Try
from openbook import *
instead of
import openbook
OR:
import openbook
and then call it with
openbook.openbook("vicocyclometer")
In your interactive session, you're getting that error because you need to from openbook import openbook. I can't tell what happened with the command line because the line with the error got snipped. It's probably that you tried to open a file object. Try just passing the string into the openbook function directly.

Categories