cPickle.load( ) error - python

I am working with cPickle for the purpose to convert the structure data into datastream format and pass it to the library. The thing i have to do is to read file contents from manually written file name "targetstrings.txt" and convert the contents of file into that format which Netcdf library needs in the following manner,
Note: targetstrings.txt contains latin characters
op=open("targetstrings.txt",'rb')
targetStrings=cPickle.load(op)
The Netcdf library take the contents as strings.
While loading a file it stuck with the following error,
cPickle.UnpicklingError: invalid load key, 'A'.
Please tell me how can I rectify this error, I have googled around but did not find an appropriate solution.
Any suggestions,

pickle is not for reading/writing generic text files, but to serialize/deserialize Python objects to file. If you want to read text data you should use Python's usual IO functions.
with open('targetstrings.txt', 'r') as f:
fileContent = f.read()
If, as it seems, the library just wants to have a list of strings, taking each line as a list element, you just have to do:
with open('targetstrings.txt', 'r') as f:
lines=[l for l in f]
# now in lines you have the lines read from the file

As stated - Pickle is not meant to be used in this way.
If you need to manually edit complex Python objects taht are to be read and passed as Python objects to another function, there are plenty of other formats to use - for example XML, JSON, Python files themselves. Pickle uses a Python specific protocol, that while note being binary (in the version 0 of the protocol), and not changing across Python versions, is not meant for this, and is not even the recomended method to record Python objects for persistence or comunication (although it can be used for those purposes).

Related

How to convert a byte object to a list of tuples in python 3?

I have a list of tuples [(x,y,z),...,] and I want to store this list of tuples in a file. For this I chose a .txt file. I write to the file in the mode "wb" and then I close it. Later, I want to open the file in mode "rb" and convert this byte object back to a list of tuples. How would I go about this without regular expression nonsense? Is there a file type that would allow me to store this data and read it easily that I've overlooked?
The .txt extension is typically not used for binary data, as you seem to intend.
Since your data structure is not known on a byte level, it's not that simple.
If you do know your data (types and length), you could "encode" it as a binary structure with https://docs.python.org/3.4/library/struct.html and write that to a (binary) file.
Otherwise, there are many solutions to the problem of writing (structured) data to and reading data from files (that's why there are soo many file formats):
Standard library:
https://docs.python.org/3/library/fileformats.html
https://docs.python.org/3/library/persistence.html
https://docs.python.org/3/library/xml.html
https://docs.python.org/3/library/json.html
3rd party:
https://pypi.python.org/pypi/PyYAML
and other modules on https://pypi.python.org/
Related Q&A on Stackoverflow:
How to save data with Python?

how can I reliably access a single key-value pair from a JSON file that's too large to load into memory?

I am trying to retrieve the names of the people from my file. The file size is 201GB
import json
with open("D:/dns.json", "r") as fh:
for l in fh:
d = json.loads(l)
print(d["name"])
Whenever I try to run this program on windows, I encounter a Memory error, which says insufficient memory.
Is there a reliable way to parse a single key, value pair without loading the whole file? I have reading the file in chunks in mind, but I don't know how to start.
Here is sample: test.json
Every line is seperated by newline. Hope this helps.
You may want to give ijson a try : https://pypi.python.org/pypi/ijson
Unfortunately there is no guarantee that each line of a JSON file will make any sense to the parser on its own. I'm afraid JSON was never intended for multi-gigabyte data exchange, precisely because each JSON file contains an integral data structure. In the XML world people have written incremental event-driven (SAX-based) parsers. I'm not aware of such a library for JSON.

How to not have set written on my file- python 2

So I basically just want to have a list of all the pixel colour values that overlap written in a text file so I can then access them later.
The only problem is that the text file is having (set([ or whatever written with it.
Heres my code
import cv2
import numpy as np
import time
om=cv2.imread('spectrum1.png')
om=om.reshape(1,-1,3)
om_list=om.tolist()
om_tuple={tuple(item) for item in om_list[0]}
om_set=set(om_tuple)
im=cv2.imread('RGB.png')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Also, if I run this program again but with a different picture for comparison, will it append the data or overwrite it? It's kinda hard to tell when just looking at numbers.
If you replace these lines:
im=cv2.imread('RGB.png')
File= open('Weedlist', 'w')
File.write(str(ColourCount))
with:
import sys
im=cv2.imread(sys.argv[1])
open(sys.argv[1]+'Weedlist', 'w').write(str(list(ColourCount)))
you will get a new file for each input file and also you don't have to overwrite the RGB.png every time you want to try something new.
Files opened with mode 'w' will be overwritten. You can use 'a' to append.
You opened the file with the 'w' mode, write mode, which will truncate (empty) the file when you open it. Use 'a' append mode if you want data to be added to the end each time
You are writing the str() conversion of a set object to your file:
ColourCount= om_set & set(im_tuple)
File= open('Weedlist', 'w')
File.write(str(ColourCount))
Don't use str to convert the whole object; format your data to a string you find easy to read back again. You probably want to add a newline too if you want each new entry to be added on a new line. Perhaps you want to sort the data too, since a set lists items in an ordered determined by implementation details.
If comma-separated works for you, use str.join(); your set contains tuples of integer numbers, and it sounds as if you are fine with the repr() output per tuple, so we can re-use that:
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output + '\n')
I used with there to ensure that the file object is automatically closed again after you are done writing; see Understanding Python's with statement for further information on what this means.
Note that if you plan to read this data again, the above is not going to be all that efficient to parse again. You should pick a machine-readable format. If you need to communicate with an existing program, you'll need to find out what formats that program accepts.
If you are programming that other program as well, pick a format that other programming language supports. JSON is widely supported for example (use the json module and convert your set to a list first; json.dump(sorted(ColourCount), fileobj), then `fileobj.write('\n') to produce newline-separated JSON objects could do).
If that other program is coded in Python, consider using the pickle module, which writes Python objects to a file efficiently in a format the same module can load again:
with open('Weedlist', 'ab') as picklefile:
pickle.dump(ColourCount, picklefile)
and reading is as easy as:
sets = []
with open('Weedlist', 'rb') as picklefile:
while True:
try:
sets.append(pickle.load(output))
except EOFError:
break
See Saving and loading multiple objects in pickle file? as to why I use a while True loop there to load multiple entries.
How would you like the data to be written? Replace the final line by
File.write(str(list(ColourCount)))
Maybe you like that more.
If you run that program, it will overwrite the previous content of the file. If you prefer to apprend the data open the file with:
File= open('Weedlist', 'a')

Python basics - request data from API and write to a file

I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)
filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)

How to copy a JSON file in another JSON file, with Python

I want to copy the contents of a JSON file in another JSON file, with Python
Any ideas ?
Thank you :)
Given the lack of research effort, I normally wouldn't answer, but given the poor suggestions in comments, I'll bite and give a better option.
Now, this largely depends on what you mean, do you wish to overwrite the contents of one file with another, or insert? The latter can be done like so:
with open("from.json", "r") as from, open("to.json", "r") as to:
to_insert = json.load(from)
destination = json.load(to)
destination.append(to_insert) #The exact nature of this line varies. See below.
with open("to.json", "w") as to:
json.dump(to, destination)
This uses python's json module, which allows us to do this very easily.
We open the two files for reading, then open the destination file again in writing mode to truncate it and write to it.
The marked line depends on the JSON data structure, here I am appending it to the root list element (which could not exist), but you may want to place it at a particular dict key, or somesuch.
In the case of replacing the contents, it becomes easier:
with open("from.json", "r") as from, open("to.json", "w") as to:
to.write(from.read())
Here we literally just read the data out of one file and write it into the other file.
Of course, you may wish to check the data is JSON, in which case, you can use the JSON methods as in the first solution, which will throw exceptions on invalid data.
Another, arguably better, solution to this could also be shutil's copy methods, which would avoid actually reading or writing the file contents manually.
Using the with statement gives us the benefit of automatically closing our files - even if exceptions occur. It's best to always use them where we can.
Note that in versions of Python before 2.7, multiple context managers are not handled by the with statement, instead you will need to nest them:
with open("from.json", "r") as from:
with open("to.json", "r+") as to:
...

Categories