Why does pickle only reads the first line of the file? - python

I made a program that stores some data into a file with Pickle
Here is what the file looks like:
ÄX
(7836)q.ÄX
(13289)q.ÄX
(0928)q.ÄX
(26)q.ÄX
(7893)q.ÄX
(3883)q.ÄX
(1982)q.ÄX
what it is is not important
but when I try to read it with:
data = pickle.load(open("num.txt", "rb"))
print(data)
this is the output:
(7836)
while the expected result is:
(7836)
(13289)
(0928)
(26)
(7893)
(3883)
(1982)
How do I fix this?

The following is #jsbueno's answer, worked for me.To be found here +more answers.
Pickle serializes a single object at a time, and reads back a single object - the pickled data is recorded in sequence on the file.
If you simply do pickle.load you should be reading the first object serialized into the file (not the last one as you've written).
After unserializing the first object, the file-pointer is at the beggining of the next object - if you simply call pickle.load again, it will read that next object - do that until the end of the file.
objects = []
with (open("myfile", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break

Related

How to remove empty space from front of JSON object?

I am trying to process a large JSON file using the follow code:
dctx = zst.ZstdDecompressor(max_window_size=2147483648)
with open(filename+".zst", 'rb') as infile, open(outpath, 'wb') as outfile:
dctx.copy_stream(infile, outfile)
with pd.read_json(filename+".json", lines=True, chunksize=5000) as reader:
reader
# Making list of column headers
df_titles = []
for chunk in reader:
chunk_titles = list(chunk.keys())
df_titles.extend(chunk_titles)
df_titles = list(set(df_titles))
However, when I attempt to run the code, I get an error message: ValueError: Expected object or value. The file is formatted with one JSON object per line, and looking at the JSON file itself, it seems the issue is that one of the JSON objects has a bunch of empty space in front of it.
If I manually delete the 'nul' line, the file processes with no issues. However, for the sake of reproducibility, I would like to be able to address the issue from within my code itself. I'm pretty new to working in Python, and I have tried googling the issue, but solutions seem to focus on removing white space from the beginning of JSON values, rather than the start of a line in this kind of file. Is there any easy way to deal with this issue either when decompressing the initial file, or reading the decompressed file in?

how to write csv to "variable" instead of file?

I'm not sure how to word my question exactly, and I have seen some similar questions asked but not exactly what I'm trying to do. If there already is a solution please direct me to it.
Here is what I'm trying to do:
At my work, we have a few pkgs we've built to handle various data types. One I am working with is reading in a csv file into a std_io object (std_io is our all-purpose object class that reads in any type of data file).
I am trying to connect this to another pkg I am writing, so I can make an object in the new pkg, and covert it to a std_io object.
The problem is, the std_io object is meant to read an actual file, not take in an object. To get around this, I can basically write my data to temp.csv file then read it into a std_io object.
I am wondering if there is a way to eliminate this step of writing the temp.csv file.
Here is my code:
x #my object
df = x.to_df() #object class method to convert to a pandas dataframe
df.to_csv('temp.csv') #write data to a csv file
std_io_obj = std_read('temp.csv') #read csv file into a std_io object
Is there a way to basically pass what the output of writing the csv file would be directly into std_read? Does this make sense?
The only reason I want to do this is to avoid having to code additional functionality into either of the pkgs to directly accept an object as input.
Hope this was clear, and thanks to anyone who contributes.
For those interested, or who may have this same kind of issue/objective, here's what I did to solve this problem.
I basically just created a temporary named file, linked a .csv filename to this temp file, then passed it into my std_read function which requires a csv filename as an input.
This basically tricks the function into thinking it's taking the name of a real file as an input, and it just opens it as usual and uses csvreader to parse it up.
This is the code:
import tempfile
import os
x #my object I want to convert to a std_io object
text = x.to_df().to_csv() #object class method to convert to a pandas dataframe then generate the 'text' of a csv file
filename = 'temp.csv'
with tempfile.NamedTemporaryFile(dir = os.path.dirname('.')) as f:
f.write(text.encode())
os.link(f.name, filename)
stdio_obj = std_read(filename)
os.unlink(filename)
del f
FYI - the std_read function essentially just opens the file the usual way, and passes it into csvreader:
with open(filename, 'r') as f:
rdr = csv.reader(f)

I want to load a dictionary from a pickled dat file but the issue is that the first time I try to load it the pickled dictionary does not exist

I am a GCSE student and have been set a problem that requires me to save student name and test scores to a file for later retrieval and manipulation.
I have decided to store the data in a dictionary which I will pickle and retrieve as required.I know how to pickle a dictionary and retrieve a previously pickled dictionary.
The problem I have only occurs the first time the program is run as the dat file has not yet been created.
The code below opens the existing Dat file and writes the latest student name and score to the dictionary held in the file.
f = open("class1.dat","ab+")
class1 = pickle.load(f)
class1[Name]=Score
pickle.dump(class1,f)
f.close
The problem is that this works once the first score has been saved to the Dat file but I get this error message the first time the program is run.
Traceback (most recent call last):
File "C:\Python34\Latest_Version.py", line 61, in <module>
class1 = pickle.load(f)
EOFError: Ran out of input
I realise that this is because the dat file does not yet exist.
What code would check to see if the Dat file existed first?
If you can help please keep it very simple as my knowledge is limited.
The reason why this is happening is because you do not have data in your pickle file to start off with. You need to run a check to see if you have data to load to start off with. So, you can throw your code in a try/except to check if you can load the data. If you can't, write initial data (empty data) to the pickle file.
Furthermore, pay attention to explicitly setting reading and writing to the file.
import pickle
class1 = {}
try:
class1 = pickle.load(open("my_stuff.pkl", "rb"))
print(class1)
class1['bob'] = 123
pickle.dump(class1, open("my_stuff.pkl", "wb"))
except:
pickle.dump(class1, open("my_stuff.pkl", "wb"))
So, what happens here is that you first declare an empty dictionary, then you try to load the contents. If it fails, it will fall in to the except, dump the empty data. So the next time you come in, it will then load successfully, you can write your data, and then dump the pickle.

Python 2: AttributeError: 'file' object has no attribute 'strip'

I have a .txt document called new_data.txt. All data in this document separated by dots. I want to open my file inside python, split it and put inside a list.
output = open('new_data.txt', 'a')
output_list = output.strip().split('.')
But I have an error:
AttributeError: 'file' object has no attribute 'strip'
How can I fix this?
Note: My program is on Python 2
First, you want to open the file in read mode (you have it in append mode)
Then you want to read() the file:
output = open('new_data.txt', 'r') # See the r
output_list = output.read().strip().split('.')
This will get the whole content of the file.
Currently you are working with the file object (hence the error).
Update: Seems like this question has received a lot more views since its initial time. When opening files, the with ... as ... structure should be used like so:
with open('new_data.txt', 'r') as output:
output_list = output.read().strip().split('.')
The advantage of this is that there's no need to explicitly close the file, and if an error ever occurs in the control sequence, python will automatically close the file for you (instead of the file being left open after error)

pickle - putting more than 1 object in a file? [duplicate]

This question already has answers here:
Saving and loading multiple objects in pickle file?
(8 answers)
Closed 6 years ago.
I have got a method which dumps a number of pickled objects (tuples, actually) into a file.
I do not want to put them into one list, I really want to dump several times into the same file.
My problem is, how do I load the objects again?
The first and second object are just one line long, so this works with readlines.
But all the others are longer.
naturally, if I try
myob = cpickle.load(g1.readlines()[2])
where g1 is the file, I get an EOF error because my pickled object is longer than one line.
Is there a way to get just my pickled object?
If you pass the filehandle directly into pickle you can get the result you want.
import pickle
# write a file
f = open("example", "w")
pickle.dump(["hello", "world"], f)
pickle.dump([2, 3], f)
f.close()
f = open("example", "r")
value1 = pickle.load(f)
value2 = pickle.load(f)
f.close()
pickle.dump will append to the end of the file, so you can call it multiple times to write multiple values.
pickle.load will read only enough from the file to get the first value, leaving the filehandle open and pointed at the start of the next object in the file. The second call will then read the second object, and leave the file pointer at the end of the file. A third call will fail with an EOFError as you'd expect.
Although I used plain old pickle in my example, this technique works just the same with cPickle.
I think the best way is to pack your data into a single object before you store it, and unpack it after loading it. Here's an example using
a tuple as the container(you can use dict also):
a = [1,2]
b = [3,4]
with open("tmp.pickle", "wb") as f:
pickle.dump((a,b), f)
with open("tmp.pickle", "rb") as f:
a,b = pickle.load(f)
Don't try reading them back as lines of the file, justpickle.load()the number of objects you want. See my answer to the question How to save an object in Python for an example of doing that.

Categories