When working in R, one has the ability to save the entire "workspace", variables, output, etc... to an image file using "save.image()". Is there something equivalent in Python?
Thank you.
I am not familiar with r, but pickle offers functionality to save and load (variables, objects, types, etc...) in a pickle file. In this way you can save any details needed for a later session. I'm unsure if pickle offers a specific way to save all data associated with the current session or if you would be required to manually locate and save. Hope this helps!
import pickle
my_obj = Object()
my_var = (1,"some_data")
filename = "my_dir\my_file.pickle"
with open(filename, ‘wb’) as f: #save data
pickle.dump((my_obj, my_var), f)
with open(filename, ‘rb’) as f: #load data next time
my_saved_obj, my_saved_var = pickle.load(f)
Related
I have a file in the vcf.gz format (e.g. file_name.vcf.gz) - and I need to read it somehow in Python.
I understood that first I have to decompress it and then to read it. I found this solution, but it doesn't work for me unfortunately. Even for the first line (bgzip file_name.vcf or tabix file_name.vcf.gz) it says SyntaxError: invalid syntax.
Could you help me please?
Both cyvcf and pyvcf can read vcf files, but cyvcf is much faster and is more actively maintained.
The best approach is by using programs that do this for you as mentioned by basesorbytes. However, if you want your own code you could use this approach
# Import libraries
import gzip
import pandas as pd
class ReadFile():
'''
This class read a VCF file
and does some data manipulation
the outout is the full data found
in the input of this class
the filtering process happens
in the following step
'''
def __init__(self,file_path):
'''
This is the built-in constructor method
'''
self.file_path = file_path
def load_data(self):
'''
1) Convert VCF file into data frame
Read header of the body dynamically and assign dtype
'''
# Open the VCF file and read line by line
with io.TextIOWrapper(gzip.open(self.file_path,'r')) as f:
lines =[l for l in f if not l.startswith('##')]
# Identify columns name line and save it into a dict
# with values as dtype
dinamic_header_as_key = []
for liness in f:
if liness.startswith("#CHROM"):
dinamic_header_as_key.append(liness)
# Declare dtypes
values = [str,int,str,str,str,int,str,str,str,str]
columns2detype = dict(zip(dinamic_header_as_key,values))
vcf_df = pd.read_csv(
io.StringIO(''.join(lines)),
dtype=columns2detype,
sep='\t'
).rename(columns={'#CHROM':'CHROM'})
return vcf_df
I need to analyse a lot of CAN data and want to use python for that. I recently came across the python-can library and saw that it's possible to convert .blf to .asc files.
How do I convert .blf data of CAN to .asc using python This post helped a lot.
https://stackoverflow.com/users/13525512/tranbi Can #Tranbi or anyone else help me with some example code?
This is the part I have done till now:
import can
import os
fileList = os.listdir(".\inputFiles")
for i in range(len(fileList)):
with open(os.path.join(".\inputFiles", fileList[i]), 'rb') as f_in:
log_in = can.io.BLFReader(f_in)
with open(os.path.join(".\outputFiles", os.path.splitext(fileList[i])[0] + '.asc'), 'w') as f_out:
log_out = can.io.ASCWriter(f_out)
for msg in log_in:
log_out.on_message_received(msg)
log_out.stop()
I need to either directly read data from .blf files sequentially, or convert them to .asc, correct the timestamp using the file name, combine the files, convert them to .csv and then analyse in python. Would really help if I can get a shorter route?
I have simple elevation data from a GeoTiff fill that I've read in via Rasterio. Now I want to use Shapely's STR Tree to get intersects with other points and lines, and the most efficient way would be to store the elevation-attributed lists of geometries as pickles, and load them directly rather than loading and converting a csv of pickle of a pandas geoDataframe (or various similar options).
def openPickleFile(filePathName):
with open (filePathName, 'rb') as fp:
return pickle.load(fp)
def writePickleFile(theData,filePathName):
with open(filePathName, 'wb') as fp:
pickle.dump(theData, fp)
thisData = openPickleFile('thisDataFrame.pkl')
gridGeoms = list(thisData['geometry'])
gridValues = list(thisData['elevation'])
for index, geom in enumerate(gridGeoms):
gridGeoms[index].idx = gridValues[index]
writePickleFile(gridGeoms, 'thisGeometryList.pkl')
If I print(gridGeoms[i].idx) here I get the elevation of geometry i as desired. But if I load the 'thisGeometryList.pkl' file and do the same thing I get a 'Polygon' object has no attribute 'idx' error. I thought the pickle would store the binary data of gridGeoms that includes the added .idx attribute.
Is there some option for pickling that will save the .idx attribute?
Or is there an alternative format that will save this info and be just as efficient?
(Note: I tried joblib and it also doesn't retain the .idx data)
I want to create a .csv file to speed up the loading of the encoding file of my face recognition program using face_recognition on python.
When my algorithm detect a new face, he generate an encoding file using face_recognition and then:
with open('data.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow([ID,new_face_reco])
I do that to send the code to the .csv file. (ID is a random name I give to the face and new_face_reco is the encoding of the new face)
But I want to reopen it when i relaunch the progam so I have this at the beginning:
known_face_encodings_temp = []
known_face_names_temp = []
with open('data.csv', 'rb') as file:
data = [row for row in csv.reader(file,delimiter=',')]
known_face_names_temp.append(np.array(data[0][0]))
essai = np.array(data[0][1].replace('\n',''))
known_face_encodings_temp.append(essai.tolist())
known_face_encodings=known_face_encodings_temp
known_face_name=known_face_names_temp
I have a lot of issue (this is why they are a lot of line in this part) cause my encoding change from the .csv to the reload of it. Here is what I got:
Initial data:
array([-8.31770748e-02, ... , -3.41368467e-03])
When I try to reload my csv (without me trying to change anything):
'[-1.40143648e-01 ... -8.10057670e-02\n 3.77673171e-02 1.40102580e-02 8.14460665e-02
7.52283633e-02]'
What i do when i try to change thing:
'[-1.40143648e-01 ... 7.52283633e-02]'
I need to have my load data the same as the initial data what can I do ?
Instead of using CSV files, try using numpy (.npy) files; they're much easier to save and load. I have used them myself in one of my projects that utilizes the face_recognition module and would be happy to help you out.
To save an encoding, you can:
np.save(path to save, encoding)
To load an encoding, you can:
encodingVariable = np.load(path to load)
After looking around for about a week, I have been unable to find an answer that I can get to work. I am making an assignment manager for a project for my first year CS class. Everything else works how I'd like it to (no GUI, just text) except that I cannot save data to use each time you reopen it. Basically, I would like to save my classes dictionary:
classes = {period_1:assignment_1, period_2:assignment_2, period_3:assignment_3, period_4:assignment_4, period_5:assignment_5, period_6:assignment_6, period_7:assignment_7}
after the program closes so that I can retain the data stored in the dictionary. However, I cannot get anything I have found to work. Again, this is a beginner CS class, so I don't need anything fancy, just something basic that will work. I am using a school-licensed form of Canopy for the purposes of the class.
L3viathan's post might be direct answer to this question, but I would suggest the following for your purpose: using pickle.
import pickle
# To save a dictionary to a pickle file:
pickle.dump(classes, open("assignments.p", "wb"))
# To load from a pickle file:
classes = pickle.load(open("assignments.p", "rb"))
By this method, the variable would retain its original structure without having to write and convert to different formats manually.
Either use the csv library, or do something simple like:
with open("assignments.csv", "w") as f:
for key, value in classes.items():
f.write(key + "," + value + "\n")
Edit: Since it seems that you can't read or write files in your system, here's an alternative solution (with pickle and base85):
import pickle, base64
def save(something):
pklobj = pickle.dumps(something)
print(base64.b85encode(pklobj).decode('utf-8'))
def load():
pklobj = base64.b85decode(input("> ").encode('utf-8'))
return pickle.loads(pklobj)
To save something, you call save on your object, and copy the string that is printed to your clipboard, then you can save it in a file, for instance.
>>> save(classes) # in my case: {34: ['foo#', 3]}
fCGJT081iWaRDe;1ONa4W^ZpJaRN&NWpge
To load, you call load() and enter the string:
>>> load()
> fCGJT081iWaRDe;1ONa4W^ZpJaRN&NWpge
{34: ['foo#', 3]}
The pickle approach described by #Ébe Isaac and #L3viathan is the way to go. In case you also want to do something else with the data, you might want to consider pandas (which you should only use IF you do something else than just exporting the data).
As there are only basic strings in your dictionary according to your comment below your question, it is straightforward to use; if you have more complicated data structures, then you should use the pickle approach:
import pandas as pd
classes = {'period_1':'assignment_1', 'period_2':'assignment_2', 'period_3':'assignment_3', 'period_4':'assignment_4', 'period_5':'assignment_5', 'period_6':'assignment_6', 'period_7':'assignment_7'}
pd.DataFrame.from_dict(classes, orient='index').sort_index().rename(columns={0: 'assignments'}).to_csv('my_csv.csv')
That gives you the following output:
assignments
period_1 assignment_1
period_2 assignment_2
period_3 assignment_3
period_4 assignment_4
period_5 assignment_5
period_6 assignment_6
period_7 assignment_7
In detail:
.from_dict(classes, orient='index') creates the actual dataframe using the dictionary as in input
.sort_index() sorts the index which is not sorted as you use a dictionary for the creation of the dataframe
.rename(columns={0: 'assignments'}) that just assigns a more reasonable name to your column (by default '0' is used)
.to_csv('my_csv.csv') that finally exports the dataframe to a csv
If you want to read in the file again you can do it as follows:
df2 = pd.read_csv('my_csv.csv', index_col=0)