I want to do the following:
1- Check if a pkl file with a given name exists
2- If not, create a new file with that given name
3- Load the data into that file
if not os.path.isfile(filename):
with open(filename,"wb") as file:
pickle.dump(result, file)
else:
pickle.dump(result, open(filename,"wb") )
However, this rises an error even though I have checked the file exists (shouldnt even enter the if!!) with the given path:
Traceback (most recent call last):
with open(filename_i,"wb") as file:
IsADirectoryError: [Errno 21] Is a directory: '.'
Thanks!
You can do it like this:
import os
import pickle
if not os.path.isfile("test_pkl.pkl"):
with open("test_pkl.pkl",'wb') as file:
pickle.dump("some obejct", file)
So first it checks if file exists, if not create the file ("wb") and then dump some object to it via pickle pickle.dump
Maybe this is more clear:
Imports
import os
import pickle
Create pickle and save data
dict = { 'Test1': 1, 'Test2': 2, 'Test3': 3 }
filename = "test_pkl.pkl"
if not os.path.isfile(filename):
with open(filename,'wb') as file:
pickle.dump(dict, file)
file.close()
Opening the pickle file
infile = open(filename,'rb')
new_dict = pickle.load(infile)
infile.close()
Test the data
print(new_dict)
print(new_dict == dict)
print(type(new_dict))
Output
{'Test1': 1, 'Test2': 2, 'Test3': 3}
True
<class 'dict'>
Related
How to define folder name when saving JSON file?
I tried to add myfoldername inside open(), but did not work.
Also tried to myfoldername/myfilename in filename definition
Error:
TypeError: an integer is required (got type str)
Code:
import json
# Testing file save
dictionary_data = {"a": 1, "b": 2}
filename = "myfilename" + time.strftime("%Y%m%d-%H%M%S") + ".json"
a_file = open("myfoldername",filename, "w")
json.dump(dictionary_data, a_file)
a_file.close()
This should do the trick.
Use pathlib to manage paths
Create the parent dir if not exist with mkdir
Open the file thanks to the with statement
import json
import time
from pathlib import Path
# Testing file save
dictionary_data = {"a": 1, "b": 2}
filename = Path("myfilename") / Path(f"{time.strftime('%Y%m%d-%H%M%S')}.json")
# create the parent dir if not exist
filename.parent.mkdir(parents=True, exist_ok=True)
with open(filename, "w") as a_file:
json.dump(dictionary_data, a_file)
My goal is to serialize a dictionary object to a specific file location, and read it back each time the program is run. The following works in Python2.7, but throws an error in Python3.4. What confuses me is that this works the first time an object is saved to disk, but not on subsequent executions.
The problem seems to be that in setitem an error is thrown that says that 'ConfigDict' object has no attribute '_config_file'. Why does it not have a value any time except the first time I run the script??
import os
import pickle
class ConfigDict(dict):
def __init__(self, config_name): # Name of a pickle file within configs directory
self._config_directory = 'C:\\Users\\myfilepath\\configs'
self._config_file = self._config_directory + '\\' + config_name + '.pickle'
# If file does not exist, write a blank pickle file.
if not os.path.isfile(self._config_file):
with open(self._config_file, 'wb') as fh:
pickle.dump({}, fh)
# Read the pickle file from disk.
with open(self._config_file, 'rb') as fh:
pkl = pickle.load(fh)
self.update(pkl)
def __setitem__(self, key, value):
dict.__setitem__(self, key, value)
with open(self._config_file, 'wb') as fh:
pickle.dump(self, fh)
cc = ConfigDict('DBConfig')
print()
print()
cc['config_val_1'] = '1'
cc['config_val_3'] = '3'
print(cc['config_val_3'])
Here's the full traceback:
Traceback (most recent call last):
File "C:/Users/filepath/test.py", line 25, in <module>
cc = ConfigDict('DBConfig')
File "C:/Users/filepath/test.py", line 16, in __init__
pkl = pickle.load(fh)
File "C:/Users/filepath/test.py", line 21, in __setitem__
with open(self._config_file, 'wb') as fh:
AttributeError: 'ConfigDict' object has no attribute '_config_file'
Im trying to create a script that will read a JSON file and use the variables to select particular folders and files and save them somewhere else.
My JSON is as follows:
{
"source_type": "folder",
"tar_type": "gzip",
"tar_max_age": "10",
"source_include": {"/opt/myapp/config", "/opt/myapp/db, /opt/myapp/randomdata"}
"target_type": "tar.gzip",
"target_path": "/home/user/targetA"
}
So far, I have this Python Code:
import time
import os
import tarfile
import json
source_config = '/opt/myapp/config.JSON'
target_dir = '/home/user/targetA'
def main():
with open('source_config', "r").decode('utf-8') as f:
data = json.loads('source_config')
for f in data["source_include", str]:
full_dir = os.path.join(source, f)
tar = tarfile.open(os.path.join(backup_dir, f+ '.tar.gzip'), 'w:gz')
tar.add(full_dir)
tar.close()
for oldfile in os.listdir(backup_dir):
if str(oldfile.time) < 20:
print("int(oldfile.time)")
My traceback is:
Traceback (most recent call last):
File "/Users/user/Documents/myapp/test/point/test/Test2.py", line 16, in <module>
with open('/opt/myapp/config.json', "r").decode('utf-8') as f:
AttributeError: 'file' object has no attribute 'decode'
How do I fix this?
You are trying to call .decode() directly on the file object. You'd normally do that on the read lines instead. For JSON, however, you don't need to do this. The json library handles this for you.
Use json.load() (no s) to load directly from the file object:
with open(source_config) as f:
data = json.load(f)
Next, you need to address the source_include key with:
for entry in data["source_include"]:
base_filename = os.path.basename(entry)
tar = tarfile.open(os.path.join(backup_dir, base_filename + '.tar.gzip'), 'w:gz')
tar.add(full_dir)
tar.close()
Your JSON also needs to be fixed, so that your source_include is an array, rather than a dictionary:
{
"source_type": "folder",
"tar_type": "gzip",
"tar_max_age": "10",
"source_include": ["/opt/myapp/config", "/opt/myapp/db", "/opt/myapp/randomdata"],
"target_type": "tar.gzip",
"target_path": "/home/user/targetA"
}
Next, you loop over filenames with os.listdir(), which are strings (relative filenames with no path). Strings do not have a .time attribute, if you wanted to read out file timestamps you'll have to use os.stat() calls instead:
for filename in os.listdir(backup_dir):
path = os.path.join(backup_dir, filename)
stats = os.stat(path)
if stats.st_mtime < time.time() - 20:
# file was modified less than 20 seconds ago
Given a report (which is just a dictionary) and a filename, I want to be able to write into the supplied file name all of the contents of the report. I want to make sure I don't overwrite anything in the filename. This is what I have:
def write_report(r, filename):
input_filename=open(filename, "a")
new_report= input_filename.append(r)
filename.close()
return new_report
But I get this error when I test it:
AttributeError: '_io.TextIOWrapper' object has no attribute 'append'
How do I append something into a file?
Use json module to write dictionary to a file;
>>> import json
>>> d = dict.fromkeys('abcde')
#Write
with open('abc.json', 'w') as f:
json.dump(d, f)
#Read
with open('abc.json') as f:
print (json.load(f))
...
{'a': None, 'b': None, 'c': None, 'd': None, 'e': None}
There's two errors there.
The method to write to a file is write(), not append()
You're calling close() on a string, you should close() the file object input_filename.
Also, you may want to rename input_filename to output_file.
I'm new to python and bioinformatics field. I'm using python-2.6. Now I'm trying to select all fastq.gz files, then gzip.open(just a few lines because it's too huge and time-wasting), then count 'J' , then pick out those files with 'J' count NOT equal to 0.
The following is my code:
#!/usr/bin/python
import os,sys,re,gzip
path = "/home/XXX/nearline"
for file in os.listdir(path):
if re.match('.*\.recal.fastq.gz', file):
text = gzip.open(file,'r').readlines()[:10]
word_list = text.split()
number = word_list.count('J') + 1
if number !== 0:
print file
But I got some errors:
Traceback (most recent call last):
File "fastqfilter.py", line 9, in <module>
text = gzip.open(file,'r').readlines()[:10]
File "/share/lib/python2.6/gzip.py", line 33, in open
return GzipFile(filename, mode, compresslevel)
File "/share/lib/python2.6/gzip.py", line 79, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: 'ERR001268_1.recal.fastq.gz'
What's this traceback: File......
Is there anything wrong with gzip here?
And why can't it find ERR001268_1.recal.fastq.gz? It's the first fastq file in the list, and DOES exist there.
Hope give me some clues, and any point out any other errors in the script.
THanks a lot.
Edit: thx everyone. I followed Dan's suggestion. And I tried on ONE fastq file first. My script goes like:
#!/usr/bin/python
import os,sys
import gzip
import itertools
file = gzip.open('/home/xug/nearline/ERR001274_1.recal.fastq.gz','r')
list(itertools.islice(file.xreadlines(),10))
word_list = list.split()
number = word_list.count('J') + 1
if number != 0:
print 'ERR001274_1.recal.fastq.gz'
Then errors are:
Traceback (most recent call last):
File "try2.py", line 8, in <module>
list(itertools.islice(text.xreadlines(),10))
AttributeError: GzipFiles instance has no attribute 'xreadlines'
Edit again: Thx Dan, I've solved the problem yesterday. Seems GzipFiles don't support xreadlines. So I tried the similar way as you suggested later. And it works. See below:
#!/usr/bin/python
import os,sys,re
import gzip
from itertools import islice
path = "/home/XXXX/nearline"
for file in os.listdir(path):
if re.match('.*\.recal.fastq.gz', file):
fullpath = os.path.join(path, file)
myfile = gzip.open(fullpath,'r')
head = list(islice(myfile,1000))
word_str = ";".join(str(x) for x in head)
number = word_str.count('J')
if number != 0:
print file
on this line:
text = gzip.open(file,'r').read()
file is a filename not a full path so
fullpath = os.path.join(path, file)
text = gzip.open(fullpath,'r').read()
about F.readlines()[:10] will read the whole file in to a list of lines and then take the first 10
import itertools
list(itertools.islice(F.xreadlines(),10))
this will not read the whole file into memory and will only read the first 10 lines into a list
but as gzip.open returns an object that doesn't have .xreadlines() and but as files are iterable on their lines just:
list(itertools.islice(F,10))
would work as this test shows:
>>> import gzip,itertools
>>> list(itertools.islice(gzip.open("/home/dan/Desktop/rp718.ps.gz"),10))
['%!PS-Adobe-2.0\n', '%%Creator: dvips 5.528 Copyright 1986, 1994 Radical Eye Software\n', '%%Title: WLP-94-int.dvi\n', '%%CreationDate: Mon Jan 16 16:24:41 1995\n', '%%Pages: 6\n', '%%PageOrder: Ascend\n', '%%BoundingBox: 0 0 596 842\n', '%%EndComments\n', '%DVIPSCommandLine: dvips -f WLP-94-int.dvi\n', '%DVIPSParameters: dpi=300, comments removed\n']
Change your code to:
#!/usr/bin/python
import os,sys,re,gzip
path = "/home/XXX/nearline"
for file in os.listdir(path):
if re.match('.*\.recal.fastq.gz', file):
text = gzip.open(os.path.join(path,file),'r').readlines()[:10]
word_list = text.split()
number = word_list.count('J') + 1
if number !== 0:
print file
It's trying to open ERR001268_1.recal.fastq.gz from the working directory, not from /home/XXX/nearline.