I have a function that filters documents by certain extensions, filtering is performed , but there is a problem with writing json and passing it to a txt file. json.dump without f. write also doesn't work.
you may be able to help solve this problem, thank you !
def get_file_json(self):
result = []
documents = Document.objects.all()
for document in documents:
extension = document.source_file.name.split('.')[-1]
print(extension)
if extension == 'txt' or extension == 'pdf':
result.append(document.source_file.name)
if result:
with open('user_documents.txt', 'w') as f:
f.write(json.dump(result, f))
self.stdout.write(self.style.SUCCESS(f'ОК!'))
Your current code is writing to the file, then attempting to write None into it, as that's the return value of json.dump.
Either use
json.dump(result, f) # Dump result to `f`
or less preferably
f.write(json.dumps(result)) # Generate string of result, write to `f`
problem solved, thank you all !
def get_file_json(self, documents, filename):
result = []
for document in documents:
extension_docx = document.source_file.name.split('.')[-1]
if extension_docx == 'docx':
result.append({"docx": document.source_file.name, "pdf": self.get_pdf_path(document)})
if result:
with open(filename, 'w+') as f:
f.write(json.dumps(result))
Related
Hi guys I am working with a huge gz compressed fasta file, and I have a nice fasta parser but I would like to make it more general, in the way I can check for compression, to parse a gz or a not compressed file.
I try this code:
def is_header(line):
return line[0] == '>'
def parse_multi_fasta_file_compressed_or_not(filename):
if filename.endswith('.gz'):
with gzip.open(filename, 'rt') as f:
fasta_iter = (it[1] for it in itertools.groupby(f, is_header))
else:
with open(filename, 'r') as f:
fasta_iter = (it[1] for it in itertools.groupby(f, is_header))
for name in fasta_iter:
name = name.__next__()[1:].strip()
sequences = ''.join(seq.strip() for seq in fasta_iter.__next__())
yield name, sequences
ref:
https://drj11.wordpress.com/2010/02/22/python-getting-fasta-with-itertools-groupby/
https://www.biostars.org/p/710/
I tried to modify the identation. Python doesn't complain about any error. However, it doesn't print or show any results. I am using a toy file with 5 sequences.
Just to remind a fasta file is something like that:
>header_1
AATATATTCAATATGGAGAGAATAAAAGAACTAAGAGATCTAATGTCACAGTCTCGCACTCGCGAGATAC
TCACCAAAACCACTGTGGACCACATGGCCATAATCAAAAAGTACACATCAGGAAGGCAAGAGAAGAACCC
TGCACTCAGGATGAAGTGGATGATG
>header_2
AACCATTTGAATGGATGTCAATCCGACTTTACTTTTCTTGAAAGTTCCAGCGCAAAATGCCATAAGCACC
ACATTTCCCTATACTGGAGACCCTCC
I would like to use some try:... except:... instead of if.
If any of you have any tip to help me figure that out, I would appreciate it a lot (it's not any course exercice at all!).
Thank you for your time.
Paulo
It looks like you have overly indented your `for loop. Try the following:
def is_header(line):
return line[0] == '>'
def parse_multi_fasta_file_compressed_or_not(filename):
if filename.endswith('.gz'):
opener = lambda filename: gzip.open(filename, 'rt')
else:
opener = lambda filename: open(filename, 'r')
with opener(filename) as f:
fasta_iter = (it[1] for it in itertools.groupby(f, is_header))
for name in fasta_iter:
name = name.__next__()[1:].strip()
sequences = ''.join(seq.strip() for seq in fasta_iter.__next__())
yield name, sequences
I've also rearranged things a little so you can use the with block as you did before. The conditional at the beginning assigns to opener a function which can open the given file depending on whether it is gzipped or not.
I have more than 30 text files. I need to do some processing on each text file and save them again in text files with different names.
Example-1: precise_case_words.txt ---- processing ---- precise_case_sentences.txt
Example-2: random_case_words.txt ---- processing ---- random_case_sentences.txt
Like this i need to do for all text files.
present code:
new_list = []
with open('precise_case_words.txt') as inputfile:
for line in inputfile:
new_list.append(line)
final = open('precise_case_sentences.txt', 'w+')
for item in new_list:
final.write("%s\n" % item)
Am manually copy+paste this code all the times and manually changing the names everytime. Please suggest me a solution to avoid manual job using python.
Suppose you have all your *_case_words.txt in the present dir
import glob
in_file = glob.glob('*_case_words.txt')
prefix = [i.split('_')[0] for i in in_file]
for i, ifile in enumerate(in_file):
data = []
with open(ifile, 'r') as f:
for line in f:
data.append(line)
with open(prefix[i] + '_case_sentence.txt' , 'w') as f:
f.write(data)
This should give you an idea about how to handle it:
def rename(name,suffix):
"""renames a file with one . in it by splitting and inserting suffix before the ."""
a,b = name.split('.')
return ''.join([a,suffix,'.',b]) # recombine parts including suffix in it
def processFn(name):
"""Open file 'name', process it, save it under other name"""
# scramble data by sorting and writing anew to renamed file
with open(name,"r") as r, open(rename(name,"_mang"),"w") as w:
for line in r:
scrambled = ''.join(sorted(line.strip("\n")))+"\n"
w.write(scrambled)
# list of filenames, see link below for how to get them with os.listdir()
names = ['fn1.txt','fn2.txt','fn3.txt']
# create demo data
for name in names:
with open(name,"w") as w:
for i in range(12):
w.write("someword"+str(i)+"\n")
# process files
for name in names:
processFn(name)
For file listings: see How do I list all files of a directory?
I choose to read/write line by line, you can read in one file fully, process it and output it again on block to your liking.
fn1.txt:
someword0
someword1
someword2
someword3
someword4
someword5
someword6
someword7
someword8
someword9
someword10
someword11
into fn1_mang.txt:
0demoorsw
1demoorsw
2demoorsw
3demoorsw
4demoorsw
5demoorsw
6demoorsw
7demoorsw
8demoorsw
9demoorsw
01demoorsw
11demoorsw
I happened just today to be writing some code that does this.
I'm opening a CSV file and I need to check if the file empty or not, I already know about checking using getsize(). I would like a way by using DictReader.
This is my code
infocsv = open('nyfile.csv', 'a')
reader = csv.DictReader(infocsv)
with open(parafile, "rb") as paracsv:
#Read in parameter values as a dictionary
paradict = csv.DictReader(paracsv)
has_rows = False
for line in paradict:
has_rows = True
if not has_rows:
return None
The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.
Here is an alternative solution:
import csv
with open('nyfile.csv') as infocsv:
reader = [i for i in csv.DictReader(infocsv)]
if len(reader)>0:
print ('not empty')
else:
print ('empty')
I tried it on a few CSV files of my own and it works. Let me know if this helps.
I'm trying to create a function that would add entries to a json file. Eventually, I want a file that looks like
[{"name" = "name1", "url" = "url1"}, {"name" = "name2", "url" = "url2"}]
etc. This is what I have:
def add(args):
with open(DATA_FILENAME, mode='r', encoding='utf-8') as feedsjson:
feeds = json.load(feedsjson)
with open(DATA_FILENAME, mode='w', encoding='utf-8') as feedsjson:
entry = {}
entry['name'] = args.name
entry['url'] = args.url
json.dump(entry, feedsjson)
This does create an entry such as {"name"="some name", "url"="some url"}. But, if I use this add function again, with different name and url, the first one gets overwritten. What do I need to do to get a second (third...) entry appended to the first one?
EDIT: The first answers and comments to this question have pointed out the obvious fact that I am not using feeds in the write block. I don't see how to do that, though. For example, the following apparently will not do:
with open(DATA_FILENAME, mode='a+', encoding='utf-8') as feedsjson:
feeds = json.load(feedsjson)
entry = {}
entry['name'] = args.name
entry['url'] = args.url
json.dump(entry, feeds)
json might not be the best choice for on-disk formats; The trouble it has with appending data is a good example of why this might be. Specifically, json objects have a syntax that means the whole object must be read and parsed in order to understand any part of it.
Fortunately, there are lots of other options. A particularly simple one is CSV; which is supported well by python's standard library. The biggest downside is that it only works well for text; it requires additional action on the part of the programmer to convert the values to numbers or other formats, if needed.
Another option which does not have this limitation is to use a sqlite database, which also has built-in support in python. This would probably be a bigger departure from the code you already have, but it more naturally supports the 'modify a little bit' model you are apparently trying to build.
You probably want to use a JSON list instead of a dictionary as the toplevel element.
So, initialize the file with an empty list:
with open(DATA_FILENAME, mode='w', encoding='utf-8') as f:
json.dump([], f)
Then, you can append new entries to this list:
with open(DATA_FILENAME, mode='w', encoding='utf-8') as feedsjson:
entry = {'name': args.name, 'url': args.url}
feeds.append(entry)
json.dump(feeds, feedsjson)
Note that this will be slow to execute because you will rewrite the full contents of the file every time you call add. If you are calling it in a loop, consider adding all the feeds to a list in advance, then writing the list out in one go.
Append entry to the file contents if file exists, otherwise append the entry to an empty list and write in in the file:
a = []
if not os.path.isfile(fname):
a.append(entry)
with open(fname, mode='w') as f:
f.write(json.dumps(a, indent=2))
else:
with open(fname) as feedsjson:
feeds = json.load(feedsjson)
feeds.append(entry)
with open(fname, mode='w') as f:
f.write(json.dumps(feeds, indent=2))
Using a instead of w should let you update the file instead of creating a new one/overwriting everything in the existing file.
See this answer for a difference in the modes.
One possible solution is do the concatenation manually, here is some useful
code:
import json
def append_to_json(_dict,path):
with open(path, 'ab+') as f:
f.seek(0,2) #Go to the end of file
if f.tell() == 0 : #Check if file is empty
f.write(json.dumps([_dict]).encode()) #If empty, write an array
else :
f.seek(-1,2)
f.truncate() #Remove the last character, open the array
f.write(' , '.encode()) #Write the separator
f.write(json.dumps(_dict).encode()) #Dump the dictionary
f.write(']'.encode()) #Close the array
You should be careful when editing the file outside the script not add any spacing at the end.
this, work for me :
with open('file.json', 'a') as outfile:
outfile.write(json.dumps(data))
outfile.write(",")
outfile.close()
I have some code which is similar, but does not rewrite the entire contents each time. This is meant to run periodically and append a JSON entry at the end of an array.
If the file doesn't exist yet, it creates it and dumps the JSON into an array. If the file has already been created, it goes to the end, replaces the ] with a , drops the new JSON object in, and then closes it up again with another ]
# Append JSON object to output file JSON array
fname = "somefile.txt"
if os.path.isfile(fname):
# File exists
with open(fname, 'a+') as outfile:
outfile.seek(-1, os.SEEK_END)
outfile.truncate()
outfile.write(',')
json.dump(data_dict, outfile)
outfile.write(']')
else:
# Create file
with open(fname, 'w') as outfile:
array = []
array.append(data_dict)
json.dump(array, outfile)
You aren't ever writing anything to do with the data you read in. Do you want to be adding the data structure in feeds to the new one you're creating?
Or perhaps you want to open the file in append mode open(filename, 'a') and then add your string, by writing the string produced by json.dumps instead of using json.dump - but nneonneo points out that this would be invalid json.
import jsonlines
object1 = {
"name": "name1",
"url": "url1"
}
object2 = {
"name": "name2",
"url": "url2"
}
# filename.jsonl is the name of the file
with jsonlines.open("filename.jsonl", "a") as writer: # for writing
writer.write(object1)
writer.write(object2)
with jsonlines.open('filename.jsonl') as reader: # for reading
for obj in reader:
print(obj)
visit for more info https://jsonlines.readthedocs.io/en/latest/
You can simply import the data from the source file, read it, and save what you want to append to a variable. Then open the destination file, assign the list data inside to a new variable (presumably this will all be valid JSON), then use the 'append' function on this list variable and append the first variable to it. Viola, you have appended to the JSON list. Now just overwrite your destination file with the newly appended list (as JSON).
The 'a' mode in your 'open' function will not work here because it will just tack everything on to the end of the file, which will make it non-valid JSON format.
let's say you have the following dicts
d1 = {'a': 'apple'}
d2 = {'b': 'banana'}
d3 = {'c': 'carrot'}
you can turn this into a combined json like this:
master_json = str(json.dumps(d1))[:-1]+', '+str(json.dumps(d2))[1:-1]+', '+str(json.dumps(d3))[1:]
therefore, code to append to a json file will look like below:
dict_list = [d1, d2, d3]
for i, d in enumerate(d_list):
if i == 0:
#first dict
start = str(json.dumps(d))[:-1]
with open(str_file_name, mode='w') as f:
f.write(start)
else:
with open(str_file_name, mode='a') as f:
if i != (len(dict_list) - 1):
#middle dicts
mid = ','+str(json.dumps(d))[1:-1]
f.write(mid)
else:
#last dict
end = ','+str(json.dumps(d))[1:]
f.write(end)
I have problem with changing a dict value and saving the dict to a text file (the format must be same), I only want to change the member_phone field.
My text file is the following format:
memberID:member_name:member_email:member_phone
and I split the text file with:
mdict={}
for line in file:
x=line.split(':')
a=x[0]
b=x[1]
c=x[2]
d=x[3]
e=b+':'+c+':'+d
mdict[a]=e
When I try change the member_phone stored in d, the value has changed not flow by the key,
def change(mdict,b,c,d,e):
a=input('ID')
if a in mdict:
d= str(input('phone'))
mdict[a]=b+':'+c+':'+d
else:
print('not')
and how to save the dict to a text file with same format?
Python has the pickle module just for this kind of thing.
These functions are all that you need for saving and loading almost any object:
import pickle
with open('saved_dictionary.pkl', 'wb') as f:
pickle.dump(dictionary, f)
with open('saved_dictionary.pkl', 'rb') as f:
loaded_dict = pickle.load(f)
In order to save collections of Python there is the shelve module.
Pickle is probably the best option, but in case anyone wonders how to save and load a dictionary to a file using NumPy:
import numpy as np
# Save
dictionary = {'hello':'world'}
np.save('my_file.npy', dictionary)
# Load
read_dictionary = np.load('my_file.npy',allow_pickle='TRUE').item()
print(read_dictionary['hello']) # displays "world"
FYI: NPY file viewer
We can also use the json module in the case when dictionaries or some other data can be easily mapped to JSON format.
import json
# Serialize data into file:
json.dump( data, open( "file_name.json", 'w' ) )
# Read data from file:
data = json.load( open( "file_name.json" ) )
This solution brings many benefits, eg works for Python 2.x and Python 3.x in an unchanged form and in addition, data saved in JSON format can be easily transferred between many different platforms or programs. This data are also human-readable.
Save and load dict to file:
def save_dict_to_file(dic):
f = open('dict.txt','w')
f.write(str(dic))
f.close()
def load_dict_from_file():
f = open('dict.txt','r')
data=f.read()
f.close()
return eval(data)
As Pickle has some security concerns and is slow (source), I would go for JSON, as it is fast, built-in, human-readable, and interchangeable:
import json
data = {'another_dict': {'a': 0, 'b': 1}, 'a_list': [0, 1, 2, 3]}
# e.g. file = './data.json'
with open(file, 'w') as f:
json.dump(data, f)
Reading is similar easy:
with open(file, 'r') as f:
data = json.load(f)
This is similar to this answer, but implements the file handling correctly.
If the performance improvement is still not enough, I highly recommend orjson, fast, correct JSON library for Python build upon Rust.
I'm not sure what your first question is, but if you want to save a dictionary to file you should use the json library. Look up the documentation of the loads and puts functions.
I would suggest saving your data using the JSON format instead of pickle format as JSON's files are human-readable which makes your debugging easier since your data is small. JSON files are also used by other programs to read and write data. You can read more about it here
You'll need to install the JSON module, you can do so with pip:
pip install json
# To save the dictionary into a file:
json.dump( data, open( "myfile.json", 'w' ) )
This creates a json file with the name myfile.
# To read data from file:
data = json.load( open( "myfile.json" ) )
This reads and stores the myfile.json data in a data object.
For a dictionary of strings such as the one you're dealing with, it could be done using only Python's built-in text processing capabilities.
(Note this wouldn't work if the values are something else.)
with open('members.txt') as file:
mdict={}
for line in file:
a, b, c, d = line.strip().split(':')
mdict[a] = b + ':' + c + ':' + d
a = input('ID: ')
if a not in mdict:
print('ID {} not found'.format(a))
else:
b, c, d = mdict[a].split(':')
d = input('phone: ')
mdict[a] = b + ':' + c + ':' + d # update entry
with open('members.txt', 'w') as file: # rewrite file
for id, values in mdict.items():
file.write(':'.join([id] + values.split(':')) + '\n')
I like using the pretty print module to store the dict in a very user-friendly readable form:
import pprint
def store_dict(fname, dic):
with open(fname, "w") as f:
f.write(pprint.pformat(dic, indent=4, sort_dicts=False))
# note some of the defaults are: indent=1, sort_dicts=True
Then, when recovering, read in the text file and eval() it to turn the string back into a dict:
def load_file(fname):
try:
with open(fname, "r") as f:
dic = eval(f.read())
except:
dic = {}
return dic
Unless you really want to keep the dictionary, I think the best solution is to use the csv Python module to read the file.
Then, you get rows of data and you can change member_phone or whatever you want ;
finally, you can use the csv module again to save the file in the same format
as you opened it.
Code for reading:
import csv
with open("my_input_file.txt", "r") as f:
reader = csv.reader(f, delimiter=":")
lines = list(reader)
Code for writing:
with open("my_output_file.txt", "w") as f:
writer = csv.writer(f, delimiter=":")
writer.writerows(lines)
Of course, you need to adapt your change() function:
def change(lines):
a = input('ID')
for line in lines:
if line[0] == a:
d=str(input("phone"))
line[3]=d
break
else:
print "not"
I haven't timed it but I bet h5 is faster than pickle; the filesize with compression is almost certainly smaller.
import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'dict2': dict2}, compression=('blosc', 9))
file_name = open("data.json", "w")
json.dump(test_response, file_name)
file_name.close()
or use context manager, which is better:
with open("data.json", "w") as file_name:
json.dump(test_response, file_name)