JSON file specific line editing - python

I'd like to ask what is the best way to replace specific line in multiple json files. In everyfile its the same line that needs to be replaced. enter image description here
import json
with open('3.json') as f:
data = json.load(f)
for item in data['attributes']:
item['value'] = item['value'].replace("Untitled", item['BgTest'])
with open('3.json', 'w') as d:
json.dump(data, d)
I tried this code I found but it keeps giving me an error:
"/Users/jakubpitonak/Desktop/NFT/Gnomes Collection/ART-GEN-TUTORIAL 2.0/bin/python" /Users/jakubpitonak/PycharmProjects/pythonProject1/update.py
Traceback (most recent call last):
File "/Users/jakubpitonak/PycharmProjects/pythonProject1/update.py", line 25, in <module>
item['value'] = item['value'].replace("Untitled", item['BgTest'])
KeyError: 'BgTest'
Process finished with exit code 1

So item['BgTest'] does not exist in the items you're iterating through. I think you want to replace the "Untitled" value with the value "BgTest". In that case, replace the for loop with the one below:
for item in data['attributes']:
if item['value'] == 'Untitled':
item['value'] = 'BgTest'

import json
with open('3.json') as f:
data = json.load(f)
for item in data['attributes']:
item['value'] = "Your value here"
with open('3.json', 'w') as d:
json.dump(data, d)
BgTest is not a valid key for the example you posted. If you only have that key in certain rows of the list you can not use it in the for loop.

Related

Breakng the hash

I have to break 4 hash codes and find their number . but my code is not working
these are the hash codes (in a csv file) :
javad :f478525457dcd5ec6223e52bd3df32d1edb600275e18d6435cdeb3ef2294e8de
milad : 297219e7de424bb52c040e7a2cbbd9024f7af18e283894fe59ca6abc0313c3c4
tahmine : 6621ead3c9ec19dfbd65ca799cc387320c1f22ac0c6b3beaae9de7ef190668c4
niloofar : 26d72e99775e03d2501416c6f402c265e628b7d02eee17a7671563c32e0cd9a3
my code :
import hashlib
import itertools as it
import csv
from typing import Dict
number=[0,1,2,3,4,5,6,7,8,9]
code = hashlib.sha256()
passwords = list(it.permutations(number, 4))
with open('passwords.csv', newline='') as theFile:
reader = csv.reader(theFile)
passdic = dict()
# hpass is hash password
for hpass in passwords :
encoded_hpass = ''.join(map(str, hpass)).encode('ascii')
code = hashlib.sha256()
code.update(encoded_hpass)
passdic[encoded_hpass] = code.digest()
for row in theFile :
for key, value in row.items():
passdic[key].append(value)
and my result is :
'C:\Users\Parsa\AppData\Local\Programs\Python\Python38-32\python.exe' 'c:\Users\Parsa\.vscode\extensions\ms-python.python-2021.12.1559732655\pythonFiles\lib\python\debugpy\launcher' '3262' '--' 'c:\Users\Parsa\Desktop\project\hash breaker.py'
Traceback (most recent call last):
File "c:\Users\Parsa\Desktop\project\hash breaker.py", line 24, in <module>
for row in theFile :
ValueError: I/O operation on closed file.
You're trying to read from a closed file, which is impossible.
I don't know what your code is supposed to do, but here are the unlogical parts:
This opens the file to parse it as CSV
with open('passwords.csv', newline='') as theFile:
reader = csv.reader(theFile)
Then later on you run:
for row in theFile :
for key, value in row.items():
But now, you're outside of the with block and the file is closed.
I guess you should use reader in place of theFile. If you really intend to loop over the raw line of the file, you need to wrap the loop again in a with open statement.

Rename fasta file according to a dataframe in python

Hello I have huge file such as :
>Seq1.1
AAAGGAGAATAGA
>Seq2.2
AGGAGCTTCTCAC
>Seq3.1
CGTACTACGAGA
>Seq5.2
CGAGATATA
>Seq3.1
CGTACTACGAGA
>Seq2
AGGAGAT
and a dataframe such as :
tab
query New_query
Seq1.1 Seq1.1
Seq2.2 Seq2.2
Seq3.1 Seq3.1_0
Seq5.2 Seq5.2_3
Seq3.1 Seq3.1_1
and the idea is to rename the >Seqname according to the tab.
Then for each Seqname, if tab['query'] != tab['New_query'], then rename the Seqname as tab['New_query']
Ps: All the >Seqname are not present in the tab, if it is the case then I do nothing.
I should then get a new fasta file such as :
>Seq1.1
AAAGGAGAATAGA
>Seq2.2
AGGAGCTTCTCAC
>Seq3.1_0
CGTACTACGAGA
>Seq5.2_3
CGAGATATA
>Seq3.1_1
CGTACTACGAGA
>Seq2
AGGAGAT
I tried this code :
records = SeqIO.parse("My_fasta_file.aa", 'fasta')
for record in records:
subtab=tab[tab['query']==record.id]
subtab=subtab.drop_duplicates(subset ="New_query",keep = "first")
if subtab.empty == True: #it means that the seq was not in the tab, so I do not rename the sequence.
continue
else:
if subtab['query'].iloc[0] != subtab['New_query'].iloc[0]:
record.id = subtab['New_query']
record.description = subtab['New_query']
else:
continue
it works but it takes to much time ...
You can create a mapper dictionary from the dataframe and then read the fasta file line by line, substituting the lines which starts with >:
mapper = tab.set_index('query').to_dict()['New_query']
with open('My_fasta_file.aa', 'r') as f_in, open('output.txt', 'w') as f_out:
for line in map(str.strip, f_in):
if line.startswith('>'):
v = line.split('>')[-1]
line = '>{}'.format(mapper.get(v, v))
print(line, file=f_out)
Creates output.txt:
>Seq1.1
AAAGGAGAATAGA
>Seq2.2
AGGAGCTTCTCAC
>Seq3.1_1
CGTACTACGAGA
>Seq5.2_3
CGAGATATA
>Seq3.1_1
CGTACTACGAGA
>Seq2
AGGAGAT
The solution by #Andrej using a dictionary is indeed the way to go.. Since you are already using biopython, below is a way to use it, and I think it might be good because it does handle fasta files properly..
Your data.frame is:
tab = pd.DataFrame({'query':['Seq1.1','Seq2.2','Seq3.1','Seq5.2','Seq3.1'],
'New_query':['Seq1.1','Seq2.2','Seq3.1_0','Seq5.2_3','Seq3.1_1']})
Same dictionary as Andrej:
mapper = tab.set_index('query').to_dict()['New_query']
Then similar to what you have done, we just change the header (by updating id and description, thanks to #Chris_Rands):
records = list(SeqIO.parse("My_fasta_file.aa", "fasta"))
for i in records:
i.id = mapper.get(i.id,i.id)
i.description = mapper.get(i.description,i.description)
Now write the file:
with open("new.fasta", "w") as output_handle:
SeqIO.write(records, output_handle, "fasta")

Python CSV: Unable to upload file. Index Error("String index out of range")

I am trying to import a file using Python: I understand why I've got this error but don't manage to correct it. The code tries to access a blank line and return an out of range error message. How could I correct this?
file_data = csv_file.read().decode("utf-8")
print("1")
lines = file_data.split("\n")
#loop over the lines and save them in db. If error , store as string and then display
for line in lines:
if not line:
continue
line = line.strip()
print(line)
# following line may not be used, as line is a String, just access as an array
#b = line.split()
print(line[0])
print("2")
fields = line.split(",")
data_dict = {}
data_dict["project_name"] = fields[0]
You check if the line is empty with
if not line:
continue
And after that you strip it
line = line.strip()
But when you strip it, the line can become empty, which you don't check for.
Fix the order of those lines, so you have:
line = line.strip()
if not line:
continue

ValueError issue and how to extract not matching strings from two files?

Hello everyone and thanks in advance.
I'm trying to extract the not matching column strings from two different csv files and write the results in a new cvs file.
As far as now, I've written this code:
import csv
with open(r'C:\Users\DataAnalyst\Desktop\phonesdata\sms_03.csv', 'r') as sms:
sms_indices = dict((r[1], i) for i, r in enumerate(csv.reader(sms)))
with open(r'C:\Users\DataAnalyst\Desktop\phonesdata\marketing.csv', 'r') as marketing:
reader = csv.reader(marketing)
f3 = open('results.csv', 'w')
c3 = csv.writer(f3)
sms_mark = list(marketing)
for sms_03_row in sms:
row = 1
found = False
results_row = sms_03_row #Moved out from nested loop
for marketing_row in sms_mark:
x = marketing[0]
if sms_03_row[1] != marketing_row[0]:
results_row.append(x)
found = True
break
row += 1
if not found:
results_row.append('Not found')
c3.writerow(results_row)
sms.close()
marketing.close()
f3.close()
However, I got this:
Traceback (most recent call last):
File "<ipython-input-5-bc26b28cdf70>", line 1, in <module>
sms_mark = list(marketing)
ValueError: I/O operation on closed file.
How can I solved it? Having this code, will I get just the not matching strings?
Thank you!
You have there some indentadion issues. Anyway let the with statement take care about opening and closing files and you will be fine:
import csv
with open(r'<path_to_sms>', 'r') as sms, open(r'<path_to_marketing>') as marketing,
open(r'results.csv', 'w') as f3:
sms_indices = dict((r[1], i) for i, r in enumerate(csv.reader(sms)))
reader = csv.reader(marketing)
c3 = csv.writer(f3)
sms_mark = list(marketing)
results_row = []
# your stuff goes here
c3.writerow(results_row)

syntaxerror while pasing json

I want to make json from text file, and make list value of ['ids']
{"previous_cursor": 0, "previous_cursor_str": "0", "next_cursor": 1351473067174597097, "ids": [250718906, 66612533], "next_cursor_str": "1351473067174597097"} {"previous_cursor": -1351352880715992572, "previous_cursor_str": "-1351352880715992572", "next_cursor": 0, "ids": [113030689, 22020972], "next_cursor_str": "0"}
My code
import json
f = open('22580877','r')
data = f.read()
datalist = data.split('\n')
idslist = []
for i in datalist:
datadic = eval(i)
print(datadic['ids'])
idslist.extend(datadic['ids'])
datadic = {}
for j in idslist:
print(j)
f.close()
the error msg is
Traceback (most recent call last):
File "test.py", line 11, in <module>
datadic = eval(i)
File "<string>", line 0
^
SyntaxError: unexpected EOF while parsing
I can't find my syntaxerror in my code. help me please!
It sounds like you've been handed a file with a jsonified string on each line of the file. From your error message I kinda wonder if your file is corrupt or not formatted the way you think it is. However, if I had been given the task you've supplied I'd do something like this...
import json, traceback
idslist = []
with open('22580877', 'r') as f:
data = f.read()
datalist = data.split('\n')
for idx, json_string in enumerate(datalist):
try:
json_obj = json.loads(json_string)
idslist.extend(json_obj['ids'])
except:
print "bad json on line {} with traceback:\n{}".format(idx+1, traceback.format_exc())
for id in idslist:
print(id)

Categories