am getting identical sha256 for each json file in python - python

I am in a huge hashing crisis. Using the chip-0007's default format I generatedfew JSON files. Using these files I have been trying to generate sha256 hash value. And I expect a unique hash value for each file.
However, python code isn't doing so. I thought there might be some issue with JSON file but, it is not. Something is to do with sha256 code.
All the json files ->
JSON File 1
{ "format": "CHIP-0007", "name": "adewale-the-amebo", "description": "Adewale always wants to be in everyone's business.", "attributes": [ { "trait_type": "Gender", "value": "male" } ], "collection": { "name": "adewale-the-amebo Collection", "id": "1" } }
JSON File 2
{ "format": "CHIP-0007", "name": "alli-the-queeny", "description": "Alli is an LGBT Stan.", "attributes": [ { "trait_type": "Gender", "value": "male" } ], "collection": { "name": "alli-the-queeny Collection", "id": "2" } }
JSON File 3
{ "format": "CHIP-0007", "name": "aminat-the-snnobish", "description": "Aminat never really wants to talk to anyone.", "attributes": [ { "trait_type": "Gender", "value": "female" } ], "collection": { "name": "aminat-the-snnobish Collection", "id": "3" } }
Sample CSV File:
Series Number,Filename,Description,Gender
1,adewale-the-amebo,Adewale always wants to be in everyone's business.,male
2,alli-the-queeny,Alli is an LGBT Stan.,male
3,aminat-the-snnobish,Aminat never really wants to talk to anyone.,female
Python CODE
TODO 2 : Generate a JSON file per entry in team's sheet in CHIP-0007's default format
new_jsonFile = f"{row[1]}.json"
json_data = {}
json_data["format"] = "CHIP-0007"
json_data["name"] = row[1]
json_data["description"] = row[2]
attribute_data = {}
attribute_data["trait_type"] = "Gender" # gender
attribute_data["value"] = row[3] # "value/male/female"
json_data["attributes"] = [attribute_data]
collection_data = {}
collection_data["name"] = f"{row[1]} Collection"
collection_data["id"] = row[0] # "ID of the NFT collection"
json_data["collection"] = collection_data
filepath = f"Json_Files/{new_jsonFile}"
with open(filepath, 'w') as f:
json.dump(json_data, f, indent=2)
C += 1
sha256_hash = sha256_gen(filepath)
temp.append(sha256_hash)
NEW.append(temp)
# TODO 3 : Calculate sha256 of the each entry
def sha256_gen(fn):
return hashlib.sha256(open(fn, 'rb').read()).hexdigest()
How can I generate a unique sha256 hash for each JSON?
I tried reading in byte blocks. That is also not working out. After many trials, I am going nowhere. Sharing the unexpected outputs of each JSON file:
[ All hashes are identical ]
Unexpected SHA256 output:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Expected:
Unique Hash value. Different from each other

Because of output buffering, you're calling sha256_gen(filepath) before anything is written to the file, so you're getting the hash of an empty file. You should do that outside the with, so that the JSON file is closed and the buffer is flushed.
with open(filepath, 'w') as f:
json.dump(json_data, f, indent=2)
C += 1
sha256_hash = sha256_gen(filepath)
temp.append(sha256_hash)
NEW.append(temp)

Related

Python Taking API response and adding to JSON Key error

I have a script that takes an ID from a JSON file, adds it to a URL for an API request. The aim is to have a loop run through the 1000-ish ids and preduce one JSON file with all the information contained within.
The current code calls the first request and creates and populates the JSON file, but when run in a loop it throws a key error.
import json
import requests
fname = "NewdataTest.json"
def write_json(data, fname):
fname = "NewdataTest.json"
with open(fname, "w") as f:
json.dump(data, f, indent = 4)
with open (fname) as json_file:
data = json.load(json_file)
temp = data[0]
#print(newData)
y = newData
data.append(y)
# Read test.json to get tmdb IDs
tmdb_ids = []
with open('test.json', 'r') as json_fp:
imdb_info = json.load(json_fp)
tmdb_ids = [movie_info['tmdb_id'] for movies_chunk in imdb_info for movie_index, movie_info in movies_chunk.items()]
# Add IDs to API call URL
for tmdb_id in tmdb_ids:
print("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Send API Call
response_API = requests.get("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Check API Call Status
print(response_API.status_code)
write_json((response_API.json()), "NewdataTest.json")
The error is in this line "temp = data[0]" I have tried printing the keys for data, nothing. At this point, I have no idea where I am with this as I have hacked it about it barely resembles anything like a cohesive piece of code. My aim was to make a simple function to get the data from the JSON, one to produce the API call URLs, and one to write the results to the new JSON.
Example of API reponse JSON:
{
"adult": false,
"backdrop_path": "/e1cC9muSRtAHVtF5GJtKAfATYIT.jpg",
"belongs_to_collection": null,
"budget": 0,
"genres": [
{
"id": 10749,
"name": "Romance"
},
{
"id": 35,
"name": "Comedy"
}
],
"homepage": "",
"id": 1063242,
"imdb_id": "tt24640474",
"original_language": "fr",
"original_title": "Disconnect: The Wedding Planner",
"overview": "After falling victim to a scam, a desperate man races the clock as he attempts to plan a luxurious destination wedding for an important investor.",
"popularity": 34.201,
"poster_path": "/tGmCxGkVMOqig2TrbXAsE9dOVvX.jpg",
"production_companies": [],
"production_countries": [
{
"iso_3166_1": "KE",
"name": "Kenya"
},
{
"iso_3166_1": "NG",
"name": "Nigeria"
}
],
"release_date": "2023-01-13",
"revenue": 0,
"runtime": 107,
"spoken_languages": [
{
"english_name": "English",
"iso_639_1": "en",
"name": "English"
},
{
"english_name": "Afrikaans",
"iso_639_1": "af",
"name": "Afrikaans"
}
],
"status": "Released",
"tagline": "",
"title": "Disconnect: The Wedding Planner",
"video": false,
"vote_average": 5.8,
"vote_count": 3
}
You can store all results from the API calls into a list and then save this list in Json format into a file. For example:
#...
all_data = []
for tmdb_id in tmdb_ids:
print("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Send API Call
response_API = requests.get("https://api.themoviedb.org/3/movie/" + str(tmdb_id) + "?api_key=****")
# Check API Call Status
print(response_API.status_code)
if response_API.status_code == 200:
# store the Json data in a list:
all_data.append(response_API.json())
# write the list to file
with open('output.json', 'w') as f_out:
json.dump(all_data, f_out, indent=4)
This will produce output.json with all responses in Json format.

How to merge non-fixed key json multilines into one json abstractly

If I have a heavy json file that have 30m entries like that
{"id":3,"price":"231","type":"Y","location":"NY"}
{"id":4,"price":"321","type":"N","city":"BR"}
{"id":5,"price":"354","type":"Y","city":"XE","location":"CP"}
--snip--
{"id":30373779,"price":"121","type":"N","city":"SR","location":"IU"}
{"id":30373780,"price":"432","type":"Y","location":"TB"}
{"id":30373780,"price":"562","type":"N","city":"CQ"}
how I can only abstract the location and the city and parse it into one json like that in python:
{
"orders":{
3:{
"location":"NY"
},
4:{
"city":"BR"
},
5:{
"city":"XE",
"location":"CP"
},
30373779:{
"city":"SR",
"location":"IU"
},
30373780:{
"location":"TB"
},
30373780:{
"city":"CQ"
}
}
}
P.S: beatufy the syntax is not necessary.
Assuming your input file is actually in jsonlines format, then you can read each line, extract the city and location keys from the dict and then append those to a new dict:
import json
from collections import defaultdict
orders = { 'orders' : defaultdict(dict) }
with open('orders.txt', 'r') as f:
for line in f:
o = json.loads(line)
id = o['id']
if 'location' in o:
orders['orders'][id]['location'] = o['location']
if 'city' in o:
orders['orders'][id]['city'] = o['city']
print(orders)
Output for your sample data (note it has two 30373780 id values, so the values get merged into one dict):
{
"orders": {
"3": {
"location": "NY"
},
"4": {
"city": "BR"
},
"5": {
"location": "CP",
"city": "XE"
},
"30373779": {
"location": "IU",
"city": "SR"
},
"30373780": {
"location": "TB",
"city": "CQ"
}
}
}
As you've said that your file is pretty big and you probably don't want to keep all entries in memory here is the way to consume source file line by line and write output immediately:
import json
with open(r"in.jsonp") as i_f, open(r"out.json", "w") as o_f:
o_f.write('{"orders":{')
for i in i_f:
i_obj = json.loads(i)
o_f.write(f'{i_obj["id"]}:')
o_obj = {}
if location := i_obj.get("location"):
o_obj["location"] = location
if city := i_obj.get("city"):
o_obj["city"] = city
json.dump(o_obj, o_f)
o_f.write(",")
o_f.write('}}')
It will generate semi-valid JSON object in same format you've provided in your question.

Extract data from json, append using for loop and save as CSV

I have extracted id, username, and name for 100 followers for 102 politicians using Tweepy. The data is stored in a JSON file named pol_followers. Now I wish to append id and username and save it as a CSV file using the function below. However, when using the function in the last line append_followers_to_csv(pol_followers, "pol_followers.csv") I get the error seen at the bottom.
# Structure of pol_followers. The full pol_followers is much longer...
print(json.dumps(pol_followers, indent=4, sort_keys=True)) # see json data structure
[
{
"data": [
{
"id": "1464206217807601666",
"name": "terry alex",
"username": "terryal51850644"
},
{
"id": "1479032154394968064",
"name": "Charles Williams",
"username": "Charles99924770"
},
{
"id": "2526015770",
"name": "LISA P",
"username": "LISAP0910"
},
{
"id": "2957692520",
"name": "fayaz ahmad",
"username": "ahmadfayaz202"
}
],
"meta": {
"next_token": "F6HS7IU5SRGHEZZZ",
"result_count": 100
}
},
{
"data": [
{
"id": "2482703136",
"name": "HieuVu",
"username": "sachieuhaihanh"
},
{
"id": "580882148",
"name": "Maxine D. Harmon",
"username": "maxxximd"
},
{
"id": "1478867472841334787",
"name": "RBPsych1",
"username": "RBPsych1"
# Create file
csv_follower_file = open("pol_followers.csv", "a", newline="", encoding='utf-8')
csv_follower_writer = csv.writer(csv_follower_file)
# Create headers for the data I want to save. I only want to save these columns in my dataset
csv_follower_writer.writerow(
['id', 'username'])
csv_follower_file.close()\
def append_followers_to_csv(pol_followers, csv_follower_file):
# A counter variable
global follower_id, username
counter = 0
# Open OR create the target CSV file
csv_follower_file = open(csv_follower_file, "a", newline="", encoding='utf-8')
csv_follower_writer = csv.writer(csv_follower_file)
for ids in pol_followers['data']:
# 1. follower ID
follower_id = ids['id']
# 2. follower username
username = ids['username']
# Assemble all data in a list
ress = [follower_id, username]
# Append the result to the CSV file
csv_follower_writer.writerow(ress)
counter += 1
# When done, close the CSV file
csvFile.close()
# Print the number of tweets for this iteration
print("# of Tweets added from this response: ", counter)
append_followers_to_csv(pol_followers, "pol_followers.csv") # Save tweet data in a csv file
File "<input>", line 1, in <module>
File "<input>", line 11, in append_followers_to_csv
TypeError: list indices must be integers or slices, not str
You are just missing additional loop, like so:
for each_dict in pol_followers:
for ids in each_dict['data']:
follower_id = ids['id']
username = ids['username']
You seem to have wrapped your JSON object in a list, so instead of getting the 'data' bit of the JSON, you are getting the 'data'th element of a list when you are iterating in your append_followers_to_csv function, which you can't do in python. Try removing the square brackets around the JSON or making it for ids in pol_followers[0]['data'].

Convert a JSON string to multiple CSV's based on its structure and name it to a certain value

I currently have A JSON file saved containing some data I want to convert to CSV. Here is the data sample below, please note, I have censored the actual value in there for security and privacy reasons.
{
"ID value1": {
"Id": "ID value1",
"TechnischContactpersoon": {
"Naam": "Value",
"Telefoon": "Value",
"Email": "Value"
},
"Disclaimer": [
"Value"
],
"Voorzorgsmaatregelen": [
{
"Attributes": {},
"FileId": "value",
"FileName": "value",
"FilePackageLocation": "value"
},
{
"Attributes": {},
"FileId": "value",
"FileName": "value",
"FilePackageLocation": "value"
},
]
},
"ID value2": {
"Id": "id value2",
"TechnischContactpersoon": {
"Naam": "Value",
"Telefoon": "Value",
"Email": "Value"
},
"Disclaimer": [
"Placeholder"
],
"Voorzorgsmaatregelen": [
{
"Attributes": {},
"FileId": "value",
"FileName": "value",
"FilePackageLocation": "value"
}
]
},
Though I know how to do this (because I already have a function to handle a JSON to CSV convertion) with a simple JSON string without issues. I do not know to this with this kind of JSON file that this kind of a structure layer. Aka a second layer beneath the first. Also you may have noticed that there is an ID value above
Because as may have noticed from structure is actually another layer inside the JSON file. So in total I need to have two kinds of CSV files:
The main CSV file just containing the ID, Disclaimer. This CSV file
is called utility networks and contains all possible ID value's and
the value
A file containing the "Voorzorgsmaatregelen" value's. Because there are multiple values in this section, one CSV file per unique
ID file is needed and needs to be named after the Unique value id.
Deleted this part because it was irrelevant.
Data_folder = "Data"
Unazones_file_name = "UnaZones"
Utilitynetworks_file_name = "utilityNetworks"
folder_path_JSON_BS_JSON = folder_path_creation(Data_folder)
pkml_file_path = os.path.join(folder_path_JSON_BS_JSON,"pmkl.json")
print(pkml_file_path)
json_object = json_open(pkml_file_path)
json_content_unazones = json_object.get("mapRequest").get("UnaZones")
json_content_utility_Networks = json_object.get("utilityNetworks")
Unazones_json_location = json_to_save(json_content_unazones,folder_path_JSON_BS_JSON,Unazones_file_name)
csv_file_location_unazones = os.path.join(folder_path_CSV_file_path(Data_folder),(Unazones_file_name+".csv"))
csv_file_location_Utilitynetwork = os.path.join(folder_path_CSV_file_path(Data_folder),(Unazones_file_name+".csv"))
json_content_utility_Networks = json_object.get("utilityNetworks")
Utility_networks_json_location = json_to_save(json_content_utility_Networks,folder_path_JSON_BS_JSON,Utilitynetworks_file_name)
def json_to_csv_convertion(json_file_path: str, csv_file_location: str):
loaded_json_data = json_open(json_file_path)
# now we will open a file for writing
data_file = open(csv_file_location, 'w', newline='')
# # create the csv writer object
csv_writer = csv.writer(data_file,delimiter = ";")
# Counter variable used for writing
# headers to the CSV file
count = 0
for row in loaded_json_data:
if count == 0:
# Writing headers of CSV file
header = row.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(row.values())
data_file.close()
def folder_path_creation(path: str):
if not os.path.exists(path):
os.makedirs(path)
return path
def json_open(complete_folder_path):
with open(complete_folder_path) as f:
json_to_load = json.load(f) # Modified "objectids" to "object_ids" for readability -sg
return json_to_load
def json_to_save(input_json, folder_path: str, file_name: str):
json_save_location = save_file(input_json, folder_path, file_name, "json")
return json_save_location
So how do I this starting from this?
for obj in json_content_utility_Networks:
Go from there?
Keep in mind that is JSON value has already one layer above every object for every object I need to start one layer below it.
So how do I this?

How to read Avro files from S3 in Python?

I have a bunch of Avro files that I would like to read one by one from S3. I have no problem reading the files as bytes but I am wondering how can you iterate over the entires after that. Current code:
conn = boto.s3.connect_to_region("us-east-1")
my_bucket=boto.s3.bucket.Bucket(conn, "my_bucket")
my_key = my_bucket.get_key("folder/file.avro")
raw_bytes = my_key.read()
test_schema = '''
{
"namespace": "com.company",
"type": "record",
"name": "MimeMessage_v2",
"fields": [
{
"name": "record_timestamp",
"type": "long"
},
{
"name": "contents",
"type": "bytes"
}
],
"message_id": 2
}
'''
schema = avro.schema.Parse(test_schema)
#this is the problematic section
dreader = DatumReader(schema, schema)
v = dreader.read(raw_bytes)
I am wondering how to read a variable containing bytes of a Avro file properly.
Here is one of the ways that worked for me in Python 3:
from avro.datafile import DataFileReader
avro_bytes = io.BytesIO(raw_bytes)
reader = DataFileReader(avro_bytes, avro.io.DatumReader())
for line in reader:
print(line)

Categories