Split Pandas DataFrame objects into 2 JSON objects - python

I have a requirement to send JSON data via API Rest call, but the files need to be separate as there is a file size limit.
I currently collect the data from a HANA database, and split it into 2 files:
connection = dbapi.connect(address='<>',port='<>',user='<>',password='<>')
df = pd.read_sql('''QUERY''', connection)
length_df = len(df)
length_half = math.ceil(len(df) / 2)
if length_df > length_half:
payload_1 = df[:length_half]
payload_2 = df[length_half:]
This returns 2 perfectly separated DF's, that I need to process as JSON objects.
After this has been done, I iterate over these DF's separately, and put them into the required format that the client wants it as:
item_data_load_1 = {}
for row in payload_1.itertuples(index=False):
list = {
"itemId": {
"primaryId": row[0].lstrip('0')
},
"description": {
"languageCode": "",
"descriptionType": "",
"value": row[25]
},
"classifications": {
"itemType": row[7]
},
"tradeItemBaseUnitOfMeasure": row[8],
"demandUnitInformation": {
"demandUnitName": row[0].lstrip('0'),
"description": {
"value": row[25]
},
"actionCode": "",
"demandUnitDetails": {
"demandUnitBaseUnitOfMeasure": row[8],
"brandName": row[6],
"unitSize": row[19],
"handlingInstruction": {
"handlingInstructionCode": ""
}
}
}
item_data_load_1.append(list)
item_data_load_2 = {}
for row in payload_2.itertuples(index=False):
list = {
"itemId": {
"primaryId": row[0].lstrip('0')
},
"description": {
"languageCode": "",
"descriptionType": "",
"value": row[25]
},
"classifications": {
"itemType": row[7]
},
"tradeItemBaseUnitOfMeasure": row[8],
"demandUnitInformation": {
"demandUnitName": row[0].lstrip('0'),
"description": {
"value": row[25]
},
"actionCode": "",
"demandUnitDetails": {
"demandUnitBaseUnitOfMeasure": row[8],
"brandName": row[6],
"unitSize": row[19],
"handlingInstruction": {
"handlingInstructionCode": ""
}
}
}
item_data_load_2.append(list)
I do this for both payloads, and then dump it into a JSON object:
json_data_1 = json.dumps(item_data_load_1)
json_data_2 = json.dumps(item_data_load_2)
However, the first file I produce has the correct amount of records in it, but the second file has double that - it's like it's taking the data from the first payload, and then also appending the other half of the second payload (essentially making it one big file).
Row count of 1st dataframe:
44938
Row count of 2nd dataframe:
44938
Row count of full dataframe:
89876
Length of 1st payload:
39139770
Length of 2nd payload:
78279080
This should not occur, though, as it should just produce 2 JSON files with the same record counts.
I cannot share the full script as it has sensitive information in it.
Can someone please explain to me why this is happening?
Thanks in advance.

Related

am getting identical sha256 for each json file in python

I am in a huge hashing crisis. Using the chip-0007's default format I generatedfew JSON files. Using these files I have been trying to generate sha256 hash value. And I expect a unique hash value for each file.
However, python code isn't doing so. I thought there might be some issue with JSON file but, it is not. Something is to do with sha256 code.
All the json files ->
JSON File 1
{ "format": "CHIP-0007", "name": "adewale-the-amebo", "description": "Adewale always wants to be in everyone's business.", "attributes": [ { "trait_type": "Gender", "value": "male" } ], "collection": { "name": "adewale-the-amebo Collection", "id": "1" } }
JSON File 2
{ "format": "CHIP-0007", "name": "alli-the-queeny", "description": "Alli is an LGBT Stan.", "attributes": [ { "trait_type": "Gender", "value": "male" } ], "collection": { "name": "alli-the-queeny Collection", "id": "2" } }
JSON File 3
{ "format": "CHIP-0007", "name": "aminat-the-snnobish", "description": "Aminat never really wants to talk to anyone.", "attributes": [ { "trait_type": "Gender", "value": "female" } ], "collection": { "name": "aminat-the-snnobish Collection", "id": "3" } }
Sample CSV File:
Series Number,Filename,Description,Gender
1,adewale-the-amebo,Adewale always wants to be in everyone's business.,male
2,alli-the-queeny,Alli is an LGBT Stan.,male
3,aminat-the-snnobish,Aminat never really wants to talk to anyone.,female
Python CODE
TODO 2 : Generate a JSON file per entry in team's sheet in CHIP-0007's default format
new_jsonFile = f"{row[1]}.json"
json_data = {}
json_data["format"] = "CHIP-0007"
json_data["name"] = row[1]
json_data["description"] = row[2]
attribute_data = {}
attribute_data["trait_type"] = "Gender" # gender
attribute_data["value"] = row[3] # "value/male/female"
json_data["attributes"] = [attribute_data]
collection_data = {}
collection_data["name"] = f"{row[1]} Collection"
collection_data["id"] = row[0] # "ID of the NFT collection"
json_data["collection"] = collection_data
filepath = f"Json_Files/{new_jsonFile}"
with open(filepath, 'w') as f:
json.dump(json_data, f, indent=2)
C += 1
sha256_hash = sha256_gen(filepath)
temp.append(sha256_hash)
NEW.append(temp)
# TODO 3 : Calculate sha256 of the each entry
def sha256_gen(fn):
return hashlib.sha256(open(fn, 'rb').read()).hexdigest()
How can I generate a unique sha256 hash for each JSON?
I tried reading in byte blocks. That is also not working out. After many trials, I am going nowhere. Sharing the unexpected outputs of each JSON file:
[ All hashes are identical ]
Unexpected SHA256 output:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Expected:
Unique Hash value. Different from each other
Because of output buffering, you're calling sha256_gen(filepath) before anything is written to the file, so you're getting the hash of an empty file. You should do that outside the with, so that the JSON file is closed and the buffer is flushed.
with open(filepath, 'w') as f:
json.dump(json_data, f, indent=2)
C += 1
sha256_hash = sha256_gen(filepath)
temp.append(sha256_hash)
NEW.append(temp)

How to merge non-fixed key json multilines into one json abstractly

If I have a heavy json file that have 30m entries like that
{"id":3,"price":"231","type":"Y","location":"NY"}
{"id":4,"price":"321","type":"N","city":"BR"}
{"id":5,"price":"354","type":"Y","city":"XE","location":"CP"}
--snip--
{"id":30373779,"price":"121","type":"N","city":"SR","location":"IU"}
{"id":30373780,"price":"432","type":"Y","location":"TB"}
{"id":30373780,"price":"562","type":"N","city":"CQ"}
how I can only abstract the location and the city and parse it into one json like that in python:
{
"orders":{
3:{
"location":"NY"
},
4:{
"city":"BR"
},
5:{
"city":"XE",
"location":"CP"
},
30373779:{
"city":"SR",
"location":"IU"
},
30373780:{
"location":"TB"
},
30373780:{
"city":"CQ"
}
}
}
P.S: beatufy the syntax is not necessary.
Assuming your input file is actually in jsonlines format, then you can read each line, extract the city and location keys from the dict and then append those to a new dict:
import json
from collections import defaultdict
orders = { 'orders' : defaultdict(dict) }
with open('orders.txt', 'r') as f:
for line in f:
o = json.loads(line)
id = o['id']
if 'location' in o:
orders['orders'][id]['location'] = o['location']
if 'city' in o:
orders['orders'][id]['city'] = o['city']
print(orders)
Output for your sample data (note it has two 30373780 id values, so the values get merged into one dict):
{
"orders": {
"3": {
"location": "NY"
},
"4": {
"city": "BR"
},
"5": {
"location": "CP",
"city": "XE"
},
"30373779": {
"location": "IU",
"city": "SR"
},
"30373780": {
"location": "TB",
"city": "CQ"
}
}
}
As you've said that your file is pretty big and you probably don't want to keep all entries in memory here is the way to consume source file line by line and write output immediately:
import json
with open(r"in.jsonp") as i_f, open(r"out.json", "w") as o_f:
o_f.write('{"orders":{')
for i in i_f:
i_obj = json.loads(i)
o_f.write(f'{i_obj["id"]}:')
o_obj = {}
if location := i_obj.get("location"):
o_obj["location"] = location
if city := i_obj.get("city"):
o_obj["city"] = city
json.dump(o_obj, o_f)
o_f.write(",")
o_f.write('}}')
It will generate semi-valid JSON object in same format you've provided in your question.

Extract data from json, append using for loop and save as CSV

I have extracted id, username, and name for 100 followers for 102 politicians using Tweepy. The data is stored in a JSON file named pol_followers. Now I wish to append id and username and save it as a CSV file using the function below. However, when using the function in the last line append_followers_to_csv(pol_followers, "pol_followers.csv") I get the error seen at the bottom.
# Structure of pol_followers. The full pol_followers is much longer...
print(json.dumps(pol_followers, indent=4, sort_keys=True)) # see json data structure
[
{
"data": [
{
"id": "1464206217807601666",
"name": "terry alex",
"username": "terryal51850644"
},
{
"id": "1479032154394968064",
"name": "Charles Williams",
"username": "Charles99924770"
},
{
"id": "2526015770",
"name": "LISA P",
"username": "LISAP0910"
},
{
"id": "2957692520",
"name": "fayaz ahmad",
"username": "ahmadfayaz202"
}
],
"meta": {
"next_token": "F6HS7IU5SRGHEZZZ",
"result_count": 100
}
},
{
"data": [
{
"id": "2482703136",
"name": "HieuVu",
"username": "sachieuhaihanh"
},
{
"id": "580882148",
"name": "Maxine D. Harmon",
"username": "maxxximd"
},
{
"id": "1478867472841334787",
"name": "RBPsych1",
"username": "RBPsych1"
# Create file
csv_follower_file = open("pol_followers.csv", "a", newline="", encoding='utf-8')
csv_follower_writer = csv.writer(csv_follower_file)
# Create headers for the data I want to save. I only want to save these columns in my dataset
csv_follower_writer.writerow(
['id', 'username'])
csv_follower_file.close()\
def append_followers_to_csv(pol_followers, csv_follower_file):
# A counter variable
global follower_id, username
counter = 0
# Open OR create the target CSV file
csv_follower_file = open(csv_follower_file, "a", newline="", encoding='utf-8')
csv_follower_writer = csv.writer(csv_follower_file)
for ids in pol_followers['data']:
# 1. follower ID
follower_id = ids['id']
# 2. follower username
username = ids['username']
# Assemble all data in a list
ress = [follower_id, username]
# Append the result to the CSV file
csv_follower_writer.writerow(ress)
counter += 1
# When done, close the CSV file
csvFile.close()
# Print the number of tweets for this iteration
print("# of Tweets added from this response: ", counter)
append_followers_to_csv(pol_followers, "pol_followers.csv") # Save tweet data in a csv file
File "<input>", line 1, in <module>
File "<input>", line 11, in append_followers_to_csv
TypeError: list indices must be integers or slices, not str
You are just missing additional loop, like so:
for each_dict in pol_followers:
for ids in each_dict['data']:
follower_id = ids['id']
username = ids['username']
You seem to have wrapped your JSON object in a list, so instead of getting the 'data' bit of the JSON, you are getting the 'data'th element of a list when you are iterating in your append_followers_to_csv function, which you can't do in python. Try removing the square brackets around the JSON or making it for ids in pol_followers[0]['data'].

Convert multiple string stored in a variable into a single list in python

I hope everyone is doing well.
I need a little help where I need to get all the strings from a variable and need to store into a single list in python.
For example -
I have json file from where I am getting ids and all the ids are getting stored into a variable called id as below when I run print(id)
17298626-991c-e490-bae6-47079c6e2202
17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b
172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5
172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd
17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0
1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4
172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d
1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566
17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23
17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a
I want these all values to be stored into a single list.
Can some please help me out with this.
Below is the code I am using:
import json
with open('requests.json') as f:
data = json.load(f)
print(type(data))
for i in data:
if 'traceId' in i:
id = i['traceId']
newid = id.split()
#print(type(newid))
print(newid)
And below is my json file looks like:
[
{
"id": "376287298-hjd8-jfjb-khkf-6479280283e9",
"submittedTime": 1591692502558,
"traceId": "17298626-991c-e490-bae6-47079c6e2202",
"userName": "ABC",
"onlyChanged": true,
"description": "Not Required",
"startTime": 1591694487929,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "16b22a09-a840-f4d9-f42a-64fd73fece57",
"name": "XYZ"
},
"applicationProcess": {
"id": "dihihdosfj9279278yrie8ue",
"name": "Deploy",
"version": 12
},
"environment": {
"id": "fkjdshkjdshglkjdshgldshldsh03r937837",
"name": "DEV"
},
"snapshot": {
"id": "djnglkfdglki98478yhgjh48yr844h",
"name": "DEV_snapshot"
},
},
{
"id": "17298495-f060-3e9d-7097-1f86d5160789",
"submittedTime": 1591692844597,
"traceId": "17298496-19bd-2f89-7b5f-881921abc632",
"userName": "UYT,
"onlyChanged": true,
"startTime": 1591692845543,
"result": "NONE",
"state": "EXECUTING",
"paused": false,
"application": {
"id": "osfodsho883793hgjbv98r3098w",
"name": "QA"
},
"applicationProcess": {
"id": "owjfoew028r2uoieroiehojehfoef",
"name": "EDC",
"version": 5
},
"environment": {
"id": "16cf69c5-4194-e557-707d-0663afdbceba",
"name": "DTESTU"
},
}
]
From where I am trying to get the traceId.
you could use simple split method like the follwing:
ids = '''17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632 17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0 172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322 17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029 1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c 17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860 1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb 172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1 17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e 17298825-7449-2fcb-378e-13671cb4688a'''
l = ids.split(" ")
print(l)
This will give the following result, I assumed that the separator needed is simple space you can adjust properly:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632', '17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0', '172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322', '17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029', '1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c', '17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860', '1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb', '172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1', '17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e', '17298825-7449-2fcb-378e-13671cb4688a']
Edit
You get list of lists because each iteration you read only 1 id, so what you need to do is to initiate an empty list and append each id to it in the following way:
l = []
for i in data
if 'traceId' in i:
id = i['traceId']
l.append(id)
you can append the ids variable to the list such as,
#list declaration
l1=[]
#this must be in your loop
l1.append(ids)
I'm assuming you get the id as a str type value. Using id.split() will return a list of all ids in one single Python list, as each id is separated by space here in your example.
id = """17298626-991c-e490-bae6-47079c6e2202 17298496-19bd-2f89-7b5f-881921abc632
17298698-3e17-7a9b-b337-aacfd9483b1b 172986ac-d91d-c4ea-2e50-d53700480dd0
172986d0-18aa-6f51-9c62-6cb087ad31e5 172986f4-80f0-5c21-3aee-12f22a5f4322
17298712-a4ac-7b36-08e9-8512fa8322dd 17298747-8cc6-d9d0-8d05-50adf228c029
1729875c-050f-9a99-4850-bb0e6ad35fb0 1729875f-0d50-dc94-5515-b4891c40d81c
17298761-c26b-3ce5-e77e-db412c38a5b4 172987c8-2b5d-0d94-c365-e8407b0a8860
1729881a-e583-2b54-3a52-d092020d9c1d 1729881c-64a2-67cf-d561-6e5e38ed14cb
172987ec-7a20-7eb6-3ebe-a9fb621bb566 17298813-7ac4-258b-d6f9-aaf43f9147b1
17298813-f1ef-d28a-0817-5f3b86c3cf23 17298828-b62b-9ee6-248b-521b0663226e
17298825-7449-2fcb-378e-13671cb4688a"""
id_list = id.split()
print(id_list)
Output:
['17298626-991c-e490-bae6-47079c6e2202', '17298496-19bd-2f89-7b5f-881921abc632',
'17298698-3e17-7a9b-b337-aacfd9483b1b', '172986ac-d91d-c4ea-2e50-d53700480dd0',
'172986d0-18aa-6f51-9c62-6cb087ad31e5', '172986f4-80f0-5c21-3aee-12f22a5f4322',
'17298712-a4ac-7b36-08e9-8512fa8322dd', '17298747-8cc6-d9d0-8d05-50adf228c029',
'1729875c-050f-9a99-4850-bb0e6ad35fb0', '1729875f-0d50-dc94-5515-b4891c40d81c',
'17298761-c26b-3ce5-e77e-db412c38a5b4', '172987c8-2b5d-0d94-c365-e8407b0a8860',
'1729881a-e583-2b54-3a52-d092020d9c1d', '1729881c-64a2-67cf-d561-6e5e38ed14cb',
'172987ec-7a20-7eb6-3ebe-a9fb621bb566', '17298813-7ac4-258b-d6f9-aaf43f9147b1',
'17298813-f1ef-d28a-0817-5f3b86c3cf23', '17298828-b62b-9ee6-248b-521b0663226e',
'17298825-7449-2fcb-378e-13671cb4688a']
split() splits by default with space as a separator. You can use the sep argument to use any other separator if needed.

Grab element from json dump

I'm using the following python code to connect to a jsonrpc server and nick some song information. However, I can't work out how to get the current title in to a variable to print elsewhere. Here is the code:
TracksInfo = []
for song in playingSongs:
data = { "id":1,
"method":"slim.request",
"params":[ "",
["songinfo",0,100, "track_id:%s" % song, "tags:GPASIediqtymkovrfijnCYXRTIuwxN"]
]
}
params = json.dumps(data, sort_keys=True, indent=4)
conn.request("POST", "/jsonrpc.js", params)
httpResponse = conn.getresponse()
data = httpResponse.read()
responce = json.loads(data)
print json.dumps(responce, sort_keys=True, indent=4)
TrackInfo = responce['result']["songinfo_loop"][0]
TracksInfo.append(TrackInfo)
This brings me back the data in json format and the print json.dump brings back:
pi#raspberrypi ~/pithon $ sudo python tom3.py
{
"id": 1,
"method": "slim.request",
"params": [
"",
[
"songinfo",
"0",
100,
"track_id:-140501481178464",
"tags:GPASIediqtymkovrfijnCYXRTIuwxN"
]
],
"result": {
"songinfo_loop": [
{
"id": "-140501481178464"
},
{
"title": "Witchcraft"
},
{
"artist": "Pendulum"
},
{
"duration": "253"
},
{
"tracknum": "1"
},
{
"type": "Ogg Vorbis (Spotify)"
},
{
"bitrate": "320k VBR"
},
{
"coverart": "0"
},
{
"url": "spotify:track:2A7ZZ1tjaluKYMlT3ItSfN"
},
{
"remote": 1
}
]
}
}
What i'm trying to get is result.songinfoloop.title (but I tried that!)
The songinfo_loop structure is.. peculiar. It is a list of dictionaries each with just one key.
Loop through it until you have one with a title:
TrackInfo = next(d['title'] for d in responce['result']["songinfo_loop"] if 'title' in d)
TracksInfo.append(TrackInfo)
A better option would be to 'collapse' all those dictionaries into one:
songinfo = reduce(lambda d, p: d.update(p) or d,
responce['result']["songinfo_loop"], {})
TracksInfo.append(songinfo['title'])
songinfo_loop is a list not a dict. That means you need to call it by position, or loop through it and find the dict with a key value of "title"
positional:
responce["result"]["songinfo_loop"][1]["title"]
loop:
for info in responce["result"]["songinfo_loop"]:
if "title" in info.keys():
print info["title"]
break
else:
print "no song title found"
Really, it seems like you would want to have the songinfo_loop be a dict, not a list. But if you need to leave it as a list, this is how you would pull the title.
The result is really a standard python dict, so you can use
responce["result"]["songinfoloop"]["title"]
which should work

Categories