How can I remove duplicate elements in a JSON file?

How can I remove duplicate elements in a JSON file? - python

I trying to write a program that will output information to me in the form of a JSON file, but I ran into the fact that I duplicate information with the "exchange_id" parameter, but with other info. I want information with this value to be displayed to me once. How can I do this?
Example of JSON file:
{
"exchange_name": "Huobi Global",
"market_url": "https://www.huobi.com/en-us/exchange/eth_btc",
"price": 48637.97781883441,
"last_update": "2021-12-09T17:11:53.000Z",
"exchange_id": 102
},
{
"exchange_name": "Huobi Global",
"market_url": "https://www.huobi.com/en-us/exchange/xrp_btc",
"price": 48656.10607332287,
"last_update": "2021-12-09T17:10:53.000Z",
"exchange_id": 102
},
My code:
json_data = response['data']
pairs = []
out_object = {}
exchangeIds = {102,311,200,302,521,433,482,406,42,400}
for pair in json_data["marketPairs"]:
if pair['exchangeId'] in set(exchangeIds):
pairs.append({
"exchange_name": pair["exchangeName"],
"market_url": pair["marketUrl"],
"price": pair["price"],
"last_update" : pair["lastUpdated"],
"exchange_id": pair["exchangeId"]
})
out_object["name_of_coin"] = json_data["name"]
out_object["marketPairs"] = pairs
out_object["pairs"] = json_data["numMarketPairs"]
name = json_data["name"]
save_path = '.\swapMarketCap\coins'
file_name = f'{name}'
path_to_coin = os.path.join(save_path, file_name +".json")

In the absence of more detail, here's a pattern that you could use. This does not purport to be rewrite of your code and also makes certain assumptions (in the absence of detail in the original question). However, it should help you:
response = {'data':
{'marketPairs': [{
"exchange_name": "Huobi Global",
"market_url": "https://www.huobi.com/en-us/exchange/eth_btc",
"price": 48637.97781883441,
"last_update": "2021-12-09T17:11:53.000Z",
"exchange_id": 102
},
{
"exchange_name": "Huobi Global",
"market_url": "https://www.huobi.com/en-us/exchange/xrp_btc",
"price": 48656.10607332287,
"last_update": "2021-12-09T17:10:53.000Z",
"exchange_id": 102
}
]}}
exchangeIds = {102, 311, 200, 302, 521, 433, 482, 406, 42, 400}
output = {'data': {'marketPairs': []}}
for d in response['data']['marketPairs']:
if (id := d.get('exchange_id', None)):
if id in exchangeIds:
output['data']['marketPairs'].append(d)
exchangeIds.remove(id)
print(output)
So, the underlying idea is that you have a set of IDs that you're interested in. Once you find one that's in the set, take the reference to the current dictionary and add it to the new dictionary (output). Then, crucially, remove the previously processed ID from the set.

Related

Creating dictionary need more than 1 value to unpack

I have two lists:
managers = ['john', 'karl', 'ricky']
sites = ['site1', 'site2', 'site3']
I also have a json file for each manager:
{
"results": [
{
"id": "employee1",
"user": {
"location": "site1"
}
},
{
"id": "employee2",
"user": {
"location": "site1"
}
},
{
"id": "employee3",
"user": {
"location": "site2"
}
},
{
"id": "employee4",
"user": {
"location": "site3"
}
}
]
}
So now, I want to create a dictionary by to categorize data by site! which is making it challenging for me. The output I need is to show employees for each site.
Here is the code I have:
myDict = dict()
for j in managers:
for i in range(len(sites)):
with open(""+j+"-data.json", 'r') as jasonRead:
json_object = json.loads(jasonRead.read())
for result in json_object['results']:
site = (result['user']['location'])
for employee, manager in map(str.split, str(site), managers[i]):
myDict.setdefault(managers[i], []).append(site)
The error I get:
File "test.py", line 25, in <module>
for employee, manager in map(str.split, str(site), managers[i]):
ValueError: need more than 1 value to unpack

Instead of looping over sites, just test if the site of the employee is in the list.
The keys of the resulting dictionary should be the site names, not the manager names.
mydict = {}
for manager in managers:
with open(f'{manager}-data.json') as jasonRead:
json_object = json.load(jasonRead)
for result in json_object['results']:
site = result['user']['location']
if site in sites:
mydict.setdefault(site, []).append(result['id'])
print(mydict)

How to merge non-fixed key json multilines into one json abstractly

If I have a heavy json file that have 30m entries like that
{"id":3,"price":"231","type":"Y","location":"NY"}
{"id":4,"price":"321","type":"N","city":"BR"}
{"id":5,"price":"354","type":"Y","city":"XE","location":"CP"}
--snip--
{"id":30373779,"price":"121","type":"N","city":"SR","location":"IU"}
{"id":30373780,"price":"432","type":"Y","location":"TB"}
{"id":30373780,"price":"562","type":"N","city":"CQ"}
how I can only abstract the location and the city and parse it into one json like that in python:
{
"orders":{
3:{
"location":"NY"
},
4:{
"city":"BR"
},
5:{
"city":"XE",
"location":"CP"
},
30373779:{
"city":"SR",
"location":"IU"
},
30373780:{
"location":"TB"
},
30373780:{
"city":"CQ"
}
}
}
P.S: beatufy the syntax is not necessary.

Assuming your input file is actually in jsonlines format, then you can read each line, extract the city and location keys from the dict and then append those to a new dict:
import json
from collections import defaultdict
orders = { 'orders' : defaultdict(dict) }
with open('orders.txt', 'r') as f:
for line in f:
o = json.loads(line)
id = o['id']
if 'location' in o:
orders['orders'][id]['location'] = o['location']
if 'city' in o:
orders['orders'][id]['city'] = o['city']
print(orders)
Output for your sample data (note it has two 30373780 id values, so the values get merged into one dict):
{
"orders": {
"3": {
"location": "NY"
},
"4": {
"city": "BR"
},
"5": {
"location": "CP",
"city": "XE"
},
"30373779": {
"location": "IU",
"city": "SR"
},
"30373780": {
"location": "TB",
"city": "CQ"
}
}
}

As you've said that your file is pretty big and you probably don't want to keep all entries in memory here is the way to consume source file line by line and write output immediately:
import json
with open(r"in.jsonp") as i_f, open(r"out.json", "w") as o_f:
o_f.write('{"orders":{')
for i in i_f:
i_obj = json.loads(i)
o_f.write(f'{i_obj["id"]}:')
o_obj = {}
if location := i_obj.get("location"):
o_obj["location"] = location
if city := i_obj.get("city"):
o_obj["city"] = city
json.dump(o_obj, o_f)
o_f.write(",")
o_f.write('}}')
It will generate semi-valid JSON object in same format you've provided in your question.

Extract data from json, append using for loop and save as CSV

I have extracted id, username, and name for 100 followers for 102 politicians using Tweepy. The data is stored in a JSON file named pol_followers. Now I wish to append id and username and save it as a CSV file using the function below. However, when using the function in the last line append_followers_to_csv(pol_followers, "pol_followers.csv") I get the error seen at the bottom.
# Structure of pol_followers. The full pol_followers is much longer...
print(json.dumps(pol_followers, indent=4, sort_keys=True)) # see json data structure
[
{
"data": [
{
"id": "1464206217807601666",
"name": "terry alex",
"username": "terryal51850644"
},
{
"id": "1479032154394968064",
"name": "Charles Williams",
"username": "Charles99924770"
},
{
"id": "2526015770",
"name": "LISA P",
"username": "LISAP0910"
},
{
"id": "2957692520",
"name": "fayaz ahmad",
"username": "ahmadfayaz202"
}
],
"meta": {
"next_token": "F6HS7IU5SRGHEZZZ",
"result_count": 100
}
},
{
"data": [
{
"id": "2482703136",
"name": "HieuVu",
"username": "sachieuhaihanh"
},
{
"id": "580882148",
"name": "Maxine D. Harmon",
"username": "maxxximd"
},
{
"id": "1478867472841334787",
"name": "RBPsych1",
"username": "RBPsych1"
# Create file
csv_follower_file = open("pol_followers.csv", "a", newline="", encoding='utf-8')
csv_follower_writer = csv.writer(csv_follower_file)
# Create headers for the data I want to save. I only want to save these columns in my dataset
csv_follower_writer.writerow(
['id', 'username'])
csv_follower_file.close()\
def append_followers_to_csv(pol_followers, csv_follower_file):
# A counter variable
global follower_id, username
counter = 0
# Open OR create the target CSV file
csv_follower_file = open(csv_follower_file, "a", newline="", encoding='utf-8')
csv_follower_writer = csv.writer(csv_follower_file)
for ids in pol_followers['data']:
# 1. follower ID
follower_id = ids['id']
# 2. follower username
username = ids['username']
# Assemble all data in a list
ress = [follower_id, username]
# Append the result to the CSV file
csv_follower_writer.writerow(ress)
counter += 1
# When done, close the CSV file
csvFile.close()
# Print the number of tweets for this iteration
print("# of Tweets added from this response: ", counter)
append_followers_to_csv(pol_followers, "pol_followers.csv") # Save tweet data in a csv file
File "<input>", line 1, in <module>
File "<input>", line 11, in append_followers_to_csv
TypeError: list indices must be integers or slices, not str

You are just missing additional loop, like so:
for each_dict in pol_followers:
for ids in each_dict['data']:
follower_id = ids['id']
username = ids['username']

You seem to have wrapped your JSON object in a list, so instead of getting the 'data' bit of the JSON, you are getting the 'data'th element of a list when you are iterating in your append_followers_to_csv function, which you can't do in python. Try removing the square brackets around the JSON or making it for ids in pol_followers[0]['data'].

Reading and parsing JSON with ijson [duplicate]

I have the following data in my JSON file:
{
"first": {
"name": "James",
"age": 30
},
"second": {
"name": "Max",
"age": 30
},
"third": {
"name": "Norah",
"age": 30
},
"fourth": {
"name": "Sam",
"age": 30
}
}
I want to print the top-level key and object as follows:
import json
import ijson
fname = "data.json"
with open(fname) as f:
raw_data = f.read()
data = json.loads(raw_data)
for k in data.keys():
print k, data[k]
OUTPUT:
second {u'age': 30, u'name': u'Max'}
fourth {u'age': 30, u'name': u'Sam'}
third {u'age': 30, u'name': u'Norah'}
first {u'age': 30, u'name': u'James'}
So, far so good. However if I want to this same thing for a huge file, I would have to read it all in-memory. This very slow and requires lots of memory.
I want use an incremental JSON parser ( ijson in this case ) to achieve what I described earlier:
The above code was taken from: No access to top level elements with ijson?
with open(fname) as f:
json_obj = ijson.items(f,'').next() # '' loads everything as only one object.
for (key, value) in json_obj.items():
print key + " -> " + str(value)
This is not suitable either, because it also reads the whole file in memory. This not truly incremental.
How can I do incremental parsing of top-level keys and corresponding objects, of a JSON file in Python?

Since essentially json files are text files, consider stripping the top level as string. Basically, use a read file iterable approach where you concatenate a string with each line and then break out of the loop once the string contains the double braces }} signaling the end of the top level. Of course the double brace condition must strip out spaces and line breaks.
toplevelstring = ''
with open('data.json') as f:
for line in f:
if not '}}' in toplevelstring.replace('\n', '').replace('\s+',''):
toplevelstring = toplevelstring + line
else:
break
data = json.loads(toplevelstring)
Now if your larger json is wrapped in square brackets or other braces, still run above routine but add the below line to slice out first character, [, and last two characters for comma and line break after top level's final brace:
[{
"first": {
"name": "James",
"age": 30
},
"second": {
"name": "Max",
"age": 30
},
"third": {
"name": "Norah",
"age": 30
},
"fourth": {
"name": "Sam",
"age": 30
}
},
{
"data1": {
"id": "AAA",
"type": 55
},
"data2": {
"id": "BBB",
"type": 1601
},
"data3": {
"id": "CCC",
"type": 817
}
}]
...
toplevelstring = toplevelstring[1:-2]
data = json.loads(toplevelstring)

Since version 2.6 ijson comes with a kvitems function that achieves exactly this.

Answer from github issue [file name changed]
import ijson
from ijson.common import ObjectBuilder
def objects(file):
key = '-'
for prefix, event, value in ijson.parse(file):
if prefix == '' and event == 'map_key': # found new object at the root
key = value # mark the key value
builder = ObjectBuilder()
elif prefix.startswith(key): # while at this key, build the object
builder.event(event, value)
if event == 'end_map': # found the end of an object at the current key, yield
yield key, builder.value
for key, value in objects(open('data.json', 'rb')):
print(key, value)

TypeError: string indices must be integers // working with JSON as dict in python

Okay, so I've been banging my head on this for the last 2 days, with no real progress. I am a beginner with python and coding in general, but this is the first issue I haven't been able to solve myself.
So I have this long file with JSON formatting with about 7000 entries from the youtubeapi.
right now I want to have a short script to print certain info ('videoId') for a certain dictionary key (refered to as 'key'):
My script:
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key']['Items']['id']['videoId'])
# print(trailers['key']['videoId'] gives same response
Error:
print(trailers['key']['Items']['id']['videoId'])
TypeError: string indices must be integers
It does work when I want to print all the information for the dictionary key:
This script works
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key'])
Also print(type(trailers)) results in class 'dict', as it's supposed to.
My JSON File is formatted like this and is from the youtube API, youtube#searchListResponse.
{
"kind": "youtube#searchListResponse",
"etag": "",
"nextPageToken": "",
"regionCode": "",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 1
},
"items": [
{
"kind": "youtube#searchResult",
"etag": "",
"id": {
"kind": "youtube#video",
"videoId": ""
},
"snippet": {
"publishedAt": "",
"channelId": "",
"title": "",
"description": "",
"thumbnails": {
"default": {
"url": "",
"width": 120,
"height": 90
},
"medium": {
"url": "",
"width": 320,
"height": 180
},
"high": {
"url": "",
"width": 480,
"height": 360
}
},
"channelTitle": "",
"liveBroadcastContent": "none"
}
}
]
}
What other information is needed to be given for you to understand the problem?

The following code gives me all the videoId's from the provided sample data (which is no id's at all in fact):
import json
with open('sampledata', 'r') as datafile:
data = json.loads(datafile.read())
print([item['id']['videoId'] for item in data['items']])
Perhaps you can try this with more data.
Hope this helps.

I didn't really look into the youtube api but looking at the code and the sample you gave it seems you missed out a [0]. Looking at the structure of json there's a list in key items.
import json
f = open ('json1.json', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['items'][0]['id']['videoId'])

I've not used json before at all. But it's basically imported in the form of dicts with more dicts, lists etc. Where applicable. At least from my understanding.
So when you do type(trailers) you get type dict. Then you do dict with trailers['key']. If you do type of that, it should also be a dict, if things work correctly. Working through the items in each dict should in the end find your error.
Pythons error says you are trying find the index/indices of a string, which only accepts integers, while you are trying to use a dict. So you need to find out why you are getting a string and not dict when using each argument.
Edit to add an example. If your dict contains a string on key 'item', then you get a string in return, not a new dict which you further can get a dict from. item in the json for example, seem to be a list, with dicts in it. Not a dict itself.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I remove duplicate elements in a JSON file? - python

Related

Creating dictionary need more than 1 value to unpack

How to merge non-fixed key json multilines into one json abstractly

Extract data from json, append using for loop and save as CSV

Reading and parsing JSON with ijson [duplicate]

TypeError: string indices must be integers // working with JSON as dict in python

Categories

Resources