Converting JSON to csv in a useful format

Converting JSON to csv in a useful format - python

I am pretty new to python and started to use requests for downloading data.
Up to now I am doing fine, but I always do have issues converting JSON data to csv.
Applying the code below ...
import requests
import pandas as pd
import json
url = "https://api.coinstats.app/public/v1/coins?skip=0&limit=100"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
coin_data = json.loads(response.text)
getpost = pd.DataFrame (coin_data)
getpost.to_csv('getpost_1.csv')
... I do get data in the plain vanilla JSON format.
My question is and I am starting from scratch: what else has to be done to fix to code getting readable csv-files (without ids and others)?
0 {'id': 'bitcoin', 'icon': 'https://static.coin...
1 {'id': 'ethereum', 'icon': 'https://static.coi...
2 {'id': 'binance-coin', 'icon': 'https://static...
3 {'id': 'tether', 'icon': 'https://static.coins...
4 {'id': 'solana', 'icon': 'https://static.coins...
.. ...
95 {'id': 'eth_frax3crv-f_0xd632f22692fac7611d2aa...
96 {'id': 'sushi', 'icon': 'https://static.coinst...
97 {'id': 'paxos-standard-token', 'icon': 'https:...
98 {'id': 'compound-governance-token', 'icon': 'h...
99 {'id': 'nem', 'icon': 'https://static.coinstat...
[100 rows x 1 columns]`
Bernd

Related

Group the data and convert to json data

I have a data frame with 150 rows and sample two rows mentioned below. Need to convert the data to json data like below.
Input:
artwork_id creator_id department_id art_work creator department
0 86508 29993 21 {'id': '86508', 'accession_number': '2015.584'... {'id': '29993', 'role': 'artist', 'description... {'id': '21', 'name': 'Prints'}
1 86508 68000 21 {'id': '86508', 'accession_number': '2015.584'... {'id': '68000', 'role': 'printer', 'descriptio... {'id': '21', 'name': 'Prints'}
desired output:
Attached as image
I have tried using below code
df.groupby(['artwork_id']).agg(lambda x: list(x))
df.to_json(orient = 'records')

Do you get the right format if you do the following:
result = df.to_json(orient="records")
parsed = json.loads(result)
json.dumps(parsed, indent=4)
or
grouped_art=df.groupby(['artwork_id']).agg(lambda x: list(x))
result = grouped_art.to_json(orient="records")
parsed = json.loads(result)
json.dumps(parsed, indent=4)

Iterate on nested json through python

I am having a json data coming through API and i need to print some specific values through it and i am using the code to get the same but its giving me keyerror : results
import json
import requests
headers = {"Content-Type":"application/json","Accept":"application/json"}
r = requests.get('API', headers=headers)
data=r.text
parse_json=json.loads(data)
for result in parse_json['results']:
for cpu in result['cpumfs']:
print(cpu.get('pct_im_utilization'))
Below is the json data :
{'d': {'__count': '0', 'results': [{'ID': '6085', 'Name': 'device1', 'DisplayName': None, 'DisplayDescription': None, 'cpumfs': {'results': [{'ID': '6117', 'Timestamp': '1649157300', 'DeviceItemID': '6085', 'pct_im_Utilization': '4.0'}, {'ID': '6117', 'Timestamp': '1649157600', 'DeviceItemID': '6085', 'pct_im_Utilization': '1.0'}, {'ID': '6117', 'Timestamp': '1649157900', 'DeviceItemID': '6085', 'pct_im_Utilization': '4.0'}, {'ID': '6117', 'Timestamp': '1649158200', 'DeviceItemID': '6085', 'pct_im_Utilization': '1.0'},
I need to get printed Name of the device ,Timestamp,pct_im_utlization

Extract the Pokemon names only from PokeAPI

I am trying to use the PokeAPI to extract all pokemon names for a personal project to help build API comfort. I have been having issues with the Params specifically. Can someone please provide support or resources to simplify data grabbing with JSON. Here is the code I have written so far, which returns the entire data set.
import json
from unicodedata import name
import requests
from pprint import PrettyPrinter
pp = PrettyPrinter()
url = ("https://pokeapi.co/api/v2/ability/1/")
params = {
name : "garbodor"
}
def main():
r= requests.get(url)
status = r.status_code
if status != 200:
quit()
else:
get_pokedex(status)
def get_pokedex(x):
print("status code: ", + x) # redundant check for status code before the program begins.
response = requests.get(url, params = params).json()
pp.pprint(response)
main()
Website link: https://pokeapi.co/docs/v2#pokemon-section working specifically with the pokemon group.

I have no idea what values you want but response is a dictionary with lists and you can use keys and indexes (with for-loops) to select elements from response - ie. response["names"][0]["name"]
Minimal working example
Name or ID has to be added at the end of URL.
import requests
import pprint as pp
name_or_id = "stench" # name
#name_or_id = 1 # id
url = "https://pokeapi.co/api/v2/ability/{}/".format(name_or_id)
response = requests.get(url)
if response.status_code != 200:
print(response.text)
else:
data = response.json()
#pp.pprint(data)
print('\n--- data.keys() ---\n')
print(data.keys())
print('\n--- data["name"] ---\n')
print(data['name'])
print('\n--- data["names"] ---\n')
pp.pprint(data["names"])
print('\n--- data["names"][0]["name"] ---\n')
print(data['names'][0]['name'])
print('\n--- language : name ---\n')
names = []
for item in data["names"]:
print(item['language']['name'],":", item["name"])
names.append( item["name"] )
print('\n--- after for-loop ---\n')
print(names)
Result:
--- data.keys() ---
dict_keys(['effect_changes', 'effect_entries', 'flavor_text_entries', 'generation', 'id', 'is_main_series', 'name', 'names', 'pokemon'])
--- data["name"] ---
stench
--- data["names"] ---
[{'language': {'name': 'ja-Hrkt',
'url': 'https://pokeapi.co/api/v2/language/1/'},
'name': 'あくしゅう'},
{'language': {'name': 'ko', 'url': 'https://pokeapi.co/api/v2/language/3/'},
'name': '악취'},
{'language': {'name': 'zh-Hant',
'url': 'https://pokeapi.co/api/v2/language/4/'},
'name': '惡臭'},
{'language': {'name': 'fr', 'url': 'https://pokeapi.co/api/v2/language/5/'},
'name': 'Puanteur'},
{'language': {'name': 'de', 'url': 'https://pokeapi.co/api/v2/language/6/'},
'name': 'Duftnote'},
{'language': {'name': 'es', 'url': 'https://pokeapi.co/api/v2/language/7/'},
'name': 'Hedor'},
{'language': {'name': 'it', 'url': 'https://pokeapi.co/api/v2/language/8/'},
'name': 'Tanfo'},
{'language': {'name': 'en', 'url': 'https://pokeapi.co/api/v2/language/9/'},
'name': 'Stench'},
{'language': {'name': 'ja', 'url': 'https://pokeapi.co/api/v2/language/11/'},
'name': 'あくしゅう'},
{'language': {'name': 'zh-Hans',
'url': 'https://pokeapi.co/api/v2/language/12/'},
'name': '恶臭'}]
--- data["names"][0]["name"] ---
あくしゅう
--- language : name ---
ja-Hrkt : あくしゅう
ko : 악취
zh-Hant : 惡臭
fr : Puanteur
de : Duftnote
es : Hedor
it : Tanfo
en : Stench
ja : あくしゅう
zh-Hans : 恶臭
--- after for-loop ---
['あくしゅう', '악취', '惡臭', 'Puanteur', 'Duftnote', 'Hedor', 'Tanfo', 'Stench', 'あくしゅう', '恶臭']
EDIT:
Another example with other URL and with parameters limit and offset.
I use for-loop to run with different offset (0, 100, 200, etc.)
import requests
import pprint as pp
url = "https://pokeapi.co/api/v2/pokemon/"
params = {'limit': 100}
for offset in range(0, 1000, 100):
params['offset'] = offset # add new value to dict with `limit`
response = requests.get(url, params=params)
if response.status_code != 200:
print(response.text)
else:
data = response.json()
#pp.pprint(data)
for item in data['results']:
print(item['name'])
Result (first 100 items):
bulbasaur
ivysaur
venusaur
charmander
charmeleon
charizard
squirtle
wartortle
blastoise
caterpie
metapod
butterfree
weedle
kakuna
beedrill
pidgey
pidgeotto
pidgeot
rattata
raticate
spearow
fearow
ekans
arbok
pikachu
raichu
sandshrew
sandslash
nidoran-f
nidorina
nidoqueen
nidoran-m
nidorino
nidoking
clefairy
clefable
vulpix
ninetales
jigglypuff
wigglytuff
zubat
golbat
oddish
gloom
vileplume
paras
parasect
venonat
venomoth
diglett
dugtrio
meowth
persian
psyduck
golduck
mankey
primeape
growlithe
arcanine
poliwag
poliwhirl
poliwrath
abra
kadabra
alakazam
machop
machoke
machamp
bellsprout
weepinbell
victreebel
tentacool
tentacruel
geodude
graveler
golem
ponyta
rapidash
slowpoke
slowbro
magnemite
magneton
farfetchd
doduo
dodrio
seel
dewgong
grimer
muk
shellder
cloyster
gastly
haunter
gengar
onix
drowzee
hypno
krabby
kingler
voltorb

How to flatten nested dict formatted '_source' column of csv, into dataframe

I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!

I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f

Python: unable to get data from json and convert to list

I need to get data from a json page and convert it to a list.
import json
import requests
j = requests.get('http://www.example.com/Portals/0/StaticData/data.js')
abc = json.loads(j.content)
However, error occurred
ValueError: No JSON object could be decoded
what I need is
mylist = ['AALI','ABBA','ABDA'.......]
mylist1 = ['Astra Agro Lestari Tbk','Mahaka Media Tbk','Asuransi Bina Dana Arta Tbk',.......]

You don't have a valid JSON returned, you can clean it up before reading it in with json module:
js = j.content.decode("utf-8").split("=")[-1].strip().strip(';')
json.loads(js)
#[{'code': 'AALI', 'name': 'Astra Agro Lestari Tbk'},
# {'code': 'ABBA', 'name': 'Mahaka Media Tbk'},
# {'code': 'ABDA', 'name': 'Asuransi Bina Dana Arta Tbk'},
# {'code': 'ABMM', 'name': 'ABM Investama Tbk'},
# {'code': 'ACES', 'name': 'Ace Hardware Indonesia Tbk'},
# ...
To unpack the result into two lists:
mylist, mylist1 = zip(*((d['code'], d['name']) for d in json.loads(js)))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting JSON to csv in a useful format - python

Related

Group the data and convert to json data

Iterate on nested json through python

Extract the Pokemon names only from PokeAPI

How to flatten nested dict formatted '_source' column of csv, into dataframe

Python: unable to get data from json and convert to list

Categories

Resources