I have a data frame with 150 rows and sample two rows mentioned below. Need to convert the data to json data like below.
Input:
artwork_id creator_id department_id art_work creator department
0 86508 29993 21 {'id': '86508', 'accession_number': '2015.584'... {'id': '29993', 'role': 'artist', 'description... {'id': '21', 'name': 'Prints'}
1 86508 68000 21 {'id': '86508', 'accession_number': '2015.584'... {'id': '68000', 'role': 'printer', 'descriptio... {'id': '21', 'name': 'Prints'}
desired output:
Attached as image
I have tried using below code
df.groupby(['artwork_id']).agg(lambda x: list(x))
df.to_json(orient = 'records')
Do you get the right format if you do the following:
result = df.to_json(orient="records")
parsed = json.loads(result)
json.dumps(parsed, indent=4)
or
grouped_art=df.groupby(['artwork_id']).agg(lambda x: list(x))
result = grouped_art.to_json(orient="records")
parsed = json.loads(result)
json.dumps(parsed, indent=4)
Related
there are list :
data = ['man', 'man1', 'man2']
key = ['name', 'id', 'sal']
man_res = ['Alexandra', 'RST01', '$34,000']
man1_res = ['Santio', 'RST009', '$45,000']
man2_res = ['Rumbalski', 'RST50', '$78,000']
the expected output will be nested output:
Expected o/p:- {'man':{'name':'Alexandra', 'id':'RST01', 'sal':$34,000},
'man1':{'name':'Santio', 'id':'RST009', 'sal':$45,000},
'man2':{'name':'Rumbalski', 'id':'RST50', 'sal':$78,000}}
Easy way would be using pandas dataframe
import pandas as pd
df = pd.DataFrame([man_res, man1_res, man2_res], index=data, columns=key)
print(df)
df.to_dict(orient='index')
name id sal
man Alexandra RST01 $34,000
man1 Santio RST009 $45,000
man2 Rumbalski RST50 $78,000
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
Or you could manually merge them using dict + zip
d = dict(zip(
data,
(dict(zip(key, res)) for res in (man_res, man1_res, man2_res))
))
d
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
#save it in 2D array
all_man_res = []
all_man_res.append(man_res)
all_man_res.append(man1_res)
all_man_res.append(man2_res)
print(all_man_res)
#Add it into a dict output
output = {}
for i in range(len(l)):
person = l[i]
details = {}
for j in range(len(key)):
value = key[j]
details[value] = all_man_res[i][j]
output[person] = details
output
The pandas dataframe answer provided by NoThInG makes the most intuitive sense. If you are looking to use only the built in python tools, you can do
info_list = [dict(zip(key,man) for man in (man_res, man1_res, man2_res)]
output = dict(zip(data,info_list))
I am pretty new to python and started to use requests for downloading data.
Up to now I am doing fine, but I always do have issues converting JSON data to csv.
Applying the code below ...
import requests
import pandas as pd
import json
url = "https://api.coinstats.app/public/v1/coins?skip=0&limit=100"
payload = {}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
coin_data = json.loads(response.text)
getpost = pd.DataFrame (coin_data)
getpost.to_csv('getpost_1.csv')
... I do get data in the plain vanilla JSON format.
My question is and I am starting from scratch: what else has to be done to fix to code getting readable csv-files (without ids and others)?
0 {'id': 'bitcoin', 'icon': 'https://static.coin...
1 {'id': 'ethereum', 'icon': 'https://static.coi...
2 {'id': 'binance-coin', 'icon': 'https://static...
3 {'id': 'tether', 'icon': 'https://static.coins...
4 {'id': 'solana', 'icon': 'https://static.coins...
.. ...
95 {'id': 'eth_frax3crv-f_0xd632f22692fac7611d2aa...
96 {'id': 'sushi', 'icon': 'https://static.coinst...
97 {'id': 'paxos-standard-token', 'icon': 'https:...
98 {'id': 'compound-governance-token', 'icon': 'h...
99 {'id': 'nem', 'icon': 'https://static.coinstat...
[100 rows x 1 columns]`
Bernd
I have a list as shown below:
[{'id': 'id_123',
'type': 'type_1',
'created_at': '2020-02-12T17:45:00Z'},
{'id': 'id_124',
'type': 'type_2',
'created_at': '2020-02-12T18:15:00Z'},
{'id': 'id_125',
'type': 'type_1',
'created_at': '2020-02-13T19:43:00Z'},
{'id': 'id_126',
'type': 'type_3',
'created_at': '2020-02-13T07:00:00Z'}]
I am trying to find how many times type : type_1 occurs and what is the earliest created_at timestamp in that list for type_1
We can achieve this in several steps.
To find the number of times type_1 occurs we can use the built-in filter in tandem with itemgetter.
from operator import itemgetter
def my_filter(item):
return item['type'] == 'type_1'
key = itemgetter('created_at')
items = sorted(filter(my_filter, data), key=key)
print(f"Num records is {len(items)}")
print(f"Earliest record is {key(items[0])}")
Num records is 2
Earliest record is 2020-02-12T17:45:00Z
Conversely you can use a generator-comprehension and then sort the generator.
gen = (item for item in data if item['type'] == 'type_1')
items = sorted(gen, key=key)
# rest of the steps are the same...
You could use list comprehension to get all the sublists you're interested in, then sort by 'created_at'.
l = [{'id': 'id_123',
'type': 'type_1',
'created_at': '2020-02-12T17:45:00Z'},
{'id': 'id_124',
'type': 'type_2',
'created_at': '2020-02-12T18:15:00Z'},
{'id': 'id_125',
'type': 'type_1',
'created_at': '2020-02-13T19:43:00Z'},
{'id': 'id_126',
'type': 'type_3',
'created_at': '2020-02-13T07:00:00Z'}]
ll = [x for x in l if x['type'] == 'type_1']
ll.sort(key=lambda k: k['created_at'])
print(len(ll))
print(ll[0]['created_at'])
Output:
2
02/12/2020 17:45:00
This is one approach using filter and min.
Ex:
data = [{'id': 'id_123',
'type': 'type_1',
'created_at': '2020-02-12T17:45:00Z'},
{'id': 'id_124',
'type': 'type_2',
'created_at': '2020-02-12T18:15:00Z'},
{'id': 'id_125',
'type': 'type_1',
'created_at': '2020-02-13T19:43:00Z'},
{'id': 'id_126',
'type': 'type_3',
'created_at': '2020-02-13T07:00:00Z'}]
onlytype_1 = list(filter(lambda x: x['type'] == 'type_1', data))
print(len(onlytype_1))
print(min(onlytype_1, key=lambda x: x['created_at']))
Or:
temp = {}
for i in data:
temp.setdefault(i['type'], []).append(i)
print(len(temp['type_1']))
print(min(temp['type_1'], key=lambda x: x['created_at']))
Output:
2
{'id': 'id_123', 'type': 'type_1', 'created_at': '2020-02-12T17:45:00Z'}
You can just generate a list of all the type_1s using a list_comprehension, and them use sort with datetime.strptime to sort the values accordingly
from datetime import datetime
# Generate a list with only the type_1s' created_at values
type1s = [val['created_at'] for val in vals if val['type']=="type_1"]
# Sort them based on the timestamps
type1s.sort(key=lambda date: datetime.strptime(date, "%Y-%m-%dT%H:%M:%SZ"))
# Print the lowest value
print(type1s[0])
#'2020-02-12T17:45:00Z'
You can use the following function to get the desired output:
from datetime import datetime
def sol(l):
sum_=0
dict_={}
for x in l:
if x['type']=='type_1':
sum_+=1
dict_[x['id']]=datetime.strptime(x['created_at'], "%Y-%m-%dT%H:%M:%SZ")
date =sorted(dict_.values())[0]
for key,value in dict_.items():
if value== date: id_=key
return sum_,date,id_
sol(l)
This function gives the number of times type ='type_1', corresponding minimum date and its id respectively.
Hope this helps!
I have issue with parsing Json file. here the format i have:
{'metadata': {'timezone': {'location': 'Etc/UTC'},
'serial_number': '123456',
'device_type': 'sensor'},
'timestamp': '2019-08-21T13:57:12.500Z',
'framenumber': '4866274',
'tracked_objects': [{'id': 2491,
'type': 'PERSON',
'position': {'x': -361,
'y': -2933,
'type': 'FOOT',
'coordinate_system': 'REAL_WORLD_IN_MILLIMETER'},
'person_data': {'height': 1295}},
{'id': 2492,
'type': 'PERSON',
'position': {'x': -733,
'y': -2860,
'type': 'FOOT',
'coordinate_system': 'REAL_WORLD_IN_MILLIMETER'},
'person_data': {'height': 1928}},
{'id': 2495,
'type': 'PERSON',
'position': {'x': -922,
'y': -3119,
'type': 'FOOT',
'coordinate_system': 'REAL_WORLD_IN_MILLIMETER'},
'person_data': {'height': 1716}}]}
And I am trying to get next columns into dataframe:
timezone, serial_number,id, x , y which are part of position, and height.
This is the code i used so far:
# Import Dependencies
import pandas as pd
import json
from pandas.io.json import json_normalize
# loading json file. In your case you will point the data stream into json variable
infile = open("C:/Users/slavi/Documents/GIT/test2.json")
json_raw = json.load(infile)
# Functions to flaten multidimensional json file
def flatten_json(nested_json):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
# Use Function to flaten json
json_flat = flatten_json(json_raw)
# Create panda dataframe from dictionary sine json itself is list of dictionaries or dictiornary of dictionaries
df = pd.DataFrame.from_dict(json_flat, orient='index')
# Reseting index
df.reset_index(level=0, inplace=True)
df.set_index('index', inplace=True)
df
I used the function to flaten the json however when i run the code I am getting results like this:
So there should be 3 lines of data for each tracked object and i should retrieve those columns with 3 lines of data under.
Any suggestion on how to adjust my code?
For any kind of JSON parsing to DtaFrame, get acquanited to json_normalize:
import json
from pandas.io.json import json_normalize
with open('...', r) as f:
json_raw = json.load(f)
df = json_normalize(json_raw, record_path='tracked_objects', meta=[
['metadata', 'serial_number'],
'timestamp'
])
Result:
id type position.x position.y position.type position.coordinate_system person_data.height metadata.serial_number timestamp
0 2491 PERSON -361 -2933 FOOT REAL_WORLD_IN_MILLIMETER 1295 123456 2019-08-21T13:57:12.500Z
1 2492 PERSON -733 -2860 FOOT REAL_WORLD_IN_MILLIMETER 1928 123456 2019-08-21T13:57:12.500Z
2 2495 PERSON -922 -3119 FOOT REAL_WORLD_IN_MILLIMETER 1716 123456 2019-08-21T13:57:12.500Z
Rename the columns as you wish.
I am looping through an API to retrieve data for multiple ICO tokens. Now, I would like to save the data to a csv with variables in columns and 1 row for each ICO token. The basic code works, I have 2 problems:
- entries are written only in every second line, which is quite unpractical. How can I specify not to leave rows blank?
- the variable price is a list itself and thus saved in as a single item (with > 1 variables inside). How can I decompose the list to write one variable per column?
See my code here:
ICO_Wallet = '0xe8ff5c9c75deb346acac493c463c8950be03dfba',
'0x7654915a1b82d6d2d0afc37c52af556ea8983c7e',
'0x4DF812F6064def1e5e029f1ca858777CC98D2D81'
for index, Wallet in enumerate(ICO_Wallet) :
Name = ICO_name[index]
Number = ICO_No[index]
try:
URL = 'http://api.ethplorer.io/getTokenInfo/' + Wallet + '?apiKey=freekey'
except:
print(Wallet)
json_obj = urlopen(URL)
data = json.load(json_obj)
with open('token_data_test.csv','a') as f:
w = csv.writer(f, delimiter=";")
w.writerow(data.values())
time.sleep(1)
Sample output:
data Out[59]:
{'address': '0x8a854288a5976036a725879164ca3e91d30c6a1b',
'countOps': 24207,
'decimals': '18',
'ethTransfersCount': 0,
'holdersCount': 10005,
'issuancesCount': 0,
'lastUpdated': 1542599890,
'name': 'GET',
'owner': '0x9a417e4db28778b6d9a4f42a5d7d01252a3af849',
'price': {'availableSupply': '11388258.0',
'currency': 'USD',
'diff': -20.71,
'diff30d': -14.155971452386,
'diff7d': -22.52,
'marketCapUsd': '2814942.0',
'rate': '0.2471792958',
'ts': '1542641433',
'volume24h': '2371.62380719'},
'symbol': 'GET',
'totalSupply': '33368773400000170376363910',
'transfersCount': 24207}
As mentioned, it's an easy fix for the first problem, just modify the csv.writer line like this:
w = csv.writer(f, delimiter=";", lineterminator='\n')
For your second problem, you can flatten your json before passing into csv:
for k, v in data.pop('price').items():
data['price_{}'.format(k)] = v
This changes all items under price into price_itemname as a flattened key. The .pop() method also helps remove the 'price' key at the same time.
Result:
{'address': '0x8a854288a5976036a725879164ca3e91d30c6a1b',
'countOps': 24207,
'decimals': '18',
'ethTransfersCount': 0,
'holdersCount': 10005,
'issuancesCount': 0,
'lastUpdated': 1542599890,
'name': 'GET',
'owner': '0x9a417e4db28778b6d9a4f42a5d7d01252a3af849',
'price_availableSupply': '11388258.0',
'price_currency': 'USD',
'price_diff': -20.71,
'price_diff30d': -14.155971452386,
'price_diff7d': -22.52,
'price_marketCapUsd': '2814942.0',
'price_rate': '0.2471792958',
'price_ts': '1542641433',
'price_volume24h': '2371.62380719',
'symbol': 'GET',
'totalSupply': '33368773400000170376363910',
'transfersCount': 24207}
Now you can just pass that into your csv.writer().