Convert Messy String to CSV Columns

Convert Messy String to CSV Columns - python

I have a large list of ids from the Facebook API that I would like to put into a CSV file. Is there a way to parse each ID into an individual column in a CSV file? I am using Python
Current Format:
{'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}}
But I would like the format to be this:
id
17898498243076831
17907011917029111
17906766215033990
17894813609104671
17890124843094721

EDITED
The facebook api seem to be spitting out dictionary inside a dictionary inside a dictionary, well thats makes it easy for developers to do any manipulations with any language. So here is what I will suggest to you
You can do something like this
# Create a dictionary
dictionary_fb= {'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}]}}
# Get the id dictionary_fb and make it into loops
dict_id= dictionary_fb['media']['data']
df =pd.DataFrame(dict_id)
df.to_csv("filename")
If you want to do it more often then you can use for loops and get things done
Cheers!

Related

How to format query results as CSV?

My goal: Automate the operation of executing a query and output the results into a csv.
I have been successful in obtaining the query results using Python (this is my first project ever in Python). I am trying to format these results as a csv but am completely lost. It's basically just creating 2 massive rows with all the data not parsed out. The .txt and .csv results are attached (I obtained these by simply calling the query and entering "file name > results.txt" or "file name > results.csv".
txt results: {'data': {'get_result': {'job_id': None, 'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', '__typename': 'get_result_response'}}} {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', 'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c', 'error': None, 'runtime': 157, 'generated_at': '2022-04-07T20:14:36.693419+00:00', 'columns': ['project_name', 'leaderboard_date', 'volume_30day', 'transactions_30day', 'floor_price', 'median_price', 'unique_holders', 'rank', 'custom_sort_order'], '__typename': 'query_results'}], 'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA', 'floor_price': 0.375, 'leaderboard_date': '2022-04-07', 'median_price': 343.4, 'project_name': 'Terraforms by Mathcastles', 'rank': 1, 'transactions_30day': 2774, 'unique_holders': 2179, 'volume_30day': 744611.6252}, '__typename': 'get_result_template'}, {'data': {'custom_sort_order': 'AB', 'floor_price': 4.69471, 'leaderboard_date': '2022-04-07', 'median_price': 6.5, 'project_name': 'Meebits', 'rank': 2, 'transactions_30day': 4153, 'unique_holders': 6200, 'volume_30day': 163520.7377371168}, '__typename': 'get_result_template'}, etc. (repeats for 100s of rows)..

Your results text string actually contains two dictionaries separated by a space character.
Here's a formatted version of what's in each of them:
dict1 = {'data': {'get_result': {'job_id': None,
'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
'__typename': 'get_result_response'}}}
dict2 = {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c',
'error': None,
'runtime': 157,
'generated_at': '2022-04-07T20:14:36.693419+00:00',
'columns': ['project_name',
'leaderboard_date',
'volume_30day',
'transactions_30day',
'floor_price',
'median_price',
'unique_holders',
'rank',
'custom_sort_order'],
'__typename': 'query_results'}],
'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA',
'floor_price': 0.375,
'leaderboard_date': '2022-04-07',
'median_price': 343.4,
'project_name': 'Terraforms by Mathcastles',
'rank': 1,
'transactions_30day': 2774,
'unique_holders': 2179,
'volume_30day': 744611.6252},
'__typename': 'get_result_template'},
{'data': {'custom_sort_order': 'AB',
'floor_price': 4.69471,
'leaderboard_date': '2022-04-07',
'median_price': 6.5,
'project_name': 'Meebits',
'rank': 2,
'transactions_30day': 4153,
'unique_holders': 6200,
'volume_30day': 163520.7377371168},
'__typename': 'get_result_template'},
]}}
(BTW I formatting them using the pprint module. This is often a good first step when dealing with these kinds of problems — so you know what you're dealing with.)
Ignoring the first one completely and all but the repetitive data in the second — which is what I assume is all you really want — you could create a CSV file from the nested dictionary values in the dict2['data']['get_result_by_result_id'] list. Here's how that could be done using the csv.DictWriter class:
import csv
from pprint import pprint # If needed.
output_filepath = 'query_results.csv'
# Determine CSV fieldnames based on keys of first dictionary.
fieldnames = dict2['data']['get_result_by_result_id'][0]['data'].keys()
with open(output_filepath, 'w', newline='') as outp:
writer = csv.DictWriter(outp, delimiter=',', fieldnames=fieldnames)
writer.writeheader() # Optional.
for result in dict2['data']['get_result_by_result_id']:
# pprint(result['data'], sort_dicts=False)
writer.writerow(result['data'])
print('fini')
Using the test data, here's the contents of the 'query_results.csv' file it created:
custom_sort_order,floor_price,leaderboard_date,median_price,project_name,rank,transactions_30day,unique_holders,volume_30day
AA,0.375,2022-04-07,343.4,Terraforms by Mathcastles,1,2774,2179,744611.6252
AB,4.69471,2022-04-07,6.5,Meebits,2,4153,6200,163520.7377371168

It appears you have the data in a python dictionary. The google sheet says access denied so I can't see the whole data.
But essentially you want to convert the dictionary data to a csv file.
At the bare bones you can use code like this to get where you need to. For your example you'll need to drill down to where the rows actually are.
import csv
new_path = open("mytest.csv", "w")
file_dictionary = {"oliva":199,"james":145,"potter":187}
z = csv.writer(new_path)
for new_k, new_v in file_dictionary.items():
z.writerow([new_k, new_v])
new_path.close()
This guide should help you out.
https://pythonguides.com/python-dictionary-to-csv/

if I understand your question right, you should construct a dataframe format with your results and then save the dataframe in .csv format. Pandas library is usefull and easy to use.

How to relate two object in a json file when searching?

I'm using the Twitter API, and it returns me a JSON file.
There's a sample in their dev documentation if you scroll to the bottom, note the example only includes 1 tweet whereas I'm working with hundreds.
In the data object you have geo and inside of geo you have place_id which correlates to another field in the includes object, more specifically the id field nested under places.
My problem then arises when I have hundreds of tweets in a JSON file in data and their respective geolocation data in the other object includes. How can I extract the geolocation data and relate it to the current tweet I have selected?
Currently, I have a for loop to go through all of the tweets and append the information into a CSV file, then nested in that for loop I have this:
for place in json_response['includes']['places']:
if (geo == place['id']):
full_name = place['full_name']
country = place['country']
country_code = place['country_code']
new_geo = place['geo']
place_name = place['name']
place_type = place['place_type']
However, it only returns the geolocation data for 1 tweet per JSON response because I assumed that each tweet got its own includes object. Now I'm stuck and any help would be appreciated.

To eliminate the need for a double for loop as well as the if statement
you have in your code snippet a straightforward approach without an additional library would be to make a dict comprehension of all tweets with the place_id as the dict's key:
tweets = {tweet['geo']['place_id']: tweet for tweet in json_response['data']}
this results in the following list:
{'01a9a39529b27f36': {'text': 'We’re sharing a live demo of the new Twitter Developer Labs program, led by a member of our DevRel team, #jessicagarson #TapIntoTwitter [url_intentionally_removed],
'id': '1136048014974423040',
'geo': {'place_id': '01a9a39529b27f36'}}}
If the response returned more than one tweets, as you've mentioned your use case is then this would look like:
['01a9a39529b27f36', 'some_other_id', ...]
In the next step we can do a dict comprehension defining the id of each place, this way we can avoid any if-statements:
places = { p['id']: p for p in json_response['includes']['places']}
this produces the following result:
{'01a9a39529b27f36': {'geo': {'type': 'Feature',
'bbox': [-74.026675, 40.683935, -73.910408, 40.877483],
'properties': {}},
'country_code': 'US',
'name': 'Manhattan',
'id': '01a9a39529b27f36',
'place_type': 'city',
'country': 'United States',
'full_name': 'Manhattan, NY'}}
Finally, to combine them based on the common key:
for pid, geodata in places.items(): tweets[pid].update(geodata)
which yields:
{'01a9a39529b27f36': {'text': 'We’re sharing a live demo of the new Twitter Developer Labs program, led by a member of our DevRel team, #jessicagarson #TapIntoTwitter [url_removed_on_purpose],
'id': '01a9a39529b27f36',
'geo': {'type': 'Feature',
'bbox': [-74.026675, 40.683935, -73.910408, 40.877483],
'properties': {}},
'country_code': 'US',
'name': 'Manhattan',
'place_type': 'city',
'country': 'United States',
'full_name': 'Manhattan, NY'}}

GraphQL json output to pandas dataframe

I tried a few ways to convert a json output from GraphQL to a pandas dataframe but I was not able to get it right. What's the best way to convert it into a pandas dataframe?
{'data': {'posts': {'edges': [{'node': {'id': '303843',
'name': 'hipCV',
'tagline': 'Create an impressive resume in minutes',
'votesCount': 71}},
{'node': {'id': '303751',
'name': 'Find Your First Frontend Job',
'tagline': "Find your dream job, even if you've been rejected many times",
'votesCount': 51}},
{'node': {'id': '303665',
'name': 'Epsilon3',
'tagline': 'The OS for spacecraft and complex operations',
'votesCount': 290}}]}}}

Try:
df=pd.json_normalize(data['data']['posts']['edges'])
#here data is your json data
#If needed use:
df.columns=df.columns.str.split('.').str[1]
output of df:
node.id node.name node.tagline node.votesCount
0 303843 hipCV Create an impressive resume in minutes 71
1 303751 Find Your First Frontend Job Find your dream job, even if you've been rejec... 51
2 303665 Epsilon3 The OS for spacecraft and complex operations 290

pandas create new columns from dictionaries

a portion of one column 'relatedWorkOrder' in my dataframe looks like this:
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
My desired output is to have a column 'name','labor_name','labor_code' with their respective values. I can do this using regex extract and replace:
df['name'] = df['relatedWorkOrder'].str.extract(r'{regex}',expand=False).str.replace('something','')
But I have several dictionaries in this column and in this way is tedious, also I'm wondering if it's possible doing this through accessing the keys and values of the dictionary
Any help with that?

You can join the result from pd.json_normalize:
df.join(pd.json_normalize(df['relatedWorkOrder'], sep='_'))

Retrieve value in JSON from pandas series object

I need help retrieving a value from a JSON response object in python. Specifically, how do I access the prices-asks-price value? I'm having trouble:
JSON object:
{'prices': [{'asks': [{'liquidity': 10000000, 'price': '1.16049'}],
'bids': [{'liquidity': 10000000, 'price': '1.15989'}],
'closeoutAsk': '1.16064',
'closeoutBid': '1.15974',
'instrument': 'EUR_USD',
'quoteHomeConversionFactors': {'negativeUnits': '1.00000000',
'positiveUnits': '1.00000000'},
'status': 'non-tradeable',
'time': '2018-08-31T20:59:57.748335979Z',
'tradeable': False,
'type': 'PRICE',
'unitsAvailable': {'default': {'long': '4063619', 'short': '4063619'},
'openOnly': {'long': '4063619', 'short': '4063619'},
'reduceFirst': {'long': '4063619', 'short': '4063619'},
'reduceOnly': {'long': '0', 'short': '0'}}}],
'time': '2018-09-02T18:56:45.022341038Z'}
Code:
data = pd.io.json.json_normalize(response['prices'])
asks = data['asks']
asks[0]
Out: [{'liquidity': 10000000, 'price': '1.16049'}]
I want to get the value 1.16049 - but having trouble after trying different things.
Thanks

asks[0] returns a list so you might do something like
asks[0][0]['price']
or
data = pd.io.json.json_normalize(response['prices'])
price = data['asks'][0][0]['price']

The data that you have contains jsons and lists inside one another. Hence you need to navigate through this accordingly. Lists are accessed through indexes (starting from 0) and jsons are accessed through keys.
price_value=data['prices'][0]['asks'][0]['price']
liquidity_value=data['prices'][0]['asks'][0]['liquidity']
Explaining this logic in this case : I assume that your big json object is stored in a object called data. First accessing prices key in this object. Then I have index 0 because the next key is present inside a list. Then after you go inside the list, you have a key called asks. Now again you have a list here so you need to access it using index 0. Then finally the key liquidity and price is here.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert Messy String to CSV Columns - python

Related

How to format query results as CSV?

How to relate two object in a json file when searching?

GraphQL json output to pandas dataframe

pandas create new columns from dictionaries

Retrieve value in JSON from pandas series object

Categories

Resources