How to format query results as CSV? - python

My goal: Automate the operation of executing a query and output the results into a csv.
I have been successful in obtaining the query results using Python (this is my first project ever in Python). I am trying to format these results as a csv but am completely lost. It's basically just creating 2 massive rows with all the data not parsed out. The .txt and .csv results are attached (I obtained these by simply calling the query and entering "file name > results.txt" or "file name > results.csv".
txt results: {'data': {'get_result': {'job_id': None, 'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', '__typename': 'get_result_response'}}} {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99', 'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c', 'error': None, 'runtime': 157, 'generated_at': '2022-04-07T20:14:36.693419+00:00', 'columns': ['project_name', 'leaderboard_date', 'volume_30day', 'transactions_30day', 'floor_price', 'median_price', 'unique_holders', 'rank', 'custom_sort_order'], '__typename': 'query_results'}], 'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA', 'floor_price': 0.375, 'leaderboard_date': '2022-04-07', 'median_price': 343.4, 'project_name': 'Terraforms by Mathcastles', 'rank': 1, 'transactions_30day': 2774, 'unique_holders': 2179, 'volume_30day': 744611.6252}, '__typename': 'get_result_template'}, {'data': {'custom_sort_order': 'AB', 'floor_price': 4.69471, 'leaderboard_date': '2022-04-07', 'median_price': 6.5, 'project_name': 'Meebits', 'rank': 2, 'transactions_30day': 4153, 'unique_holders': 6200, 'volume_30day': 163520.7377371168}, '__typename': 'get_result_template'}, etc. (repeats for 100s of rows)..

Your results text string actually contains two dictionaries separated by a space character.
Here's a formatted version of what's in each of them:
dict1 = {'data': {'get_result': {'job_id': None,
'result_id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
'__typename': 'get_result_response'}}}
dict2 = {'data': {'query_results': [{'id': '72a17fd2-e63c-4732-805a-ad6a7b980a99',
'job_id': '05eb2527-2ca0-4dd1-b6da-96fb5aa2e67c',
'error': None,
'runtime': 157,
'generated_at': '2022-04-07T20:14:36.693419+00:00',
'columns': ['project_name',
'leaderboard_date',
'volume_30day',
'transactions_30day',
'floor_price',
'median_price',
'unique_holders',
'rank',
'custom_sort_order'],
'__typename': 'query_results'}],
'get_result_by_result_id': [{'data': {'custom_sort_order': 'AA',
'floor_price': 0.375,
'leaderboard_date': '2022-04-07',
'median_price': 343.4,
'project_name': 'Terraforms by Mathcastles',
'rank': 1,
'transactions_30day': 2774,
'unique_holders': 2179,
'volume_30day': 744611.6252},
'__typename': 'get_result_template'},
{'data': {'custom_sort_order': 'AB',
'floor_price': 4.69471,
'leaderboard_date': '2022-04-07',
'median_price': 6.5,
'project_name': 'Meebits',
'rank': 2,
'transactions_30day': 4153,
'unique_holders': 6200,
'volume_30day': 163520.7377371168},
'__typename': 'get_result_template'},
]}}
(BTW I formatting them using the pprint module. This is often a good first step when dealing with these kinds of problems — so you know what you're dealing with.)
Ignoring the first one completely and all but the repetitive data in the second — which is what I assume is all you really want — you could create a CSV file from the nested dictionary values in the dict2['data']['get_result_by_result_id'] list. Here's how that could be done using the csv.DictWriter class:
import csv
from pprint import pprint # If needed.
output_filepath = 'query_results.csv'
# Determine CSV fieldnames based on keys of first dictionary.
fieldnames = dict2['data']['get_result_by_result_id'][0]['data'].keys()
with open(output_filepath, 'w', newline='') as outp:
writer = csv.DictWriter(outp, delimiter=',', fieldnames=fieldnames)
writer.writeheader() # Optional.
for result in dict2['data']['get_result_by_result_id']:
# pprint(result['data'], sort_dicts=False)
writer.writerow(result['data'])
print('fini')
Using the test data, here's the contents of the 'query_results.csv' file it created:
custom_sort_order,floor_price,leaderboard_date,median_price,project_name,rank,transactions_30day,unique_holders,volume_30day
AA,0.375,2022-04-07,343.4,Terraforms by Mathcastles,1,2774,2179,744611.6252
AB,4.69471,2022-04-07,6.5,Meebits,2,4153,6200,163520.7377371168

It appears you have the data in a python dictionary. The google sheet says access denied so I can't see the whole data.
But essentially you want to convert the dictionary data to a csv file.
At the bare bones you can use code like this to get where you need to. For your example you'll need to drill down to where the rows actually are.
import csv
new_path = open("mytest.csv", "w")
file_dictionary = {"oliva":199,"james":145,"potter":187}
z = csv.writer(new_path)
for new_k, new_v in file_dictionary.items():
z.writerow([new_k, new_v])
new_path.close()
This guide should help you out.
https://pythonguides.com/python-dictionary-to-csv/

if I understand your question right, you should construct a dataframe format with your results and then save the dataframe in .csv format. Pandas library is usefull and easy to use.

Related

Updating dictionaries based on huge list of dicts

I have a huge(around 350k elements) list of dictionaries:
lst = [
{'data': 'xxx', 'id': 1456},
{'data': 'yyy', 'id': 24234},
{'data': 'zzz', 'id': 3222},
{'data': 'foo', 'id': 1789},
]
On the other hand I receive dictionaries(around 550k) one by one with missing value(not every dict is missing this value) which I need to update from:
example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': None}
To:
example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': 'xxx'}
And I need to take each dict and search withing the list for matching 'id' and update the 'data'. Doing it this way takes ages to process:
if example_dict['data'] is None:
for row in lst:
if row['id'] == example_dict['id']:
example_dict['data'] = row['data']
Is there a way to build a structured chunked data divided to in e.g. 10k values and tell the incoming dict in which chunk to search for the 'id'? Or any other way to to optimize that? Any help is appreciated, take care.
Use a dict instead of searching linearly through the list.
The first important optimization is to remove that linear search through lst, by building a dict indexed on id pointing to the rows
For example, this will be a lot faster than your code, if you have enough RAM to hold all the rows in memory:
row_dict = {row['id']: row for row in lst}
if example_dict['data'] is None:
if example_dict['id'] in row_dict:
example_dict['data'] = row_dict[example_dict['id']]['data']
This improvement will be relevant for you whether you process the rows by chunks of 10k or all at once, since dictionary lookup time is constant instead of linear in the size of lst.
Make your own chunking process
Next you ask "Is there a way to build a structured chunked data divided...". Yes, absolutely. If the data is too big to fit in memory, write a first pass function that divides the input based on id into several temporary files. They could be based on the last two digits of ID if order is irrelevant, or on ranges of ids if you prefer. Do that for both the list of rows, and the dictionaries you receive, and then process each list/dict file pairs on the same ids one at a time, with code like above.
If you have to preserve the order in which you receive the dictionaries, though, this approach will be more difficult to implement.
Some preprocessing of lst list might help a lot. E.g. transform that list of dicts into dictionary, where id would be a key.
To be precise transform lst into such structure:
lst = {
'1456': 'xxx',
'24234': 'yyy',
'3222': 'zzz',
...
}
Then when trying to check your data attributes in example_dict, just access straight to id key in lst as follows:
if example_dict['data'] is None:
example_dict['data'] = lst.get(example_dict['id'])
It should reduce the time complexity from something as quadratic complexity (n*n) to linear complexity (n).
Try creating creating a hash table (in Python, a dict) from lst to speed up the lookup based on 'id':
lst = [
{'data': 'xxx', 'id': 1456},
{'data': 'yyy', 'id': 24234},
{'data': 'zzz', 'id': 3222},
{'data': 'foo', 'id': 1789},
]
example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': None}
dct ={row['id'] : row for row in lst}
if example_dict['data'] is None:
example_dict['data'] = dct[example_dict['id']]['data']
print(example_dict)
Sample output:
{'key': 'x', 'key2': 'y', 'id': 1456, 'data': 'xxx'}

Convert a list of dicts to DataFrame and back to exactly same list?

In my Django view, I want to load a list of dictionaries into a Pandas dataframe, manipulate it, dump to dict again and return such manipulated data as a part of this view's JSON response:
def test(request):
pre_pandas = [{'type': 'indoor', 'speed': 1, 'heart_rate': None}, {'type': 'outdoor', 'speed': 2, 'heart_rate': 124.0}, {'type': 'commute', 'speed': 3, 'heart_rate': 666.0}, {'type': 'indoor', 'speed': 4, 'heart_rate': 46.0}]
df = pd.DataFrame(pre_pandas)
# some data manipulation here...
post_pandas = df.to_dict(orient='records')
response = {
'pre_pandas': pre_pandas,
'post_pandas': post_pandas,
}
return JsonResponse(response)
The problem with my approach is that Pandas' to_dict() method replaces Python's None with nan, so that the response has NaN in it:
{"pre_pandas": [{"type": "indoor", "speed": 1, "heart_rate": null}...], "post_pandas": [{"heart_rate": NaN, "speed": 1,...}]
and JavaScript cannot tackle NaN.
Is there a way to dump a dataframe to a dict so that the output is exactly the same as the dict it was build from?
I could possibly adjust the data manually with a list replace() method, but it feels awkward and also I would need to cover for all of the other - if any - conversions Pandas' to_dict() method might do.
I also cannot dump post_pandas to JSON, as I am already doing it in JsonResponse.
Your the columns in your pre_pandas Dataframe are inferred from the provided dictionary, to prevent this you can explicitly specify the data type object. What forces all the values to be of type object.
Like that:
df = pd.DataFrame(pre_pandas, dtype=object)

Retrieve value in JSON from pandas series object

I need help retrieving a value from a JSON response object in python. Specifically, how do I access the prices-asks-price value? I'm having trouble:
JSON object:
{'prices': [{'asks': [{'liquidity': 10000000, 'price': '1.16049'}],
'bids': [{'liquidity': 10000000, 'price': '1.15989'}],
'closeoutAsk': '1.16064',
'closeoutBid': '1.15974',
'instrument': 'EUR_USD',
'quoteHomeConversionFactors': {'negativeUnits': '1.00000000',
'positiveUnits': '1.00000000'},
'status': 'non-tradeable',
'time': '2018-08-31T20:59:57.748335979Z',
'tradeable': False,
'type': 'PRICE',
'unitsAvailable': {'default': {'long': '4063619', 'short': '4063619'},
'openOnly': {'long': '4063619', 'short': '4063619'},
'reduceFirst': {'long': '4063619', 'short': '4063619'},
'reduceOnly': {'long': '0', 'short': '0'}}}],
'time': '2018-09-02T18:56:45.022341038Z'}
Code:
data = pd.io.json.json_normalize(response['prices'])
asks = data['asks']
asks[0]
Out: [{'liquidity': 10000000, 'price': '1.16049'}]
I want to get the value 1.16049 - but having trouble after trying different things.
Thanks
asks[0] returns a list so you might do something like
asks[0][0]['price']
or
data = pd.io.json.json_normalize(response['prices'])
price = data['asks'][0][0]['price']
The data that you have contains jsons and lists inside one another. Hence you need to navigate through this accordingly. Lists are accessed through indexes (starting from 0) and jsons are accessed through keys.
price_value=data['prices'][0]['asks'][0]['price']
liquidity_value=data['prices'][0]['asks'][0]['liquidity']
Explaining this logic in this case : I assume that your big json object is stored in a object called data. First accessing prices key in this object. Then I have index 0 because the next key is present inside a list. Then after you go inside the list, you have a key called asks. Now again you have a list here so you need to access it using index 0. Then finally the key liquidity and price is here.

Convert Messy String to CSV Columns

I have a large list of ids from the Facebook API that I would like to put into a CSV file. Is there a way to parse each ID into an individual column in a CSV file? I am using Python
Current Format:
{'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}}
But I would like the format to be this:
id
17898498243076831
17907011917029111
17906766215033990
17894813609104671
17890124843094721
EDITED
The facebook api seem to be spitting out dictionary inside a dictionary inside a dictionary, well thats makes it easy for developers to do any manipulations with any language. So here is what I will suggest to you
You can do something like this
# Create a dictionary
dictionary_fb= {'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}]}}
# Get the id dictionary_fb and make it into loops
dict_id= dictionary_fb['media']['data']
df =pd.DataFrame(dict_id)
df.to_csv("filename")
If you want to do it more often then you can use for loops and get things done
Cheers!

Make the same table using Python cvs module

I'm working on a project, comaring different sorting algorithms. I already have a data generating script, which can time everything. I need this data to fit in a table (I'm using OriginPro 8) like that one:
But what should I write in Python script, so when I import .csv file it would look like this exact table?
Right now I have this structure:
{'bubble_sort': {'BEST': {'COMP': 999000, 'PERM': 0, 'TIME': 1072.061538696289},
'RND': {'COMP': 999000,
'PERM': 249853,
'TIME': 1731.0991287231445},
'WORST': {'COMP': 999000,
'PERM': 499500,
'TIME': 2358.1347465515137}},
'hoare_sort': {'BEST': {'COMP': 10975, 'PERM': 0, 'TIME': 14.000654220581055}, #and so on
And this code to save it:
def write_csv_in_file(fn, data):
with open(fn + ".cvs", 'w') as file:
writer = csv.writer(file)
for key, value in data.items():
writer.writerow([key, value])
And after importing get this table:
And it is far away from the variant I need.
What I want is that:
let's say that this data was collected on best case array of length 100. Then for 1st row of first table there should be values from ['bubble_sort']['BEST']['TIME'], ['hoare_sort']['BEST']['TIME'] and so on. Then I'd make the same tables for worst case scenario (["WORST"]), random (["RND"]), and then repeat everything for number of comparissons (["COMP"]) and permutations done (["PERM"])

Categories