how do i use more than 1 list index? - python

how do i use more than 1 list index? like i wanna search through more than just 0 without writing another line that then searches 1, that feels like a workaround
This is my current code, I'm using an API that then I put into a json
>result = response.json()
>{'exercises': [{'tag_id': 317, 'user_input': 'run', 'duration_min': 30, 'met': 9.8, 'nf_calories': 842.8, 'photo': {'highres': 'https://d2xdmhkmkbyw75.cloudfront.net/exercise/317_highres.jpg', 'thumb': 'https://d2xdmhkmkbyw75.cloudfront.net/exercise/317_thumb.jpg', 'is_user_uploaded': False}, 'compendium_code': 12050, 'name': 'running', 'description': None, 'benefits': None}, {'tag_id': 814, 'user_input': 'bike', 'duration_min': 1, 'met': 6.8, 'nf_calories': 19.49, 'photo': {'highres': None, 'thumb': None, 'is_user_uploaded': False}, 'compendium_code': 1020, 'name': 'bicycling', 'description': None, 'benefits': None}]}
then I'm trying to get the 'name', but theres two instances of name and i can only use 0,1,2 etc as the index
>exercises = result['exercises'][0]['name']
>exercisess = result['exercises'][1]['name']
>print(exercises)
>print(exercisess)
>running
>bicycling
is there a way i can search whole thing for keys and get their value without specifically saying 0,1,2 as the index to search
I'm a noob at this sorry if i formatted this question wrong.

You can use list comprehension to add them all to a list then print that.
data = {'exercises': [{'tag_id': 317, 'user_input': 'run', 'duration_min': 30, 'met': 9.8, 'nf_calories': 842.8, 'photo': {'highres': 'https://d2xdmhkmkbyw75.cloudfront.net/exercise/317_highres.jpg', 'thumb': 'https://d2xdmhkmkbyw75.cloudfront.net/exercise/317_thumb.jpg', 'is_user_uploaded': False}, 'compendium_code': 12050, 'name': 'running', 'description': None, 'benefits': None}, {'tag_id': 814, 'user_input': 'bike', 'duration_min': 1, 'met': 6.8, 'nf_calories': 19.49, 'photo': {'highres': None, 'thumb': None, 'is_user_uploaded': False}, 'compendium_code': 1020, 'name': 'bicycling', 'description': None, 'benefits': None}]}
names = [x['name'] for x in data['exercises']]
print(names)
# output: ['running', 'bicycling']
Depends on what you are after.

You use a for loop:
for ex in result['exercises']:
print(ex['name'])
The name ex will refer to each of the elements in turn.

Related

Filter nested dictionaries to second time key name appears, python

I would like to filter custom_fields1 so that the only remaining items are the ones who have 'Value' = 'Ja' below the line 'Name' = 'GPP' (and not below the first 'Name' key 'Name': 'Informationen'). Does anyone know how to efficiently filter through the dictionary? I am happy for every tip!
custom_fields1 =
[(44,
{'#odata.context': 'http://api.hellohq.io/v1/$metadata#CustomFields',
'value': [{'Name': 'Informationen',
'Value': '',
'Type': 'TextMultiline',
'Id': 18020,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None},
{'Name': 'GPP',
'Value': 'Ja',
'Type': 'DropdownCheckbox',
'Id': 18049,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None}]}),
(45,
{'#odata.context': 'http://api.hellohq.io/v1/$metadata#CustomFields',
'value': [{'Name': 'Informationen',
'Value': '',
'Type': 'TextMultiline',
'Id': 18020,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None},
{'Name': 'GPP',
'Value': 'Ja',
'Type': 'DropdownCheckbox',
'Id': 18049,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None}]}),
(46,
{'#odata.context': 'http://api.hellohq.io/v1/$metadata#CustomFields',
'value': [{'Name': 'Informationen',
'Value': '',
'Type': 'TextMultiline',
'Id': 18020,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None},
{'Name': 'GPP',
'Value': 'Nein',
'Type': 'DropdownCheckbox',
'Id': 18049,
'CreatedBy': 0,
'UpdatedBy': 0,
'CreatedOn': None,
'UpdatedOn': None}]}))]
```
Your structure is sooo nested! Let's break it step by step.
It's a list, so we want to iterate over it
for field in custom_field:
<I have an element>
What the element is? Its a tuple and second element is what interest me dict1 = field[1]
Now I have a dictionary, where value is what interest me most values1 = dict['value']
Oh, is it list again? Let's iterate again!
for dict_value in values1:
<this is the dict I need!>
I got proper dict, now I just need to check my conditions
def check(dict_value):
return dict_value["name"] == ... and dict_value["Value"] == ...
How to do filtering based on that? You can use list filtering
[dict_value for dict_value in values1 if check(dict_value)]
And assign it to "value" key of outer dict.
Other option could be deleting records with del that does not satisfy our check.

How to convert string from variable in table for csv-export?

I am an absolute beginner with Python (the title probably says it already).I usually look for the answers with googling, but here I don't even know what term to look for.... I have a long string in a variable, which I suppose will be easily converted to a table, only I can't figure out how to do it myself
This is my example:
#pip install pycoingecko
from pycoingecko import CoinGeckoAPI
cg = CoinGeckoAPI()
output = cg.get_search_trending()
print(output)
This is the output:
{'coins': [{'item': {'id': 'metavpad', 'coin_id': 21397, 'name': 'MetaVPad', 'symbol': 'METAV', 'market_cap_rank': 511, 'thumb': 'https://assets.coingecko.com/coins/images/21397/thumb/metav.png?1639044315', 'small': 'https://assets.coingecko.com/coins/images/21397/small/metav.png?1639044315', 'large': 'https://assets.coingecko.com/coins/images/21397/large/metav.png?1639044315', 'slug': 'metavpad', 'price_btc': 7.777707600278187e-06, 'score': 0}}, {'item': {'id': 'syscoin', 'coin_id': 119, 'name': 'Syscoin', 'symbol': 'SYS', 'market_cap_rank': 189, 'thumb': 'https://assets.coingecko.com/coins/images/119/thumb/Syscoin.png?1560401261', 'small': 'https://assets.coingecko.com/coins/images/119/small/Syscoin.png?1560401261', 'large': 'https://assets.coingecko.com/coins/images/119/large/Syscoin.png?1560401261', 'slug': 'syscoin', 'price_btc': 1.3905286168359925e-05, 'score': 1}}, {'item': {'id': 'rainbowtoken', 'coin_id': 17828, 'name': 'RainbowToken', 'symbol': 'RAINBOWTOKEN', 'market_cap_rank': 907, 'thumb': 'https://assets.coingecko.com/coins/images/17828/thumb/WsLiOeJ.png?1637337787', 'small': 'https://assets.coingecko.com/coins/images/17828/small/WsLiOeJ.png?1637337787', 'large': 'https://assets.coingecko.com/coins/images/17828/large/WsLiOeJ.png?1637337787', 'slug': 'rainbowtoken', 'price_btc': 5.831112758941096e-13, 'score': 2}}, {'item': {'id': 'railgun', 'coin_id': 16840, 'name': 'Railgun', 'symbol': 'RAIL', 'market_cap_rank': 534, 'thumb': 'https://assets.coingecko.com/coins/images/16840/thumb/railgun.jpeg?1625322775', 'small': 'https://assets.coingecko.com/coins/images/16840/small/railgun.jpeg?1625322775', 'large': 'https://assets.coingecko.com/coins/images/16840/large/railgun.jpeg?1625322775', 'slug': 'railgun', 'price_btc': 3.1094468809624446e-05, 'score': 3}}, {'item': {'id': 'wonderland', 'coin_id': 18126, 'name': 'Wonderland', 'symbol': 'TIME', 'market_cap_rank': 113, 'thumb': 'https://assets.coingecko.com/coins/images/18126/thumb/time.PNG?1630621941', 'small': 'https://assets.coingecko.com/coins/images/18126/small/time.PNG?1630621941', 'large': 'https://assets.coingecko.com/coins/images/18126/large/time.PNG?1630621941', 'slug': 'wonderland', 'price_btc': 0.08713452772286424, 'score': 4}}, {'item': {'id': 'gods-unchained', 'coin_id': 17139, 'name': 'Gods Unchained', 'symbol': 'GODS', 'market_cap_rank': 274, 'thumb': 'https://assets.coingecko.com/coins/images/17139/thumb/10631.png?1635718182', 'small': 'https://assets.coingecko.com/coins/images/17139/small/10631.png?1635718182', 'large': 'https://assets.coingecko.com/coins/images/17139/large/10631.png?1635718182', 'slug': 'gods-unchained', 'price_btc': 0.00014524078849750436, 'score': 5}}, {'item': {'id': 'altura',
'coin_id': 15127, 'name': 'Altura', 'symbol': 'ALU', 'market_cap_rank': 456, 'thumb': 'https://assets.coingecko.com/coins/images/15127/thumb/ALU_logo_200x200.png?1626868890', 'small': 'https://assets.coingecko.com/coins/images/15127/small/ALU_logo_200x200.png?1626868890', 'large': 'https://assets.coingecko.com/coins/images/15127/large/ALU_logo_200x200.png?1626868890', 'slug': 'altura', 'price_btc': 3.302478861283615e-06, 'score': 6}}], 'exchanges': []}
Now how can I convert this to export it to a CSV? Or what keywords do I have to search for?
You can try something like this, to put everything in 'coins' to a dataframe:
import pandas as pd
from pycoingecko import CoinGeckoAPI
cg = CoinGeckoAPI()
output = cg.get_search_trending()
tab = pd.concat([pd.DataFrame(i) for i in output['coins']],axis=1).T
tab.index = range(tab.shape[0])
The data frame looks like this:
coin_id id ... symbol thumb
0 21397 metavpad ... METAV https://assets.coingecko.com/coins/images/2139...
1 119 syscoin ... SYS https://assets.coingecko.com/coins/images/119/...
2 17828 rainbowtoken ... RAINBOWTOKEN https://assets.coingecko.com/coins/images/1782...
3 16840 railgun ... RAIL https://assets.coingecko.com/coins/images/1684...
4 10354 insure ... SURE https://assets.coingecko.com/coins/images/1035...
5 18126 wonderland ... TIME https://assets.coingecko.com/coins/images/1812...
6 17139 gods-unchained ... GODS https://assets.coingecko.com/coins/images/1713...
You can write it to a csv:
tab.to_csv("results.csv")

How to flatten nested dict formatted '_source' column of csv, into dataframe

I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!
I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f

Iterate through a list of key and value pairs and get specific key and value using Python

I have a list like this.
data = [{
'category': 'software',
'code': 110,
'actual': '["5.1.4"]',
'opened': '2018-10-16T09:18:12Z',
'component_type': 'update',
'event': 'new update available',
'current_severity': 'info',
'details': '',
'expected': None,
'id': 10088862,
'component_name': 'Purity//FA'
},
{
'category': 'software',
'code': 67,
'actual': None,
'opened': '2018-10-18T01:14:45Z',
'component_type': 'host',
'event': 'misconfiguration',
'current_severity': 'critical',
'details': '',
'expected': None,
'id': 10088898,
'component_name': 'pudc-vm-001'
},
{
'category': 'array',
'code': 42,
'actual': None,
'opened': '2018-11-22T22:27:29Z',
'component_type': 'hardware',
'event': 'failure',
'current_severity': 'warning',
'details': '' ,
'expected': None,
'id': 10089121,
'component_name': 'ct1.eth15'
}]
I want to iterate over this and get only category, component_type, event and current_severity.
I tried a for loop but it says too values to unpack, obviously.
for k, v, b, n in data:
print(k, v, b, n) //do something
i essentially want a list that is filtered to have only category, component_type, event and current_severity. So that i can use the same for loop to get out my four key value pairs.
Or if there is a better way to do it? Please help me out.
Note: The stanzas in the list is not fixed, it keeps changing, it might have more than three stanzas.
You have a list of dictionaries, simple way to iterate over this is
category = [x['category'] for x in data]
Which prints the values of category key
['software', 'software', 'array']
Do the same for component_type, event and current_severity and you're good to go
If you know that every dict inside your current list of dicts should have at least the keys you're trying to extract their data, then you can use dict[key], however for safety, i prefer using dict.get(key, default value) like this example:
out = [
{
'category': elm.get('category'),
'component_type': elm.get('component_type'),
'event': elm.get('event'),
'current_severity': elm.get('current_severity')
} for elm in data
]
print(out)
Output:
[{'category': 'software',
'component_type': 'update',
'current_severity': 'info',
'event': 'new update available'},
{'category': 'software',
'component_type': 'host',
'current_severity': 'critical',
'event': 'misconfiguration'},
{'category': 'array',
'component_type': 'hardware',
'current_severity': 'warning',
'event': 'failure'}]
For more informations about when we should use dict.get() instead of dict[key], see this answer
with this you get a new list with only the keys you are interested on:
new_list = [{
'category': stanza['category'],
'component_type': stanza['component_type'],
'event': stanza['event'],
} for stanza in data]

Convert dictionary lists to multi-dimensional list of dictionaries

I've been trying to convert the following:
data = {'title':['doc1','doc2','doc3'], 'name':['test','check'], 'id':['ddi5i'] }
to:
[{'title':'doc1', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'test', 'id': 'ddi5i'},
{'title':'doc1', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc2', 'name': 'check', 'id': 'ddi5i'},
{'title':'doc3', 'name': 'check', 'id': 'ddi5i'}]
I've tried various options (list comprehensions, pandas and custom code) but nothing seems to work. For example, the following:
panda.DataFrame(data).to_dict('list')
throws an error because, since it tries to map the lists, all of them have to be of the same length. Besides, the output would only be uni-dimensional which is not what I'm looking for.
itertools.product may be what you're looking for here, and it can be applied to the values of your data to get appropriate value groupings for the new dicts. Something like
list(dict(zip(data, ele)) for ele in product(*data.values()))
Demo
>>> from itertools import product
>>> list(dict(zip(data, ele)) for ele in product(*data.values()))
[{'id': 'ddi5i', 'name': 'test', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'test', 'title': 'doc3'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc1'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc2'},
{'id': 'ddi5i', 'name': 'check', 'title': 'doc3'}]
It is clear how this works once seeing
>>> list(product(*data.values()))
[('test', 'doc1', 'ddi5i'),
('test', 'doc2', 'ddi5i'),
('test', 'doc3', 'ddi5i'),
('check', 'doc1', 'ddi5i'),
('check', 'doc2', 'ddi5i'),
('check', 'doc3', 'ddi5i')]
and now it is just a matter of zipping back into a dict with the original keys.

Categories