I am trying to write some code with the Hunter.io API to automate some of my b2b email scraping. It's been a long time since I've written any code and I could use some input. I have a CSV file of Urls, and I want to call a function on each URL that outputs a dictionary like this:
`{'domain': 'fromthebachrow.com', 'webmail': False, 'pattern': '{f}{last}', 'organization': None, 'emails': [{'value': 'fbach#fromthebachrow.com', 'type': 'personal', 'confidence': 91, 'sources': [{'domain': 'fromthebachrow.com', 'uri': 'http://fromthebachrow.com/contact', 'extracted_on': '2017-07-01'}], 'first_name': None, 'last_name': None, 'position': None, 'linkedin': None, 'twitter': None, 'phone_number': None}]}`
for each URL I call my function on. I want my code to return just the email address for each key labeled 'value'.
Value is a key that is contained in a list that itself is an element of the directory my function outputs. I am able to access the output dictionary to grab the list that is keyed to 'emails', but I don't know how to access the dictionary contained in the list. I want my code to return the value in that dictionary that is keyed with 'value', and I want it to do so for all of my urls.
from pyhunyrt import PyHunter
import csv
file=open('urls.csv')
reader=cvs.reader (file)
urls=list(reader)
hunter=PyHunter('API Key')
for item in urls:
output=hunter.domain_search(item)
output['emails'`
which returns a list that looks like this for each item:
[{
'value': 'fbach#fromthebachrow.com',
'type': 'personal',
'confidence': 91,
'sources': [{
'domain': 'fromthebachrow.com',
'uri': 'http://fromthebachrow.com/contact',
'extracted_on': '2017-07-01'
}],
'first_name': None,
'last_name': None,
'position': None,
'linkedin': None,
'twitter': None,
'phone_number': None
}]
How do I grab the first dictionary in that list and then access the email paired with 'value' so that my output is just an email address for each url I input initially?
To grab the first dict (or any item) in a list, use list[0], then to grab a value of a key value use ["value"]. To combine it, you should use list[0]["value"]
Related
I am calling the list operation to retrieve the metadata values of a blob storage.
My code looks like:
blob_service_list = storage_client.blob_services.list('rg-exercise1', 'sa36730')
for items in blob_service_list:
print((items.as_dict()))
What's happening in this case is that the returned output only contains the items which had a corresponding Azure object:
{'id': '/subscriptions/0601ba03-2e68-461a-a239-98cxxxxxx/resourceGroups/rg-exercise1/providers/Microsoft.Storage/storageAccounts/sa36730/blobServices/default', 'name': 'default', 'type': 'Microsoft.Storage/storageAccounts/blobServices', 'sku': {'name': 'Standard_LRS', 'tier': 'Standard'}, 'cors': {'cors_rules': [{'allowed_origins': ['www.xyz.com'], 'allowed_methods': ['GET'], 'max_age_in_seconds': 0, 'exposed_headers': [''], 'allowed_headers': ['']}]}, 'delete_retention_policy': {'enabled': False}}
Where-as, If I do a simple print of items, the output is much larger:
{'additional_properties': {}, 'id': '/subscriptions/0601ba03-2e68-461a-a239-98c1xxxxx/resourceGroups/rg-exercise1/providers/Microsoft.Storage/storageAccounts/sa36730/blobServices/default', 'name': 'default', 'type': 'Microsoft.Storage/storageAccounts/blobServices', 'sku': <azure.mgmt.storage.v2021_06_01.models._models_py3.Sku object at 0x7ff2f8f1a520>, 'cors': <azure.mgmt.storage.v2021_06_01.models._models_py3.CorsRules object at 0x7ff2f8f1a640>, 'default_service_version': None, 'delete_retention_policy': <azure.mgmt.storage.v2021_06_01.models._models_py3.DeleteRetentionPolicy object at 0x7ff2f8f1a6d0>, 'is_versioning_enabled': None, 'automatic_snapshot_policy_enabled': None, 'change_feed': None, 'restore_policy': None, 'container_delete_retention_policy': None, 'last_access_time_tracking_policy': None}
Any value which is None has been removed from my example code. How can I extend my example code to include the None fields and have the final output as a list?
I tried in my environment and got below results:
If you need to include the None values in the dictionary you can follow the below code:
Code:
from azure.mgmt.storage import StorageManagementClient
from azure.identity import DefaultAzureCredential
storage_client=StorageManagementClient(credential=DefaultAzureCredential(),subscription_id="<your sub id>")
blob_service_list = storage_client.blob_services.list('v-venkat-rg', 'venkat123')
for items in blob_service_list:
items_dict = items.as_dict()
for key, value in items.__dict__.items():
if value is None:
items_dict[key] = value
print(items_dict)
Console:
The above code executed with None value successfully.
I'm trying to figure out how to add the same object to every array.
I'm requesting data from the server for the "first game". When I get it back, it doesn't include any data referencing the first game. So I need to edit it before I send it to my server to save.
I have a json request that looks like this:
{
'dateTime': '2022-07-01T01:00:00.000000',
'httpStatus': 'OK',
'message': 'SUCCESS',
'details': None,
'detailsList': [
{
'date': '2021-07-01T00:00:00',
'tcount': 0,
'first_name': 'Sam',
'last_name': 'Smith'
},
{
'user_reg_date': '2022-06-01T00:00:00',
'tcount': 0,
'first_name': 'Bob',
'last_name': 'Jones'
}]
}
I'm trying to figure out how to add an object to each json array (hope I'm saying that the right way) before I then send it to a mongodb.
In this example:
'game': 'first'
It would then look like this:
{
'dateTime': '2022-07-01T01:00:00.000000',
'httpStatus': 'OK',
'message': 'SUCCESS',
'details': None,
'detailsList': [
{
'date': '2021-07-01T00:00:00',
'tcount': 0,
'first_name': 'Sam',
'last_name': 'Smith',
'game': 'first'
},
{
'user_reg_date': '2022-06-01T00:00:00',
'tcount': 0,
'first_name': 'Bob',
'last_name': 'Jones',
'game': 'first'
}]
}
if there is a better way to do this, that would work as well.
Your asking something similar to this question. But want to loop through a JSON array in Python.
You are not trying to add an 'object' to json 'array'. You have a JSON object, of which there are object members (or more commonly, properties). You have an array as an object member of which you want to modify each object in the array to add a new property.
The code below is the above link with some modification to fit your needs. request_data is the JSON object you give in your example above. You need enumerate to know which index to edit.
for index, item in enumerate(request_data['detailsList']):
# Edit the array entry
item["game"] = 'first'
# Save it
request_data['detailsList'][index] = item
Having a bit of difficulties here with looping through this json object content.
The json file is as such:
[{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
}, ... ]
And I want to loop through each dict in the list check that the collection is not empty and then for each collection if the location equals '/450/' return append that dict to a list.
My code is as follows.
content = json.loads(res.text)
for q in content:
if q['collection']:
for col in q['collection']:
if col['location'] == '/450/':
data.append(q)
print(data)
Having played around with it I keep either getting ValueError: too many values to unpack (expected 2) OR TypeError: string indices must be integers
Any help with my structure would be much appreciated thanks.
Disclaimer:
I had previously written this as a list comprehension and it worked like a charm however that doesnt work anymore as I now need to check if the collection is empty.
How I wrote it previously:
content = [ x for x in content if x['collection']['location'] == '/450/']
That should work for you:
for q in content:
if q['collection']['location'] == '/450/':
data.append(q)
print(data)
If you go with for loop with for col in q['collection'], you just iterate over keys inside q['collection'], so cols = ['archived', 'authority_level', ...].
From your previous list comprehension, "location" is a key in q["collection"].
When you write
for col in q["collection"]
You are iterating over the keys in q["collection"]. One of these keys is "location". Your for loop seems to iterate more than necessary:
if q['collection'] and "location" in q["collection"] and q["collection"]["location"] == "/450/":
data.append(q)
Your Code Has Way too Iterations Than needed.
The error TypeError: string indices must be integers occurs at the second conditional statement when you check col['location'] = "/450/".
That's because not all tokens in the collection object have sub-objects where you can get data with their key.
Take a look at your old code and the modified code for more in depth understanding.
# Your old json datas
content = [{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
} ]
data = []
for q in content:
if q['collection']:
for col in q['collection']:
if col['location'] == '/450/': # The first object in collection object is [archived] which is a string, this causes the program to throw error
data.append(q)
print(data)
Here is the modified code
# Your json datas
json_datas = [{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
} ]
list_data = [] # Your list data in which appends the json data if the location is /450/
for data in json_datas: # Getting each Json data
if len(data["collection"]): # Continue if the length of collection is not 0 [NOTE: 0 = False, 1 or more = True]
if data['collection']['location'] == "/450/": # Check the location
list_data.append(data) # Append if true
print(list_data)
Don't need to iterate over the collection object since it's a dictionary and just need to check the location property.
Also, in case the "collection" or "location" properties are not present then use dict.get(key) function rather than dict[key] since the latter will raise a KeyError exception if key is not found and get() returns None value if key is not found.
content = [{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
},
{'foo': None}
]
#content = json.loads(res.text)
data = []
for q in content:
c = q.get('collection')
if c and c.get('location') == '/450/':
data.append(q)
print(data)
Output:
[{'archived': False, 'cache_ttl': None, 'collection': { 'location': '/450/', 'name': 'eaf', 'namespace': None }, ...}]
In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
{
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
}],
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'Rule': 'AUTODISABLED',
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
}],
'Type': 'user'
}]
}
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
outputlist=(output["RuleGroups"])
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
groupRuleID.remove(value)
break
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.
In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
try:
lstRule = [item['Rule'] for item in sublist]
ruleClean.append(lstRule)
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
ruleIDListClean.append(lstID)
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic
I'm very new to python and please treat me as same.
When i tried to convert the XML content into List of Dictionaries I'm getting output but not as expected and tried a lot playing around.
XML Content:
<project>
<panelists>
<panelist panelist_login="pradeep">
<login/>
<firstname/>
<lastname/>
<gender/>
<age>0</age>
</panelist>
<panelist panelist_login="kumar">
<login>kumar</login>
<firstname>kumar</firstname>
<lastname>Pradeep</lastname>
<gender/>
<age>24</age>
</panelist>
</panelists>
</project>
Code i have used:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file.xml) # import xml from
root = tree.getroot()
Panelist_list = []
for item in root.findall('./panelists/panelist'): # find all projects node
Panelist = {} # dictionary to store content of each projects
panelist_login = {}
panelist_login = item.attrib
Panelist_list.append(panelist_login)
for child in item:
Panelist[child.tag] = child.text
Panelist_list.append(Panelist)
print(Panelist_list)
Output:
[{
'panelist_login': 'pradeep'
}, {
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'
}, {
'panelist_login': 'kumar'
}, {
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'
}]
and I'm Expecting for the below Output
[{
'panelist_login': 'pradeep',
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'
}, {
'panelist_login': 'kumar'
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'
}]
I have refereed so many stack overflow questions on xml tree but still didn't helped me.
any help/suggestion is appreciated.
Your code is appending the dict panelist_login with the tag attributes to the list, in this line: Panelist_list.append(panelist_login) separately from the Panelist dict. So for every <panelist> tag the code appends 2 dicts: one dict of tag attributes and one dict of subtags. Inside the loop you have 2 append() calls, which means 2 items in the list for each time through the loop.
But you actually want a single dict for each <panelist> tag, and you want the tag attribute to appear inside the Panelist dict as if it were a subtag also.
So have a single dict, and update the Panelist dict with the tag attributes instead of keeping the tag attributes in a separate dict.
for item in root.findall('./panelists/panelist'): # find all projects node
Panelist = {} # dictionary to store content of each projects
panelist_login = item.attrib
Panelist.update(panelist_login) # make panelist_login the first key of the dict
for child in item:
Panelist[child.tag] = child.text
Panelist_list.append(Panelist)
print(Panelist_list)
I get this output, which I think is what you had in mind:
[
{'panelist_login': 'pradeep',
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'},
{'panelist_login': 'kumar',
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'}
]