Not able to get Country of a Tweet - Twython API - python

I am using the following code to collect Tweets pertaining to a certain topic but in all the tweets that I have extracted the 'places' attribute is None. Am I doing something wrong? Also, the code is meant to extract existing tweets and I do not need streaming api solution and not looking for this solution of streaming API : https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API
api = Twython(consumer_key, consumer_secret, access_key, access_secret)
tweets = []
MAX_ATTEMPTS = 200
COUNT_OF_TWEETS_TO_BE_FETCHED = 10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break # we got 500 tweets... !!
#----------------------------------------------------------------#
# STEP 1: Query Twitter
# STEP 2: Save the returned tweets
# STEP 3: Get the next max_id
#----------------------------------------------------------------#
# STEP 1: Query Twitter
if(0 == i):
# Query twitter for data.
results = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
else:
# After the first call we should have max_id from result of previous call. Pass it in query.
results = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)
# STEP 2: Save the returned tweets
for result in results['statuses']:
temp = ""
tweet_text = result['text']
temp += tweet_text.encode('utf-8') + " "
hashtags = result['entities']['hashtags']
for i in hashtags:
temp += i['text'].encode('utf-8') + " "
print result
#temp += i["place"]["country"] + "\n"
#output_file.write(temp)
# STEP 3: Get the next max_id
try:
# Parse the data returned to get max_id to be passed in consequent call.
next_results_url_params = results['search_metadata']['next_results']
next_max_id = next_results_url_params.split('max_id=')[1].split('&')[0]
except:
# No more next pages
break

The short answer is, No, you are doing nothing wrong. The reason why all place tags are empty is because statistically they are very unlikely to contain data. Only about 1% of all tweets have data in their place tag. This is because users rarely tweet their location. Location is off by default.
Download 100 or more tweets and you probably will find place tag data.

If place field is a MUST for all the tweet that you app will process, then you can limit your search over a place to make sure all the result will definitely have it.
You can doing so by setting geocode (latitude,longitude,radius[km/mi]) parameter, to limit your search within an area.
An example such request via Twython is:
geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)

Not all tweets have all fields like tweet_text, place, country, language etc.,
So, to avoid KeyError use the following approach. Modify your code so that when the key that you're looking for is not found, a default value is returned.
result.get('place', {}).get('country', {}) if result.get('place') != None else None
Here, the above line means "search for the key country after fetching the key place if it exists, otherwise return None "

kmario is right. Most tweets don't have this information, but a small percent do. Doing a location search will increase this chance e.g. https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1
"place": {
"id": "cba60fe77bc80469",
"url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
"place_type": "city",
"name": "Tallinn",
"full_name": "Tallinn, Harjumaa",
"country_code": "EE",
"country": "Eesti",
"contained_within": [],
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
24.5501404,
59.3518286
],
[
24.9262886,
59.3518286
],
[
24.9262886,
59.4981855
],
[
24.5501404,
59.4981855
]
]
]
},
"attributes": {}
},

Related

Reading JSON data in Python using Pagination, max records 100

I am trying to extract data from a REST API using python and put it into one neat JSON file, and having difficulty. The date is rather lengthy, with a total of nearly 4,000 records, but the max record allowed by the API is 100.
I've tried using some other examples to get through the code, and so far this is what I'm using (censoring the API URL and auth key, for the sake of confidentiality):
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.airtable.com/v0/CENSORED/Vendors?maxRecords=100"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer CENSORED"
resp = requests.get(url, headers=headers)
resp.content.decode("utf-8")
vendors = []
new_results = True
page = 1
while new_results:
centiblock = requests.get(url + f"&page={page}", headers=headers).json()
new_results = centiblock.get("results", [])
vendors.extend(centiblock)
page += 1
full_directory = json.dumps(vendors, indent=4)
print(full_directory)
For the life of me, I cannot figure out why it isn't working. The output keeps coming out as just:
[
"records"
]
If I play around with the print statement at the end, I can get it to print centiblock (so named for being a block of 100 records at a time) just fine - it gives me 100 records in un-formated text. However, if I try printing vendors at the end, the output is:
['records']
...which leads me to guess that somehow, the vendors array is not getting filled with the data. I suspect that I need to modify the get request where I define new_results, but I'm not sure how.
For reference, this is a censored look at how the json data begins, when I format and print out one centiblock:
{
"records": [
{
"id": "XXX",
"createdTime": "2018-10-15T19:23:59.000Z",
"fields": {
"Vendor Name": "XXX",
"Main Phone": "XXX",
"Street": "XXX",
Can anyone see where I'm going wrong?
Thanks in advance!
When you are extending vendors with centiblock, your are giving a dict to the extend function. extend is expecting an Iterable, so that works, but when you iterate over a python dict, you only iterate over the keys of the dict. In this case, ['records'].
Note as well, that your loop condition becomes False after the first iteration, because centiblock.get("results", []) returns [], since "results" is not a key of the output of the API. and [] has a truthiness value of False.
Hence to correct those errors you need to get the correct field from the API into new_results, and extend vendors with new_results, which is itself an array. Note that on the last iteration, new_results will be the empty list, which means vendors won't be extended with any null value, and will contain exactly what you need:
This should look like:
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.airtable.com/v0/CENSORED/Vendors?maxRecords=100"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer CENSORED"
resp = requests.get(url, headers=headers)
resp.content.decode("utf-8")
vendors = []
new_results = True
page = 1
while len(new_results) > 0:
centiblock = requests.get(url + f"&page={page}", headers=headers).json()
new_results = centiblock.get("records", [])
vendors.extend(new_results)
page += 1
full_directory = json.dumps(vendors, indent=4)
print(full_directory)
Note that I replaced the while new_results with a while len(new_results)>0 which is equivalent in this case, but more readable, and better practice in general.

How to speed up dictionary search in python?

Basically I have a function that returns an API response with a huge amount of dictionaries, simplified to their keys, I then have another function, called getPlayerData which sends an api call to the same api to get information about the specific player, instead of all of them, the problem is that alone, getPlayerData is fast, but in this scenario, it is way more than unusable.
Is there a way i can speed up this? getPlayerData is not required, I can just make a request too.
The dictionary search
residents = []
for resident in getListData("resident"):
if getPlayerData(resident)["town"] == town:
residents.append(resident)
print(residents)
getPlayerData()
def getPlayerData(player):
r = requests.get("http://srv.earthpol.com/api/json/residents.php?name=" + player)
j = r.json()
player = player.lower()
global emptyresult
emptyresult = False
if str(j) == "{}":
emptyresult = True
else:
result = {"town": j[player]["town"],
"town-rank": j[player]["townRank"],
"nation-ranks": j[player]["nationRanks"],
"lastOnline:": j[player]["lastOnline"],
"registered": j[player]["registered"],
"town-title": j[player]["title"],
"nation-title": j[player]["surname"],
"friends": j[player]["friends"],
"uuid": j[player]["uuid"],
"avatar": "https://crafatar.com/avatars/"+ j[player]["uuid"]}
return result

How to retrieve customer id from create customer method in Square using Python

I'm creating a customer in square and getting the results as follows. What I need is to get the id of customer.
My code :
from square.client import Client
client = Client(
access_token=settings.SQUARE_ACCESS_TOKEN,
environment=settings.SQUARE_ENVIRONMENT,
)
api_customers = client.customers
request_body = {'idempotency_key': idempotency_key, 'given_name': name, 'company_name': company,'phone_number':phone}
result = api_customers.create_customer(request_body)
And this is the output:
<ApiResponse [{"customer":
{"id": "F8M9KDHWPMYGK2108RMQVQ6FHC",
"created_at": "2020-10-22T09:14:50.159Z",
"updated_at": "2020-10-22T09:14:50Z",
"given_name": "mkv5",
"phone_number": "900000066666",
"company_name": "codesvera",
"preferences": {"email_unsubscribed": false},
"creation_source": "THIRD_PARTY"}
}
]>
Are you using this library ?
https://github.com/square/square-python-sdk/blob/master/square/http/api_response.py
if yes result is an array and APiResponse object.
so first you should do that : result = result.body
then to get the ID: result['customer']['id']
Ps : You have exemple in the github doc :
https://github.com/square/square-python-sdk
# Initialize the customer count
total_customers = 0
# Initialize the cursor with an empty string since we are
# calling list_customers for the first time
cursor = ""
# Count the total number of customers using the list_customers method
while True:
# Call list_customers method to get all customers in this Square account
result = api_customers.list_customers(cursor)
if result.is_success():
# If any customers are returned, the body property
# is a list with the name customers.
# If there are no customers, APIResponse returns
# an empty dictionary.
if result.body:
customers = result.body['customers']
total_customers += len(customers)
# Get the cursor if it exists in the result else set it to None
cursor = result.body.get('cursor', None)
print(f"cursor: {cursor}")
else:
print("No customers.")
break
# Call the error method to see if the call failed
elif result.is_error():
print(f"Errors: {result.errors}")
break
# If there is no cursor, we are at the end of the list.
if cursor == None:
break
print(f"Total customers: {total_customers}")

While loop to make API calls until a condition is met

I want to make API calls until a condition is met. I figured I might use a while loop.
I have a JSON response from the server that is paginated.
{
"services": [
{
"id": "ABC12",
"name": "Networks",
"description": null,
"status": "active",
"teams": [
{
"id": "XYZ12",
"type": "team_reference",
"summary": "Network Systems ",
}
],
"acknowledgement_timeout": null,
"auto_resolve_timeout": null,
"alert_grouping": "intelligent",
"alert_grouping_timeout": null,
"integrations": [],
"response_play": null,
"type": "service",
"summary": "All Events",
}
],
"limit": 25,
"offset": 0,
"total": null,
"more": true
}
limit - max I can set is 100.
offset - If specified, shows results from that point.
more - If TRUE, there are more results. If FALSE, that is the end.
for more info on this pagination - https://v2.developer.pagerduty.com/docs/pagination
I need to match the name "Networks" and get its corresponding id "ABC12". The problem is, I have to paginate make multiple calls to the API.
I have written this so far.
import requests
import json
import urllib3
# Supress SSL warnings
urllib3.disable_warnings()
# API key
API_KEY = '12345asdfg'
def list_services():
x = 25
y = 0
results = []
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
while current_page['more'] == 'True':
y = y + 1
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
print(results) # Does not print anything
print(results) # Prints only the first call results, the while loop
# doesn't seem to work.
if __name__ == '__main__':
list_services()
the print(results) outside the while loop prints only the first API call results. The while loop doesn't seem to work. But the code compiles without any errors.
how do I set the value of x to 25 and make API calls and append the results to results until more is false?
OR
how do I make multiple API calls until I find the match. If I found a match, then stop making the call.
Or is there a better cleaner way to do this?
This does not work because you never actually reassign the url variable once y is changed. Also you are checking against 'True' which is a string, not a boolean value. In addition I believe the offset should increase by the amount of results everytime; not just one. For example if on your first call you get results 1-25. Then if you increase y by one, the second call will yield 2-26. Instead you should increase it by the limit. This way on the second call you get results 25-50. Here is how I would do this:
def list_services():
x = 25
y = 0
results = []
serv_id = None
flag = False
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
for serv_set in current_page['services']:
if serv_set['name'] == 'Networks':
serv_id = serv_set['id']
flag = True
while current_page['more'] == True and not flag:
for serv_set in current_page['services']:
if serv_set['name'] == 'Networks':
serv_id = serv_set['id']
break
y += x
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
print(results)
print(results, serv_id)
You could further clean this up to avoid some redundancy but this should work. You should also check the status of the API call to ensure that you have a valid response.
Edit:
I edited in the issue dealing with obtaining the id attribute when the name == 'Networks'. Once again you could reduce the redundancy in this a lot but this will get you on the right track. Now serv_id = the id of the service with the name of Networks. If no match is found at the end of the iterations then serv_id will be None.
while current_page['more'] == 'True':
You are checking for a string called 'True' instead of a boolean of True, as is defined in your json file. This could be why your while loop is never executing, and you are not receiving your print statement.
Also, generally for API calls that have more than 1 page of data, you need to specify which page you are getting. Which means you need to reinitialize your payload in your while loop.
For example, if an API has a parameter called "page" that you can pass in, in your while loop you would have to pass in page = 1, page = 2, etc. as a payload.

IN Query not working for Amazon DynamoDB

I would like to check retrieve items that have an attribute value that is present in the list of value I provide. Below is the query I have for searching. Unfortunately the response return an empty list of items. I don't understand why this is the case and would like to know the correct query.
def search(self, src_words, translations):
entries = []
query_src_words = [word.decode("utf-8") for word in src_words]
params = {
"TableName": self.table,
"FilterExpression": "src_word IN (:src_words) AND src_language = :src_language AND target_language = :target_language",
"ExpressionAttributeValues": {
":src_words": {"SS": query_src_words},
":src_language": {"S": config["source_language"]},
":target_language": {"S": config["target_language"]}
}
}
page_iterator = self.paginator.paginate(**params)
for page in page_iterator:
for entry in page["Items"]:
entries.append(entry)
return entries
Below is the table that I would like to query from. For example if my list of query_src_word have: [soccer ball, dog] then only row with entry_id=2 should be returned
Any insights would be much appreciated.
I think this is because in the query_src_word you have "soccer_ball" (with an underscore), while in the database you have "soccer ball" (without an underscore).
Change "soccer_ball" to "soccer ball" in your query_src_words and it should work find

Categories