Im working on a take home project and i'm a little stuck on the last part of actually mapping the data from the provided csv file to the api response data i received.
The project involves writing a python script that uses an api to find project awards.
Using the CSV file provided, I need to map each project award from the API responses to the award winner within the csv file.
My python script has to output a CSV file with the original CSV file data plus the additional data provided via the API.
Here is what i have so far in two seperate files.
Im interviewing for a junior role so please be kind.
Thank you for your help!
url = 'https://api.federalreporter.nih.gov/v1/Projects/search?query=query%3Dorgstate%3ANY%2CDE%2CMD%2CNJ%2CPA%2CCT%2CRI%2CMA%2CVT%2CNH%2CME%24fy%3A2019%24agency%3ANIH&offset=1'
r = requests.get(url)
print("Status code:", r.status_code)
response_dict = r.json()
res_dict = response_dict['items']
print("items returned:", len(res_dict)) #printing len or amount of items within this key
res_dict2 = res_dict[0]
print("\nKeys:", len(res_dict2))
for key in sorted(res_dict2.keys()):
print(key)
print("\nSelected information about each project:")
for res_dict2 in res_dict:
print('Project number:', res_dict2['projectNumber'])
print('Agency:', res_dict2['agency'])
print('Title:', res_dict2['title'])
print('Department:', res_dict2['department'])
print("FY:", res_dict2['fy'])
print('Total Cost Amount:', res_dict2['totalCostAmount'])
print("State:", res_dict2['orgState'])
import csv
filename = 'legislators.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
lines = f.readlines()
print(header_row)
for index, column_header in enumerate(header_row):
print(index, column_header)
print(lines)
use kaggle to get dataset ,join through API to google Colabratory.Google colab is the best platform for machine Learning other than IDE
Related
I am using Python's twarc2 and the Twitter API to crawl tweets. So far I have code that crawls tweets according to a query and saves them to a CSV file.
I want to expand on this code: for each tweet crawled, I want to crawl some number (e.g. 2) of additional tweets from the current user (only for users with enough additional tweets available). I would like to save the first group of tweets to one CSV file, and the additional 2 tweets from the same user to a separate CSV.
Here is my current code for just crawling tweets:
import json
from twarc.client2 import Twarc2
from twarc.expansions import ensure_flattened
from twarc_csv import CSVConverter
import unicodedata
import csv
# query for tweets
t = Twarc2(bearer_token= <bearer token>)
query = "lang:en"
print(f"Searching for \"{query}\" tweets")
search_results = t.search_all(query=query, max_results=100)
# Get all results page by page:
count = 0
for page in search_results:
count+=1
if (count == 20):
break
# Do something with the page of results:
with open("tbt_results.jsonl", "w+", encoding="utf-8") as f:
f.write(json.dumps(page) + "\n")
print("Wrote a page of results...")
print("Converting to CSV...")
# This assumes `results.jsonl` is finished writing.
with open("tbt_results.jsonl", "r", encoding="utf-8") as infile:
with open("tbt_output.csv", "w", encoding="utf-8") as outfile:
converter = CSVConverter(infile, outfile)
converter.process()
print("Finished.")
How should I add to my current code to achieve my goal? Any help is much appreciated.
I am looking for some assistance with writing API results to a .CSV file using Python.
I have my source as CSV file. It contains the below urls in a column as separate rows.
https://webapi.nhtsa.gov/api/SafetyRatings/modelyear/2013/make/Acura/model/rdx?format=csv
https://webapi.nhtsa.gov/api/SafetyRatings/modelyear/2017/make/Chevrolet/model/Corvette?format=csv
I can call the Web API and get the printed results. Please find attached 'Web API results' snapshot.
When I try to export these results into a csv, I am getting them as per the attached 'API results csv'. It is not transferring all the records. Right now, It is only sending the last record to csv.
My final output should be as per the attached 'My final output should be' for all the given inputs.
Please find the below python code that I have used. I appreciate your help on this. Please find attached image for my code.My Code
import csv, requests
with open('C:/Desktop/iva.csv',newline ='') as f:
reader = csv.reader(f)
for row in reader:
urls = row[0]
print(urls)
r = requests.get(urls)
print (r.text)
with open('C:/Desktop/ivan.csv', 'w') as csvfile:
csvfile.write(r.text)
You'll have to create a writer object of the csvfile(to be created). and use the writerow() method you could write to the csvfile.
import csv,requests
with open('C:/Desktop/iva.csv',newline ='') as f:
reader = csv.reader(f)
for row in reader:
urls = row[0]
print(urls)
r = requests.get(urls)
print (r.text)
with open('C:/Desktop/ivan.csv', 'w') as csvfile:
writerobj=csv.writer(r.text)
for line in reader:
writerobj.writerow(line)
One problem in your code is that every time you open a file using open and mode w, any existing content in that file will be lost. You could prevent that by using append mode open(filename, 'a') instead.
But even better. Just open the output file once, outside the for loop.
import csv, requests
with open('iva.csv') as infile, open('ivan.csv', 'w') as outfile:
reader = csv.reader(infile)
for row in reader:
r = requests.get(urls[0])
outfile.write(r.text)
I am trying to bulk download movie information from The Movie Database. The preferred method mentioned on their website is to loop through movie IDs from 1 until the most recent movie ID. When I pull individual movies using their ID, I get the entire set of information. However, when I pull it into a loop, I receive an error 34, resource cannot be found. For my example, I picked specifically a movie ID that I have grabbed individual (Skyfall, 37724), which returns the resource cannot be found error.
import requests
dataset = []
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/x?api_key=*****&language=en-US'
movieurl = url[:35] + str(i) + url[36:]
payload = "{}"
response = requests.request("GET", url, data=payload)
data = response.json()
dataset.append(data)
print(movieurl)
dataset
[ANSWERED] 1) Is there a reason for why the loop cannot pull the information? Is this a programming question or specific to the API?
2) Is the way my code set up the best to pull the information and store it in bulk? My ultimate goal is to create a CSV file with the data.
Your request uses url, while your actual url is in the movieurl variable.
To write your data to csv, I would recommend the python csv DictWriter, as your data are dicts (response.json() produces a dict).
BONUS: If you want to format a string, use the string.format method:
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
this is much more robust.
The working, improved version of your code, with writing to csv would be:
import csv
import requests
with open('output.csv', 'w') as csvfile:
writer = csv.DictWriter(csvfile)
for i in range(37724, 37725):
url = 'https://api.themoviedb.org/3/movie/{id}?api_key=*****&language=en-US'.format(id=i)
payload = "{}"
response = requests.request("GET", url, data=payload)
writer.writerow(response.json())
I am trying to write a script (Python 2.7.11, Windows 10) to collect data from an API and append it to a csv file.
The API I want to use returns data in json.
It limits the # of displayed records though, and pages them.
So there is a max number of records you can get with a single query, and then you have to run another query, changing the page number.
The API informs you about the nr of pages a dataset is divided to.
Let's assume that the max # of records per page is 100 and the nr of pages is 2.
My script:
import json
import urllib2
import csv
url = "https://some_api_address?page="
limit = "&limit=100"
myfile = open('C:\Python27\myscripts\somefile.csv', 'ab')
def api_iterate():
for i in xrange(1, 2, 1):
parse_url = url,(i),limit
json_page = urllib2.urlopen(parse_url)
data = json.load(json_page)
for item in data['someobject']:
print item ['some_item1'], ['some_item2'], ['some_item3']
f = csv.writer(myfile)
for row in data:
f.writerow([str(row)])
This does not seem to work, i.e. it creates a csv file, but the file is not populated. There is obviously something wrong with either the part of the script which builds the address for the query OR the part dealing with reading json OR the part dealing with writing query to csv. Or all of them.
I have tried using other resources and tutorials, but at some point I got stuck and I would appreciate your assistance.
The url you have given provides a link to the next page as one of the objects. You can use this to iterate automatically over all of the pages.
The script below gets each page, extracts two of the entries from the Dataobject array and writes them to an output.csv file:
import json
import urllib2
import csv
def api_iterate(myfile):
url = "https://api-v3.mojepanstwo.pl/dane/krs_osoby"
csv_myfile = csv.writer(myfile)
cols = ['id', 'url']
csv_myfile.writerow(cols) # Write a header
while True:
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data['Dataobject']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['Links']['next'] # Get the next url
except KeyError as e:
break
with open(r'e:\python temp\output.csv', 'wb') as myfile:
api_iterate(myfile)
This will give you an output file looking something like:
id,url
1347854,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1347854
1296239,https://api-v3.mojepanstwo.pl/dane/krs_osoby/1296239
705217,https://api-v3.mojepanstwo.pl/dane/krs_osoby/705217
802970,https://api-v3.mojepanstwo.pl/dane/krs_osoby/802970
I have this (simplified for here) code:
csv_file = open('/'.join([settings.MEDIA_ROOT, 'people.csv']), 'wb')
writer = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
for person in Person.objects.all():
writer.writerow(['name', 'blabla', 'blablablanbl'])
writer.writerow(['adress', 'blabla', 'blablabla'])
#... many other things
csv_file.close()
csv_url = 'http://' + request.META['HTTP_HOST'] + "/media/%s" % os.path.basename(csv_file.name)
Then I return the URL of newly written csv file to remote API endpoint, and that endpoint reads the csv file back into its database etc..
Now the thing is that sometimes the csv file is written correctly at my endpoint but the other endpoint gets incomplete csv file e.g. the last 3 rows are missing etc..
what am I missing here? do I need to make all steps "write-close-return" in an atomic way?