I'm trying to pull information from NIST's NVD, and I'm having trouble appending to an existing JSON as I grab data. I see how the code overwrites existing data, but I am unsure of how to tell the for loop to append instead of overwrite.
#Block of code:
jsonPathMedium = (filepath)
jsonPathMediumCompressed = (filepath)
base_url = "https://services.nvd.nist.gov/rest/json/cves/2.0?cvssV3Severity=MEDIUM&resultsPerPage=2000&"
headers = {"Accept": "application/json", "Authorization": "Bearer 123456 "}
ids = ['startIndex=0', 'startIndex=2000']
jsonAppend = []
for id in ids:
responseMedium = requests.get(base_url + str(id), headers=headers)
jsonAppend.append(responseMedium)
print('Grabbed data, making next request.')
print('Finishing pulling data.')
#converts data into json
jsonPrintMedium = responseMedium.json()
jsonObjectMedium = json.dumps(jsonPrintMedium, indent=4)
with open(jsonPathMedium, "w") as jsonFileMedium:
jsonFileMedium.write(str(jsonObjectMedium))
jsonFileMedium.close
print('Wrote to medium severity JSON file.')
mediumIn= open(jsonPathMedium, 'rb')
mediumOut = gzip.open(jsonPathMediumCompressed, 'wb')
mediumOut.writelines(mediumIn)
mediumOut.close
mediumIn.close
print('Compressed medium severity JSON file.')
Let's think about this in words. If I understand correctly, you want to do something like this:
for each id in a list of ids
get the JSON from an HTTP request for that specific id
append this JSON to a list of all of the results
write the list of results to a file
You already have some of the code for this, so I will borrow from it. There are a few details you are not quite getting right and I'll comment on those later. Here's what I suggest the code should be:
base_url = "https://services.nvd.nist.gov/rest/json/cves/2.0?cvssV3Severity=MEDIUM&resultsPerPage=2000&"
headers = {"Accept": "application/json", "Authorization": "Bearer 123456 "}
ids = ['startIndex=0', 'startIndex=2000']
jsonAppend = []
for id in ids:
responseMedium = requests.get(base_url + str(id), headers=headers)
jsonAppend.append(responseMedium.json()) # <----- parse the JSON from the request here
print('Grabbed data, making next request.')
json.dump(jsonPathMedium, jsonAppend) # <---- write all of the list to a single file, no need to make this more complicated
If you want to write the JSON to a compressed file, then I recommend you just do that directly. Don't write it to an uncompressed file first. I will leave the implementation as an exercise for the reader.
Related
I am trying to download files using python and then add lines at the end of the downloaded files, but it returns an error:
f.write(data + """<auth-user-pass>
TypeError: can't concat str to bytes
Edit: Thanks, it works now when I do this b"""< auth-user-pass >""", but I only want to add the string at the end of the file. When I run the code, it adds the string for every line.
I also tried something like this but it also did not work: f.write(str(data) + "< auth-user-pass >")
here is my full code:
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ", url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
save_path = 'ovpns/'
complete_path = os.path.join(save_path, file_name)
print(complete_path)
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(complete_path, 'wb') as f:
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""")
return url
servers = [
"us-ca72.nordvpn.com",
"us-ca73.nordvpn.com"
]
urls = []
for server in servers:
urls.append("https://downloads.nordcdn.com/configs/files/ovpn_legacy/servers/" + server + ".udp1194.ovpn")
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
for r in results:
print(r)
EDIT: Thanks, it works now when I do this b"""< auth-user-pass >""", but I only want to add the string at the end of the file. When I run the code, it adds the string for every line.
Try this:
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ", url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
save_path = 'ovpns/'
complete_path = os.path.join(save_path, file_name)
print(complete_path)
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(complete_path, 'wb') as f:
for data in r:
f.write(data)
return url
servers = [
"us-ca72.nordvpn.com",
"us-ca73.nordvpn.com"
]
urls = []
for server in servers:
urls.append("https://downloads.nordcdn.com/configs/files/ovpn_legacy/servers/" + server + ".udp1194.ovpn")
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
with open(complete_path, 'ab') as f:
f.write(b"""<auth-user-pass>
username
password
</auth-user-pass>""")
for r in results:
print(r)
You are using binary mode, encode your string before concat, that is replace
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""")
using
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""".encode())
You open the file as a write in binary.
Because of that you cant use normal strings like the comment from #user56700 said.
You either need to convert the string or open it another way(ex. 'a' = appending).
Im not completly sure but it is also possible that the write binary variant of open the data of the file deletes. Normally open with write deletes existing data, so its quite possible that you need to change it to 'rwb'.
I'm attempting to write a code in Python that will take the rows from a CSV file and pass them to an API call. If there is a successful return, I'd like to append yes to the match column that I added. If no data is returned, append no instead.
This is the current code to return the matching results of the first row:
headers = {
'Authorization': {token},
'Content-Type': 'application/json; charset=utf-8',
}
data = '[{
"name": "Company 1",
"email_domain": "email1.com",
"url": "https://www.url1.com"
}]'
response = requests.post(
'https://{base_url}/api/match',
headers=headers,
data=data
)
This code is working for each row if I manually pass in the data to the API call, but since there are hundreds of rows, I'd like to be able to iterate through each row, pass them through the API call, and append yes or no to the match column that I created. My strong suit is not writing for loops, which I believe is the way to attack this, but would love any input from someone who has done something similar.
Since you want to iterate over your csv file, you will have to use a for loop. The csv.DictReader can convert a csv file in a list of dictionaries which is exactly what you need:
import csv
with open('filename.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data = str([row])
alternatively you can use pandas.DataFrame.to_json(orient="index")
Assuming you have a src.csv file with the following contents:
company_id,company_name,url,company_domain,match
1,Company 1,https://www.url1.com,email1.com,
2,Company 2,https://www.url2.com,email2.io,
3,Company 3,https://www.url3.com,email3.com,
The following code snippet will read it and create a new tgt.csv file with the column match of each row set to yes or no, based on the results of the request.post() (you need to adjust it for your logic though):
import csv
import json
token = 'Your API Token Here'
base_url = 'https://some.base.url'
headers = {
'Authorization': token,
'Content-Type': 'application/json; charset=utf-8',
}
with open('src.csv') as src, open('tgt.csv', 'w', newline='') as tgt:
reader = csv.reader(src)
writer = csv.writer(tgt)
columns = next(reader)
writer.writerow(columns)
for company_id, company_name, url, company_domain, match in reader:
data = json.dumps([{
'name': company_name,
'email_domain': company_domain,
'url': url
}])
response = requests.post(
f'{base_url}/api/match',
headers=headers,
data=data
)
if response.ok:
match = 'yes'
else:
match = 'no'
writer.writerow((company_id, company_name, url, company_domain, match))
This is what i am suppose to do:
List all files in data/feedback folder
Scan all the files, and make a nested dictionary with Title, Name, Date & Feedback (All the files are in Title,Name, Date & Feedback format with each in a different line of file, that’s why using rstrip function)
Post the dictionary in The given url
Following is my code:
#!/usr/bin/env python3
import os
import os.path
import requests
import json
src = '/data/feedback/'
entries = os.listdir(src)
Title, Name, Date, Feedback = 'Title', 'Name', 'Date', 'Feedback'
inputDict = {}
for i in range(len(entries)):
fileName = entries[i]
completeName = os.path.join(src, fileName)
with open(completeName, 'r') as f:
line = f.readlines ()
line tuple = (line[0],line[1],line[2],line[3])
inputDict[fileName] = {}
inputDict[fileName][Title] = line_tuple[0].rstrip()
inputDict[fileName][Name] = line_tuple[1].rstrip()
inputDict[fileName][Date] = line_tuple[2].rstrip()
inputDict[fileName][Feedback] = line_tuple[3].rstrip()
x = requests.get ("http://website.com/feedback")
print (x.status_code)
r = requests.post ("http://Website.com/feedback” , data=inputDict)
print (r.status_code)
After i run it, get gives 200 code but post gives 500 code.
I just want to know if my script is causing the error or not ?
r = requests.post ("http://Website.com/feedback” , data=inputDict)
If your rest api endpoint is expecting json data then the line above is not doing that; it is sending the dictionary inputDict as form-encoded, as though you were submitting a form on an HTML page.
You can either use the json parameter in the post function, which sets the content-type in the headers to application/json:
r = requests.post ("http://Website.com/feedback", json=inputDict)
or set the header manually:
headers = {'Content-type': 'application/json'}
r = requests.post("http://Website.com/feedback", data=json.dumps(inputDict), headers=headers)
I'm not sure how I should ask this question. I'm looping through a csv file using panda (at least I think so). As I'm looping through rows, I want to pass a value from a specific column to run an http request for each row.
Here is my code so far:
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
df = pd.read_csv(f,)
value = df[['ID']].to_string(index=False)
print(value)
response = requests.get(REQUEST_URL + value,headers={'accept': 'application/json','ClientToken':TOKEN }
)
json_response = response.json()
print(json_response)
As you can see, I'm looping through the csv file to get the ID to pass it to my request url.
I'm not sure I understand the issue but looking at the console log it seems that print(value) is in the loop when the response request is not. In other words, in the console log I'm seeing all the ID printed but I'm seeing only one http request which is empty (probably because the ID is not correctly passed to it).
I'm running my script with cloud functions.
Actually, forgo the use of the Pandas library and simply iterate through csv
import csv
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
reader = csv.reader(f)
next(reader, None) # SKIP HEADERS
for row in reader: # LOOP THROUGH GENERATOR (NOT PANDAS SERIES)
value = row[0] # SELECT FIRST COLUMN (ASSUMED ID)
response = requests.get(
REQUEST_URL + value,
headers={'accept': 'application/json', 'ClientToken': TOKEN }
)
json_response = response.json()
print(json_response)
Give this a try instead:
def api_request(request):
fs = gcsfs.GCSFileSystem(project=PROJECT)
with fs.open('gs://project.appspot.com/file.csv') as f:
df = pd.read_csv(f)
for value in df['ID']:
response = requests.get(
REQUEST_URL + value,
headers = {'accept': 'application/json', 'ClientToken': TOKEN }
)
json_response = response.json()
print(json_response)
As mentioned in my comment, you haven't iterated through the data. What you are seeing is just the string representation of it with linebreaks (which might be why you mistakenly thought to be looping).
My problem is that I don't know how to work with the result of the search of a gif. I used an example, I know how to modify some parameters but I don't know how to build the gifs of the result. Code:
import requests
import json
# set the apikey and limit
apikey = "MYKEY" # test value
lmt = 8
# load the user's anonymous ID from cookies or some other disk storage
# anon_id = <from db/cookies>
# ELSE - first time user, grab and store their the anonymous ID
r = requests.get("https://api.tenor.com/v1/anonid?key=%s" % apikey)
if r.status_code == 200:
anon_id = json.loads(r.content)["anon_id"]
# store in db/cookies for re-use later
else:
anon_id = ""
# our test search
search_term = "love"
# get the top 8 GIFs for the search term
r = requests.get(
"https://api.tenor.com/v1/search?q=%s&key=%s&limit=%s&anon_id=%s" %
(search_term, apikey, lmt, anon_id))
if r.status_code == 200:
# load the GIFs using the urls for the smaller GIF sizes
top_8gifs = json.loads(r.content)
print (top_8gifs)
else:
top_8gifs = None
I would like to download the file. I know I can do it with urllib and request, but the problem is that I don't even know what is top_8gifs.
I hope someone could help me. I'm waiting you answer, thanks for your attention!!
First of all you have to use a legitimate key instead of MYKEY. Once you have done that you'll observe this code will print the output of the GET request that you have sent. It is a json file which is similar to a dictionary in python. So now you can exploit this dictionary and obtain the urls. The best strategy is to simply print out the output of json and observe the structure of dictionary carefully and extract the url from it. If you want more clarity we can use pprint module in python. It is pretty awesome and will show you how a json file looks properly. Here is the modified version of your code which pretty prints the json file, prints the gif urls and downloads the gif files. You can improve upon it and play with it if you want.
import requests
import json
import urllib.request,urllib.parse,urllib.error
import pprint
# set the apikey and limit
apikey = "YOURKEY" # test value
lmt = 8
# load the user's anonymous ID from cookies or some other disk storage
# anon_id = <from db/cookies>
# ELSE - first time user, grab and store their the anonymous ID
r = requests.get("https://api.tenor.com/v1/anonid?key=%s" % apikey)
if r.status_code == 200:
anon_id = json.loads(r.content)["anon_id"]
# store in db/cookies for re-use later
else:
anon_id = ""
# our test search
search_term = "love"
# get the top 8 GIFs for the search term
r = requests.get(
"https://api.tenor.com/v1/search?q=%s&key=%s&limit=%s&anon_id=%s" %
(search_term, apikey, lmt, anon_id))
if r.status_code == 200:
# load the GIFs using the urls for the smaller GIF sizes
pp = pprint.PrettyPrinter(indent=4)
top_8gifs = json.loads(r.content)
pp.pprint(top_8gifs) #pretty prints the json file.
for i in range(len(top_8gifs['results'])):
url = top_8gifs['results'][i]['media'][0]['gif']['url'] #This is the url from json.
print (url)
urllib.request.urlretrieve(url, str(i)+'.gif') #Downloads the gif file.
else:
top_8gifs = None