Advice on automating datamining with Python

Advice on automating datamining with Python - python

I am a Biologist with a little programming experience in Python. One of my research methods involves profiling large gene lists using this database: https://david.ncifcrf.gov/
Can anyone advise me on whether it would be possible to do a keyword search of the output and return the gene name associated with the keyword? This is for the "Table" output which looks this: https://david.ncifcrf.gov/annotationReport.jsp?annot=59,12,87,88,30,38,46,3,5,55,53,70,79&currentList=0
There are also backend and api options.
All insight and advice is greatly appreciated.

If there is an API which gives you all the data, you can automate almost everything associated with it. API's are either REST or SOAP so first you need to figure out what you need.
If the API is RESTful:
import urllib2, json
url = "https://mysuperapiurl.com/api-ws/api/port/"
u = 'APIUsername'
p = 'APIPassword'
def encodeUserData(user, password):
return "Basic " + (user + ":" + password).encode("base64").rstrip()
req = urllib2.Request(url)
req.add_header('Accept', 'application/json')
req.add_header("Content-type", "application/x-www-form-urlencoded")
req.add_header('Authorization', encodeUserData(u, p))
res = urllib2.urlopen(req)
j = json.load(res) # Here is all the data from the API
json_str= json.dumps(j) # this is the same as above as string
if the API is SOAP, it gets a bit harder. What I recommend is zeep. If that is not possible because your server is 2.6 or because several people are working on it, then use suds.
with suds an API call looks like this:
import logging, time, requests, re, suds_requests
from datetime import timedelta,date,datetime,tzinfo
from requests.auth import HTTPBasicAuth
from suds.client import Client
from suds.wsse import *
from suds import null
from cStringIO import StringIO
from bs4 import BeautifulSoup as Soup
log_stream = StringIO()
logging.basicConfig(stream=log_stream, level=logging.INFO)
logging.getLogger('suds.transport').setLevel(logging.DEBUG)
logging.getLogger('suds.client').setLevel(logging.DEBUG)
WSDL_URL = 'http://213.166.38.97:8080/SRIManagementWS/services/SRIManagementSOAP?wsdl'
username='username'
password='password'
session = requests.session()
session.auth=(username, password)
def addSecurityHeader(client,username,password):
security=Security()
userNameToken=UsernameToken(username,password)
security.tokens.append(userNameToken)
client.set_options(wsse=security)
addSecurityHeader(client,username,password)
arg1 = "argument_1"
arg2 = "argument_2"
try:
client.service.GetServiceById(arg1, arg2)
except TypeNotFound as e:
print e
logresults = log_stream.getvalue()
You will receive xml in return so i use beautifulsoup to prettify the results:
soup = Soup(logresults)
print soup.prettify()
Ok so the API connection part is covered, where do you store your data, and where do you iterate over this data to perform a keyword search? In your database. I recommend MySQLdb. Setup your table and think about what information (that you collect from API) you're going to store in which column.
def dbconnect():
try:
db = MySQLdb.connect(
host='localhost',
user='root',
passwd='password',
db='mysuperdb'
)
except Exception as e:
sys.exit("Can't connect to database")
return db
def getSQL():
db = dbconnect()
cursor = db.cursor()
sql = "select * from yoursupertable"
dta = cursor.execute(sql)
results = cursor.fetchall()
return results
def dataResult():
results = getSQL()
for column in results:
id = (column[1])
print dataResult()
So this is where you set your keywords (could also do it via another SQL) and compare the results you extract from your database with a list, dict, textfile or hardcoded keywords and define what to do if they match etc :)

Related

Extract score for an Application Hash using Virustotal and Python

I am trying to get the score for Application hash and IP address using VirusTotal API.
The code works fine for IP address. See the code below:
###### Code starts
import json
import urllib.request
from urllib.request import urlopen
url = 'https://www.virustotal.com/vtapi/v2/ip-address/report'
parameters = {'ip': '90.156.201.27', 'apikey': 'apikey'}
response = urllib.request.urlopen('%s?%s' % (url, urllib.parse.urlencode(parameters))).read()
response_dict = json.loads(response)
#### Code ends
But the same does not work for Application Hash. Has anyone worked on this before:
For example, the Application Hash " f67ce4cdea7425cfcb0f4f4a309b0adc9e9b28e0b63ce51cc346771efa34c1e3" has a score of 29/67. See the image here. Has anyone worked on this API to get the score.

You can Try the same with requests module in python library
import requests
params = {'apikey': '<your api key>', 'resource':<your hash>}
headers = {"Accept-Encoding": "gzip, deflate","User-Agent" : "gzip, My Python
requests library example client or username"}
response_dict={}
try:
response_dict = requests.get('https://www.virustotal.com/vtapi/v2/file/report',
params=params).json()
except Exception as e:
print(e)
And you can use this to get the data:
if response_dict.get("response_code") != None and response_dict.get("response_code") > 0:
# Hashes
sample_info["md5"] = response_dict.get("md5")
# AV matches
sample_info["positives"] = response_dict.get("positives")
sample_info["total"] = response_dict.get("total")
print(sample_info["md5"]+" Positives: "+str(sample_info["positives"])+"Total "+str(sample_info["total"]))
else:
print("Not Found in VT")
For reference check virustotalapi which lets you use multiple api keys simultaneously.

Iinvalid subscription key when using Microsoft Emotion API for Video in API Testing Console and in Python 2.7

I only manage to use the Emotion API subscription key for pictures but never for videos. It makes no difference whether I use the API Testing Console or try to call the Emotion API by Pathon 2.7. In both cases I get a response status 202 Accepted, however when opening the Operation-Location it says
{ "error": { "code": "Unauthorized", "message": "Access denied due to
invalid subscription key. Make sure you are subscribed to an API you are
trying to call and provide the right key." } }
On the Emotion API explanatory page it says that Response 202 means that
The service has accepted the request and will start the process later.
In the response, there is a "Operation-Location" header. Client side should further query the operation status from the URL specified in this header.
Then there is Response 401, which is exactly what my Operation-Location contains. I do not understand why I'm getting a response 202 which looks like response 401.
I have tried to call the API with Python using at least three code versions that I found on the Internet that
all amount to the same, I found the code here :
Microsoft Emotion API for Python - upload video from memory
python-upload-video-from-memory
import httplib
import urllib
import base64
import json
import pandas as pd
import numpy as np
import requests
_url = 'https://api.projectoxford.ai/emotion/v1.0/recognizeInVideo'
_key = '**********************'
_maxNumRetries = 10
paramsPost = urllib.urlencode({'outputStyle' : 'perFrame', \
'file':'C:/path/to/file/file.mp4'})
headersPost = dict()
headersPost['Ocp-Apim-Subscription-Key'] = _key
headersPost['content-type'] = 'application/octet-stream'
jsonGet = {}
headersGet = dict()
headersGet['Ocp-Apim-Subscription-Key'] = _key
paramsGet = urllib.urlencode({})
responsePost = requests.request('post', _url + "?" + paramsPost, \
data=open('C:/path/to/file/file.mp4','rb').read(), \
headers = headersPost)
print responsePost.status_code
videoIDLocation = responsePost.headers['Operation-Location']
print videoIDLocation
Note that changing _url = 'https://api.projectoxford.ai/emotion/v1.0/recognizeInVideo' to _url =
'https://westus.api.cognitive.microsoft.com/emotion/v1.0/recognizeInVideo' doesn't help.
However, afterwards I wait and run every half an hour:
getResponse = requests.request('get', videoIDLocation, json = jsonGet,\
data = None, headers = headersGet, params = paramsGet)
print json.loads(getResponse.text)['status']
The outcome has been 'Running' for hours and my video is only about half an hour long.
Here is what my Testing Console looks like Testing Console for Emotion API, Emotion Recognition in Video
Here I used another video that is about 5 minutes long and available on the internet. I found the video in a different usage example
https://benheubl.github.io/data%20analysis/fr/
that uses a very similar code, which again gets me a response status 202 Accepted and when opening the Operation-Location the subscription key is wrong
Here the code:
import httplib
import urllib
import base64
import json
import pandas as pd
import numpy as np
import requests
# you have to sign up for an API key, which has some allowances. Check the
API documentation for further details:
_url = 'https://api.projectoxford.ai/emotion/v1.0/recognizeinvideo'
_key = '*********************' #Here you have to paste your
primary key
_maxNumRetries = 10
# URL direction: I hosted this on my domain
urlVideo = 'http://datacandy.co.uk/blog2.mp4'
# Computer Vision parameters
paramsPost = { 'outputStyle' : 'perFrame'}
headersPost = dict()
headersPost['Ocp-Apim-Subscription-Key'] = _key
headersPost['Content-Type'] = 'application/json'
jsonPost = { 'url': urlVideo }
responsePost = requests.request( 'post', _url, json = jsonPost, data = None,
headers = headersPost, params = paramsPost )
if responsePost.status_code == 202: # everything went well!
videoIDLocation = responsePost.headers['Operation-Location']
print videoIDLocation
There are further examples on the internet and they all seem to work but replicating any of them never worked for me. Does anyone have any idea what could be wrong?

The Video Feature of Emotion API retires October 30th, so maybe you should change your procedure to screenshots anyways.
But for your question: The API returns you an URL where your results are accessible. You cannot open this URL in your browser, this will give you the notice of "invalid key", instead you need to call over python again this URL including your key.
I will post you my code how to get the score, I am using Python 3, so there might be some adjustments necessary. Only "tricky" point is getting the Operation ID, which is just the ID in the URL ( =location in my case) which leads to your request. Rest of the parameters like subscription key etc. is as before.
#extract operation ID from location-string
OID = location[67:]
bod = ""
try:
conn =
http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')
conn.request("GET", "/emotion/v1.0/operations/"+OID+"?%s" %params, bod, headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))

Did you verify your API call is working using curl? Always prototype calls using curl first. If it works in curl but not in Python, use Fiddler to observe the API request and response.

I also found an answer in the following link, all steps are explained:
https://gigaom.com/2017/04/10/discover-your-customers-deepest-feelings-using-microsoft-facial-recognition/

Python script to retrieve data from MySQL and post it to web server

I am trying to come up with a python script to retrieve data from MySQL and post the data in json format to a web server. I have two separate python codes, one for retrieving the data in MySQL and one for posting the data in json format. The main issue that I am facing is that I do not know how to integrate them together.
Code for retrieving data from MySQL:
import MySQLdb
db = MySQLdb.connect("locahost", "root", "12345", "testdatabase")
curs=db.cursor()
curs.execute("SELECT * from mydata")
reading = curs.fetchall()
print "Data Info: %s" % reading
Code for posting to web server:
import json
import urllib2
import requests
data = {
'ID' :1
'Name' :Bryan
'Class' :3A
}
req = urllib2.Request('http://abcd.com') //not the actual url
req.add_header('Content type', 'application/json')
response=urllib.urlopen(req, json.dumps(data))
I have referenced the codes from the following links:
Retrieve data from MySQL
Retrieve data from MySQL 2nd link
Post to web server
Post to web server 2nd link
Would appreciate any form of assistance.

You can use the connection as a library file,
File connection.py :
def db_connect(query):
import MySQLdb db = MySQLdb.connect("locahost", "root", "12345", "testdatabase")
curs=db.cursor()
curs.execute(query)
reading = curs.fetchall()
return reading
Main file: webserver.py
import json
import urllib2
import requests
import connection
mysql_data = connection.db_connect("SELECT * from mydata")
#data = <Your logic to convert string to json>
req = urllib2.Request('http://abcd.com') //not the actual url
req.add_header('Content type', 'application/json')
response=urllib.urlopen(req, json.dumps(data))
Method 2 you can also try sql alachemy which gives directly dict data out of sql query. You can use filters instead of direct sql query.
I recomend this way is better and you can go through the link "https://pythonspot.com/en/orm-with-sqlalchemy/"

def db_connect(query):
import MySQLdb db = MySQLdb.connect("locahost", "root", "12345", "testdatabase")
curs=db.cursor()
curs.execute(query)
reading = curs.fetchall()
return reading

not been able to send a GET request using python

I am having some functions in javascript that does the string manipulations and then my backend updates that data to database.
In short, I have a url for sending data to backend, then this data is send to javascript fucntion, js does manipulation and the using ajax request I send the manipulated string to backend, which updates my database.
I am using flask framework
Here is what I have written till now
#this url is where I send a GET request
#app.route('/api',methods=['GET'])
def text():
text = request.args.get('text','')
lang = request.args.get('lang','')
return render_template('test.html',text=text,lang=lang)
Now JS does the manipulation of the strings and sends a ajax GET request to following url
#app.route('/files/<text>',methods=['GET'])
def fi(text):
if request.method == 'GET':
textValue = text.encode("utf-8")
INSERT_DB = 'INSERT INTO text (text) VALUES (%s)'
db = connect_mysql()
cursor = db.cursor()
cursor.execute(INSERT_DB,[textValue])
db.commit()
cursor.close()
db.close()
return ''
Now I check if the data is save to database or not in following url
#app.route('/test',methods=['GET'])
def test():
if request.method == 'GET':
db_select_last = "SELECT text FROM text order by id DESC limit 1"
db = connect_mysql()
cursor = db.cursor()
cursor.execute(db_select_last)
data = cursor.fetchone()
cursor.close()
db.close()
return (data['text'])
But what I face problem is when I manually hit the url from browser it updates the data. but when I send a GET request from python it doesn't. Why is this so.
Here's how I send a GET request to that url
main_url is http://fyp-searchall.rhcloud.com/
>>> import requests
>>> url = 'http://fyp-searchall.rhcloud.com/api?text=some body&lang=marathi'
>>> r = requests.get(url)
But data does not update. Where I am doing wrong ?
I got to know that JS works only when you have browser, so what should I do now?

In requests you should pass query parameters as a dictionary, using the params keyword:
import requests
url = 'http://fyp-searchall.rhcloud.com/api'
params = {'text': 'some body', 'lang': 'marathi'}
r = requests.get(url, params=params)
See the Quickstart here: http://docs.python-requests.org/en/master/user/quickstart/

Querying Jira with Python and Rest

I want to pull a list of users in the jira-users group. as i understand it, it can be done with Python using restkit.
Does anyone have any examples or links that give an example of this?
thanks.

If somebody still need a solution, you can install JIRA rest api lib https://pypi.python.org/pypi/jira/.
Just a simple example for your question:
from jira.client import JIRA
jira_server = "http://yourjiraserver.com"
jira_user = "login"
jira_password = "pass"
jira_server = {'server': jira_server}
jira = JIRA(options=jira_server, basic_auth=(jira_user, jira_password))
group = jira.group_members("jira-users")
for users in group:
print users

Jira has a REST API for external queries, it's using HTTP protocol for request and responses and the response content is formed as JSON. So you can use python's urllib and json packages to run request and then parse results.
This is Atlassian's document for Jira REST API: http://docs.atlassian.com/jira/REST/latest/ and for example check the users API: http://docs.atlassian.com/jira/REST/latest/#id120322
Consider that you should do authentication before send your request, you can find necessary information in the document.

import urllib2, base64
import requests
import ssl
import json
import os
from pprint import pprint
import getpass
UserName = raw_input("Ener UserName: ")
pswd = getpass.getpass('Password:')
# Total number of users or licenses used in JIRA. REST api of jira can take values of 50 incremental
ListStartAt = [0,50,100,150,200,250,300]
counter = 0
for i in ListStartAt:
request = urllib2.Request("https://jiraserver.com/rest/api/2/group/member?groupname=GROUPNAME&startAt=%s" %i)
base64string = base64.encodestring('%s:%s' % (UserName, pswd)).replace('\n', '')
request.add_header("Authorization", "Basic %s" % base64string)
gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
result = urllib2.urlopen(request, context=gcontext)
JsonGroupdata = result.read()
jsonToPython = json.loads(JsonGroupdata)
try:
for i in range (0,50):
print jsonToPython["values"][i]["key"]
counter = counter+1
except Exception as e:
pass
print counter

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Advice on automating datamining with Python - python

Related

Extract score for an Application Hash using Virustotal and Python

Iinvalid subscription key when using Microsoft Emotion API for Video in API Testing Console and in Python 2.7

Python script to retrieve data from MySQL and post it to web server

not been able to send a GET request using python

Querying Jira with Python and Rest

Categories

Resources