Instaloader Timing Out - python

I've seen a bunch of issues related to this, but doesn't seem there is a strong fix. Basically, I'm trying to scrape all of the accounts an Instagram account follows, along with how many followers they have.
I have it now pushing to a spreadsheet, so that I can start to sort and analyze.
My only problem is the script keeps timing out. I've tried using time.sleep (off someone's recommendation) but it's only making the information load in slower & not fixing the issue.
Any suggestions? I could just be doing something entirely wrong - learning as I go.
import gspread
import instaloader
loader = instaloader.Instaloader()
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1cD8mX8tR2iSQgSmpk6QxBVVHYVV8VxGs2uhtA8iBSpQ')
worksheet = sh.sheet1
loader.login("vcf1948", "VCF1948!")
profile = instaloader.Profile.from_username(loader.context, "michelleobama")
followees = profile.get_followees()
for followee in profile.get_followees():
print('{} has {} followees'.format(followee.username, followee.followers))
AddValue = [followee.username, int(followee.followees)]
worksheet.append_row(AddValue)

Related

How to run a Python Script from a Google Sheet

I have a Python script that leverages lists in a Google Sheet and sends bulk SMS text messages using Twilio.
I'm fairly new to this and have struggled to get this far - any Python script I've created in the past, I've been able to just run off my local computer in VS Code.
I am trying to share this with a family member - I've read into tkinter and gui's a bit, but because the rest of this workflow is already in a Google Sheet, it would be perfect if any user could just run the Python script right from the spreadsheet itself.
I found this, but I don't really understand how to create a webservice in GAE. I've googled it all over but struggling to put into action -- Trigger python code from Google spreadsheets?
Is there a simple way to just tie my Python script into this spreadsheet so that anyone can run it? Or another way to go about this?
ChatGPT response says this, but I feel it is inaccurate (or I just can't get it working):
My code is here, if it helps:
from twilio.rest import Client
import gspread
# Your Account SID and Auth Token from twilio.com/console
account_sid = 'AC868ea4e1a779ff0816b466a13f201b02'
auth_token = '85469ae1eb492ffc814c095b5c6e0889'
client = Client(account_sid, auth_token)
gc = gspread.service_account(filename='creds.json')
# Open a spreadsheet by ID
sh = gc.open_by_key('1KRYITQ_O_-7exPZp8zj1VvAUPPutqtO4SrTgloCx8x4')
# Get the sheets
wk = sh.worksheet("Numbers to Send")
# E.G. the URLs are listed on Sheet 1 on Column A
numbers = wk.batch_get(('f3:f',))[0]
names = wk.batch_get(('g3:g',))[0]
# names = ['John', 'Jane', 'Jim']
# numbers = ['+16099725052', '+16099725052', '+16099725052']
# Loop through the names and numbers and send a text message to each phone number
for i in range(len(names)):
message = client.messages.create(
to=numbers[i],
from_='+18442251378',
body=f"Hello {names[i][0]}, this is a test message from Twilio.")
print(f"Message sent to {names[i]} at {numbers[i]}")

Web scraping with Anaconda and Python 3.65

I'm not a programmer, but I'm trying to teach myself Python so that I can pull data off various sites for projects that I'm working on. I'm using "Automate the Boring Stuff" and I'm having trouble getting the examples to work with one of the pages I'm trying to pull data from.
I'm using Anaconda as my prompt with Python 3.65. Here's what I've done:
Step 1: create the beautiful soup object
import requests, bs4
res = requests.get('https://www.almanac.com/weather/history/zipcode/02111/2017-05-15')
res.raise_for_status()
weatherTest = bs4.BeautifulSoup(res.text)
type(weatherTest)
This works, and returns the result
<class 'bs4.BeautifulSoup'>
I've made the assumption that the "noStarchSoup" that was in the original text (in place of weatherTest here) is a name the author gave to the object that I can rename to something more relevant to me. If that's not accurate, please let me know.
Step 2: pull an element out of the html
Here's where I get stuck. The author had just mentioned how to pull a page down into a file (which I would prefer not to do, I want to use the bs4 object), but then is using that file as his source for the html data. The exampleFile was his downloaded file.
import bs4
exampleFile = open('https://www.almanac.com/weather/history/zipcode/02111/2017-05-15')
I've tried using weatherTest in place of exampleFile, I've tried running the whole thing with the original object name (noStarchSoup), I've even tried it with exampleFile, even though I haven't downloaded the file.
What I get is
"OSError: [Errno 22] Invalid argument:
'https://www.almanac.com/weather/history/zipcode/02111/2017-05-15'
The next step is to tell it what element to pull but I'm trying to fix this error first and kind of spinning my wheels here.
Couldn't resist here!
I found this page during my search but this answer didn't quite help... try this code :)
Step 1: download Anaconda 3.0+
Step 2: (function)
# Import Libraries
import bs4
import requests
def import_high_short_tickers(market_type):
if market_type == 'NADAQ':
page = requests.get('https://www.highshortinterest.com/nasdaq/')
elif market_type == 'NYSE':
page = requests.get('https://www.highshortinterest.com/nyse/')
else:
logger.error("Invalid market_type: " + market_type)
return None
# Parse the HTML Page
soup = bs4.BeautifulSoup(page.content, 'html.parser')
# Grab only table elements
all_soup = soup.find_all('table')
# Get what you want from table elements!
for element in all_soup:
listing = str(element)
if 'https://finance.yahoo.com/' in listing:
# Stuff the results in a pandas data frame (if your not using these you should)
data = pd.read_html(listing)
return data
Yes Yes its very crude but don't hate!
Cheers!

How do I get the XML format of Bugzilla given a bug ID using python and XML-RPC?

This question has been updated
I am writing a python script using the python-bugzilla 1.1.0 pypi. I am able to get all the bug IDs but I want to know if there is a way for me to access each bug's XML page? Here is the code I have so far:
bz = bugzilla.Bugzilla(url='https://bugzilla.mycompany.com/xmlrpc.cgi')
try:
bz.login('name#email.com', 'password');
print'Authorization cookie received.'
except bugzilla.BugzillaError:
print(str(sys.exc_info()[1]))
sys.exit(1)
#getting all the bug ID's and displaying them
bugs = bz.query(bz.build_query(assigned_to="your-bugzilla-account"))
for bug in bugs:
print bug.id
I don't know how to access each bug's XML page and not sure if it is even possible to do so. Can anyone help me with this? Thanks.
bz.getbugs()
Will get all bugs, bz.getbugssimple is also worth a look.
#!/usr/bin/env python
import bugzilla
bz = bugzilla.Bugzilla(url='https://bugzilla.company.com/xmlrpc.cgi')
bz.login('username#company.com', 'password')
results = bz.query(bz.url_to_query(queryUrl))
bids = []
for b in results:
bids.append(b.id)
print bids

Trying to do batch update to Google spreadsheet using gdata python libraries

I have been trying to figure this out for a while now and just dont seem to be able to break through so hopefully someone out there has done this before.
My issue is that I am trying to do a batch update of a google spreadsheet using the gdata python client libraries and authenticating via oauth2. I have found an example of how to do the batch update using the gdata.spreadsheet.service module here: https://code.google.com/p/gdata-python-client/wiki/UsingBatchOperations
However that does not seem to work when authenticating via oauth2 and so I am having to use the gdata.spreadsheets.client module instead as discussed in this post: https://code.google.com/p/gdata-python-client/issues/detail?id=549
Using the gdata.spreadsheets.client module works for authentication and for updating the sheet however batch commands does not seem to work. Below is my latest variation of the code which is about the closest I have got. It seems to work but the sheet is not updated and the batch_status returned is: 'Insert not supported on batch.' (Note: I did try modifying the batch_operation and batch_id parameters of the CellEntries in the commented out code but this did not work either.)
Thanks for any help you can provide.
import gdata
import gdata.gauth
import gdata.service
import gdata.spreadsheets
import gdata.spreadsheets.client
import gdata.spreadsheets.data
token = gdata.gauth.OAuth2Token(client_id=Client_id,client_secret=Client_secret,scope=Scope,
access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN,
user_agent=User_agent)
client = gdata.spreadsheets.client.SpreadsheetsClient()
token.authorize(client)
range = "D6:D13"
cellq = gdata.spreadsheets.client.CellQuery(range=range, return_empty='true')
cells = client.GetCells(file_id, 'od6', q=cellq)
objData = gdata.spreadsheets.data
batch = objData.BuildBatchCellsUpdate(file_id, 'od6')
n = 1
for cell in cells.entry:
cell.cell.input_value = str(n)
batch.add_batch_entry(cell, cell.id.text, batch_id_string=cell.title.text, operation_string='update')
n = n + 1
client.batch(batch, force=True)

How do I store data from the Bloomberg API into a Pandas dataframe?

I recently started using Python so I could interact with the Bloomberg API, and I'm having some trouble storing the data into a Pandas dataframe (or a panel). I can get the output in the command prompt just fine, so that's not an issue.
A very similar question was asked here:
Pandas wrapper for Bloomberg api?
The referenced code in the accepted answer for that question is for the old API, however, and it doesn't work for the new open API. Apparently the user who asked the question was able to easily modify that code to work with the new API, but I'm used to having my hand held in R, and this is my first endeavor with Python.
Could some benevolent user show me how to get this data into Pandas? There is an example in the Python API (available here: http://www.openbloomberg.com/open-api/) called SimpleHistoryExample.py that I've been working with that I've included below. I believe I'll need to modify mostly around the 'while(True)' loop toward the end of the 'main()' function, but everything I've tried so far has had issues.
Thanks in advance, and I hope this can be of help to anyone using Pandas for finance.
# SimpleHistoryExample.py
import blpapi
from optparse import OptionParser
def parseCmdLine():
parser = OptionParser(description="Retrieve reference data.")
parser.add_option("-a",
"--ip",
dest="host",
help="server name or IP (default: %default)",
metavar="ipAddress",
default="localhost")
parser.add_option("-p",
dest="port",
type="int",
help="server port (default: %default)",
metavar="tcpPort",
default=8194)
(options, args) = parser.parse_args()
return options
def main():
options = parseCmdLine()
# Fill SessionOptions
sessionOptions = blpapi.SessionOptions()
sessionOptions.setServerHost(options.host)
sessionOptions.setServerPort(options.port)
print "Connecting to %s:%s" % (options.host, options.port)
# Create a Session
session = blpapi.Session(sessionOptions)
# Start a Session
if not session.start():
print "Failed to start session."
return
try:
# Open service to get historical data from
if not session.openService("//blp/refdata"):
print "Failed to open //blp/refdata"
return
# Obtain previously opened service
refDataService = session.getService("//blp/refdata")
# Create and fill the request for the historical data
request = refDataService.createRequest("HistoricalDataRequest")
request.getElement("securities").appendValue("IBM US Equity")
request.getElement("securities").appendValue("MSFT US Equity")
request.getElement("fields").appendValue("PX_LAST")
request.getElement("fields").appendValue("OPEN")
request.set("periodicityAdjustment", "ACTUAL")
request.set("periodicitySelection", "DAILY")
request.set("startDate", "20061227")
request.set("endDate", "20061231")
request.set("maxDataPoints", 100)
print "Sending Request:", request
# Send the request
session.sendRequest(request)
# Process received events
while(True):
# We provide timeout to give the chance for Ctrl+C handling:
ev = session.nextEvent(500)
for msg in ev:
print msg
if ev.eventType() == blpapi.Event.RESPONSE:
# Response completly received, so we could exit
break
finally:
# Stop the session
session.stop()
if __name__ == "__main__":
print "SimpleHistoryExample"
try:
main()
except KeyboardInterrupt:
print "Ctrl+C pressed. Stopping..."
I use tia (https://github.com/bpsmith/tia/blob/master/examples/datamgr.ipynb)
It already downloads data as a panda dataframe from bloomberg.
You can download history for multiple tickers in one single call and even download some bloombergs reference data (Central Bank date meetings, holidays for a certain country, etc)
And you just install it with pip.
This link is full of examples but to download historical data is as easy as:
import pandas as pd
import tia.bbg.datamgr as dm
mgr = dm.BbgDataManager()
sids = mgr['MSFT US EQUITY', 'IBM US EQUITY', 'CSCO US EQUITY']
df = sids.get_historical('PX_LAST', '1/1/2014', '11/12/2014')
and df is a pandas dataframe.
Hope it helps
You can also use pdblp for this (Disclaimer: I'm the author). There is a tutorial showing similar functionality available here https://matthewgilbert.github.io/pdblp/tutorial.html, the functionality could be achieved using something like
import pdblp
con = pdblp.BCon()
con.start()
con.bdh(['IBM US Equity', 'MSFT US Equity'], ['PX_LAST', 'OPEN'],
'20061227', '20061231', elms=[("periodicityAdjustment", "ACTUAL")])
I've just published this which might help
http://github.com/alex314159/blpapiwrapper
It's basically not very intuitive to unpack the message, but this is what works for me, where strData is a list of bloomberg fields, for instance ['PX_LAST','PX_OPEN']:
fieldDataArray = msg.getElement('securityData').getElement('fieldData')
size = fieldDataArray.numValues()
fieldDataList = [fieldDataArray.getValueAsElement(i) for i in range(0,size)]
outDates = [x.getElementAsDatetime('date') for x in fieldDataList]
output = pandas.DataFrame(index=outDates,columns=strData)
for strD in strData:
outData = [x.getElementAsFloat(strD) for x in fieldDataList]
output[strD] = outData
output.replace('#N/A History',pandas.np.nan,inplace=True)
output.index = output.index.to_datetime()
return output
I've been using pybbg to do this sort of stuff. You can get it here:
https://github.com/bpsmith/pybbg
Import the package and you can then do (this is in the source code, bbg.py file):
banner('ReferenceDataRequest: single security, single field, frame response')
req = ReferenceDataRequest('msft us equity', 'px_last', response_type='frame')
print req.execute().response
The advantages:
Easy to use; minimal boilerplate, and parses indices and dates for you.
It's blocking. Since you mention R, I assume you are using this in some type of an interactive environment, like IPython. So this is what you want , rather than having to mess around with callbacks.
It can also do historical (i.e. price series), intraday and bulk data request (no tick data yet).
Disadvantages:
Only works in Windows, as far as I know (you must have BB workstationg installed and running).
Following on the above, it depends on the 32 bit OLE api for Python. It only works with the 32 bit version - so you will need 32 bit python and 32 bit OLE bindings
There are some bugs. In my experience, when retrieving data for a number of instruments, it tends to hang IPython. Not sure what causes this.
Based on the last point, I would suggest that if you are getting large amounts of data, you retrieve and store these in an excel sheet (one instrument per sheet), and then import these. read_excel isn't efficient for doing this; you need to use the ExcelReader (?) object, and then iterate over the sheets. Otherwise, using read_excel will reopen the file each time you read a sheet; this can take ages.
Tia https://github.com/bpsmith/tia is the best I've found, and I've tried them all... It allows you to do:
import pandas as pd
import datetime
import tia.bbg.datamgr as dm
mgr = dm.BbgDataManager()
sids = mgr['BAC US EQUITY', 'JPM US EQUITY']
df = sids.get_historical(['BEST_PX_BPS_RATIO','BEST_ROE'],
datetime.date(2013,1,1),
datetime.date(2013,2,1),
BEST_FPERIOD_OVERRIDE="1GY",
non_trading_day_fill_option="ALL_CALENDAR_DAYS",
non_trading_day_fill_method="PREVIOUS_VALUE")
print df
#and you'll probably want to carry on with something like this
df1=df.unstack(level=0).reset_index()
df1.columns = ('ticker','field','date','value')
df1.pivot_table(index=['date','ticker'],values='value',columns='field')
df1.pivot_table(index=['date','field'],values='value',columns='ticker')
The caching is nice too.
Both https://github.com/alex314159/blpapiwrapper and https://github.com/kyuni22/pybbg do the basic job (thanks guys!) but have trouble with multiple securities/fields as well as overrides which you will inevitably need.
The one thing this https://github.com/kyuni22/pybbg has that tia doesn't have is bds(security, field).
A proper Bloomberg API for python now exists which does not use COM. It has all of the hooks to allow you to replicate the functionality of the Excel addin, with the obvious advantage of a proper programming language endpoint. The request and response objects are fairly poorly documented, and are quite obtuse. Still, the examples in the API are good, and some playing around using the inspect module and printing of response messages should get you up to speed. Sadly, the standard terminal licence only works on Windows. For *nix you will need a server licence (even more expensive). I have used it quite extensively.
https://www.bloomberg.com/professional/support/api-library/

Categories