What I'm trying to do is:
I'm trying to make a Telegram-bot that sends me a message when there's a new post updated on a specific web page.
I made the code and uploaded it on Heroku.
The bot is set to keep starting every 10 minutes using Heroku Scheduler so that it would detect any new post updated every 10 minutes.
Now the problem is:
The code is set to remember the latest post number and not to make any alarm if there's nothing updated between the previous bot run and the current run.
If the saved post number in the previous run matches the latest post number in the present run, the bot should not alarm me and keep doing the scheduled process (keep checking new posts every 10 minutes).
This is what I made to make this work
import os
latest_num = os.environ.get("POST_ID")
post_num = posts.find("td", {"class" : "no"}).text.strip()
if latest_num != post_num :
latest_num = post_num
os.environ["POST_ID"] = latest_num
I assume that if the latest post number from the previous run is saved as "POST_ID" through environment variables on Heroku, it should appear in the present run and be the value of latest_num when using os.environ to read "POST_ID" from the environment variable.
But the problem is, it seems like os.environ["POST_ID"] doesn't overwrite its value after the current run is done. Every time the Heroku scheduler starts the program, the 'latest_num' value is 0, the same as the default value of "POST_ID" on Heroku's settings.
So, even though there's no new post, the bot keeps sending me a message because 'latest_num' doesn't match 'post_num' all the time.
How can I fix this? Actually, I don't know whether setting environment variables through python code is possible or not. Please tell me if there's something better to make this work.
You can save those previous post id either in text file or in python variable. I prefer to save it in text file because if you restart or close your program then it forget all previous stuff but it isn't for text file.
Here's the code: At first while saving do this:
post_num = posts.find("td", {"class" : "no"}).text.strip()
with open("pre.txt","wb") as f:
f.write(post_num)
And now your scrape scrape second time then do this:
post_num = posts.find("td", {"class" : "no"}).text.strip()
with open("pre.txt","rb") as f:
pre_num=f.read()
if post_num!=pre_num:
#Do Something
with open("pre.txt","wb") as f:
f.write(post_num)
else:
#Do something
Related
I noticed I could not find Newport, Oregon in my django_cities_light django application. It is a small city with population slightly above 10k, so I downloaded cities1000.zip which contains cities with a population higher than 1k. I unzipped this file and started searching for Newport's id and indeed it is there:
5742750 Newport Newport N'juport,Newport (Oregon),Njuport,ONP,nyupoteu,nyupoto,nywbwrt,nywpwrt awrgn,Њупорт,Ньюпорт,Нюпорт,نيوبورت,نیوپورت، اورگن,نیوپورٹ، اوریگون,ニューポート,뉴포트 44.63678 -124.05345 P PPLA2 US OR ...
Now, I have in my myapp/settings/development.py the following:
CITIES_LIGHT_TRANSLATION_LANGUAGES = ['en']
CITIES_LIGHT_INCLUDE_CITY_TYPES = ['PPL', 'PPLA', 'PPLA2', 'PPLA3', 'PPLA4', 'PPLC', 'PPLF', 'PPLG', 'PPLL', 'PPLR', 'PPLS', 'STLMT',]
CITIES_LIGHT_APP_NAME = 'jobs'
CITIES_LIGHT_CITY_SOURCES = ['http://download.geonames.org/export/dump/cities1000.zip'] # <-- this added as part of this task
I added CITIES_LIGHT_CITY_SOURCES following this post and the information here.
I then tried to import from using the following command, which I understand downloads the cities1000 file specified in myapp.settings.development:
python manage.py cities_light --settings=myapp.settings.development --force-all --progress
Newport, Oregon, with id 5742750 is not found in my database. I also cannot see from the command that my settings file is used and that the value of CITIES_LIGHT_CITY_SOURCES is overridden properly.
Does anyone know what I'm doing wrong and how to properly add from the source files? Thx!
EDIT: I added DJANGO_SETTINGS_MODULE explicity as an env var, went into the cities_light directory in my virtual environment and added a print that checks the value of DJANGO_SETTINGS_MODULE. It points to my settings file. I also added a print that prints the value of CITIES_LIGHT_CITY_SOURCES and it works. I also went into the cities_light/data and saw that both cities1000.zip and cities1000.txt were there. I deleted them, ran the command again, and they were there again. Still no success in having Newport, Oregon in my database.
EDIT2: It seems that I either was doing something wrong (likely) or there is a bug in cities_light (less likely). If I started with a database previously populated, lets say with cities15000.zip, and then try to populate it like I have tried before, it wont work. If I start from a completely empty database, and then run as I mentioned, it works. One thing I did notice is that I made a script to manually insert the cities of cities1000.txt. The insert for Newport worked successfully, so I then went to check the database and now ALL the cities were there. I am not sure how this happened, maybe they had been there for a while and I just missed it.
EDIT 3: Important to not confuse id and geoname_id.
I'm working inside a system that has Jython2.5 but I need to be able to call some of Google's apis so I wrote an offline script that I wanted to call from my Jython environment and return to me small pieces of data. Like a JobID or a sheet URL or something from Google.
I've tried a number of things but I always get an error back from Windows, saying that it cannot find the file specified.
Path is done in two ways.
The first way using a string
stringPath = r"C:\GooglePipes\Scripts\filetobq.py C:\GooglePipes\Keys\DEV-BigQueryKey.json nofile C:\GooglePipes\BQ_Downtime\TESTFILE.CSV dataset1 table1"
And the second way, as a sequence (per the docs, using shell=false supply a sequence)
seqPath = [r"C:\GooglePipes\Scripts\filetobq.py",r"C:\GooglePipes\Keys\DEV-BigQueryKey.json","nofile",r"C:\GooglePipes\BQ_Downtime\TESTFILE.CSV","dataset1","table1"]
Called with
data, err = Popen(seqPath, shell=True, stderr=PIPE, stdout=PIPE).communicate()
#Read values back in
print data
print err
Replacing seqPath with stringPath to try it either way.
I've been at this all weekend, every time I run it I get from Windows
The system cannot find the path specified.
from the err print. I've been unable to debug much further than this. I'm not really sure what's happening. When I paste the stringPath variable directly into my computer's command window it executes.
I've also called subprocess.list2cmdline(seqPath) to see what it's outputting. It's giving me a ? in front of the string, but I haven't been able to figure out what that means. I can paste the rest of the string, starting after the question mark into the command window and it executes.
?C:\GooglePipes\Scripts\filetobq.py C:\GooglePipes...
I've tried a number of different combinations of true and false on shell, passing different args into Popen, double slashes, and I have no less than 30 tabs open from stack overflow and other help forums. I just have no idea what to do at this point and any help is appreciated.
Edit
The ? at the start of the sting is actually a NULL character when I did some additional logging. This seems to be the root of my problem. I can't figure out why it shows up, but it was present in my copy pastes. I started manually typing, and I got it working. When I feed the path with my Jython program it is present again.
Ultimately the error was the ?/NULL character.
I went back to the source value where the program was grabbing the path and it was present there. After I hand-re keyed it in, everything started working.
If you copy and paste what I put in the question, you can see the NULL character in the string if you run it through a string->ASCII converter.
>C:
>NULL 67 58
What a bunch of bullsh***.
I'm trying to create a hook script for subversion on windows, I have a bat file that calls my python script but getting the log/comments seems to be beyond me.
I have pysvn installed and can get the transaction like this:
repos_path = sys.argv[1]
transaction_name = sys.argv[2]
transaction = pysvn.Transaction( repos_path, transaction_name)
I can also list what has changed:
transaction.changed(0)
What I cannot figure out is how to get the log/comment for the transaction. I realize that in pysvn there is a command similar to:
transaction.propget(propname,path)
But cannot for the life of me get it to return anything. I assume propname should be "svn:log", for path I have tried the fiel name, the repo path, null but all get are errors.
AT the end of the day I need to validate the comment, there will be matching against external data that will evolve, hence why I want to do it in python rather than the bat file, plus it may move to a linux server later.
AM I missing something obvious? How do I get the log/comment as a string?
Thanks, Chris.
After a great deal of trial and error and better searching after a day of frustration I found that I need to use the revision property, not a straight property, for a given transaction this will return the user submitted comment:
transaction.revpropget("svn:log")
There are other useful properties, this will return a list of all revision properties:
transaction.revproplist()
for example:
{'svn:log': 'qqqqqqq', 'svn:txn-client-compat-version': '1.9.7', 'svn:txn-user-agent': 'SVN/1.9.7 (x64-microsoft-windows) TortoiseSVN-1.9.7.27907', 'svn:author': 'harry', 'svn:date': '2017-12-14T16:13:52.361605Z'}
I'm trying out snakebite. I started the following client:
from snakebite.client import Client
client = Client("my.host.com", 8020, effective_user='datascientist')
First, I tried to list the users directory:
for x in client.ls(['/user/datascientist']):
print x
This worked nicely and printed couple of dictionaries; one for each item in the directory. One of the items is a file foobar.txt which I'd like to see. To that end, I believe I should use Client.cat:
for cat in client.cat(['/user/datascientist/da-foobar.txt',]):
print(cat)
for item in cat:
print(item)
However, this didn't work. I got the following error message:
ConnectionFailureException: Failure to connect to data node at (10.XXX.YYY.ZZZ:50010)
What am I doing wrongly?
BTW: using PyWebHdfsClient from pywebhdfs.webhdfs I managed to see the file by starting a client with the same address but with port 50070. I don't know whether this is relevant or not.
Edit 1: I also tried to use snakebite.client.Client.text and got the same error. I guess this is not surprising.
BTW, the file's content is my file is this\ntest file.
I found a/the solution. It seems like the listing operation can be accomplished on the name-node alone. In contrast, the printing of the text file needs to access the data-nodes! By instantiating the client as follows
client = Client("stage-gap-namenode-2.srv.glispa.com", 8020, effective_user='datascientist',
use_datanode_hostname=True)
the cat operation works as it is not using the internal IP, but the hostname. I summarized a minimal example.
This is my first time using this so be kind :) basically my question is I am making a program that opens many Microsoft Word 2007 docs and reads from a certain table in that document and writes that info to an excel file there is well in excess of 1000 word docs. I have all of this working but the only problem when I run my code it does not close MSword after opening each doc I have to manually do this at the end of the program run by opening word and selecting exit word option from the Home menu. Another problem is also if a run this program consecutively on the second run everything goes to hell it prints the same thing repeatedly no matter which doc is selected I think this may have to do with how MSword is deciding which doc is active e.g. is it still opening the last active document that was not closed from the last run. Anyways here is my code for the opening and closing part I wont bore you guys with the rest::
MSWord = win32com.client.Dispatch("Word.Application")
MSWord.Visible = 0
# Open a specific file
#myWordDoc = tkFileDialog.askopenfilename()
MSWord.Documents.Open("C:\\Documents and Settings\\fdosier" + chosen_doc)
#Get the textual content
docText = MSWord.Documents[0].Content
charText = MSWord.Documents[0].Characters
# Get a list of tables
ListTables = MSWord.Documents[0].Tables
------Main Code---------
MSWord.Documents.Close
MSWord.Documents.Quit
del MSWord
Basically, Python is not VBA, so this:
MSWord.Documents.Close
is equivalent to:
getattr(MSWord.Documents, "Close")
i.e. you just get some method object and do nothing with it. You need to call the method with the call operator (the parentheses :) :
MSWord.Documents.Close()
Accordingly for .Quit.
Before your MSWord.Quit did you try using:
MSWord.ActiveWindow.Close
Or even more simpley just doing
MSWord.Quit
I dont really understand if you are trying to close a document or the application.
I think you need a MSWord.Quit at the end (before and/or instead of the the del)