I have a list of websites I need to extract certain values from in order to keep a local .txt file up-to-date. Since the websites need to be checked at different time intervals I would prefer not to use Windows Task Manager but instead have a single script running continuously in the background, extracting the information from each website at each specified frequency (so the frequency for each website would be an input parameter) and keep the file updated.
I know how to extract the information from the websites but I don't know how to schedule the checks on the websites in an automated fashion and have the script run continuously in the background. Knowing how to stop it would be useful too. (I have Anaconda Python installed on Windows 7)
What is an efficient way of coding that?
Thanks.
PS clarification: The script just needs to run as a background job once started and harvest some text from a number of predefined urls. So my questions are: a) How do I set it to run as a background job? A while loop? Something else? b) How do I make it return to a url to harvest the text at pre-specified intervals?
Given that it doesn't need to be a hidden process and that the Windows Task scheduler is unsuitable (as you need to pick different recurrences), it sounds like you just want a simple Python process that will call your function to extract the data on an irregular but predetermined basis.
This sounds a lot like apscheduler (https://pypi.python.org/pypi/APScheduler/) to me. I've used it a lot in Linux and it's worked like a charm for cron-like features. The package docs say it is Cross platform and so might fit the bill.
Related
I'm relatively new to Python so was wondering if anyone can give some hints or tips regarding something I'm wanting to do using Python whilst being run as part of a build on a Jenkins Pipeline.
To give a basic breakdown I'm wanting to export/save timestamps from the Jenkins Output, which current timestamps all commands/strings that happen within it, whilst it is running a build to either a .txt file or .csv file. These timestamp will be taken when specific commands/strings occur in the Jenkins output. I've given an example below for the Timestamp and Command being looked for.
"2021-08-17 11:46:38,899 - LOG: Successfully sent the test record"
I'd prefer just to send the timestamp itself, but if the full line needs to be sent then that would work as well, as the there is a lot of information generated in the console that isn't of interest for what I want to do.
My ultimate goal would be to do this for multiple different and unique commands/strings that occur in the Jenkins output. Along with this, some testing I’d be doing would involve running the same script over and over for a set number of loops, so I’d want the timestamp data to be saved into a singular output file (and not overwritten) or in separate output files for each loop.
Any hints or tips for this would be greatly appreciated as I’ve reached a dead end on what I can search up online involving the use of either the Logging function, using the wait_for_value function to find the required command/string in the console output and then save/print it to a created variable or seeing if Regex would be suitable for the task.
Thanks in advance for any help on this.
Context
I'm working on a Data Science Project in which I'm running a data analysis task on a dataset (let's call it original dataset) and creating a processed dataset (let's call this one result). The last one can be queried by a user by creating different plots through use of a Dash application. The system also makes some predictions on an attribute of this dataset thanks to ML models. Everything will work on an external VM of my company.
What is my current "code"
Currently I have these python scripts that create the result dataset (except the Dashboard one):
concat.py (simply concatenates some files)
merger.py (merges different files in the project directory)
processer1.py (processes the first file needed for the analysis)
processer2.py (processes a second file needed for the analysis)
Dashboard.py (the Dash application)
ML.py (runs a classic ML task, creates a report and an updated result dataset with some predictions)
What I should obtain
I'm interested in creating this kind of solution that will run the VM:
Dashboard.py runs 24/7 based on the existence of the "result" dataset, without it it's useless.
Every time there's a change in the project directory (new files every month are added), the system triggers the execution of concat.py, merger.py, processer1.py and processer2.py. Maybe a python script and the watchdog package can help to create this trigger mechanism? I'm not sure.
Once the execution above is done, the ML.py file is executed based on the "result" dataset and it's uploaded to the dashboard.
The Dashboard.py it's restarted with new csv file.
I would like to receive some help to understand what are the technologies necessary to get what I would like. Something like an example or maybe a source, so I can fully understand and apply what is right. I know that maybe I have to use a python script to orchestrate the whole system, maybe the same script that observes the directory or maybe not.
The most important thing is that the dashboard operates always. This is what creates the need of running things simultaneously. Just when the "result" csv dataset is completed and uploaded it is necessary to restart it, I think that for the users is best to keep the service continuity.
The users will feed the dashboard with new files in the observed directory. It's necessary to create automation by using "triggers" to execute the code, since they are not skilled users and they will not be allowed to use the VM bash (I suppose). Maybe I could think about creating a repetitive execution instead, like every month.
Company won't let me grant another VM or similar if it's needed, so I should do it just with a single VM.
Premise
This is the first time that I have to get "in production" something, and I have no experience at all. Could anyone help me to get the best approach? Thanks in advance.
The website Download the GLEIF Golden Copy and Delta Files.
has buttons that download data that I want to retrieve automatically with a python script. Usually when I want to download a file, I use mget or similar, but that will not work here (at least I don't think it will).
For some reason I cannot fathom, the producers of the data seem to want to force one to manually download the files. I really need to automate this to reduce the number of steps for my users (and frankly for me), since there are a great many files in addition to these and I want to automate as many as possible (all of them).
So my question is this - is there some kind of python package for doing this sort of thing? If not a python package, is there perhaps some other tool that is useful for it? I have to believe this is a common annoyance.
Yup, you can use BeautifulSoup to scrape the URLs then download them with requests.
I want to automate the entire process of creating ngs,bit and mcs files in xilinx and have these files be automatically be associated with certain folders in the svn repository. What I need to know is that is there a log file that gets created in the back end of the Xilinx gui which records all the commands I run e.g open project,load file,synthesize etc.
Also the other part that I have not been able to find is a log file that records the entire process of synthesis, map,place and route and generate programming file. Specially record any errors that the tool encountered during these processes.
If any of you can point me to such files if they exist it would be great. I haven't gotten much out of my search but maybe I didn't look enough.
Thanks!
Well, it is definitely a nice project idea but a good amount of work. There's always a reason why an IDE was built – a simple search yields the "Command Line Tools User Guide" for various versions of Xilinx ISE, like for 14.3, 380 pages about
Overview and list of features
Input and output files
Command line syntax and options
Report and message information
ISE is a GUI for various command line executables, most of them are located in the subfolder 14.5/ISE_DS/ISE/bin/lin/ (in this case: Linux executables for version 14.5) of your ISE installation root. You can review your current parameters for each action by right clicking the item in the process tree and selecting "Process properties".
On the Python side, consider using the subprocess module:
The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.
Is this the entry point you were looking for?
As phineas said, what you are trying to do is quite an undertaking.
I've been there done that, and there are countless challenges along the way. For example, if you want to move generated files to specific folders, how do you classify these files in order to figure out which files are which? I've created a project called X-MimeTypes that attempts to classify the files, but you then need a tool to parse the EDA mime type database and use that to determine which files are which.
However there is hope, so to answer the two main questions you've pointed out:
To be able to automatically move generated files to predetermined paths. From what you are saying it seems like you want to do this to make the versioning process easier? There is already a tool that does this for you based on "design structures" that you create and that can be shared within a team. The tool is called Scineric Workspace so check it out. It also have built in Git and SVN support which ignores things according to the design structure and in most cases it filters all generated things by vendor tools without you having to worry about it.
You are looking for a log file that shows all commands that were run. As phineas said, you can check out the Command Line Tools User guides for ISE, but be aware that the commands to run have changed again in Vivado. The log file of each process also usually states the exact command with its parameters that have been called. This should be close to the top of the report. If you look for one log file that contains everything, that does not exist. Again, Scineric Workspace supports evoking flows from major vendors (ISE, Vivado, Quartus) and it produces one log file for all processes together while still allowing each process to also create its own log file. Errors, warning etc. are also marked properly in this big report. Scineric has a tcl shell mode as well, so your python tool can run it in the background and parse the complete log file it creates.
If you have more questions on the above, I will be happy to help.
Hope this helps,
Jaco
I have a Python program that collects data. I have tested it many times before, but today it decided that it will not save the data. Also, unfortunately, I decided to run my program using pythonw.exe so there is no terminal to see what the errors are.
I can see that it still has the data saved to the memory because it is displayed on a plot and I can still manipulate the data using my program.
I want to know if there is a way to access the data my program collected externally or some way to read it.
I know that it is unlikely I will be able to recover my data, but it is worth a shot.
(Also, I am using Python 2.7 with PyQT4 as a GUI interface.)
You should be able to attach to your running process and examine variables using http://winpdb.org/