Web Scraping 24/7 without Ip-spoofing (theoretical) - Python - python

Now before I ask the actual question I want to mention that IP-Spoofing is illegal in a number of Countries so please do not attempt to use any code provided in here or in the answers bellow. Moreover I am not confident myself that I know exactly what is considered legal and what not in the subject I'm about to open therefor I want to state that every piece of code provided here has not and will not be actually tested. Everything just should 'theoretically work'!
Having mentioned that, a friend of mine asked me if I could make a bot that would monitor a market 24/7 and shall it find an item in the desired price (or lower) it will automatically buy it. Now the only problem in theory that I think my code would have is in this line here:
request(url, some_var_name, timeout=1)
#think that timeout will need to be at least 2 realistically but anyway!
This runs in a loop that's supposed to run 24/7 sending 1 request per second (optimally) to the server. After a while the most likely outcome is that this will be viewed as a DDoS attack and to prevent it the IP will be blocked so we wont be able to get the prices from the request anymore. An easy solution would be Proxies, just have a list of them bought from somewhere or even get some for free possibly infecting your own computer with something or getting all your requests monitored by someone but yeah.. That's one (illegal I think) way to achieve that.. Now questions I have are:
Is there a way to achieve the same result without proxies? And is there a way to make the whole thing legal so I can actually test it out on a real market?
Thanks in advance y'all, if at the end I find out that there's nothing to be done here at the very least it was a fun challenge to take on!

Related

Is it possible to write a program that checks if a pinned comment is present under a Youtube video every certain period of time? (using python?)

I'm not asking for someone to write out specific codes for me, but rather just point me towards the general direction.
Sometimes after writing and pinning a comment under a Youtube video, it disappears!
It may happen hours later, it may happen a few minutes later, all randomly. Then re-appears hours later, not sure why.
Since I am currently learning programming (python), I might as well try to tackle the problem myself. I want to make a program that regularly checks if the pinned comment is still present, and if not, send me an alert in whatever way so I can go write and pin a new comment for the time being. (and maybe learn a few things during the process)
So far I learned some basic stuff like how to get source code from a webpage, output it to somewhere like a txt file, and also searching with Regex. However when I check out the source code of a Youtube video, the comments text aren't there. Where do I begin learning about the necessary things needed to make this program work?

Try Except is just executing all lines simultaneously (in pyolite kernel with `input()`)

I'm "learning" Python in a course on Coursera, but this lesson doesn't have any explanation and doesn't seem to work at all. As I'm so new, I have no idea what is even going wrong in order to fix it or even glean how it should work from the lesson.
a = 1
try:
b = int(input("Please enter a number to divide a"))
a = a/b
except ZeroDivisionError:
print("The number you provided cant divide 1 because it is 0")
except ValueError:
print("You did not provide a number")
except:
print("Something went wrong")
else:
print("success a=",a)
finally:
print("Processing Complete")
The lesson purports that this should output "success a=",a but instead it's just vomiting out
Something went wrong / Processing Complete / Please enter a number to divide a [input box] and even when a number (let's say 2) is entered to the input box, it just terminates like Something went wrong / Processing Complete / Please enter a number to divide a 2 never actually printing the success. I've been over the lesson multiple times now and there's nothing to indicate why it's not working. Since Coursera has no way to contact instructors, I have no recourse except to outsource their job to here. (Sorry)
EDIT: To clarify, this is a course lab that launches a Jupyter notebook with the code already populated, and we execute the code to see how it works. It's running Pyolite kernel. It has nothing to do with me not copying it correctly as it is pre-populated in the notebook. I do not know why it is not working, but it's frustrating because I don't know what correct is supposed to look like, or if this IS correct, why it's not working in the notebook that I simply launch and execute.
It's not you. Your course shouldn't be using Jupyterlite with the pyolite kernel at this time. That is very cutting-edge to the point of being described as 'experimental' here. (Or, if the course administrators do so intentionally, they should make user's such as yourself very aware of the associated issues. As you are finding it is very frustrating & discouraging to be provided with something that is flawed when you are trying to learn.) It's well documented that input() doesn't work right there along with a host of other things, see here, here, here, here and links & comments therein. This will probably be a temporary situation as things are in very active development in the WASM/pyolite world.
As others have pointed out, you should provide more information from the get-go. You say it is a course that launches a pre-populated notebook. Provide the link so that we can use it without copying your code. Also with the URL in hand, I could probably tell you whether the people administering your course simply haven't perhaps noticed that the system now uses by default the pyolite kernel or whether they purposefully chose it. I have a feeling it is the former because there was an update to how the Try Jupyter page works. In other words, I suspect the course administrators probably didn't intentionally send you to some place using this kernel to run this code without providing guidance. The Try Jupyter page was switched to JupyterLite which uses the pyolite kernel, that actually runs all within the client's browser via web assembly, to try and reduce the traffic hitting the costly full server-backed system previously used there. However, as you found there's a few glitches. The change is relatively recent and the course instructors and moderators may simply not be aware of this. There's easy ways to offer other options. One is to send users to a site using a full Jupyter kernel backing the notebook. One such option illustrated:
Go to this GitHub page and click on launch binder badge you see there. A temporary Jupyter session will spin up.
Open a new notebook and paste in the code block you provided in your post. Run that cell.
You should see things work as expected.
Now in the future you can try things where they sent you and then when you encounter something that seems 'off' follow the same process to see if it is one of these glitches by spinning up another temporary session.
Unless something has dramatically changed since I took several Coursera courses, there are ways to contact course moderators/ administrators / mentors (who may or may not be able to directly contact the actual instructors). Those are who you would usually go to with something like this. There are usually course discussion boards. However, in this case, if what I suspect is happening is happening, the person or persons contacted would have ended up here or at the Jupyter Discourse forum inquiring as this is relatively new and has come up with other instructors not realizing what was occurring.

my API's parsing data with beautifulSoup4 and python is too slow

I have made the API parsing the GITHUB contribution data of each account and arranging by month, week or day and decorating with JSON.
responding to just one request takes approximately 2 second. (1800ms)
Link to my GitHub repository.
contributions.py in repository is the python code that does the above things.
THE POINT OF QUESTION : What makes my API slow?
just too many data to parse (about 365)?
the way api make the JSON String?
Thank you for answering and helping me in advance.
"Why is my code slow?" is a really hard question to answer. There's basically an unlimited number of possible reasons that could be. I may not be able to answer the question, but I can provide some suggestions to hopefully help you answer it for yourself.
There are dozens of questions to ask... What kind of hardware are you using? What kind of network/internet connection do you have? Is it just slow on the first request, or all requests? Is it just slow on the call to one type of request (daily, weekly, monthly) or all? etc. etc.
You are indicating overall request times being ~1800ms, but as you pointed out, there are a lot of things happening during the processing of that request. In my experience, often times one of the best ways to find out is to add some timing code to narrow down the scope of the slowness.
For example, one quick and dirty way to do this is to use the python time module. I quickly added some code to the weekly contributions method:
import time
# [...]
#app.route("/contributions/weekly/<uname>")
def contributionsWeekly(uname):
before = time.time()
rects = getContributionsElement(uname)
after = time.time()
timeToGetContribs = after - before
# [...]
print(' timeToGetContribs: ' + str(timeToGetContribs))
print('timeToIterateRects: ' + str(timeToIterateRects))
print(' timeToBuildJson: ' + str(timeToBuildJson))
Running this code locally produced the following results:
timeToGetContribs: 0.8678717613220215
timeToIterateRects: 0.011543750762939453
timeToBuildJson: 1.5020370483398438e-05
(Note the e-05 on the end of the last time... very tiny amount of time).
From these results, we know that the time to get the contributions is taking the majority of the full request. Now we can drill down into that method to try to further isolate the most time consuming part. The next set of results shows:
timeToOpenUrl: 0.5734567642211914
timeToInstantiateSoup: 0.3690469264984131
timeToFindRects: 0.0023255348205566406
From this it appears that the majority of the time is spent actually opening the URL and retrieving the HTML (meaning that network latency, internet connection speed, GitHub server response time, etc are the likely suspects). The next heaviest is the time it actually takes to instantiate the BeautifulSoup parser.
Take all of these concrete numbers with a grain of salt. These are on my hardware (12 year old PC) and my local internet connection. On your system, the numbers will likely vary, and may even be significantly different. The point is, the best way to track down slowness is to go through some basic troubleshooting steps to identify where the slowness is occurring. After you've identified the problem area(s), you can likely search for more specific answers, or ask more targeted questions.

Invoke CTRL+V to an open window

I have something in my clipboard, and I'd like to run a python script that invokes CTRL+V as if it was pressed on the keyboard, and pastes the clipboard's content to the current focused window (say chrome). Any idea how to do that?
You have an X-Y problem.
What you want to accomplish is programmatically take data from one program (where you hit cntrl-V) and place it into another arbitrary program (chrome).
There are two ways to do that:
First
You can either set the programs up to have a data exchange mechanism such as a system pipe, or a network connection. This requires some API for data exchange to be already included in the program or access to the source so you might add one. There are very specific channels for cross program data exchange and you wont do well to try to circumvent them. Program A cant just say
get_program_b().get_text_box().add(clip_board);
That would be a violation of Process Isolaton and an OS like windows is written expressly to make it impossible. Some programs are designed to take input from other programs.
popen.open('mysql -e "ISNERT INTO table (a) VALUES ('4')")
Chrome is not one of those programs, chrome avoids allowing programs from doing this because it would be a target for programs to do things like, get the saved password or credit card data out of chrome. Or use save password to login to someone account and buy things in someone elses name.
Second
You could try to spoof user input and input the data exactly like a user would so chrome wont know the difference. But spoofing a user is hard to do and intentionally so because it prevents malicious scripts from taking control of a computer and doing bad things. The makers of windows are accutely aware that spoofing input is a method to circumvent allowed data exchange channels. So the makers of windows made it hard to do. You need to have access to a lot of system assets that most programs wont be given. At a minimum a program has to run as admin on windows to accomplish this, then there are libs that will let you do it. Even then Im willing to bet there are easier way to get the job done. On a machine where you have access to anything and everything it is possible. If you don't have admin access, it should be downright impossible without knowing some unpatched exploit in the system.
Therefore
What you are trying to do goes against what the computer was designed to let you do. If we had more information on what you want to accomplish maybe some of the wonderful people here could help. Getting to the end result you want shouldnt be that hard. But you way of doing it is like trying to run across the ocean, when you just need a boat. As it is my answer is -- dont do it, that's not how windows was designed to work.

Python Login System that supports multiple logins

This is a project that I wanted to work on. I want a program that people can enter their login details and their username and password information will be sent to another program that will check to see if the login details are right and then will tell the other program whether they can login or not.
The problem is I have no idea how I will make the programs send data to each other. If anyone could help me, maybe introduce me to some new modules that could help me, I would be very happy.
Thanks,
Dan
The program that you want to send the data to confirm if it'll work has to be able to allow such method of confirmation, else, it wouldn't work. If it does, it may be that you would have to supply such to the program from the command line.
You could try out https://pypi.python.org/pypi/EasyProcess
If you're talking about a sort of client-server setup here, you could look into using sockets in order to facilitate the communications between the client and server software. However, I would advise that you keep in mind that sending information such as passwords over the net will probably require that they be secured as well.
I'm sorry if I'm a little unhelpful with your project, but the vague nature of your description was such that I can only really point you towards the modules you may want to look into for accomplishing the key aspect of your task.

Categories