I have a programme running on an old laptop which is constantly monitoring a Dropbox folder for new files being added. When it's running the Python process uses close to 50% of the CPU on a dual-core machine, and about 12% on an 8-core machine which suggests it's using close to 100% of one core). This is giving off a lot of heat.
The relevant bit of code is:
while True:
files = dict ([(f, None) for f in os.listdir(path_to_watch)])
if len(files) > 0:
print "You have %s new file/s!" % len(files)
time.sleep(20)
In the case that there is no new file, surely most of the time should be spent in the time.sleep() waiting which I wouldn't have thought would be CPU-intensive - and the answers here seem to say it shouldn't be.
So two questions:
1) Since time.sleep() shouldn't be so CPU-intensive, what is going on here?
2) Is there another way of monitoring a folder for changes which would run cooler?
1) Your sleep only gets called when there are new files.
This should be much better:
while True:
files = dict ([(f, None) for f in os.listdir(path_to_watch)])
if len(files) > 0:
print "You have %s new file/s!" % len(files)
time.sleep(20)
2) Yes, especially if using linux. Gamin would be something I'd recommend looking into.
Example:
import gamin
import time
mydir = /path/to/watch
def callback(path, event):
global mydir
try:
if event == gamin.GAMCreated:
print "New file detected: %s" % (path)
fullname = mydir + "/" + path
print "Goint to read",fullname
data = open(fullname).read()
print "Going to upload",fullname
rez = upload_file(data,path)
print "Response from uploading was",rez
except Exception,e: #Not good practice
print e
import pdb
pdb.set_trace()
mon = gamin.WatchMonitor()
mon.watch_directory(mydir, callback)
time.sleep(1)
while True:
ret = mon.handle_one_event()
mon.stop_watch(mydir)
del mon
There is also a cross platform API to monitor file system changes: Watchdog
Related
I'm currently working on a script for my sensor on my Raspberry Pi. The code underneath should get the values of my sensor and write it into a the data.json file. My problem is, if I run the scipt with my the Thonny editor everything works but if I add the script to my crontab menu the data does not get written to the data.json file.
The Code:
import time
import board
import adafruit_dht
import psutil
import io
import json
import os
from gpiozero import LED
from datetime import date
from datetime import datetime
# We first check if a libgpiod process is running. If yes, we kill it!
for proc in psutil.process_iter():
if proc.name() == "libgpiod_pulsein" or proc.name() == "libgpiod_pulsei":
proc.kill()
sensor = adafruit_dht.DHT11(board.D23)
# init
temp_values = [10]
hum_values = [10]
counter = 0
dataLED = LED(13)
dataList = []
def errSignal():
for i in range(0,3):
dataLED.on()
time.sleep(0.1)
dataLED.off()
time.sleep(0.1)
#on startup
def runSignal():
for i in range(0,5):
dataLED.on()
time.sleep(0.2)
dataLED.off()
time.sleep(0.2)
def getExistingData():
with open('data.json') as fp:
dataList = json.load(fp)
print(dataList)
def startupCheck():
if os.path.isfile("data.json") and os.access("data.json", os.R_OK):
# checks if file exists
print("File exists and is readable.")
# get json data an push into arr on startup
getExistingData()
else:
print("Either file is missing or is not readable, creating file...")
# create json file
with open("data.json", "w") as f:
print("The json file is created.")#
def calc_avgValue(values):
sum = 0
for iterator in values:
sum += iterator
return sum / len(values)
def onOFF():
dataLED.on()
time.sleep(0.7)
dataLED.off()
# data led blinking on startup
runSignal()
# checks if file exists
startupCheck()
while True:
try:
temp_values.insert(counter, sensor.temperature)
hum_values.insert(counter, sensor.humidity)
counter += 1
time.sleep(6)
if counter >= 10:
print(
"Temperature: {}*C Humidity: {}% ".format(
round(calc_avgValue(temp_values), 2),
round(calc_avgValue(hum_values), 2)
)
)
# get time
today = date.today()
now = datetime.now()
# create json obj
data = {
"temperature": round(calc_avgValue(temp_values), 2),
"humidity": round(calc_avgValue(hum_values), 2),
"fullDate": str(today),
"fullDate2": str(today.strftime("%d/%m/%Y")),
"fullDate3": str(today.strftime("%B %d, %Y")),
"fullDate4": str(today.strftime("%b-%d-%Y")),
"date_time": str(now.strftime("%d/%m/%Y %H:%M:%S"))
}
# push data into list
dataList.append(data)
# writing to data.json
with open("data.json", "w") as f:
json.dump(dataList, f, indent=4, separators=(',',': '))
# if data is written signal appears
onOFF()
print("Data has been written to data.json...")
counter = 0
except RuntimeError as error:
continue
except Exception as error:
sensor.exit()
while True:
errSignal()
raise error
time.sleep(0.2)
Crontab Menu:
The line in the center is the script.
Investigation areas:
Do not put & in crontab, it serves no purpose.
You should capture the output of your scripts to see what is going on. You do this by adding >/tmp/stats.out 2>/tmp/stats.err (and similar for the other 2 lines). You will see what output and errors your scripts encounter.
cron does not run your scripts in the same environment, and from the same directory you are running them. Load what you require in the script.
cron might not have permissions to write into data.yml in the directory it is running from. Specify a full path, and ensure cron can write in that directory.
Look at https://unix.stackexchange.com/questions/109804/crontabs-reboot-only-works-for-root for usage of #reboot. Things that should occur at startup should be configured through systemd or init.d (I do not know what Rasperry Pie uses vs distro). Cron is to schedule jobs, not run things at startup.
It could be as simple as not having python3 in the PATH configured in cron.
I wrote a kernel module that writes in /proc/mydev to notify the python program in userspace. I want to trigger a function in the python program whenever there is an update of data in /proc/mydev from the kernel module. What is the best way to listen for an update here? I am thinking about using "watchdog" (https://pythonhosted.org/watchdog/). Is there a better way for this?
This is an easy and efficient way:
import os
from time import sleep
from datetime import datetime
def myfuction(_time):
print("file modified, time: "+datetime.fromtimestamp(_time).strftime("%H:%M:%S"))
if __name__ == "__main__":
_time = 0
while True:
last_modified_time = os.stat("/proc/mydev").st_mtime
if last_modified_time > _time:
myfuction(last_modified_time)
_time = last_modified_time
sleep(1) # prevent high cpu usage
result:
file modified, time: 11:44:09
file modified, time: 11:46:15
file modified, time: 11:46:24
The while loop guarantees that the program keeps listening to changes forever.
You can set the interval by changing the sleep time. Low sleep time causes high CPU usage.
import time
import os
# get the file descriptor for the proc file
fd = os.open("/proc/mydev", os.O_RDONLY)
# create a polling object to monitor the file for updates
poller = select.poll()
poller.register(fd, select.POLLIN)
# create a loop to monitor the file for updates
while True:
events = poller.poll(10000)
if len(events) > 0:
# read the contents of the file if updated
print(os.read(fd, 1024))
sudo pip install inotify
Example
Code for monitoring a simple, flat path (see “Recursive Watching” for watching a hierarchical structure):
import inotify.adapters
def _main():
i = inotify.adapters.Inotify()
i.add_watch('/tmp')
with open('/tmp/test_file', 'w'):
pass
for event in i.event_gen(yield_nones=False):
(_, type_names, path, filename) = event
print("PATH=[{}] FILENAME=[{}] EVENT_TYPES={}".format(
path, filename, type_names))
if __name__ == '__main__':
_main()
Expected output:
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_MODIFY']
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_OPEN']
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_CLOSE_WRITE']
I'm not sure if this would work for your situation, since it seems that you're wanting to watch a folder, but this program watches a file at a time until the main() loop repeats:
import os
import time
def main():
contents = os.listdir("/proc/mydev")
for file in contents:
f = open("/proc/mydev/" + file, "r")
init = f.read()
f.close()
while different = false:
f = open("/proc/mydev/" + file, "r")
check = f.read()
f.close()
if init !== check:
different = true
else:
different = false
time.sleep(1)
main()
# Write what you would want to happen if a change occured here...
main()
main()
main()
You could then write what you would want to happen right before the last usage of main(), as it would then repeat.
Also, this may contain errors, since I rushed this.
Hope this at least helps!
You can't do this efficiently without modifying your kernel driver.
Instead of using procfs, have it register a new character device under /dev, and write that driver to make new content available to read from that device only when new content has in fact come in from the underlying hardware, such that the application layer can issue a blocking read and have it return only when new content exists.
A good example to work from (which also has plenty of native Python clients) is the evdev devices in the input core.
I have service that as the topic says dies and leave a stale PID behind. This particular service logs every minute. So basically I want to create a python script to check the logfile based on modified time, if file has not updated in say 3 minutes then restart service.
I am new to scripting / programming so I need help with the logic here please...
this is what I have so far to do the check on the file age.
#!/usr/bin/python
import os, datetime, time
from os import path
from datetime import datetime, timedelta
#file = "/var/log/file.log"
def check_file(seconds, file_name):
try:
time_diff = datetime.now() - timedelta(seconds)
file_time = datetime.fromtimestamp(path.getctime(file_name))
if file_time < time_diff:
return [True, "File: %s. Older than: %s, file_time:%s , date_diff:%s " % (file_name, seconds, file_time, time_diff)]
else:
return [True, "File: %s. Older than: %s, file_time:%s , date_diff:%s " % (file_name, seconds, file_time, time_diff)]
except Exception, e:
return [False, e]
Before writing custom code to handle this problem (and now having 2 problems), I'd look at:
fixing the service itself so that it doesn't die,
adding monitoring to that service at the box level (see this
for example).
If these options are not practical, I would start with a bash based solution over Python.
In either case, you'll need to make sure that the restarting script doesn't die either.
I'm trying to use python to detect mouse and keyboard event, and tolerant the hot-plug action during the detection. I write this script to automatically detect the keyboard and mouse plug-ins in run-time and output all the keyboard and mouse events. I use evdev and pyudev packages to realize this function. I have my scripts mostly working, including keyboard and mouse event detection and plug-in detection. However, whenever I plug-out the mouse, many weird things happen and my script could not work properly. I have several confusions here.
(1) Whenever the mouse is plugged into the system, there are two files generated in /dev/input/ folder, including ./mouseX and ./eventX. I try to cat to see the output from both source and there are indeed differences, but I do not understand why linux will have ./mouseX even if ./eventX already exists?
(2) Whenever I unplug my mouse, the ./mouseX unplug event comes first, which I did not use in evdev, and this leads to the failure of the script because ./eventX(where I read the data in the script) is unplugged simultaneously but I could only detect ./eventX in the next round. I use a trick(variable i in my script) to bypass this issue, but even though I could successfully delete the mouse device, the select.select() begins endless input reading even though I did not type anything to the keyboard.
The script is listed below(modified based on answers from previous post), thanks beforehand for your attention!
#!/usr/bin/env python
import pyudev
from evdev import InputDevice, list_devices, categorize
from select import select
context = pyudev.Context()
monitor = pyudev.Monitor.from_netlink(context)
monitor.filter_by(subsystem='input')
monitor.start()
devices = map(InputDevice, list_devices())
dev_paths = []
finalizers = []
for dev in devices:
if "keyboard" in dev.name.lower():
dev_paths.append(dev.fn)
elif "mouse" in dev.name.lower():
dev_paths.append(dev.fn)
devices = map(InputDevice, dev_paths)
devices = {dev.fd : dev for dev in devices}
devices[monitor.fileno()] = monitor
count = 1
while True:
r, w, x = select(devices, [], [])
if monitor.fileno() in r:
r.remove(monitor.fileno())
for udev in iter(functools.partial(monitor.poll, 0), None):
# we're only interested in devices that have a device node
# (e.g. /dev/input/eventX)
if not udev.device_node:
break
# find the device we're interested in and add it to fds
for name in (i['NAME'] for i in udev.ancestors if 'NAME' in i):
# I used a virtual input device for this test - you
# should adapt this to your needs
if 'mouse' in name.lower() and 'event' in udev.device_node:
if udev.action == 'add':
print('Device added: %s' % udev)
dev = InputDevice(udev.device_node)
devices[dev.fd] = dev
break
if udev.action == 'remove':
print('Device removed: %s' % udev)
finalizers.append(udev.device_node)
break
for path in finalizers:
for dev in devices.keys():
if dev != monitor.fileno() and devices[dev].fn == path:
print "delete the device from list"
del devices[dev]
for i in r:
if i in devices.keys() and count != 0:
count = -1
for event in devices[i].read():
count = count + 1
print(categorize(event))
The difference between mouseX and eventX is, generally speaking, is eventX is the evdev device, whereas mouseX is the "traditional" device (which, for example, doesn't support various evdev ioctls.)
I don't know what's wrong with the code you posted, but here is a code snippet which does the right thing.
#!/usr/bin/env python
import pyudev
import evdev
import select
import sys
import functools
import errno
context = pyudev.Context()
monitor = pyudev.Monitor.from_netlink(context)
monitor.filter_by(subsystem='input')
# NB: Start monitoring BEFORE we query evdev initially, so that if
# there is a plugin after we evdev.list_devices() we'll pick it up
monitor.start()
# Modify this predicate function for whatever you want to match against
def pred(d):
return "keyboard" in d.name.lower() or "mouse" in d.name.lower()
# Populate the "active devices" map, mapping from /dev/input/eventXX to
# InputDevice
devices = {}
for d in map(evdev.InputDevice, evdev.list_devices()):
if pred(d):
print d
devices[d.fn] = d
# "Special" monitor device
devices['monitor'] = monitor
while True:
rs, _, _ = select.select(devices.values(), [], [])
# Unconditionally ping monitor; if this is spurious this
# will no-op because we pass a zero timeout. Note that
# it takes some time for udev events to get to us.
for udev in iter(functools.partial(monitor.poll, 0), None):
if not udev.device_node: break
if udev.action == 'add':
if udev.device_node not in devices:
print "Device added: %s" % udev
try:
devices[udev.device_node] = evdev.InputDevice(udev.device_node)
except IOError, e:
# udev reports MORE devices than are accessible from
# evdev; a simple way to check is see if the devinfo
# ioctl fails
if e.errno != errno.ENOTTY: raise
pass
elif udev.action == 'remove':
# NB: This code path isn't exercised very frequently,
# because select() will trigger a read immediately when file
# descriptor goes away, whereas the udev event takes some
# time to propagate to us.
if udev.device_node in devices:
print "Device removed (udev): %s" % devices[udev.device_node]
del devices[udev.device_node]
for r in rs:
# You can't read from a monitor
if r.fileno() == monitor.fileno(): continue
if r.fn not in devices: continue
# Select will immediately return an fd for read if it will
# ENODEV. So be sure to handle that.
try:
for event in r.read():
pass
print evdev.categorize(event)
except IOError, e:
if e.errno != errno.ENODEV: raise
print "Device removed: %s" % r
del devices[r.fn]
I have downloaded a bunch of videos from coursera.org and have them stored in one particular folder. There are many individual videos in a particular folder (Coursera breaks a lecture into multiple short videos). I would like to have a python script which gives the combined length of all the videos in a particular directory. The video files are .mp4 format.
First, install the ffprobe command (it's part of FFmpeg) with
sudo apt install ffmpeg
then use subprocess.run() to run this bash command:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 -- <filename>
(which I got from http://trac.ffmpeg.org/wiki/FFprobeTips#Formatcontainerduration), like this:
from pathlib import Path
import subprocess
def video_length_seconds(filename):
result = subprocess.run(
[
"ffprobe",
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
"--",
filename,
],
capture_output=True,
text=True,
)
try:
return float(result.stdout)
except ValueError:
raise ValueError(result.stderr.rstrip("\n"))
# a single video
video_length_seconds('your_video.webm')
# all mp4 files in the current directory (seconds)
print(sum(video_length_seconds(f) for f in Path(".").glob("*.mp4")))
# all mp4 files in the current directory and all its subdirectories
# `rglob` instead of `glob`
print(sum(video_length_seconds(f) for f in Path(".").rglob("*.mp4")))
# all files in the current directory
print(sum(video_length_seconds(f) for f in Path(".").iterdir() if f.is_file()))
This code requires Python 3.7+ because that's when text= and capture_output= were added to subprocess.run. If you're using an older Python version, check the edit history of this answer.
Download MediaInfo and install it (don't install the bundled adware)
Go to the MediaInfo source downloads and in the "Source code, All included" row, choose the link next to "libmediainfo"
Find MediaInfoDLL3.py in the downloaded archive and extract it anywhere.
Example location: libmediainfo_0.7.62_AllInclusive.7z\MediaInfoLib\Source\MediaInfoDLL\MediaInfoDLL3.py
Now make a script for testing (sources below) in the same directory.
Execute the script.
MediaInfo works on POSIX too. The only difference is that an so is loaded instead of a DLL.
Test script (Python 3!)
import os
os.chdir(os.environ["PROGRAMFILES"] + "\\mediainfo")
from MediaInfoDLL3 import MediaInfo, Stream
MI = MediaInfo()
def get_lengths_in_milliseconds_of_directory(prefix):
for f in os.listdir(prefix):
MI.Open(prefix + f)
duration_string = MI.Get(Stream.Video, 0, "Duration")
try:
duration = int(duration_string)
yield duration
print("{} is {} milliseconds long".format(f, duration))
except ValueError:
print("{} ain't no media file!".format(f))
MI.Close()
print(sum(get_lengths_in_milliseconds_of_directory(os.environ["windir"] + "\\Performance\\WinSAT\\"
)), "milliseconds of content in total")
In addition to Janus Troelsen's answer above, I would like to point out a small problem I
encountered when implementing his answer. I followed his instructions one by one but had different results on windows (7) and linux (ubuntu). His instructions worked perfectly under linux but I had to do a small hack to get it to work on windows. I am using a 32-bit python 2.7.2 interpreter on windows so I utilized MediaInfoDLL.py. But that was not enough to get it to work for me I was receiving this error at this point in the process:
"WindowsError: [Error 193] %1 is not a valid Win32 application".
This meant that I was somehow using a resource that was not 32-bit, it had to be the DLL MediaInfoDLL.py was loading. If you look at the MediaInfo intallation directory you will see 3 dlls MediaInfo.dll is 64-bit while MediaInfo_i386.dll is 32-bit. MediaInfo_i386.dll is the one which I had to use because of my python setup. I went to
MediaInfoDLL.py (which I already had included in my project) and changed this line:
MediaInfoDLL_Handler = windll.MediaInfo
to
MediaInfoDLL_Handler = WinDLL("C:\Program Files (x86)\MediaInfo\MediaInfo_i386.dll")
I didn't have to change anything for it to work in linux
Nowadays pymediainfo is available, so Janus Troelsen's answer could be simplified.
You need to install MediaInfo and pip install pymediainfo. Then the following code would print you the total length of all video files:
import os
from pymediainfo import MediaInfo
def get_track_len(file_path):
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
return int(track.duration)
return 0
print(sum(get_track_len(f) for f in os.listdir('directory with video files')))
This link shows how to get the length of a video file https://stackoverflow.com/a/3844467/735204
import subprocess
def getLength(filename):
result = subprocess.Popen(["ffprobe", filename],
stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
return [x for x in result.stdout.readlines() if "Duration" in x]
If you're using that function, you can then wrap it up with something like
import os
for f in os.listdir('.'):
print "%s: %s" % (f, getLength(f))
Here's my take. I did this on Windows. I took the answer from Federico above, and changed the python program a little bit to traverse a tree of folders with video files. So you need to go above to see Federico's answer, to install MediaInfo and to pip install pymediainfo, and then write this program, summarize.py:
import os
import sys
from pymediainfo import MediaInfo
number_of_video_files = 0
def get_alternate_len(media_info):
myJson = media_info.to_data()
myArray = myJson['tracks']
for track in myArray:
if track['track_type'] == 'General' or track['track_type'] == 'Video':
if 'duration' in track:
return int(track['duration'] / 1000)
return 0
def get_track_len(file_path):
global number_of_video_files
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
number_of_video_files += 1
if type(track.duration) == int:
len_in_sec = int(track.duration / 1000)
elif type(track.duration) == str:
len_in_sec = int(float(track.duration) / 1000)
else:
len_in_sec = get_alternate_len(media_info)
if len_in_sec == 0:
print("File path = " + file_path + ", problem in type of track.duration")
return len_in_sec
return 0
sum_in_secs = 0.0
os.chdir(sys.argv[1])
for root, dirs, files in os.walk("."):
for name in files:
sum_in_secs += get_track_len(os.path.join(root, name))
hours = int(sum_in_secs / 3600)
remain = sum_in_secs - hours * 3600
minutes = int(remain / 60)
seconds = remain - minutes * 60
print("Directory: " + sys.argv[1])
print("Total number of video files is " + str(number_of_video_files))
print("Length: %d:%02d:%02d" % (hours, minutes, seconds))
Run it: python summarize.py <DirPath>
Have fun. I found I have about 1800 hours of videos waiting for me to have some free time. Yeah sure