I wrote a Python script that will run indefinitely. It monitors a directory using PyInotify and uses the Multiprocessing module to run any new files created in those directories through an external script. That all works great.
The problem I am having is writing the output to a file. The filename I chose uses the current date (using datetime.now) and should, theoretically, roll on the hour, every hour.
now = datetime.now()
filename = "/data/db/meta/%s-%s-%s-%s.gz" % (now.year, now.month, now.day, now.hour)
with gzip.open(filename, 'ab') as f:
f.write(json.dumps(data) + "\n")
f.close() #Unsure if I need this, here for debug
Unfortunately, when the hour rolls on -- the output stops and never returns. No exceptions are thrown, it just stops working.
total 2.4M
drwxrwxr-x 2 root root 4.0K Sep 8 08:01 .
drwxrwxr-x 4 root root 12K Aug 29 16:04 ..
-rw-r--r-- 1 root root 446K Aug 29 16:59 2016-8-29-16.gz
-rw-r--r-- 1 root root 533K Aug 30 08:59 2016-8-30-8.gz
-rw-r--r-- 1 root root 38K Sep 7 10:59 2016-9-7-10.gz
-rw-r--r-- 1 root root 95K Sep 7 14:59 2016-9-7-14.gz
-rw-r--r-- 1 root root 292K Sep 7 15:59 2016-9-7-15.gz #Manually run
-rw-r--r-- 1 root root 834K Sep 8 08:59 2016-9-8-8.gz
Those files aren't really owned by root, just changed them for public consumption
As you can see, all of the files timestamps end at :59 and the next hour never happens.
Is there something that I should take into consideration when doing this? Is there something that I am missing running a Python script indefinitely?
After taking a peek. It seems as if PyInotify was my problem.
See here (https://unix.stackexchange.com/questions/164794/why-doesnt-inotifywatch-detect-changes-on-added-files)
I adjusted your code to change the file name each minute, which speeds up debugging quite a bit and yet still tests the hypothesis.
import datetime
import gzip, time
from os.path import expanduser
while True:
now = datetime.datetime.now()
filename = expanduser("~")+"/%s-%s-%s-%s-%s.gz" % (now.year, now.month, now.day, now.hour, now.minute)
with gzip.open(filename, 'a') as f:
f.write(str(now) + "\n")
f.write("Data Dump here" + "\n")
time.sleep(10)
This seems to run without an issue. Changing the time-zone of my pc was also picked up and dealt with. I would suspect, given the above, that your error may lie elsewhere and some judicious debug printing of values at key points is needed. Try using a more granular file name as above to speed up the debugging.
Related
I want to print a command until it finds the main.py file and then stops.
I tried this code but according to logic it is printing the code several times and not line by line until I stop at mine where I find the main.py file.
import subprocess
#store ls -l to variable
get_ls = subprocess.getoutput("ls -l")
#transfom output to string
ls = str(get_ls)
#search for main.py file in ls
for line in ls:
main_py = line.find('main.py')
print(ls)
#if find main.py print stop and exit
if main_py == 'main.py':
print('stop...')
exit()
Output is looping this:
-rw-r--r-- 1 runner runner 9009 Feb 19 19:00 poetry.lock
-rw-r--r-- 1 runner runner 354 Feb 19 19:00 pyproject.toml
-rw-r--r-- 1 runner runner 329 Feb 25 00:10 main.py
-rw-r--r-- 1 runner runner 383 Feb 14 17:57 replit.nix
-rw-r--r-- 1 runner runner 61 Feb 19 18:46 urls.tmp
drwxr-xr-x 1 runner runner 56 Oct 26 20:53 venv
I want this output:
-rw-r--r-- 1 runner runner 9009 Feb 19 19:00 poetry.lock
-rw-r--r-- 1 runner runner 354 Feb 19 19:00 pyproject.toml
-rw-r--r-- 1 runner runner 329 Feb 25 00:10 main.py
###### stops here #######
How to fix this?
The line for line in ls isn't doing what you think it is. Instead of going line by line, it's going through ls character by character. What you want to have is for line in ls.splitlines(). You can then check if main.py is on that line by calling "main.py" in line
import subprocess
#store ls -l to variable
get_ls = subprocess.getoutput("ls -l")
#transfom output to string
ls = str(get_ls)
#search for main.py file in ls
for line in ls.splitlines():
print(line)
#if find main.py print stop and exit
if "main.py" in line:
print('stop...')
exit()
That should be more what you want I think.
You're also printing ls every loop, which you need to change to only print the current line
In my opinion, if you only want to achieve the result and don’t mind changing your logic, this the most elegant, and which is the most "pythonic" one. I like the simplicity of the os.walk() method:
import os
for root, dirs, files in os.walk("."):
for filename in files:
print(filename)
if filename == "main.py":
print("stop")
break
I'm trying to use antlr on one of the grammars here (specifically, java 8):
$ antlr4 -Dlanguage=Python3 grammars-v4/java/java8/Java8Lexer.g4
$ antlr4 -Dlanguage=Python3 grammars-v4/java/java8/Java8Parser.g4
This step appears to go smooth, and when I inspect the directory:
$ ls -l grammars-v4/java/java8/*.py
-rw-r--r-- 1 root root 56583 Nov 30 14:32 grammars-v4/java/java8/Java8Lexer.py
-rw-r--r-- 1 root root 798326 Nov 30 14:32 grammars-v4/java/java8/Java8Parser.py
-rw-r--r-- 1 root root 79928 Nov 30 14:32 grammars-v4/java/java8/Java8ParserListener.py
everything is there. Still, when I try to use the hello world example:
from antlr4 import *
from Java8Lexer import Java8Lexer
from Java8Parser import Java8Parser
input_stream = FileStream("/main.java")
lexer = Java8Lexer(input_stream)
stream = CommonTokenStream(lexer)
parser = Java8Parser(stream)
tree = parser.startRule()
I get an error:
AttributeError: 'Java8Parser' object has no attribute 'startRule'
The parser method(s), startRule in your case, correspond to the parser rules defined in the .g4 grammar.
Look into the Java grammar, there is a parser rule with EOF in it called compilationUnit. Use that instead:
tree = parser.compilationUnit()
import luigi
class FileToStaging(ImportToTable):
filename = Luigi.Parameter(default = '')
#import file from some folder to a staging database
def requires(self):
return luigi.LocalTarget(self.filename)
#truncate table
#load the file into staging
class StgToOfficial(RunQuery):
filename = Luigi.Parameter
# run a process in the database to load data from staging to the final table
def requires(self):
return FileToStaging(self.filename)
# run query
class LoadFileGroups(luigi.WrapperTask):
def requires(self):
list_of_files = get_list_of_files_currently_in_folder() # The folder can have an arbitrary number of files inside
for file in list_of_files:
yield(StgToOfficial(filename = file))
Hello, community,
I'm new to Luigi and trying to build an ETL process with the framework.
Imagine I have a process similar to the previous snippet of pseudo code. The process must check a folder and get the list of files inside. Then, one by one, import to staging database and run a process to load the data in staging to the final table.
The problem is that, with the previous solution, all the files loading into the staging table (followed by the loading process of each file) is run in parallel, which cannot happen. How can I force Luigi to execute the tasks sequentially? Only when a file finishes the load in the final table, import the next one and so on. (Check the draft below for a simplified draft)
Draft of the structure I'm trying to achieve
I know that I should use the requires method to ensure the sequence, but how can I do it dinamically for a unknown number of files to be loaded?
Thank you very much in advance for the help.
Solved by creating a recursive pattern in the requires() method according to the answer from Peter Weissbrod in the following discution:
https://groups.google.com/g/luigi-user/c/glvU_HxYmr0/m/JvV3xgsiAwAJ
Here is the solution proposed by Peter:
Here is a snippet which isnt precisely tailored to your needs but at the same time you might find useful. Consider a "sql" directory like this:
~ :>ll somedir/sql
-rw-rw-r-- 1 pweissbrod authenticatedusers 513 Jan 27 09:15 stage01_test.sqltemplate
-rw-rw-r-- 1 pweissbrod authenticatedusers 1787 Jan 28 13:57 stage02_test.sqltemplate
-rw-rw-r-- 1 pweissbrod authenticatedusers 1188 Jan 28 13:57 stage03_test.sqltemplate
-rw-rw-r-- 1 pweissbrod authenticatedusers 13364 Jan 29 07:16 stage04_test.sqltemplate
-rw-rw-r-- 1 pweissbrod authenticatedusers 1344 Jan 28 13:57 stage05_test.sqltemplate
-rw-rw-r-- 1 pweissbrod authenticatedusers 1983 Jan 28 17:03 stage06_test.sqltemplate
-rw-rw-r-- 1 pweissbrod authenticatedusers 1224 Jan 28 16:05 stage07_test.sqltemplate
# Consider a luigi task with a dynamic requires method like this:
class BuildTableTask(luigi.Task):
table = luigi.Parameter(description='the name of the table this task (re)builds')
def requires(self):
tables = [f.split('_')[0] for f in os.listdir('sql') if re.match(f'stage[0-9]+[a-z]*_{config().environment}', f)]
prereq = next(iter([t for t in sorted(tables, reverse=True) if t < self.table]), None)
yield BuildTableTask(table=prereq) or []
def run(self):
with open(f'sql/{self.table}_{config().environment}.sqltemplate'.format(**config().to_dict())) as sqltemplate:
sql = sqltemplate.read().format(**config().to_dict())
db.run(f'create table {config().srcdbname}.{self.table} stored as orc as {sql}')
the task tree is built by observing the files in that directory:
└─--[BuildTableTask-{'table': 'stage07'} (COMPLETE)]
└─--[BuildTableTask-{'table': 'stage06'} (COMPLETE)]
└─--[BuildTableTask-{'table': 'stage05'} (COMPLETE)]
└─--[BuildTableTask-{'table': 'stage04'} (COMPLETE)]
└─--[BuildTableTask-{'table': 'stage03'} (COMPLETE)]
└─--[BuildTableTask-{'table': 'stage02'} (COMPLETE)]
└─--[BuildTableTask-{'table': 'stage01'} (COMPLETE)]
Currently I am experience issues with the script automatic run after wifi adapter connects to a network.
After ridiculously extended research, I've made several attempts to add script to a /etc/network/if-up.d/. Manually my script works; however it does not automatically.
User permissions:
ls -al /etc/network/if-up.d/*
-rwxr-xr-x 1 root root 703 Jul 25 2011 /etc/network/if-up.d/000resolvconf
-rwxr-xr-x 1 root root 484 Apr 13 2015 /etc/network/if-up.d/avahi-daemon
-rwxr-xr-x 1 root root 4958 Apr 6 2015 /etc/network/if-up.d/mountnfs
-rwxr-xr-x 1 root root 945 Apr 14 2016 /etc/network/if-up.d/openssh-server
-rwxr-xr-x 1 root root 48 Apr 26 03:21 /etc/network/if-up.d/sendemail
-rwxr-xr-x 1 root root 1483 Jan 6 2013 /etc/network/if-up.d/upstart
lrwxrwxrwx 1 root root 32 Sep 17 2016 /etc/network/if-up.d/wpasupplicant -> ../../wpa_supplicant/ifupdown.sh
Also, I've tried to push the command directly in /etc/network/interfaces
by adding a row
post-up /home/pi/r/sendemail.sh
Contents of sendemail.sh:
#!/bin/sh
python /home/pi/r/pip.py
After the reboot, nothing actually happen. I've even tried sudo in front
I assume that wpasupplicant is the thing which causes that, but I cannot get how to run my script in ifupdown.sh script under /etc/wpa_supplicant.
Appreciate your help!
If you have no connectivity prior to initializing the wifi interface, I would suggest adding a cron job of a bash or python script that checks for connectivity every X minutes.
Ping (host);
If host is up then run python commands or external command.
This is rather ambiguous but hopefully is of some help.
Here is an example of a script that will check if a host is alive;
import re,commands
class CheckAlive:
def __init__(self):
myCommand = commands.getstatusoutput('ping ' + 'google.com)
searchString = r'ping: unknown host'
match = re.search(searchString,str(myCommand))
if match:
# host is not alive
print 'no alive, don't do stuff';
else:
# host is alive
print 'alive, time do stuff';
I have this python script
#!/usr/bin/env python
import datetime, os
from time import gmtime, strftime
to_backup = "/home/vmware/tobackup"
var1 = datetime.datetime.now().strftime('%b-%d-%I%p')
for f in os.listdir(to_backup):
if(os.path.isfile(f)):
print f + " is a file"
if(os.path.isdir(f)):
print f + " is a directory"
It is giving me empty ouput. i don't know where is the problem
OUTPUT FOR dr jimbob answer
total 36
-rwxrwxr-x 1 vmware vmware 440 May 5 07:41 back.py
-rwxrwxr-x 1 vmware vmware 2624 May 4 20:35 backup.sh
drwxr-xr-x 2 vmware vmware 4096 Jun 22 2010 Desktop
drwxrwxr-x 2 vmware vmware 4096 May 5 03:51 destination
drwxr-xr-x 2 root root 4096 May 4 18:49 public_html
drwxrwxr-x 2 vmware vmware 4096 May 1 07:47 python
-rwxrwxr-x 1 vmware vmware 560 May 1 13:20 regex.py
drwxrwxrwx 7 vmware vmware 4096 May 5 03:50 tobackup
total 20
drwxrwxrwx 2 vmware vmware 4096 May 5 03:50 five
drwxrwxrwx 2 vmware vmware 4096 May 5 03:50 four
drwxrwxrwx 2 vmware vmware 4096 May 5 03:50 one
drwxrwxrwx 2 vmware vmware 4096 May 5 03:50 three
drwxrwxrwx 2 vmware vmware 4096 May 5 03:50 two
Ok you have permission, but you aren't in the right directory when you list through the files. list_dir gives you a list of dirs/files without their path, and os.path.isfile('one') and os.path.isdir('one') will check whether the directory 'one' exists in the current directory (wherever you launched the script from, unless you explicitly changed directory with os.chdir or included the path, e.g., os.path.isdir('/home/vmware/tobackup/one').
#!/usr/bin/env python
import datetime, os
from time import gmtime, strftime
import subprocess
to_backup = "/home/vmware/tobackup"
var1 = datetime.datetime.now().strftime('%b-%d-%I%p')
os.chdir(to_backup)
# os.listdir(to_backup) = ['one', 'two', 'three', 'four', 'five']
for f in os.listdir(to_backup):
if(os.path.isfile(f)):
print f + " is a file"
if(os.path.isdir(f)):
print f + " is a directory"
or
to_backup = "/home/vmware/tobackup"
var1 = datetime.datetime.now().strftime('%b-%d-%I%p')
# os.listdir(to_backup) = ['one', 'two', 'three', 'four', 'five']
for f in os.listdir(to_backup):
if(os.path.isfile(os.path.join(to_backup,f))):
print f + " is a file"
if(os.path.isdir(os.path.join(to_backup,f))):
print f + " is a directory"
or with walk (but not actually walking through subdirs).
to_backup = "/home/vmware/tobackup"
var1 = datetime.datetime.now().strftime('%b-%d-%I%p')
root, dirs, files in os.walk(to_backup).next()
for file in files:
print f + " is a file in " + root
for dir in dirs:
print f + " is a directory"
EDIT: To be even clearer, the mistake with your original script is you have a file structure like:
/home/user/bin/your_script.py
/home/vmware/tobackup/
/home/vmware/tobackup/one
/home/vmware/tobackup/two
...
When you go to /home/user/bin to run your script (e.g., python your_script.py), os.listdir('/home/vmware/tobackup') gives you a list of file and dir names in /home/vmware/tobackup, that is ['one','two', ...]. However, when you do os.path.isfile('one') from the directory /home/user/bin, you check to see if /home/user/bin/one is a file, not whether /home/vmware/tobackup/one is a file. Since /home/user/bin/one doesn't exist, you get no output.
It works on my machine (ubuntu 10.10)
Maybe /home/vmware/tobackup is empty or you have no permissions to read it.
os.listdir(dir_name) returns filenames relative to the named directory. To use those in other commands, you need to either prepend the directory name (via f = os.path.join(to_backup, f) at the start of the loop body) or else change the working directory to the backup directory before starting the loop.
These are the first two alternatives shown in dr jimbob's answer.