I made a script for organizing torrents. I download tv shows and then move files to another disk to a folder like /series name/season xx/
It was working with some issues until I've added a couple of ifs. I saw the problem of case sensitivity. For example, If i download mr robot.mkv and the folder Mr Robot already existed it created a different folder as mr robot. I did the same with season not to double move Season if season already exists.
Apparently my script is trying to create the folder continuosly and I don't see why.
#!/usr/bin/env python3
import sys, glob, re, os, shutil
from termcolor import colored
#enconding: utf-8
dir_series = "/home/user/series/series/"
buscar = "*[sS][0-9][0-9]*"
series = [s for s in glob.glob(buscar) if s.endswith(('.mp4', '.srt', '.avi', '.mkv'))]
if series:
arch_encontrados = len(series)
print(colored("\nArchivos encontrados:",'red', attrs=['bold'] ), colored(arch_encontrados, 'red', attrs=['bold'] ),'\n')
print(*series, sep = "\n")
for serie in series:
#Extraer el nombre de la serie
nombre = re.findall(r'.*[\. ][sS]\d', serie)[0]
nombre_final = re.sub(r'[\. ][sS]\d','',nombre).replace('.',' ')
#Extraer el número de la temporada
season = re.findall(r'[\. ][sS]\d\d', serie)[0]
season_final_numero = re.sub(r'[\. ][sS]','',season)
season_final = ('Season ' + season_final_numero)
#Armar el directorio final
for series_path in os.listdir(dir_series): #lista el contenido de /home/user/series/series/
if nombre_final.lower() == series_path.lower(): #compara el listado con la salida del nombre de la serie sin importar mayúsculas y minúsculas
for season_path in os.listdir(dir_series + series_path):
if season_final == season_path: #compara el listado de seasons contra season_final que tiene mayuscula
path = os.path.join(dir_series, series_path, season_final)
print(path)
else:
path = os.path.join(dir_series, series_path, 'season ', season_final_numero)
else:
print(colored("\n\n*****************************************",'cyan', attrs=['bold']))
print(colored("** Directorio no encontrado, creándolo **",'cyan', attrs=['bold']))
print(colored("*****************************************\n",'cyan', attrs=['bold']))
path = os.path.join(dir_series, nombre_final, season_final)
print(path)
os.makedirs(path)
#Mover el archivo
print(colored('\nCopiando','green'), serie, colored('a', 'green'), path + '/' + serie)
shutil.move(serie,path)
else:
print(colored('\nNo hay archivos para organizar.','green', attrs=['bold']))
input(colored("\n\nPresione Enter para continuar ...", attrs=['blink', 'bold']))
I'm not seeing an infinite loop, but I think I do see a bug that is causing the same directory to get made many times.
You are calling os.makedirs in the else of your inner for loop, which means that you will make the same directory once for each file in os.listdir(dir_series) that does NOT match nombre_final.lower().
I think the issue might just be that you (or your IDE) accidentally indented the os.makedirs(path) call two levels too deep when you added the if/else. I think it needs to be outside of the inner loop entirely.
You probably also need to add a break in the case where it does match, and maybe also a guard to stop it from making a new directory in the case where a match was found?
Related
I was looking to organize my torrents once they're downloaded. I wrote this script that checks for series name and seaons and moves files to other disk where I keep my tv shows.
I want some processes to be printed as they run. I'm having a problem with one of them.
I want first to print "Archivos encontrados" (which means Files founded) and then print the variable with all the files from the directory (in this case the variable is called "series")
The problem is that as I wrote it, it print one Archivos encontrados for each file it finds.
I've tried as you can see in line 21 to check for end of file but it doesn't work.
Also, the else at the end that should run if it can't find any of the extensions I've declared is not working.
Thanks in advance
#!/usr/bin/env python3
import sys, glob, re, os, shutil
from termcolor import colored
#enconding: utf-8
dir_series = "/home/user/series/series/"
buscar = "*[sS][0-9][0-9]*"
for serie in glob.glob(buscar):
if serie.endswith(('.mp4', '.srt', '.avi', '.mkv')):
#Extraer el nombre de la serie
nombre = re.findall(r'.*[\. ][sS]\d', serie)[0]
nombre_final = re.sub(r'[\. ][sS]\d','',nombre).replace('.',' ')
#Extraer el número de la temporada
season = re.findall(r'[\. ][sS]\d\d', serie)[0]
season_final = re.sub(r'[\. ][sS]','',season)
#if serie == serie[-1]:
print(colored("Archivos encontrados: ",'red'))
print(serie)
#Armar el directorio final
path = os.path.join(dir_series, nombre_final, ('Season '+ season_final))
#Chequear si el directorio existe
if not os.path.exists(path):
print(colored("\nDirectorio no encontrado, creándolo",'cyan'))
os.makedirs(path)
#Mover el archivo
shutil.move(serie,path)
print(colored('\nCopiando:','green'), serie, colored('a', 'green'), path + '/' + serie)
else:
print('No hay archivos para organizar.\n')
input("\n\nPresione Enter para continuar ...")
Your if serie == serie[-1] check doesn't work because instead of checking "if the series is the last one in the list", you check "if the series is also the last character of the same series".
Consider using something like this instead:
series = [s for s in glob.glob(buscar) if s.endswith(('.mp4', '.srt', '.avi', '.mkv'))]
if series:
print(colored("Archivos encontrados: ",'red'))
for serie in series:
print(serie)
...
else:
print('No hay archivos para organizar.\n')
Good evening everyone,
I am currently trying to extract data from this website :
https://classic.sportsbookreview.com/betting-odds/nba-basketball/
The program logic is based on the followed loop.
Once the page from the website is open, the first step is to open the calendar and to seek if there were matches played this month.
If this is the case, for each of these day, the data will be written in an xls file. Then, when there aren't anymore matches to extract, the program click to the previous month, and perform the same statements.
On the contrary, if there isn't a single day where a match was played during the month, the treatment will click on the previous month and will perform the same statements as before.
Here is my code (sorry for the use of the french, if this is not clear, I could translate it) :
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import xlsxwriter
## Création du fichier excel
tableur = xlsxwriter.Workbook("PrédictionMachine.xlsx") ;
feuille1 = tableur.add_worksheet("10-05-2018") ;
## Ouverture du navigateur (pour Chrome, remplacer Firefox par Chrome)
navigateur = webdriver.Firefox() ;
navigateur.get("https://classic.sportsbookreview.com/betting-odds/nba- basketball/") ;
time.sleep(3) ;
## Initialisation des variables nécessaires pour écriture dans tableur
row = 0 ;
col = 0 ;
moistraite = 0 ;
matchtreeltrouve = 0 ;
## PHASE TEST À SUPPRIMER EN VF
passage = 0
try :
## Boucle : le programme s'arrête lorsque qu'une touche quelqconque du clavier est pressée
while True :
## Ouverture du calendrier
print("Passage : ", passage)
Calendrier = navigateur.find_element_by_xpath("//a[#class='dd-go-button']").click()
time.sleep(3)
## Récupération du mois/année en cours de traitement
Mois = navigateur.find_element_by_xpath("//h2[#class='tbl-cal-top-middle']")
## Si aucun match n'est joué ce mois-ci, le programme clique sur le mois précédent
if not (navigateur.find_elements_by_xpath("//a[contains(#onclick, '20')]")) :
print("------------------------------------------------")
print("Aucun match trouvé pour le mois de :", Mois.text)
print("------------------------------------------------")
Moisprecedent = navigateur.find_element_by_xpath("//img[#alt='Left Arrow']").click()
moistraite += 1
print("------------------------------------------------")
print("Nombre de mois traité :", moistraite) ;
print("------------------------------------------------")
time.sleep(3) ;
else :
## Récupération de tous les jours du mois où un match a supposément été joué
JMtheorique = navigateur.find_elements_by_xpath("//a[contains(#onclick, '20')]")
print("------------------------------------------------")
print("Matchs supposés trouvés pour le mois de :", Mois.text)
print("------------------------------------------------")
for jmtheorique in JMtheorique :
print("ON Y CROIT", jmtheorique.text)
navigateur.find_element_by_xpath("//a[contains(#onclick, 'OddsEvent.GetLinkDate')]").click()
time.sleep(3)
passage += 1
Calendrier = navigateur.find_element_by_xpath("//a[#class='dd-go-button']").click()
time.sleep(2)
Moisprecedent = navigateur.find_element_by_xpath("//img[#alt='Left Arrow']").click()
moistraite += 1
## Arrêt Manuel du programme
except KeyboardInterrupt :
print("!/_\/_\/_\! Interruption manuelle du programme !/_\/_\/_\! ")
print("Le programme a été stoppé au mois de :", Mois.text)
print("Nombre de mois traité :", moistraite)
print("Nombre de match écrit :", matchtreeltrouve) ;
##navigateur.quit()
For now, I just want to navigate through the different months/days.
For August, the code source works fine.
But when I arrive at July, the program is looping. It seems that the 'if not' statement isnt' correctly written, since the treatment find a match whereas there isn't a single one played during this month.
Okay, I think I understood my problem. When I open the calendar, some days of the previous/next month seem to be present. When the treatment reaches the month of July, the 4 August (a day where a game was played) is also present on the calendar, which explains why the program is looping between July and August. I need to extract only the a href contained in the td and not the ones contained in a td class as showed below:
The calendar
The kind of a href I need
I will investigate further to learn how to do that. Thanks for the help.
I am trying to make my code work like this:
Enter a verb in French: chanter
Output 1: je chante
tu chantes
il ou elle chante
nous chantons
vous chantez
ils ou elles chantent
I succeeded in making the part above, but I cannot succeed in switching je to j' when the user enters, for instance: echapper
Enter a verb in French: echapper
Output 2: j'echappe
tu echappes
il ou elle echappe
nous echappons
vous echappez
ils ou elles echappent
Code:
list = {
"je": 'e',
"tu": 'es',
"il ou elle": 'e',
"nous": 'ons',
"vous": 'ez',
"ils ou elles": 'ent'
}
veb = input("")
for key in list:
if veb.endswith('er'):
b = veb[:-2]
print(key, b + list[key])
I do not know how to change the key list['je'] to list['j''] to succeed with the Output 2.
If you use double quotes around j', i.e. "j'", it will work. Also, I recommend not using the name list for your dictionary because 1) it's a dictionary, not a list, and 2) you should avoid using builtin python names for your variables.
Also, looks like that conjugation is treated differently, with "j'" at the beginning and "e" at the end (instead of "er").
dictionary = {"je":"j'","tu":'es',"il ou elle":'e',"nous":
'ons',"vous":'ez',"ils ou elles":'ent'}
veb = input("")
for key in dictionary:
if veb.endswith('er'):
b = veb[:-2]
if key == 'je':
print(key, dictionary[key] + b + 'e')
else:
print(key,b + dictionary[key])
You should just simply replace the print statement with an if statement:
if key == "je" and (veb.startswith("a") or veb.startswith("e") or [etc.]):
print("j'",b + list[key])
else:
print(key,b + list[key])
I'm working on the implementation of several algorithms to compute shortest paths on graphs.
I have managed to implement Dijkstra's algorithm sequentially and I'm now trying to optimize my algorithm through the multiprocessing module of Python.
As a whole the code works. What I am trying to do here is :
First to check how many cpus I can work on with nb_cpu = mp.cpu_count()
Then dividing all the nodes I have in my graph accordingly
Finally calling the method subprocess_dijkstra that should compute the dijkstra algorithm for each of the nodes it is given as an argument (the idea being that each process only has to compute the algorithm for a smaller part of the graph).
When I run my script (called from a main.py file where I just format the data to suit my needs), I have 4 processes launched as I should.
However, they do not seem to execute the for node in nodes loop defined in subprocess_dijkstra.
Each process only computes the code once and then they go on hold indefinitely...
It is my first attempt at multiprocessing under Python so I may have missed a detail. Does anybody have an idea ?
When I interrupt the script, python tells me that the interruption takes place on the p.join() line.
Thanks to anyone helping me :)
Here is my code :
import multiprocessing as mp
def subprocess_dijkstra(do_print, nodes, tab_contenu, tab_distances):
tab_dist_initial = dict(tab_distances)
tab_dist = dict()
for node in nodes:
visited_nodes = list()
tab_dist = dict(tab_dist_initial)
dmin = -1
resultat = ""
filename = "dijkstra"+str(node)+".txt"
if do_print:
dt = open(filename, 'w')
tab_dist[node] = 0
"""Ligne de résultat initiale"""
for valeur in tab_dist.values():
resultat += str(valeur)
resultat += " "
resultat += "\n"
dt.write(resultat)
while len(visited_nodes) != len(tab_contenu):
""" On se place sur le noeud non visité qui a la distance minimale de notre départ """
for cle, valeur in tab_dist.items():
if cle not in visited_nodes:
if dmin ==-1 or valeur<dmin:
dmin = valeur
node = cle
""" On vérifie que le noeud n'a pas déjà été visité """
if (node not in visited_nodes):
""" On regarde les fils de ce noeud et la longueur des arcs"""
for cle,valeur in tab_contenu[node].items():
tab_dist[cle] = min(tab_dist[cle], tab_dist[node]+valeur)
visited_nodes.append(node)
if do_print:
resultat = ""
""" Ligne de résultat """
for valeur in tab_dist.values():
resultat += str(valeur)
resultat += " "
resultat += "\n"
dt.write(resultat)
if do_print:
dt.close()
def main(do_print,donnees):
tab_contenu = donnees[1]
nb_nodes = int(donnees[0])
tab_distances = {x: float('inf') for x in range(nb_nodes)}
args=[(do_print, x, tab_contenu, tab_distances) for x in range(nb_nodes)]
nb_cpu = mp.cpu_count()
pool = mp.Pool(processes = nb_cpu)
pool.starmap(subprocess_dijkstra, args)
pool.close()
pool.join()
I have found the source of my problems.
The tab_dist[node] = 0 was misplaced and should have been put before the if do_print: statement.
All is now working.
Regards to all.
I'm developing a Image compression system using Python Image Library. The basic workflow is:
Read all images of a certain directory with : find /opt/images -name *.jpg > /opt/rs/images.txt
Read this file and storage the result in a Python list
Iterate the list, create a Image object and passing like a argument to
the compress function
and, copy the resulting image on a certain directory which is buit
depending of the name of the image.
A example:
/opt/buzon/17_499999_1_00000000_00000999_1.jpg
This is the real name of the image:
The final result is:
17_499999.jpg
The ouput directory is: /opt/ftp
and should be storaged on this way:
1- first partition
00000000 - second partition
00000999 - third partition
1- this flag if to decide if we have to compress this image or not (1 is False, 0 is True)
For that reason the final path of the image is:
/opt/ftp/1/00000000/00000999/17_499999.jpg for the original copy
/opt/ftp/1/00000000/00000999/17_499999_tumb.jpg
Now, Where is the problem. When I read the file where I storage the result of the find command, each line of the file has the \n character.
How I can replace with regular expressions this?
The completed source code is this. Any suggests is welcome.
import Image, os ,sys, re, subprocess, shlex
import ConfigParser
from symbol import except_clause
CONFIG_FILE = "/opt/scripts/config.txt"
config = ConfigParser.ConfigParser()
config.read(CONFIG_FILE)
entry_dir = config.get('directories', 'entry_dir')
out_dir = config.get('directories', 'out_dir')
def resize(img, box, fit, out):
'''Downsample the image.
#param img: Un objeto de la clase Imagen
#param box: tuple(x, y) - El recuadro de la imagen resultante
#param fix: boolean - el corte de la imagen para llenar el rectangulo
#param out: objeto de tipo fichero - guarda la imagen hacia la salida estandar
'''
# prepara la imagen con un factor de 2, 4, 8 y el algoritmo mas rapido
factor = 1
while img.size[0]/factor > 2*box[0] and img.size[1]*2/factor > 2*box[1]:
factor *=2
if factor > 1:
img.thumbnail((img.size[0]/factor, img.size[1]/factor), Image.NEAREST)
# Aqui se calcula el rectangulo optimo para la imagen
if fit:
x1 = y1 = 0
x2, y2 = img.size
wRatio = 1.0 * x2/box[0]
hRatio = 1.0 * y2/box[1]
if hRatio > wRatio:
y1 = y2/2-box[1]*wRatio/2
y2 = y2/2+box[1]*wRatio/2
else:
x1 = x2/2-box[0]*hRatio/2
x2 = x2/2+box[0]*hRatio/2
# Este metodo es para manipular rectangulos de una determinada imagen
# definido por 4 valores que representan las coordenadas: izquierda,
# arriba, derecha y abajo
img = img.crop((x1,y1,x2,y2))
# Le damos el nuevo tamanno a la imagen con el algoritmo de mejor calidad(ANTI-ALIAS)
img.thumbnail(box, Image.ANTIALIAS)
return img
def removeFiles(directory):
"""
Funcion para borrar las imagenes luego de ser procesadas
"""
for f in os.listdir(directory):
path = os.path.abspath(f)
if re.match("*.jpg", path):
try:
print "Erasing %s ..." % (path)
os.remove(path)
except os.error:
pass
def case_partition(img):
"""
#desc: Esta funcion es la que realmente guarda la imagen en la
particion 0
#param: imagen a guardar
#output_dir: Directorio a guardar las imagenes
"""
nombre_imagen = img
nombre_import = os.path.splitext(nombre_imagen)
temp = nombre_import[0]
nombre_real = temp.split('_')
if nombre_real[4] == 0:
ouput_file = nombre_real[0] + nombre_real[1] + ".jpg"
output_dir = out_dir + "/%d/%d/%d/" % (nombre_real[2], nombre_real[3], nombre_real[4])
if os.path.isdir(output_dir):
os.chdir(output_dir)
img.save(output_file, "JPEG", quality=75)
else:
create_out_dir(output_dir)
os.chdir(output_dir)
img.save(output_file)
else:
print "Esta imagen sera comprimida"
img = resize(img, 200, 200, 200) ## FIXME Definir el tamano de la imagen
# Salvamos la imagen hacia un objeto de tipo file
# con la calidad en 75% en JPEG y el nombre definido por los especialistas
# Este nombre no esta definido......
# FIXME
output_file = nombre_real[0] + nombre_path[1] + "_.jpg"
output_dir = out_dir + "/%d/%d" % (nombre_real[2], nombre_real[3], nombre_real[4])
if os.path.isdir(output_dir):
os.chdir(out)
img.save(output_file, "JPEG", quality=75)
else:
create_out_dir(output_dir)
os.chdir(output_dir)
img.save(output_file, "JPEG", quality=75)
if __name__ == "__main__":
find = "find %s -name *.jpg > /opt/scripts/images.txt" % entry_dir
args = shlex.split(find)
p = subprocess.Popen(args)
f = open('/opt/scripts/images.txt', "r")
images = []
for line in f:
images.append(line)
f.close()
for i in images:
img = Image.open(filename) # Here is the error when I try to open a file
case_partition(img) # with the format '/opt/buzon/15_498_3_00000000_00000000_1.jpg\n'
# and the \n character is which I want to replace with nothing
removeFiles(entry_dir) #
#
Regards
Assuming s is the string with the carriage return, you may use s.strip("\n") to remove carriage returns at the corners of the string. No need for regular expressions there.
I guess the relevant code lines are these:
for line in f:
images.append(line)
to remove the \n, you can simply call strip() on the string:
for line in f:
images.append(line.strip())
there are many ways to do that without using regexp
simpliest and correct one is use images.append(line.rstrip("\n")), also you can use images.append(line[:-1])
Also you can use glob() module instead call 'find' command through shell. It will return result as python list without having to use the files. ex: images=glob.glob("*.jpg"). http://docs.python.org/library/glob.html?highlight=glob