It's my first real Python script, so feel free to make comments in order to improve my code.
The purpose of this script is to extract 2 Oracle tables with Python, store them in a dataframe and then join them with pandas.
But for queries returning more than 500k lines I feel that it is slow. Do you know why?
import pandas as pd
from datetime import date
from sqlalchemy import create_engine
import cx_Oracle, time
import pandas as pd
import config
## Variable pour le timer
start = time.time()
## User input en ligne de commande
year = input('Saisir une annee : ')
month = input('Saisir le mois, au fomat MM : ')
societe_var = input('SA (APPLE,PEACH,BANANA,ANANAS,ALL) : ')
## SA + BU correspondantes aux SA
sa_list = ['APPLE','PEACH','BANANA','ANANAS']
bu_list_MERE = ['006111','1311402','1311403','1311404','1340115','13411106','1311407','1111','6115910','1166157','6811207','8311345','1111','1188100','8118101','8811102','8810113','8811104','8118105','8811106','8811107','8118108','1111']
bu_list_GARE = ['131400','310254']
bu_list_VOYA = ['0151100','1110073','1007115','1311335','1113340','1311341','1113342','1331143']
bu_list_RESO = ['1211345','13111345','11113395','73111345']
#Permet de pointre vers la bonne liste en fonction de la SA saisie
bu_list_map = {
'APPLE': bu_list_APPLE,
'PEACH': bu_list_PEACH,
'BANANA': bu_list_BANANA,
'ANANAS' : bu_list_ANANAS
}
if societe_var == 'ALL' :
print('non codé pour le moment')
elif societe_var in sa_list :
bu_list = bu_list_map.get(societe_var)
sa_var = societe_var
i=1
for bu in bu_list :
start_bu = time.time()
## On vient ici charger la requête SQL avec les bonnes variables pour gla_va_parametre -- EPOST
query1 = open('gla_va_parametre - VAR.sql',"r").read()
query1 = query1.replace('#ANNEE',"'" + year + "'").replace('%MOIS%',"'" + month + "'").replace('%SA%',"'" + societe_var + "'").replace('%BUGL%',"'" + bu + "'").replace('%DIVISION%','"C__en__PS_S1_D_OP_UNIT13".OPERATING_UNIT')
## On vient ici charger la requête SQL avec les bonnes variables pour cle-gla_tva -- FPOST
query2 = open('cle-gla_tva - VAR.sql',"r").read()
query2 = query2.replace('#ANNEE',"'" + year + "'").replace('%MOIS%',"'" + month + "'").replace('%SA%',"'" + societe_var + "'").replace('%BUGL%',"'" + bu + "'").replace('%DIVISION%','OPERATING_UNIT')
# Param de connexion
connection_EPOST = cx_Oracle.connect(user=config.user_EPOST, password=config.password_EPOST, dsn=config.host_EPOST, )
connection_FPOST = cx_Oracle.connect(user=config.user_FPOST, password=config.password_FPOST, dsn=config.host_FPOST, )
## Récup partie EPOST
with connection_EPOST :
# On déclare une variable liste vide
dfl = []
# On déclare un DataFrame vide
dfs = pd.DataFrame()
z=1
# Start Chunking
for chunk in pd.read_sql(query1, con=connection_EPOST,chunksize=25000) :
# Start Appending Data Chunks from SQL Result set into List
dfl.append(chunk)
print('chunk num : ' + str(z))
z = z + 1
# Start appending data from list to dataframe
dfs = pd.concat(dfl, ignore_index=True)
print('param récupéré')
## Récup partie FPOST
with connection_FPOST :
# On déclare une variable liste vide
df2 = []
# On déclare un DataFrame vide
dfs2 = pd.DataFrame()
# Start Chunking
for chunk in pd.read_sql(query2, con=connection_FPOST,chunksize=10000) :
# Start Appending Data Chunks from SQL Result set into List
df2.append(chunk)
# Start appending data from list to dataframe
dfs2 = pd.concat(df2, ignore_index=True)
print('clé récupéré')
print('Début de la jointure')
jointure = pd.merge(dfs,dfs2,how='left',left_on=['Code_BU_GL','Code_division','Code_ecriture','Date_comptable','Code_ligne_ecriture','UNPOST_SEQ'],right_on=['BUSINESS_UNIT','OPERATING_UNIT','JOURNAL_ID','JOURNAL_DATE','JOURNAL_LINE','UNPOST_SEQ']).drop(columns= ['BUSINESS_UNIT','OPERATING_UNIT','JOURNAL_ID','JOURNAL_DATE','JOURNAL_LINE'])
jointure.to_csv('out\gla_va_'+year+month+"_"+societe_var+"_"+bu+"_"+date.today().strftime("%Y%m%d")+'.csv', index=False, sep='|')
print('Fichier ' + str(i) + "/" + str(len(bu_list)) + ' généré en : '+ str(time.time() - start_bu)+' secondes')
i = i + 1
print("L'extraction du périmètre de la SA " + societe_var + " s'est effectué en :" + str((time.time() - start)/60) + " min" )
I have sought different articles here about searching data from a list, but nothing seems to be working right or is appropriate in what I am supposed to implement.
I have this pre-created module with over 500 list (they are strings, yes, but is considered as list when called into function; see code below) of names, city, email, etc. The following are just a chunk of it.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
a = empRecords.strip().split(";")
And I have the following code for searching:
import empData as x
def seecity():
empCitylist = list()
for ct in x.a:
empCt = ct.strip().split(",")
empCitylist.append(empCt)
t = sorted(empCitylist, key=lambda x: x[3])
for c in t:
city = (c[3])
print(city)
live_city = input("Enter city: ")
for cy in city:
if live_city in cy:
print(c[1])
# print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
Forgive my idiotic approach as I am new to Python. However, what I am trying to do is user will input the city, then the results should display the employee's last name, first name who are living in that city (I dunno if I made sense lol)
By the way, the code I used above doesn't return any answers. It just loops to the input.
Thank you for helping. Lovelots. <3
PS: the format of the empData is: first name, last name, address, city, country, birthday, zip, phone, and email
You can use the csv module to read easily a file with comma separated values
import csv
with open('test.csv', newline='') as csvfile:
records = list(csv.reader(csvfile))
def search(data, elem, index):
out = list()
for row in data:
if row[index] == elem:
out.append(row)
return out
#test
print(search(records, 'Orlando', 3))
Based on your original code, you can do it like this:
# Make list of list records, sorted by city
t = sorted((ct.strip().split(",") for ct in x.a), key=lambda x: x[3])
# List cities
print("Cities in DB:")
for c in t:
city = (c[3])
print("-", city)
# Define search function
def seecity():
live_city = input("Enter city: ")
for c in t:
if live_city == c[3]:
print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
seecity()
Then, after you understand what's going on, do as #Hoxha Alban suggested, and use the csv module.
The beauty of python lies in list comprehension.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
rows = empRecords.strip().split(";")
data = [ r.strip().split(",") for r in rows ]
then you can use any condition to filter the list, like
print ( [ "Name: " + emp[1] + "," + emp[0] + "| Current City: " + emp[3] for emp in data if emp[3] == "Washington" ] )
['Name: Hixenbaugh,Alesia| Current City: Washington']
I am trying to make my code work like this:
Enter a verb in French: chanter
Output 1: je chante
tu chantes
il ou elle chante
nous chantons
vous chantez
ils ou elles chantent
I succeeded in making the part above, but I cannot succeed in switching je to j' when the user enters, for instance: echapper
Enter a verb in French: echapper
Output 2: j'echappe
tu echappes
il ou elle echappe
nous echappons
vous echappez
ils ou elles echappent
Code:
list = {
"je": 'e',
"tu": 'es',
"il ou elle": 'e',
"nous": 'ons',
"vous": 'ez',
"ils ou elles": 'ent'
}
veb = input("")
for key in list:
if veb.endswith('er'):
b = veb[:-2]
print(key, b + list[key])
I do not know how to change the key list['je'] to list['j''] to succeed with the Output 2.
If you use double quotes around j', i.e. "j'", it will work. Also, I recommend not using the name list for your dictionary because 1) it's a dictionary, not a list, and 2) you should avoid using builtin python names for your variables.
Also, looks like that conjugation is treated differently, with "j'" at the beginning and "e" at the end (instead of "er").
dictionary = {"je":"j'","tu":'es',"il ou elle":'e',"nous":
'ons',"vous":'ez',"ils ou elles":'ent'}
veb = input("")
for key in dictionary:
if veb.endswith('er'):
b = veb[:-2]
if key == 'je':
print(key, dictionary[key] + b + 'e')
else:
print(key,b + dictionary[key])
You should just simply replace the print statement with an if statement:
if key == "je" and (veb.startswith("a") or veb.startswith("e") or [etc.]):
print("j'",b + list[key])
else:
print(key,b + list[key])
I'm touching the goal of my project, but I'm getting a problem on : How I can create a completeness map ?
I have lots of data, a field with maybe 500.000 objects which are represented by dots in my plot with different zoom :
I would like to create a mask, I mean, cut my plot in tiny pixels, and say if I have an object in this pixel, I get the value : 1 (black for example) elif, I have not object in my pixel, I get the value : 0 (white for example).
I'll create a mask and I could divide each field by this mask.
The problem is that I don't know how I can process in order to make that :/
I create a first script in order to get a selection on my data. This one :
#!/usr/bin/python
# coding: utf-8
from astropy.io import fits
from astropy.table import Table
import numpy as np
import matplotlib.pyplot as plt
###################################
# Fichier contenant le champ brut #
###################################
filename = '/home/valentin/Desktop/Field52_combined_final_roughcal.fits'
# Ouverture du fichier à l'aide d'astropy
field = fits.open(filename)
print "Ouverture du fichier : " + str(filename)
# Lecture des données fits
tbdata = field[1].data
print "Lecture des données du fits"
###############################
# Application du tri sur PROB #
###############################
mask = np.bitwise_and(tbdata['PROB'] < 1.1, tbdata['PROB'] > -0.1)
new_tbdata = tbdata[mask]
print "Création du Masque"
#################################################
# Détermination des valeurs extremales du champ #
#################################################
# Détermination de RA_max et RA_min
RA_max = np.max(new_tbdata['RA'])
RA_min = np.min(new_tbdata['RA'])
print "RA_max vaut : " + str(RA_max)
print "RA_min vaut : " + str(RA_min)
# Détermination de DEC_max et DEC_min
DEC_max = np.max(new_tbdata['DEC'])
DEC_min = np.min(new_tbdata['DEC'])
print "DEC_max vaut : " + str(DEC_max)
print "DEC_min vaut : " + str(DEC_min)
#########################################
# Calcul de la valeur centrale du champ #
#########################################
# Détermination de RA_moyen et DEC_moyen
RA_central = (RA_max + RA_min)/2.
DEC_central = (DEC_max + DEC_min)/2.
print "RA_central vaut : " + str(RA_central)
print "DEC_central vaut : " + str(DEC_central)
print " "
print " ------------------------------- "
print " "
##############################
# Détermination de X et de Y #
##############################
# Creation du tableau
new_col_data_X = array = (new_tbdata['RA'] - RA_central) * np.cos(DEC_central)
new_col_data_Y = array = new_tbdata['DEC'] - DEC_central
print 'Création du tableau'
# Creation des nouvelles colonnes
col_X = fits.Column(name='X', format='D', array=new_col_data_X)
col_Y = fits.Column(name='Y', format='D', array=new_col_data_Y)
print 'Création des nouvelles colonnes X et Y'
# Creation de la nouvelle table
tbdata_final = fits.BinTableHDU.from_columns(new_tbdata.columns + col_X + col_Y)
# Ecriture du fichier de sortie .fits
tbdata_final.writeto('{}_{}'.format(filename,'mask'))
print 'Ecriture du nouveau fichier mask'
field.close()
Ok, it's working ! But now, the second part is this to the moment :
###################################################
###################################################
###################################################
filename = '/home/valentin/Desktop/Field52_combined_final_roughcal.fits_mask'
print 'Fichier en cours de traitement' + str(filename) + '\n'
# Ouverture du fichier à l'aide d'astropy
field = fits.open(filename)
# Lecture des données fits
tbdata = field[1].data
figure = plt.figure(1)
plt.plot (tbdata['X'], tbdata['Y'], '.')
plt.show()
Do you have any idea how process ?
How I can cut my plot in tiny bin ?
Thank you !
UPDATE :
After the answer from armatita, I updated my script :
###################################################
###################################################
###################################################
filename = '/home/valentin/Desktop/Field52_combined_final_roughcal.fits_mask'
print 'Fichier en cours de traitement' + str(filename) + '\n'
# Opening file with astropy
field = fits.open(filename)
# fits data reading
tbdata = field[1].data
##### BUILDING A GRID FOR THE DATA ########
nodesx,nodesy = 360,360 # PIXELS IN X, PIXELS IN Y
firstx,firsty = np.min(tbdata['X']),np.min(tbdata['Y'])
sizex = (np.max(tbdata['X'])-np.min(tbdata['X']))/nodesx
sizey = (np.max(tbdata['Y'])-np.min(tbdata['Y']))/nodesy
grid = np.zeros((nodesx+1,nodesy+1),dtype='bool') # PLUS 1 TO ENSURE ALL DATA IS INSIDE GRID
# CALCULATING GRID COORDINATES OF DATA
indx = np.int_((tbdata['X']-firstx)/sizex)
indy = np.int_((tbdata['Y']-firsty)/sizey)
grid[indx,indy] = True # WHERE DATA EXISTS SET TRUE
# PLOT MY FINAL IMAGE
plt.imshow(grid.T,origin='lower',cmap='binary',interpolation='nearest')
plt.show()
I find this plot :
So, when I play with the bin size, I can see more or less blank which indicate object or not in my pixel :)
This is usually a process of inserting your data into a grid (pixel wise, or node wise). The following example builds a grid (2D array) and calculates the "grid coordinates" for the sample data. Once it has those grid coordinates (which in true are nothing but array indexes) you can just set those elements to True. Check the following example:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(0,1,1000)
y = np.random.normal(0,1,1000)
##### BUILDING A GRID FOR THE DATA ########
nodesx,nodesy = 100,100 # PIXELS IN X, PIXELS IN Y
firstx,firsty = x.min(),y.min()
sizex = (x.max()-x.min())/nodesx
sizey = (y.max()-y.min())/nodesy
grid = np.zeros((nodesx+1,nodesy+1),dtype='bool') # PLUS 1 TO ENSURE ALL DATA IS INSIDE GRID
# CALCULATING GRID COORDINATES OF DATA
indx = np.int_((x-firstx)/sizex)
indy = np.int_((y-firsty)/sizey)
grid[indx,indy] = True # WHERE DATA EXISTS SET TRUE
# PLOT MY FINAL IMAGE
plt.imshow(grid.T,origin='lower',cmap='binary',interpolation='nearest')
plt.show()
, which results in:
Notice I'm showing an image with imshow. Should I decrease the number of pixels (20,20 = nodesx, nodesy) I get:
Also for a more automatic plot in matplotlib you can consider hexbin.