Python | Modifying not Bound Objects still modifies both

Python | Modifying not Bound Objects still modifies both - python

I have a problem concerning copying object attributes, and making sure the attributes are not bound together
I am implementing Data Tables in python, whose attributes are
rows : list [ list ]
column_names : list[str]
I want to have a method that copy a table, to make modifications for instance, but I want it to be fully independent from the original table in order to have a non destructive approach
def copy(self):
print(Fore.YELLOW + "COPY FCN BEGINS" + Fore.WHITE)
rows = list(self.rows)
# FOR DEBUG PURPOSES
pprint(rows)
pprint(self.rows)
print(f'{rows is self.rows}')
names = list(self.column_names)
# FOR DEBUG PURPOSES
pprint(names)
pprint(self.column_names)
print(f'{names is self.column_names}')
print(Fore.YELLOW + "COPY FCN ENDS" + Fore.WHITE)
return Tableau( rows= rows, column_names= names )
Then I test it
creating a table and copying it into another
then i modify the new table and make sure it modified the latest only
Problem : it modifies both
however i made sure that the rows' list were not pointing to the same object so i am a bit confused
here are the rsults
result of copy function
and here is the test function (using unittest)
the test function :
def test_copy(self):
# on s'assure que les deux Tableaux sont bien identiques après la copie, mais différents si on en modifie l'un ( ils ne aprtagent pas la même liste en terme d adresse mémoire )
# on copie
NewTable = self.TableauA.copy()
self.assertEqual(NewTable.rows, self.TableauA.rows)
self.assertEqual(NewTable.column_names, self.TableauA.column_names)
print("Row B")
pprint(self.rowB)
print("New Table")
print(NewTable)
print("tableau A")
print(self.TableauA)
print( Fore.GREEN + "IS THE SAME OBJECT ?" + Fore.WHITE)
print(f"{NewTable is self.TableauA}")
print( Fore.GREEN + "ROWS IS THE SAME OBJECT ?" + Fore.WHITE)
print(f"{NewTable.rows is self.TableauA.rows}")
print( Fore.GREEN + "NAMES IS THE SAME OBJECT ?" + Fore.WHITE)
print(f"{NewTable.column_names is self.TableauA.column_names}")
# on modifie le nouveau Tableau
NewTable.add_column(name="NewCol", column=self.rowB)
print(Fore.YELLOW + "MODIFICATION" + Fore.WHITE)
print(Fore.GREEN + "New Table" + Fore.WHITE)
print(NewTable)
print(Fore.GREEN + "tableau A" + Fore.WHITE)
print(self.TableauA)
# on s'assure qu'on a pas modifié les lignes dans les deux
self.assertNotEqual(NewTable.rows, self.TableauA.rows)
return
and the results :
results of the test function
and finally :
the add_column method
def add_column(self, name : str, column : list, position : int =-1):
n =len(self.rows)
if position == -1 :
position = n
for k in range(n) :
self.rows[k].insert(position, column[k])
self.column_names.insert(position, name)
return
Thank you !

I found it, in the end it was very subtle
as a list of list
the highest list in hierarchy : rows was indeed unique
however the content of rows [the list that contains lists] was not : the observation lists were still tied even with the list() function
here is how i made them unique
rows = [ list( self.rows[k] ) for k in range( len(self.rows) ) ]
here is the final code that wortks :
def copy(self):
rows = [ list( self.rows[k] ) for k in range( len(self.rows) ) ]
names = list(self.column_names)
return Tableau( rows= rows, column_names= names )
hope this will help others

Related

More efficient way to manipulate large dataframe

It's my first real Python script, so feel free to make comments in order to improve my code.
The purpose of this script is to extract 2 Oracle tables with Python, store them in a dataframe and then join them with pandas.
But for queries returning more than 500k lines I feel that it is slow. Do you know why?
import pandas as pd
from datetime import date
from sqlalchemy import create_engine
import cx_Oracle, time
import pandas as pd
import config
## Variable pour le timer
start = time.time()
## User input en ligne de commande
year = input('Saisir une annee : ')
month = input('Saisir le mois, au fomat MM : ')
societe_var = input('SA (APPLE,PEACH,BANANA,ANANAS,ALL) : ')
## SA + BU correspondantes aux SA
sa_list = ['APPLE','PEACH','BANANA','ANANAS']
bu_list_MERE = ['006111','1311402','1311403','1311404','1340115','13411106','1311407','1111','6115910','1166157','6811207','8311345','1111','1188100','8118101','8811102','8810113','8811104','8118105','8811106','8811107','8118108','1111']
bu_list_GARE = ['131400','310254']
bu_list_VOYA = ['0151100','1110073','1007115','1311335','1113340','1311341','1113342','1331143']
bu_list_RESO = ['1211345','13111345','11113395','73111345']
#Permet de pointre vers la bonne liste en fonction de la SA saisie
bu_list_map = {
'APPLE': bu_list_APPLE,
'PEACH': bu_list_PEACH,
'BANANA': bu_list_BANANA,
'ANANAS' : bu_list_ANANAS
}
if societe_var == 'ALL' :
print('non codé pour le moment')
elif societe_var in sa_list :
bu_list = bu_list_map.get(societe_var)
sa_var = societe_var
i=1
for bu in bu_list :
start_bu = time.time()
## On vient ici charger la requête SQL avec les bonnes variables pour gla_va_parametre -- EPOST
query1 = open('gla_va_parametre - VAR.sql',"r").read()
query1 = query1.replace('#ANNEE',"'" + year + "'").replace('%MOIS%',"'" + month + "'").replace('%SA%',"'" + societe_var + "'").replace('%BUGL%',"'" + bu + "'").replace('%DIVISION%','"C__en__PS_S1_D_OP_UNIT13".OPERATING_UNIT')
## On vient ici charger la requête SQL avec les bonnes variables pour cle-gla_tva -- FPOST
query2 = open('cle-gla_tva - VAR.sql',"r").read()
query2 = query2.replace('#ANNEE',"'" + year + "'").replace('%MOIS%',"'" + month + "'").replace('%SA%',"'" + societe_var + "'").replace('%BUGL%',"'" + bu + "'").replace('%DIVISION%','OPERATING_UNIT')
# Param de connexion
connection_EPOST = cx_Oracle.connect(user=config.user_EPOST, password=config.password_EPOST, dsn=config.host_EPOST, )
connection_FPOST = cx_Oracle.connect(user=config.user_FPOST, password=config.password_FPOST, dsn=config.host_FPOST, )
## Récup partie EPOST
with connection_EPOST :
# On déclare une variable liste vide
dfl = []
# On déclare un DataFrame vide
dfs = pd.DataFrame()
z=1
# Start Chunking
for chunk in pd.read_sql(query1, con=connection_EPOST,chunksize=25000) :
# Start Appending Data Chunks from SQL Result set into List
dfl.append(chunk)
print('chunk num : ' + str(z))
z = z + 1
# Start appending data from list to dataframe
dfs = pd.concat(dfl, ignore_index=True)
print('param récupéré')
## Récup partie FPOST
with connection_FPOST :
# On déclare une variable liste vide
df2 = []
# On déclare un DataFrame vide
dfs2 = pd.DataFrame()
# Start Chunking
for chunk in pd.read_sql(query2, con=connection_FPOST,chunksize=10000) :
# Start Appending Data Chunks from SQL Result set into List
df2.append(chunk)
# Start appending data from list to dataframe
dfs2 = pd.concat(df2, ignore_index=True)
print('clé récupéré')
print('Début de la jointure')
jointure = pd.merge(dfs,dfs2,how='left',left_on=['Code_BU_GL','Code_division','Code_ecriture','Date_comptable','Code_ligne_ecriture','UNPOST_SEQ'],right_on=['BUSINESS_UNIT','OPERATING_UNIT','JOURNAL_ID','JOURNAL_DATE','JOURNAL_LINE','UNPOST_SEQ']).drop(columns= ['BUSINESS_UNIT','OPERATING_UNIT','JOURNAL_ID','JOURNAL_DATE','JOURNAL_LINE'])
jointure.to_csv('out\gla_va_'+year+month+"_"+societe_var+"_"+bu+"_"+date.today().strftime("%Y%m%d")+'.csv', index=False, sep='|')
print('Fichier ' + str(i) + "/" + str(len(bu_list)) + ' généré en : '+ str(time.time() - start_bu)+' secondes')
i = i + 1
print("L'extraction du périmètre de la SA " + societe_var + " s'est effectué en :" + str((time.time() - start)/60) + " min" )

Search in List; Display names based on search input

I have sought different articles here about searching data from a list, but nothing seems to be working right or is appropriate in what I am supposed to implement.
I have this pre-created module with over 500 list (they are strings, yes, but is considered as list when called into function; see code below) of names, city, email, etc. The following are just a chunk of it.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
a = empRecords.strip().split(";")
And I have the following code for searching:
import empData as x
def seecity():
empCitylist = list()
for ct in x.a:
empCt = ct.strip().split(",")
empCitylist.append(empCt)
t = sorted(empCitylist, key=lambda x: x[3])
for c in t:
city = (c[3])
print(city)
live_city = input("Enter city: ")
for cy in city:
if live_city in cy:
print(c[1])
# print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
Forgive my idiotic approach as I am new to Python. However, what I am trying to do is user will input the city, then the results should display the employee's last name, first name who are living in that city (I dunno if I made sense lol)
By the way, the code I used above doesn't return any answers. It just loops to the input.
Thank you for helping. Lovelots. <3
PS: the format of the empData is: first name, last name, address, city, country, birthday, zip, phone, and email

You can use the csv module to read easily a file with comma separated values
import csv
with open('test.csv', newline='') as csvfile:
records = list(csv.reader(csvfile))
def search(data, elem, index):
out = list()
for row in data:
if row[index] == elem:
out.append(row)
return out
#test
print(search(records, 'Orlando', 3))

Based on your original code, you can do it like this:
# Make list of list records, sorted by city
t = sorted((ct.strip().split(",") for ct in x.a), key=lambda x: x[3])
# List cities
print("Cities in DB:")
for c in t:
city = (c[3])
print("-", city)
# Define search function
def seecity():
live_city = input("Enter city: ")
for c in t:
if live_city == c[3]:
print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
seecity()
Then, after you understand what's going on, do as #Hoxha Alban suggested, and use the csv module.

The beauty of python lies in list comprehension.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
rows = empRecords.strip().split(";")
data = [ r.strip().split(",") for r in rows ]
then you can use any condition to filter the list, like
print ( [ "Name: " + emp[1] + "," + emp[0] + "| Current City: " + emp[3] for emp in data if emp[3] == "Washington" ] )
['Name: Hixenbaugh,Alesia| Current City: Washington']

How to scrape multiple websites with different data in urls

I'm scraping some data from a webpage where at the end of the url has the id of the product, it appears to rewrite the data at every single row, like its not appending the data from the next line, I don't know exactly what's going on, if my first for is wrong, or the indentation, I tried before without the dictionary, and it was appending but at the same line and I transpose it but didn't work as I wanted so I made it this way and now it doesn't append the next lines, help please
data_cols = []
cols = {'pro_header': [],
'pro_id': [],
.
.
.
'pro_uns5': []
}
#the id for each product
fileID = open('idProductsList.txt', 'r')
proIDS = fileID.read().split()
for proID in proIDS:
url = 'https:/website.com/mall/es/mx/Catalog/Product/' + proID
html = urllib2.urlopen(url).read()
soup = bs.BeautifulSoup(html , 'lxml')
table = soup.find("table",{"class": "ProductDetailsTable"})
rows = table.find_all('tr')
for row in rows:
labels.append(str(row.find_all('td')[0].text))
try:
data.append(str(row.find_all('td')[1].text))
except IndexError:
data.append('')
cols['pro_header'].append(data[0])
cols['pro_id'].append(data[1])
.
.
.
cols['pro_uns5'].append(data[43])
df = pd.DataFrame(cols)
df.set_index
#df.reindex()
df.to_csv('sample1.csv')
The actual output is:
pro_id pro_priceCostumer pro_priceData
1FK7011-5AK24-1AA3 " Mostrar precios
" PM300:Producto activo
1FK7011-5AK24-1AA3 " Mostrar precios
" PM300:Producto activo
1FK7011-5AK24-1AA3 " Mostrar precios
" PM300:Producto activo
Should be something like this (This is just a small representation of the data):
pro_id pro_priceCostumer pro_priceData
1FK7011-5AK24-1AA3 " Mostrar precios
" PM300:Producto activo
1FK7011-5AK24-1JA3 " Mostrar precios
" PM300:Producto activo
1FK7022-5AK21-1UA0 " Mostrar precios
" PM300:Producto activo

I guess labels are working as a variable. to append this you need to use a list.
add labels=list() at the top of your code as global variable. The same thing should be done for data too.

I am trying to make a code to conjugate verbs in French, but I cannot change the keys ''je'' to ''j ' ''

I am trying to make my code work like this:
Enter a verb in French: chanter
Output 1: je chante
tu chantes
il ou elle chante
nous chantons
vous chantez
ils ou elles chantent
I succeeded in making the part above, but I cannot succeed in switching je to j' when the user enters, for instance: echapper
Enter a verb in French: echapper
Output 2: j'echappe
tu echappes
il ou elle echappe
nous echappons
vous echappez
ils ou elles echappent
Code:
list = {
"je": 'e',
"tu": 'es',
"il ou elle": 'e',
"nous": 'ons',
"vous": 'ez',
"ils ou elles": 'ent'
}
veb = input("")
for key in list:
if veb.endswith('er'):
b = veb[:-2]
print(key, b + list[key])
I do not know how to change the key list['je'] to list['j''] to succeed with the Output 2.

If you use double quotes around j', i.e. "j'", it will work. Also, I recommend not using the name list for your dictionary because 1) it's a dictionary, not a list, and 2) you should avoid using builtin python names for your variables.
Also, looks like that conjugation is treated differently, with "j'" at the beginning and "e" at the end (instead of "er").
dictionary = {"je":"j'","tu":'es',"il ou elle":'e',"nous":
'ons',"vous":'ez',"ils ou elles":'ent'}
veb = input("")
for key in dictionary:
if veb.endswith('er'):
b = veb[:-2]
if key == 'je':
print(key, dictionary[key] + b + 'e')
else:
print(key,b + dictionary[key])

You should just simply replace the print statement with an if statement:
if key == "je" and (veb.startswith("a") or veb.startswith("e") or [etc.]):
print("j'",b + list[key])
else:
print(key,b + list[key])

Create a binary completeness map

I'm touching the goal of my project, but I'm getting a problem on : How I can create a completeness map ?
I have lots of data, a field with maybe 500.000 objects which are represented by dots in my plot with different zoom :
I would like to create a mask, I mean, cut my plot in tiny pixels, and say if I have an object in this pixel, I get the value : 1 (black for example) elif, I have not object in my pixel, I get the value : 0 (white for example).
I'll create a mask and I could divide each field by this mask.
The problem is that I don't know how I can process in order to make that :/
I create a first script in order to get a selection on my data. This one :
#!/usr/bin/python
# coding: utf-8
from astropy.io import fits
from astropy.table import Table
import numpy as np
import matplotlib.pyplot as plt
###################################
# Fichier contenant le champ brut #
###################################
filename = '/home/valentin/Desktop/Field52_combined_final_roughcal.fits'
# Ouverture du fichier à l'aide d'astropy
field = fits.open(filename)
print "Ouverture du fichier : " + str(filename)
# Lecture des données fits
tbdata = field[1].data
print "Lecture des données du fits"
###############################
# Application du tri sur PROB #
###############################
mask = np.bitwise_and(tbdata['PROB'] < 1.1, tbdata['PROB'] > -0.1)
new_tbdata = tbdata[mask]
print "Création du Masque"
#################################################
# Détermination des valeurs extremales du champ #
#################################################
# Détermination de RA_max et RA_min
RA_max = np.max(new_tbdata['RA'])
RA_min = np.min(new_tbdata['RA'])
print "RA_max vaut : " + str(RA_max)
print "RA_min vaut : " + str(RA_min)
# Détermination de DEC_max et DEC_min
DEC_max = np.max(new_tbdata['DEC'])
DEC_min = np.min(new_tbdata['DEC'])
print "DEC_max vaut : " + str(DEC_max)
print "DEC_min vaut : " + str(DEC_min)
#########################################
# Calcul de la valeur centrale du champ #
#########################################
# Détermination de RA_moyen et DEC_moyen
RA_central = (RA_max + RA_min)/2.
DEC_central = (DEC_max + DEC_min)/2.
print "RA_central vaut : " + str(RA_central)
print "DEC_central vaut : " + str(DEC_central)
print " "
print " ------------------------------- "
print " "
##############################
# Détermination de X et de Y #
##############################
# Creation du tableau
new_col_data_X = array = (new_tbdata['RA'] - RA_central) * np.cos(DEC_central)
new_col_data_Y = array = new_tbdata['DEC'] - DEC_central
print 'Création du tableau'
# Creation des nouvelles colonnes
col_X = fits.Column(name='X', format='D', array=new_col_data_X)
col_Y = fits.Column(name='Y', format='D', array=new_col_data_Y)
print 'Création des nouvelles colonnes X et Y'
# Creation de la nouvelle table
tbdata_final = fits.BinTableHDU.from_columns(new_tbdata.columns + col_X + col_Y)
# Ecriture du fichier de sortie .fits
tbdata_final.writeto('{}_{}'.format(filename,'mask'))
print 'Ecriture du nouveau fichier mask'
field.close()
Ok, it's working ! But now, the second part is this to the moment :
###################################################
###################################################
###################################################
filename = '/home/valentin/Desktop/Field52_combined_final_roughcal.fits_mask'
print 'Fichier en cours de traitement' + str(filename) + '\n'
# Ouverture du fichier à l'aide d'astropy
field = fits.open(filename)
# Lecture des données fits
tbdata = field[1].data
figure = plt.figure(1)
plt.plot (tbdata['X'], tbdata['Y'], '.')
plt.show()
Do you have any idea how process ?
How I can cut my plot in tiny bin ?
Thank you !
UPDATE :
After the answer from armatita, I updated my script :
###################################################
###################################################
###################################################
filename = '/home/valentin/Desktop/Field52_combined_final_roughcal.fits_mask'
print 'Fichier en cours de traitement' + str(filename) + '\n'
# Opening file with astropy
field = fits.open(filename)
# fits data reading
tbdata = field[1].data
##### BUILDING A GRID FOR THE DATA ########
nodesx,nodesy = 360,360 # PIXELS IN X, PIXELS IN Y
firstx,firsty = np.min(tbdata['X']),np.min(tbdata['Y'])
sizex = (np.max(tbdata['X'])-np.min(tbdata['X']))/nodesx
sizey = (np.max(tbdata['Y'])-np.min(tbdata['Y']))/nodesy
grid = np.zeros((nodesx+1,nodesy+1),dtype='bool') # PLUS 1 TO ENSURE ALL DATA IS INSIDE GRID
# CALCULATING GRID COORDINATES OF DATA
indx = np.int_((tbdata['X']-firstx)/sizex)
indy = np.int_((tbdata['Y']-firsty)/sizey)
grid[indx,indy] = True # WHERE DATA EXISTS SET TRUE
# PLOT MY FINAL IMAGE
plt.imshow(grid.T,origin='lower',cmap='binary',interpolation='nearest')
plt.show()
I find this plot :
So, when I play with the bin size, I can see more or less blank which indicate object or not in my pixel :)

This is usually a process of inserting your data into a grid (pixel wise, or node wise). The following example builds a grid (2D array) and calculates the "grid coordinates" for the sample data. Once it has those grid coordinates (which in true are nothing but array indexes) you can just set those elements to True. Check the following example:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(0,1,1000)
y = np.random.normal(0,1,1000)
##### BUILDING A GRID FOR THE DATA ########
nodesx,nodesy = 100,100 # PIXELS IN X, PIXELS IN Y
firstx,firsty = x.min(),y.min()
sizex = (x.max()-x.min())/nodesx
sizey = (y.max()-y.min())/nodesy
grid = np.zeros((nodesx+1,nodesy+1),dtype='bool') # PLUS 1 TO ENSURE ALL DATA IS INSIDE GRID
# CALCULATING GRID COORDINATES OF DATA
indx = np.int_((x-firstx)/sizex)
indy = np.int_((y-firsty)/sizey)
grid[indx,indy] = True # WHERE DATA EXISTS SET TRUE
# PLOT MY FINAL IMAGE
plt.imshow(grid.T,origin='lower',cmap='binary',interpolation='nearest')
plt.show()
, which results in:
Notice I'm showing an image with imshow. Should I decrease the number of pixels (20,20 = nodesx, nodesy) I get:
Also for a more automatic plot in matplotlib you can consider hexbin.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python | Modifying not Bound Objects still modifies both - python

Related

More efficient way to manipulate large dataframe

Search in List; Display names based on search input

How to scrape multiple websites with different data in urls

I am trying to make a code to conjugate verbs in French, but I cannot change the keys ''je'' to ''j ' ''

Create a binary completeness map

Categories

Resources