Working with Python 3.5 and Pandas 0.19.2
I describe my problem: I have in a data frame different "IDActivo" sorted by date and time ascending. Well, I have a field called Result whose values are NaN or 1. I need to calculate for each row how long ago was the last N time where the result field was 1 for that particular "IdActivo".
This is my dataframe:
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.DataFrame({'IdActivo': [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
'Fecha': ['1990-01-02','1990-01-03','1990-01-04','1990-01-05','1990-01-08',\
'1990-01-09','1990-01-10','1990-01-11','1990-01-12' ,'1990-01-15',\
'1990-01-16', '1990-01-17', '1990-01-18','1990-01-19','1990-01-22',\
'1990-01-23 ', '1990-01-24', '1990-01-25','1990-01-26','1990-01-29'],
'Hora': ['10:10:00','10:11:00','10:12:00','10:13:00','10:10:00',\
'10:10:00','10:17:00','10:14:00','11:14:00','12:14:00',\
'10:10:00', '10:20:00', '14:22:00','15:22:00','16:22:00',\
'10:10:00', '00:00:00', '00:00:00','00:00:00','00:00:00']})
def Inicio():
numHoraDia = '10:10:00'
numDia = 2 # para nosotros el 2 será el martes ya que le añadimos +1 al lunes que es 0 por defecto
nomDiasSemanaHora = " Resultado"; inpfield = "Fecha" ; oupfield = "Dia_Semana"
df_final = Fecha_Dia_Hora(df,inpfield,oupfield,numHoraDia,numDia,nomDiasSemanaHora)
print (df_final)
def Fecha_Dia_Hora(df, inpfield, oupfield,numHoraDia,numDia,nomDiasSemanaHora):
ord_df = df.sort_values(by=['IdActivo', 'Fecha'])
ord_df[inpfield] = pd.to_datetime(ord_df[inpfield])
ord_df[oupfield] = ord_df[inpfield].dt.dayofweek + 1
ord_df[nomDiasSemanaHora] = np.NaN
ord_df.ix[np.logical_and(ord_df[oupfield] == numDia, ord_df.Hora == numHoraDia), [nomDiasSemanaHora]] = '1'
return ord_df.sort_index()
def Fin():
print("FIN")
if __name__ == "__main__":
Inicio()
Fin()
I show you an example derivated of the dataframe you can see on the code:
imagen
What functions must I investigate to get it?
Thanks
Angel
Related
I have been for months trying unsuccesfully to make a function which can compare a feature with the other ones in a shp layer and if some conditions fulfill, the feature in the loop is deleted. I need it for my final work. The conditions to delete a feature have to be two: if the feature overlay the other one and if it is older than the feature that is in the loop.
def superpuestos2(capa, fecha, geom, id):
listaborrar = []
fecha_feat = fecha
fechainf_feat = datetime.strptime(fecha_feat, "%d/%m/%Y")
feat_geom = geom
features = capa.getFeatures()
for f in features:
id_feat = f.attribute('id')
if id != id_feat:
fecha_f = f.attribute('fecha')
fechainf_f = datetime.strptime(fecha_f, "%d/%m/%Y")
f_geom = f.geometry()
inters = f_geom.intersection(feat_geom)
areageom = feat_geom.area()
interarea = inters.area()
fraccion = interarea/areageom
if fraccion > 0.3:
if fechainf_feat >= fechainf_f:#si la fecha es mas antigua:
print("intersecta")#en vez de print que borre el feature de f.
listaborrar.append(f.id())
print(listaborrar)
capa.dataProvider().deleteFeatures(listaborrar)
I'm using this dataframe:
# importando o dataframe
url = 'https://raw.githubusercontent.com/ipauchner/DNC/main/kc_house_data.csv'
df = pd.read_csv(url)
After some analysis I managed to separate 25 lines from this df, the analyzes were as follows:
# encontrando os id's repetidos, ou seja, as casas que foram vendidas mais de 1 vez
id_repetido = pd.concat(g for x, g in df.groupby('id') if len(g) > 1)
id_repetido
# encontrando a primeira venda
venda2 = id_repetido.duplicated(subset=['id'], keep = 'first')
venda1 = id_repetido[~venda2]
venda1 = venda1[venda1.id != 795000620]
venda1
venda2 = id_repetido[venda2]
venda2 = venda2[venda2.id != 795000620]
venda2
venda1['id'].value_counts().sort_values()
venda2['id'].value_counts().sort_values()
lucro_prej = pd.merge_asof(venda1, venda2, on='id') # juntando as informações das vendas
lucro_prej = lucro_prej.loc[:,['id', 'price_x', 'price_y']] # unindo pelo id
lucro_prej = lucro_prej.rename({'price_x': 'primeira_venda'}, axis = 1) # renomeando a coluna
lucro_prej = lucro_prej.rename({'price_y': 'segunda_venda'}, axis = 1) # renomeando a coluna
lucro_prej
lucro_prej['lucro/prejuízo'] = lucro_prej['segunda_venda'] -
lucro_prej['primeira_venda'] # calculando o valor do lucro ou prejuízo
lucro_prej['variação'] = ((lucro_prej['segunda_venda'] - lucro_prej['primeira_venda']) /
lucro_prej['primeira_venda'] * 100).round(decimals = 2) # calculando a % do lucro ou prejuízo
lucro_prej.sort_values(by=['variação'], ascending=False, inplace = True) # ordenando em ordem crescente
lucro_prej
maiores_lucros = lucro_prej.head(25)
maiores_lucros
This generated another df (maiores_lucros) with 25 lines.
What I did was multiple filters on the df originalla(df). For example: bathrooms >=1 and <=3, bedrooms >= 2 and <=4. I got this part with the following code:
lista_casas = df[((df.bedrooms > 2) & (df.bedrooms < 6)) & (df.bathrooms >= 1) & (df.bathrooms <= 3)]
But what I needed was to make a filter so that the id's of the df maiores_lucros do not appear in the lista_casas. I even tried the following way:
id_filtrar = maiores_lucros['id'].tolist()
id_filtrar
lista_casas2 = df[df.id != id_n_filtrar]
lista_casas2
But it returns the following error:
ValueError: ('Lengths must match to compare', (21528,), (25,))
Is there any way to make this filter?
i want to obtain proper list of marks and models of boats from two dataset (one lambda an another of reference) with fuzzywuzzy (levensthein model in python) but i have an issue in my code that i don't understand
the two datasets:
https://www.transfernow.net/dl/202203070QxpVjYJ
there is my code :
#%%
from fuzzywuzzy import process
import pandas as pd
#%%
BASE_LAMBDA_PATH = '../ressources/marques_modeles_lambda_entier.csv'
BASE_REF_PATH = '../ressources/marques_modeles_ref_entier.csv'
#%%
lambda_df = pd.read_csv(BASE_LAMBDA_PATH, sep=";")
#%%
ref_df = pd.read_csv(BASE_REF_PATH, sep=";")
#%% j'ai créé ma liste de résultat (initée à vide)
df_result = pd.DataFrame(columns=['marque', 'lambda','ref','score'])
#%% je parcours ma table de modèles lambda
for ind in lambda_df.index:
marque = lambda_df['MARQUE_REF'][ind]
modele_lambda = lambda_df['MODELE'][ind]
ref_list = (ref_df[(ref_df['lib_marque'] == marque)]['lib_model']).to_list()
choices = process.extract(modele_lambda, ref_list, limit=1)
approx = choices[0][0]
score = choices[0][1]
df2 = pd.DataFrame(data = [(marque, modele_lambda, approx, score)],\
columns=['marque', 'lambda','ref','score'])
df_result = pd.concat([df_result, df2], axis=0, ignore_index=True)
df_result.to_csv('output_matching_groupe.csv', sep=';', index=False)
'''
tdep = time.time()
tfin = time.time()
print(f"duree de {tfin-tdep} secondes")
'''
# %%
the error:
IndexError Traceback (most recent call last)
c:\Users\boats\src\list_matching_groupe.py in <cell line: 1>()
20 ref_list = (ref_df[(ref_df['lib_marque'] == marque)]['lib_model']).to_list()
21 choices = process.extract(modele_lambda, ref_list, limit=1)
----> 22 approx = choices[0][0]
23 score = choices[0][1]
24 df2 = pd.DataFrame(data = [(marque, modele_lambda, approx, score)],\
25 columns=['marque', 'lambda','ref','score'])
IndexError: list index out of range
I don't understand it because choices[0][0] actually works i obtain: 'Guy Couach 1401'
I am working with a simulated bebop2
These are the commands I am using to run the simulation.
sphinx /opt/parrot-sphinx/usr/share/sphinx/drones/bebop2.drone
roslaunch bebop_driver bebop_node.launch ip:=10.202.0.1
In this case bebop_driver is the subscriber and bebop_commander the publisher( see code below)
I've been using:
rostopic pub -r 10 cmd_vel geometry_msgs/Twist '{linear: {x: 0.0, y: 0.0, z: 0.0}, angular: {x: 0.0,y: 0.0,z: 0.0}}'
in order to publish to cmd_vel topic successfully .I need to publish the same message to the same topic using a Python script, but so far I haven't been able.
This is the Python script I am trying to use :
#!/usr/bin/env python
import rospy
from geometry_msgs.msg import Twist
import sys
rospy.init_node("bebop_commander")
movement_publisher= rospy.Publisher('cmd_vel', Twist , queue_size=10)
movement_cmd = Twist()
speed = float(sys.argv[1])
time = float(sys.argv[2])
print ("Adelante")
if speed != "" and speed > 0 :
print ("Velocidad =" , speed , "m/s")
else:
print ("Falta parametro de velocidad o el valor es incorrecto")
if time != "" and time > 0 :
print ("Tiempo = ",time, "s")
else:
print ("Falta parametro de tiempo o el valor es incorrecto")
if time != "" and time > 0 :
movement_cmd.linear.x = 0
movement_cmd.linear.y = 0
movement_cmd.linear.z = 0
movement_cmd.angular.x = 0
movement_cmd.angular.y = 0
movement_cmd.angular.z = 0
movement_publisher.publish(movement_msg)
print ("Publishing")
rospy.spin()
Few mistakes/suggestions in your code:
You are not checking if the user is actually entering all the arguments at the start, namely filename, speed and time. Here try using below code:
if len(sys.argv)>2:
speed = float(sys.argv[1])
time = float(sys.argv[2])
else:
print("one or more arguments missing!!")
There is no need of speed != "" and time != "" once you checked len(sys.argv)>2 condition.
you are passing an unknown variable movement_msg inside movement_publisher.publish(). Kindly check below line:
movement_publisher.publish(movement_msg)
It should be movement_cmd.
Modified code(Tested):
Filename: test_publisher.py
import rospy
from geometry_msgs.msg import Twist
import sys
rospy.init_node("bebop_commander")
movement_publisher= rospy.Publisher('cmd_vel', Twist , queue_size=10)
movement_cmd = Twist()
if len(sys.argv)>2:
speed = float(sys.argv[1])
time = float(sys.argv[2])
print("Adelante")
if speed > 0.0:
print("Velocidad =" , speed , "m/s")
else:
print("Falta parametro de velocidad o el valor es incorrecto")
if time > 0.0:
print ("Tiempo = ",time, "s")
movement_cmd.linear.x = 0
movement_cmd.linear.y = 0
movement_cmd.linear.z = 0
movement_cmd.angular.x = 0
movement_cmd.angular.y = 0
movement_cmd.angular.z = 0
movement_publisher.publish(movement_cmd)
print ("Publishing")
rospy.spin()
else:
print ("Falta parametro de tiempo o el valor es incorrecto")
else:
print('one or more argument is missing!!')
Note: Don't forget to copy the file test_publisher.py to scripts directory and make it executable via chmod +x test_publisher.py
Output:
(Terminal 1): Run roscore command. You must have a roscore running in order for ROS nodes to communicate.
(Terminal 2): Run python publisher file with arguments.
(Terminal 3): checking rostopic information
First of all I'm a rookie in programming.
I'm trying to export MySQL database to excel format with python3. I'm using openpyxl for it. Now I have a interesting error in excel. If I run the code but the SQL query is small (around 1000 rows) then I do not have an error when opeening the excel, but if it is bigger (>30k rows) when I try to open the excel I have an error:
error135840_01.xmlErrors were detected in file 'C:\Users\id022504\PycharmProjects\GetMySQLdata\Interface planning _mau.xlsx'Removed Records: Formula from /xl/worksheets/sheet1.xml part
Interesting enough when I use Open XML SDK to open the excel file it points that the issue is in the color:
enter image description here
enter image description here
Bellow is the code:
import mysql.connector
import openpyxl
from openpyxl import Workbook
from openpyxl.styles import Border, Side, Font, Alignment
import datetime
import os
from openpyxl.worksheet import Worksheet
mydb = mysql.connector.connect(
host="10.10.10.10",
user="user",
passwd="password"
)
#Funcao para por as colunas com auto size
def auto_column_resize(worksheet):
for col in worksheet.columns:
max_length = 0
column = col[0].column # Get the column name
for cell in col:
try: # Necessary to avoid error on empty cells
if len(str(cell.value)) > max_length:
max_length = len(cell.value)
except:
pass
adjusted_width = (max_length + 2) * 1.2
if adjusted_width <= 95:
worksheet.column_dimensions[column].width = adjusted_width
else:
worksheet.column_dimensions[column].width =95
return worksheet
# definir path
path='C:/Users/'
# definir path para arquivo
path_arquivo='C:/Users/Arquivo/'
#definir tamanho do arquivo
arquivo_file_size = 26
# Abrir ficheiro actual e guardar nos arquivos
current_wb = openpyxl.load_workbook(path+"Interface planning.xlsx")
current_ws = current_wb["Ports allocation"]
if int(datetime.datetime.now().isocalendar()[1]) >9:
current_wb.save("{0}/Interface planning_{1}{2}.xlsx".format(path_arquivo, str(int(datetime.datetime.now().isocalendar()[0])),str(int(datetime.datetime.now().isocalendar()[1]))))
else:
#introduce 0 in the week so it can calculate later which file is the oldest
current_wb.save("{0}/Interface planning_{1}0{2}.xlsx".format(path_arquivo, str(int(datetime.datetime.now().isocalendar()[0])),str(int(datetime.datetime.now().isocalendar()[1]))))
#Abrir SQL e comando
mycursor = mydb.cursor()
mycursor.execute("SELECT hostname, hardware, port_label, ifHighSpeed, ifAdminStatus, ifOperStatus, ifAlias FROM `observium`.`ports` JOIN `observium`.`devices` ON `observium`.`devices`.device_id = `observium`.`ports`.device_id WHERE (port_label LIKE 'xe-%' or port_label LIKE 'et-%' or port_label LIKE 'ge-%' or port_label LIKE '%Ethernet%') and port_label NOT RLIKE '[.][1-9]' ORDER BY hostname, port_label;")
#Meter em tupel os dados recebidos pela base de dados
myresult = mycursor.fetchall()
header = mycursor.column_names
#Criar workbook
new_wb = Workbook()
#Criar worksheet
new_ws = new_wb.active
new_ws.title = "Ports allocation"
############################################## Meter dados SQL em excel ##########################################
#Add header information and formating
new_ws.append(header)
new_ws["H1"].value = "Person assigned"
for format_row in new_ws:
for i in range(8):
format_row[i].font = Font(bold=True)
#Add content from SQL to excel
for row in myresult:
new_ws.append(row)
new_ws.auto_filter.ref = "A:H"
#Verificar o estado da interface e colocar a pessoa responsavel se a interface esta administrativamente e operacionalmente em baixo
for current_ws_row in current_ws:
if current_ws_row[7].value is not None:
for new_ws_row in new_ws:
if (new_ws_row[4].value != "up" or new_ws_row[5].value != "up") and current_ws_row[0].value == new_ws_row[0].value and current_ws_row[2].value == new_ws_row[2].value :
new_ws_row[7].value= current_ws_row[7].value
new_ws_row[6].value = current_ws_row[6].value
for format_row in new_ws:
for i in range(8):
format_row[i].border = Border(right=Side(style='thin'),)
#Meter o worksheet bonitinho
new_ws = auto_column_resize(new_ws)
new_ws.sheet_view.zoomScale = 85
c=new_ws['D2']
new_ws.freeze_panes = c
wrap_alignment = Alignment(wrap_text=True)
for row in new_ws.iter_rows():
for cell in row:
cell.alignment = Alignment(shrink_to_fit=True)
#Salvar workbook
new_wb.save(path+"Interface planning.xlsx")
#remover ficheiro no arquivo
count_files=0
#infinite time
file_to_delete = '299952'
for directory in os.walk(path_arquivo):
for file in directory[2]:
count_files = count_files+1
if str(file)[-11:-5] < file_to_delete:
file_to_delete = str(file)[-11:-5]
if count_files > arquivo_file_size:
os.remove(path_arquivo+'Interface planning_'+file_to_delete+'.xlsx')
I found the issue, in one of the cells there was the "=" symbol and somehow excel identify it as a formula. To solve the issue I just clean up the "=".
import re
try:
new_ws_row[7].value = re.sub("=", "", new_ws_row[7].value)
except:
pass
Extending on the awesome answer provided by Pedro on 12th December 2018.
I found below to be an efficient way to address trouble columns containing values starting with an equal sign when trying to use Openpyxl
for cell in ws['R']:
if "=" in str(cell.value):
cell.value = re.sub('^=', ' =', cell.value)