sort txt list by element - python

i have an issue with a .txt list,
the list contains the next:
Numero de permutaciones de la forma (2,4,3) = 1260
Numero de permutaciones de la forma (7,2,0) = 36
Numero de permutaciones de la forma (5,3,1) = 504
Numero de permutaciones de la forma (4,5,0) = 126
Numero de permutaciones de la forma (1,8,0) = 9
Numero de permutaciones de la forma (0,7,2) = 36
Numero de permutaciones de la forma (0,6,3) = 84
...
my code is this:
with open('resultado_lista_original2.txt', 'r') as r:
for line in sorted(r):
print(line,end='')
but i need to sort that list by the element next to right the "=" to get this order
Numero de permutaciones de la forma (1,8,0) = 9
Numero de permutaciones de la forma (0,7,2) = 36
Numero de permutaciones de la forma (7,2,0) = 36
Numero de permutaciones de la forma (0,6,3) = 84
Numero de permutaciones de la forma (4,5,0) = 126
Numero de permutaciones de la forma (5,3,1) = 504
Numero de permutaciones de la forma (2,4,3) = 1260
...
I appreciate very much who can help me / guide me

Pass in the key argument to decide how to sort your iterable. Most people do this using a lambda function:
for line in sorted(r, key=lambda x: int(x.split('=')[1])):
Or if you prefer to define the function yourself:
def sort_my_txt_lines(line):
digits_after_equal_sign = line.split('=')[1]
return int(digits_after_equal_sign)
# ...
for line in sorted(r, key=sort_my_txt_lines):
If you're not used to think about sorting using a key function, think about it like this: when trying to sort the input sequence, only look at the numerical part beyond the = sign (the function's return value) of each line (the function's input parameter).

The sorted method has an attribute key where you can put a lambda or a method which will be called with each item of the list that you want to sort.
The result of that will be used for to sort the list instead of the item itself.

Late answer, but you can also use:
import re
with open("input.txt")as f:
s = sorted(f, key=lambda x:int(re.search(r"(\d+)$", x).group(1)))
Demo

Try this:
for line in sorted(r, key=lambda x: int(x.split(' = ').strip())):

Related

Efficient way to sentence-tokenize and clean text

I have a dataframe consisting of two columns, one with dates and the other with a string of text. I'd like to split the text in sentences and then apply some preprocessing.
Here is a simplified example of what I have:
import nltk
from nltk.corpus import stopwords
from pandarallel import pandarallel
pandarallel.initialize()
example_df=pd.DataFrame({'date':['2022-09-01'],'text':'Europa tiene un plan. Son cosas distintas. Perdón, esta es imagen dentro, y el recorte a los beneficios los ingresos totales conocimiento del uso fraudulento Además, el riesgo ha bajado. de gases nocivos, como el CO2. -La justicia europea ha confirmado se ha dado acceso al público por lo menos, intentar entrar. para reducir los contagios, vestido de chaqué. Han tenido que establecer de despido según informa que podría pasar desapercibida El Tribunal Supremo confirma en nuestra página web'})
spanish_tokenizer = nltk.data.load('tokenizers/punkt/PY3/spanish.pickle')
example_df['sentence']=example_df['text'].parallel_apply(lambda x: spanish_tokenizer.tokenize(x))
As you can see, I rely on nltk tokenizer on the raw text to create a new column "sentences", that contains the list of sentences.
print(example_df['sentence'])
0 [Europa tiene un plan, Son cosas distintas, Perdón, esta es imagen dentro, y el recorte a los beneficios los ingresos totales conocimiento del uso fraudulento Además, el riesgo ha bajado, de gases nocivos, como el CO2, -La justicia europea ha confirmado se ha dado acceso al público por lo menos, intentar entrar, para reducir los contagios, vestido de chaqué, Han tenido que establecer de despido según informa que podría pasar desapercibida El Tribunal Supremo confirma en nuestra página web]
1 [casi todas las restricciones, Socios como Esquerra le echan un servicio público; con terrazas llenas Los voluntarios piden a todos los cuatros juntos en una semana la sentencia de cárcel para Griñán que Griñán no conoció la trama, de las hipotecas, A las afueras de Westminster]
Name: sentence, dtype: object
# Since commas might be misleading:
example_df.sentence[1]
['casi todas las restricciones',
'Socios como Esquerra le echan un servicio público; con terrazas llenas Los voluntarios piden a todos los cuatros juntos en una semana la sentencia de cárcel para Griñán que Griñán no conoció la trama, de las hipotecas',
'A las afueras de Westminster']
My next goal is to clean those sentences. Since I need punctuation for the tokenizer to work, I believe I need to do this process ex-post which implies looping, for each date of text, to each sentence. First of all, I am not sure how to do this operation with the pandas structure, here is one of my trials to remove stopwords:
from nltk.corpus import stopwords
stop = stopwords.words('spanish')
example_df['sentence'] = example_df['sentence'].parallel_apply(lambda x: ' '.join(
[word for word in i.split() for i in x if word not in (stop)]))
Which produces the following attribute error AttributeError: 'int' object has no attribute 'split'
Is there a more efficient/elegant wat to do this?
Since the sentence column is tokenized text (a list of strings) the list comprehension logic needs to be changed.
Eg:
sentences = ['casi todas las restricciones', 'Socios como Esquerra le echan un servicio público; con terrazas llenas Los voluntarios piden a todos los cuatros juntos en una semana la sentencia de cárcel para Griñán que Griñán no conoció la trama, de las hipotecas', 'A las afueras de Westminster']
stopwords_removed = [word for word in sent.split() for sent in sentences if word not in stop]
sent being the sentences inside the list and each word being the individual words you obtain after splitting by whitespace.
Your error is most likely caused due to a missing axis parameter
df.Column.parallel_apply(func, axis=1)
where func is a function that returns your list comprehension result

Python .split() having trouble with txt line

I am trying to split lines in a txt file but having the following problem:
with open("KARDEX.txt", 'r', encoding="latin-1") as file:
data = []
for line in file:
data.append(line)
print(data[0])
>>ÿþ# Número de artículo Descripción del artículo Clase de operación Código de deudor/acreedor Nombre de deudor/acreedor
d = print(data[0].split(" "))
>> ['ÿþ#\x00', '\x00N\x00ú\x00m\x00e\x00r\x00o\x00 \x00d\x00e\x00 \x00a\x00r\x00t\x00í\x00c\x00u\x00l\x00o\x00', '\x00D\x00e\x00s\x00c\x00r\x00i\x00p\x00c\x00i\x00ó\x00n\x00 \x00d\x00e\x00l\x00 \x00a\x00r\x00t\x00í\x00c\x00u\x00l\x00o\x00', '\x00C\x00l\x00a\x00s\x00e\x00 \x00d\x00e\x00 \x00o\x00p\x00e\x00r\x00a\x00c\x00i\x00ó\x00n\x00']
If you want to split by 2 or more spaces then use:
re.split(r" {2,}", st)
st:
st = "ÿþ# Número de artículo Descripción del artículo Clase de operación Código de deudor/acreedor Nombre de deudor/acreedor"
re.split(r" {2,}", st)
Output:
['ÿþ#',
'Número de artículo',
'Descripción del artículo',
'Clase de operación',
'Código de deudor/acreedor',
'Nombre de deudor/acreedor']

How can I get the exact result of 10**20 + 10**-20 in Python? It gives me 1e+20

I am writing a code to solve second-grade equations and it works just well.
However, when I input the following equation:
x^2 + (10^(20) + 10^(-20)) + 1 = 0
(Yes, my input is 10**20 + 10**(-20)
I get:
x1 = 0 x2 = -1e+20
However, it is taking (10^(20) + 10^(-20) as 10e+20 while, if you do the math:
Here is the LaTeX formatted formula:
Which is almost 10^20 but not 10^20.
How can I get the exact result of that operation so I can get the exact value of the equation in x2?
My code is the following:
#===============================Función para obtener los coeficientes===============================
#Se van a anidar dos funciones. Primero la de coeficientes, luego la de la solución de la ecuación.
#Se define una función recursiva para obtener los coeficientes de la ecuación del usuario
def cof():
#Se determina si el coeficiente a introducir es un número o una cadena de operaciones
op = input("Si tu coeficiente es un número introduce 1, si es una cadena de operaciones introduce 2")
#Se compara la entrada del usuario con la opción.
if op == str(1):
#Se le solicita el número
num = input("¿Cuál es tu número?")
#Se comprueba que efectívamente sea un número
try:
#Si la entrada se puede convertir a flotante
float(num)
#Se establece el coeficiente como el valor flotante de la entrada
coef = float(num)
#Se retorna el valor del coeficiente
return coef
#Si no se pudo convertir a flotante...
except ValueError:
#Se le informa al usuario del error
print("No introdujiste un número. Inténtalo de nuevo")
#Se llama a la función de nuevo
return cofa()
#Si el coeficiente es una cadena (como en 10**20 + 10**-20)
elif op == str(2):
#La entrada se establece como la entrada del usuario
entrada = input("Input")
#Se intenta...
try:
#Evaluar la entrada. Si se puede...
eval(entrada)
#El coeficiente se establece como la evaluación de la entrada
coef = eval(entrada)
#Se regresa el coeficiente
return coef
#Si no se pudo establecer como tal...
except:
#Se le informa al usuario
print("No introdujiste una cadena de operaciones válida. Inténtalo de nuevo")
#Se llama a la función de nuevo
return cofa()
#Si no se introdujo ni 1 ni 2 se le informa al usuario
else:
#Se imprime el mensaje
print("No introdujiste n ni c, inténtalo de nuevo")
#Se llama a la función de nuevo
return cof()
#===============================Función para resolver la ecuación===============================
#Resuelve la ecuación
def sol_cuadratica():
#Se pide el valor de a
print("Introduce el coeficiente para a")
#Se llama a cof y se guarda el valor para a
a = cof()
#Se pide b
print("Introduce el coeficiente para b")
#Se llama cof y se guarda b
b = cof()
#Se pide c
print("Introduce el coeficiente para c")
#Se llama cof y se guarda c
c = cof()
#Se le informa al usuario de la ecuación a resolver
print("Vamos a encontrar las raices de la ecuación {}x² + {}x + {} = 0".format(a, b, c))
#Se analiza el discriminante
discriminante = (b**2 - 4*a*c)
#Si el discriminante es menor que cero, las raices son complejas
if discriminante < 0:
#Se le informa al usuario
print("Las raices son imaginarias. Prueba con otros coeficientes.")
#Se llama a la función de nuevo
return sol_cuadratica()
#Si el discriminante es 0, o mayor que cero, se procede a resolver
else:
#Ecuación para x1
x1 = (-b + discriminante**(1/2))/(2*a)
#Ecuación para x2
x2 = (-b - discriminante**(1/2))/(2*a)
#Se imprimen los resultados
print("X1 = " + str(x1))
print("X2 = " + str(x2))
sol_cuadratica()
Ignore the comments, I'm from a Spanish-speaking country.
The limitations of the machine floating point type is the reason why when adding a very small number to a very big number, the small number is just ignored.
This phenomenon is called absorption or cancellation.
With custom floating point objects (like the ones decimal module) you can achieve any precision (computations are slower, because floating point is now emulated, and not relying on the machine FPU capabilities anymore)
From the decimal module docs:
Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem
This can be achieved by changing the following global parameter decimal.getcontext().prec
import decimal
decimal.getcontext().prec = 41 # min value for the required range
d = decimal.Decimal(10**20)
d2 = decimal.Decimal(10**-20)
now
>>> d+d2
Decimal('100000000000000000000.00000000000000000001')
As suggested in comments, for the small number it's safer to let decimal module handle the division by using power operator on an already existing Decimal object (even if here, the result is the same):
d2 = decimal.Decimal(10)**-20
So you can use decimal.Decimal objects for your computations instead of native floats.

Find an incorrect line in a .txt file and delete it

I have a .txt file with csv data like this :
1529074866;29.89;29.41;321;70;1;60;1003.05
1529074868;29.87;29.82;140;79;1;60;1003.52
I made this function to extract the data from the file:
def __init__(self, file):
self.data_ = {"Temps": [], "Temperature": [], "Humidite": [], "Luminosite": [], "Niveau sonore": [], "Radiations EM": [], "Rythme cardiaque": [], "Pression": [] }
# data_ = {date : [t1, t2,...], temp : [temp1, temp2,...]...}. Cette disposition des données (par date, luminosité...) permet d'optimiser l affichage des graphiques ulterieurement.
try:
for line in file: # Cette commande permet de parcourir une à une toutes les lignes du fichier file.
line = line.rstrip() # Cette commande permet de supprimer le caractère invisible de retour chariot en fin de ligne.
line = line.rsplit(";") # Cette commande permet de transpormer la ligne en liste en coupant au niveau des ";".
self.data_["Temps"].append( int(line[0]) ) # Ceci implique que la donnée correspondant à la date soit bien la 1ère donnée (rang 0) sur la ligne.
self.data_["Temperature"].append( float(line[1])) # Ceci implique que la donnée correspondant à la date soit bien la 2ème donnée (rang 1) sur la ligne.
self.data_["Humidite"].append( float(line[2]) ) # ect
self.data_["Luminosite"].append( float(line[3]) )
self.data_["Niveau sonore"].append( float(line[4]) )
self.data_["Radiations EM"].append( float(line[5]))
self.data_["Rythme cardiaque"].append( float(line[6]) )
self.data_["Pression"].append( float(line[7]) )
except Exception as expt : # Cette exception est executee si l'execution du bloc try si dessus a echoue.
print("\n!!! Echec de l'importation - exception relevee !!!")
print expt
I would like to create a function which extracts the first parameter of the line, which is the Unix time, and, if the time is not between [1514761200; 1546297200], deletes the line.
How can I do this?
To delete a line from the file you're actually going to read it completely, and then to rewrite the filtered file.
One approach:
data_typo = "1529074866;29.89;29.41;321;70;1;60;1003.05\n"
with open("file.txt", "r") as f:
lines = f.readlines() # Extract all the lines
data = [line.rstrip().split(";") for line in lines]
# data elements: ['1529074866', '29.89', '29.41', '321', '70', '1', '60', '1003.05']
At that point, a simple approach is to filter the list data.
def criterion(elt):
if 1514761200 <= eval(elt[0]) and eval(elt[0]) <= 1546297200:
return True
else:
return False
data_to_rewrite = list(filter(criterion, data)) # Keeps the elements where criterion returns True.
with open("new_file.txt", "w") as f:
for elt in data_to_rewrite:
line = ";".join(elt) + "\n"
f.write(line)

Global name is not defined executing a function

I'm coding a project, in which I have 2 files (dataStructure.py and calculUser.py) working together and 1 which is a test file.
In structureDonnees.py I have this function which reads a dataset containing cars and builds data structures :
# -*-encoding:utf-8-*-
import csv
import sys #pour utiliser maximum et minimum du type float
from calculUser import *
from trajetUser import *
def recupVoiture() :
#nom de la base de donnée
nomFichier = 'CO2_passenger_cars_v10.csv'
#on ouvre le fichier en lecture
opener = open(nomFichier, "r")
#On ouvre le fichier nomFichier en lecture
lectureFichier = csv.reader(opener, delimiter='\t')
#le dico contenant les carburants
fuelType = dict()
#le dico contenant les voitures
voiture = dict()
#le dico contenant les émissions de CO2 en g/km
emission = dict()
#minimum et maximum emission
min_emission = sys.float_info.max #initialisé à max(float) pour que l'on soit sûr que toutes les emissions soient plus petites
max_emission = sys.float_info.min #initialisé à min(float) pour que l'on soit sûr que toutes les emissions soient plus grandes
for row in lectureFichier :
#Si la colonne existe
if row:
#construction du dictionnaire voiture
if voiture.has_key(row[10]) :
if row[11].upper() not in voiture[row[10]] : voiture[row[10]].append("%s" %row[11].upper()) #on ajoute le modèle
else :
voiture[row[10]] = [] #on crée une liste vide contenant les modèles et leurs versions
voiture[row[10]].append("%s" %row[11]) #on ajoute le modèle et sa version
#construction du dictionnaire fuelType
if fuelType.has_key(row[10]) : fuelType[row[10]].append(row[19].upper()) #ajout du type de carburant utilisé par la voiture correspondante dans voiture{}
else :
fuelType[row[10]] = [] #on crée une liste vide contenant les carburants
fuelType[row[10]].append(row[19]) #ajout du type de carburant utilisé par la voiture correspondante dans voiture{}
#construction du dictionnaire emission
if emission.has_key(row[10]) :
emission[row[10]].append(row[14]) #ajout de la quantité de CO2 émise par la voiture correspondante dans voiture{}
min_emission = minEmission(float(row[14]), min_emission)
max_emission = maxEmission(float(row[14]), max_emission)
else :
emission[row[10]] = [] #on crée une liste vide contenant les émissions en CO2
fuelType[row[10]].append(row[14]) #ajout de la quantité de CO2 émise par la voiture correspondante dans voiture{}
min_emission = minEmission(float(row[14]), min_emission)
max_emission = maxEmission(float(row[14]), max_emission)
#On ferme le fichier
opener.close()
#La valeur de retour est un tableau contenant les structures de données générées.
res = [voiture, fuelType, emission, min_emission, max_emission]
return res
In the calculUser.py, I defined the minEmission and maxEmission function :
def minEmission(emissionFichier, min_emission) :
if emissionFichier < min_emission :
min_emission = emissionFichier
return min_emission
def maxEmission(emissionFichier, max_emission) :
if emissionFichier > max_emission :
max_emission = emissionFichier
return max_emission
When I'm executing test.py, I get an error with this line :
table = recupVoiture()
Traceback (most recent call last):
File "test.py", line 13, in <module>
tableau = recupVoiture()
File "/home/user/Polytech/ge3/ProjetPython/structureDonnees.py", line 60, in recupVoiture
min_emission = minEmission(float(row[14]), min_emission)
NameError: global name 'minEmission' is not defined
I don't understand why I get this error. By executing everything except test.py I get no error but when I do it doesn't execute due to this minEmission and maxEmission not defined.
Is it because I'm calling a function when I'm defining a function?
How could I fix it?
I fixed the problem, it seems like my functions minEmission() and maxEmission() couldn't do a reference to max_emission and min_emission since those variables are declared in structureDonnees.py and not in calculUser.py.
I fixed it by creating an intermediary variable which takes the value of min_emission and max_emission and which is returned, instead of min_emission and max_emission.
Plus, I had to do a : from calculUser import minEmission, maxEmissiondirectly in the recupVoiture() function. I know that's awful but it solved the problem. I'll use it until I find a better, cleaner solution.
Thanks for the help guys, I'll do a better post/code if I have to ask any other help ! :)

Categories