df = df.loc yield empty DataFrame in a loop [Python] [Pandas]

df = df.loc yield empty DataFrame in a loop [Python] [Pandas] - python

Sorry if this is a dumb question, I'm a very noob coder.
I'm trying to itinerate through a folder and create DataFrames for each file (then saving it).
Thing is, I changed nothing from when it was not in a for loop (for testing) and it was working perfectly fine, but now in the for loop it yields an empty DataFrame, so I can't do anything.
The important (failure) part of the code is as follows:
filePath=(insert pathFile here)
def CDPBot():
fileItineration = filePath + '/' + files
file = pd.read_excel(fileItineration)
# Agarra el nombre de la empresa para despues
companyName= str(file['Final Account Name'][2])
#companyName=file.loc[3:2]
#companyName=companyName.columns.values[2]
#DMU
faID=os.path.basename(files)
faID=faID[-0:-5]
#Arma el DF usando los filtros proporcionados
f_file = file.loc[(f_file['Segment'] == ('SMB')) & (f_file['Final Account ID'] == faID) | (f_file['Segment'] == ('Commercial')) & (f_file['Final Account ID'] == faID)]
if file.empty:
print('file is empty!')
elif f_file.empty:
print('f_file is empty')
...
...
...
for files in os.listdir(filePath):
CDPBot()
Issue is f_file is an empty DataFrame no matter what I do or try. I don't know why it's yielding that result, any ideas? The rest of the code (I didn't paste everything because it's irrelevant) works just fine.
Thanks in advance!

Related

verification existence of an row from a sheet in another one using python

hello everyone i just want ask if anyone has an idea how i can check if row exist from one sheet in other one and if not it will highlight the row i found the issue with verfication line by line i try this code
old = old.set_index('id')
new = new.set_index('id')
resultTest = pd.concat([old,new],sort=False)
result = resultTest.stack().groupby(level=[0,1]).unique().unstack(1).copy()
result.loc[~result.index.isin(new.index),'status'] = 'deleted' # is not new old
result.loc[~result.index.isin(old.index),'status'] = 'added' # is not old new
idx = resultTest.stack().groupby(level=[0,1]).nunique() # cell changed
result.loc[idx.mask(idx <= 1).dropna().index.get_level_values(0),'status'] = 'modified'
result['status'] = result['status'].fillna('same')
result[result["status"] == 'deleted'].style.apply(highlight_max)

Django : invalid date format with Pandas - \xa0

I would like to create objects from a CSV file with Django and Pandas.
Everything is fine for FloatFields and CharFields but when I want to add DateFields, Django returns this error: ['The date format of the value "\xa02015/08/03\xa0" is not valid. The correct format is YYYY-MM-DD.']
However, the CSV file proposes this type of data for the columns concerned: '2015/08/03'. There is no space in the data as Django seems to suggest...
here is what I tried in views :
class HomeView(LoginRequiredMixin, View):
def get(self, request,*args, **kwargs):
user = User.objects.get(id=request.user.id)
Dossier.objects.filter(user=user).delete()
csv_file = user.profile.user_data
df = pd.read_csv(csv_file, encoding = "UTF-8", delimiter=';', decimal=',')
df = df.round(2)
row_iter = df.iterrows()
objs = [
Dossier(
user = user,
numero_op = row['N° Dossier'],
porteur = row['Bénéficiaire'],
libélé = row['Libellé du dossier'],
descriptif = row["Résumé de l'opération"],
AAP = row["Référence de l'appel à projet"],
date_dépôt = row["Date Dépôt"],
date_réception = row["Accusé de réception"],
montant_CT = row['Coût total en cours'],
)
for index, row in row_iter
]
Dossier.objects.bulk_create(objs)
If I change my Model to CharField, I no longer get an error.
I tried to use the str.strip() function:
df["Date Dépôt"]=df["Date Dépôt"].str.strip()
But without success.
Could someone help me? I could keep the CharField format but it limits the processing of the data I want to propose next.

It seems that you have some garbage in that file, in particular your date is surrounded by a byte "\xa0" on either side.
In some encodings this byte denotes a "non breaking space", which may be why you're not seeing it.

Why Python doesn´t save my model to database using object .save()?

I have a function in views.py accepting petitions that get some text and a book pk and save the text to a fragments table and update the book text with the new fragment.
The fragments are saved correctly, but the book doesn't. I get the response, but it doesn't save to the database when I manually check it.
This is my code:
profilelogged = validtoken(request.META['HTTP_MYAUTH'])
if not profilelogged:
return HttpResponse('Unauthorized', status=401)
else:
index = request.GET.get('id', 0)
petitiontext = request.GET.get('texto', '')
petitiontext = petitiontext.strip()
todaynoformat = datetime.now()
bookselected = Books.objects.filter(pk=index).first()
actualwait = Waits.objects.filter(book=bookselected).order_by('ordernum').first()
if not actualwait:
response = 'MAL: No hay nadie en espera'
else:
profilewaiting = actualwait.profile
if profilewaiting.pk == profilelogged.pk and actualwait.writting == 1:
newfragment = Fragments(profile=profilelogged, book=bookselected, date=todaynoformat.isoformat(), texto=petitiontext)
newfragment.save()
newtextfull = bookselected.text+" "+petitiontext
bookselected.text = newtextfull
bookselected.save()
actualwait.writting = 2
actualwait.save()
response = bookselected.text
else:
response = 'MAL: No eres el siguiente en la lista o no estas activado para escribir'
return HttpResponse(response)
Forget about the waiting thing, its some waitlist i used to check if the user is able to submit fragments or not and thats working good.
Any thoughts on why book is not saving to DB? I'm using this object.save() method in other functions and its working, but here it doesn't.
Thanks.

Ok, my bad.
I was trying to update same object on two different functions. So I had to figure out how to update it on one unique function and that's the solution.
Thanks anyway.

trying to get the lastest entry on a data frame

I'm using this code to make the top 10 most common names on my dataframe
df_faces_nombre = df_faces.groupby(["Nombre"])
top_names = df_faces_nombre.count().sort_values(by=['foto'], ascending=False).iloc[0:10]["foto"].index
for name in top_names:
fotos = df_faces_nombre.get_group(name)["foto"]
os.mkdir(TOP_DATA + name)
for foto in fotos:
if type(foto) == str and os.path.exists(DATA_DIR+foto):
try:
copyfile(DATA_DIR+foto, TOP_DATA + name + '/' + foto)
except:
print(foto)
Now I'm trying to make a new one to sort the 10 less common names, I thought using something similar in this way
low_names = df_faces_nombre.count().sort_values(by=['foto'], ascending=True).iloc[0:100]["foto"].index
for name in low_names:
fotos = df_faces_nombre.get_group(name)["foto"]
os.mkdir(LOWER_DATA + name)
for foto in fotos:
if type(foto) == str and os.path.exists(DATA_DIR+foto):
try:
copyfile(DATA_DIR+foto, LOWER_DATA + name + '/' + foto)
except:
print(foto)
The problem here is that the second code only copy the folders and not the pictures (sad face) i was hoping that you can help me to solve this, thanks in advance

solved there were a lot of names w/o pictures the code works well

Why doesn't Python want to access the second item of my 2-items list?

I'm trying to split an email address under the format jim.smith-royal#smth.edu or jim.smith#smth.edu into a name and a surname, so jim and smith, or jim and smith royal (with a space in between). (I am a beginner so I may not be doing it in the simplest way, but still).
for row in votants:
mail = row[1]
full_name = mail.split('#')[0]
prenom = full_name.split('.')[0]
#The code works until here, full_name.split gives me ['jim','smith'] and prenom gives me 'jim'
pre_name = full_name.split('.')
nom = pre_name[1]
#The problem is until here, but I kept the rest of my code for anyone who might have the same objective as me
try:
nom = nom.split('-')[0] + " " + nom.split('-')[1]
except Exception:
pass
row.append(prenom)
row.append(nom)
Instead of giving me nom as 'smith', I get "IndexError: list index out of range".

I tried:
mail = "jim.smith#smth.edu"
full_name = mail.split('#')[0]
prenom = full_name.split('.')[0]
#The code works until here, full_name.split gives me ['jim','smith'] and prenom gives me 'jim'
pre_name = full_name.split('.')
nom = pre_name[1]
#The problem is until here, but I kept the rest of my code for anyone who might have the same objective as me
print(full_name);
print(pre_name);
print(nom);
print(prenom);
and it prints:
jim.smith
['jim', 'smith']
smith
jim
Without any exceptions.
Problem:
Like comments said: If no "." is used in name (before #), you are probably facing the indexOutOfRange Exception.
Possible Solutions:
Surround it in if-else block:
if len(pre_name) >= 2
or use try-except:
try:
nom = pre_name[1]
except IndexError:
nom = pre_name[0];
or use:
if "." in full_name:
pre_name = full_name.split('.')
nom = pre_name[1]
General Debugging:
use print() earlier.
// works until here
pre_name = full_name.split('.')
print(pre_name) // if no '.' was used, you will see the array contains only 1 element
nom = pre_name[1] // problem here?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

df = df.loc yield empty DataFrame in a loop [Python] [Pandas] - python

Related

verification existence of an row from a sheet in another one using python

Django : invalid date format with Pandas - \xa0

Why Python doesn´t save my model to database using object .save()?

trying to get the lastest entry on a data frame

Why doesn't Python want to access the second item of my 2-items list?

Categories

Resources