Add to dictionary in if loop - python

I have an if loop in which I am trying to;
(1) Create a dataframe from a filepath.
(2) Format this dataframe
(3) Add that dataframe to a dictionary that is a property of an instance of a class.
Here is my code defining the class and the method:
class myClass:
def __init__(self, name, filepathlist):
self.name = name
self.filepathlist = filepathlist
def formatData(self):
i = 0
self.dataframeDict = {}
if i < (len(self.filepathlist) - 1):
DFRAW = pd.read_csv(self.filepathlist[i], header = 9) #Row 9 is the row that is not blank (all blank auto-skipped)
DFRAW['DateTime'], DFRAW['dummycol1'] = DFRAW[' ;W;W;W;W'].str.split(';', 1).str
DFRAW['Col1'], DFRAW['dummycol2'] = DFRAW['dummycol1'].str.split(';', 1).str
DFRAW['Col2'], DFRAW['dummycol3'] = DFRAW['dummycol2'].str.split(';', 1).str
DFRAW['Col3'], DFRAW['Col4'] = DFRAW['dummycol3'].str.split(';', 1).str
DFRAW = DFRAW.drop([' ;W;W;W;W', 'dummycol1', 'dummycol2', 'dummycol3'], axis = 1)
dictIndex = self.filepathlist[i][39:44]
self.dataframeDict.update({dictIndex: DFRAW})
i = i + 1
Then I create an instance of the class and run the method:
filepathlist = ['filepath1','filepath2']
myINST = myClass('Mydataname', filepathlist)
myINST.formatData()
I then expect myINST.dataframeDict to have two dataframes as per the 2 input filepaths and thus 2 iterations of the if loop. However only 1 is present.
What is the error in my code or my approach?

It is hard to tell whether this will completely solve your problem, because no dummy data is provided. You will, however, get one step closer to your solution if you replace if i < (len(self.filepathlist) - 1): with while i < (len(self.filepathlist) - 1):.
You are currently just checking if i=0 is smaller than len(self.filepathlist)-1. If so, then the if-block is executed once. What you are actually looking for is a loop that keeps on iterating, as long as i is smaller than len(self.filepathlist)-1. This is done with while-loops.

You need to change your condition to for i in range(len(self.filepathlist)):
(Also, remove the assignment of i as the for loop does it automatically. For the same reason, you should also remove the line which increments i).

If you want to use a while loop, change the if line to while i < len(self.filepathlist):.
Notice that there's no -1. This is because you're using < instead of <=. If you want to use -1, then you also need the <= as this will ensure the loop runs the correct number of times.

Related

Dataframe Is No Longer Accessible

I am trying to make my code look better and create functions that do all the work from running just one line but it is not working as intended. I am currently pulling data from a pdf that is in a table into a pandas dataframe. From there I have 4 functions, all calling each other and finally returning the updated dataframe. I can see that it is full updated when I print it in the last method. However I am unable to access and use that updated dataframe, even after I return it.
My code is as follows
def data_cleaner(dataFrame):
#removing random rows
removed = dataFrame.drop(columns=['Unnamed: 1','Unnamed: 2','Unnamed: 4','Unnamed: 5','Unnamed: 7','Unnamed: 9','Unnamed: 11','Unnamed: 13','Unnamed: 15','Unnamed: 17','Unnamed: 19'])
#call next method
col_combiner(removed)
def col_combiner(dataFrame):
#Grabbing first and second row of table to combine
first_row = dataFrame.iloc[0]
second_row = dataFrame.iloc[1]
#List to combine columns
newColNames = []
#Run through each row and combine them into one name
for i,j in zip(first_row,second_row):
#Check to see if they are not strings, if they are not convert it
if not isinstance(i,str):
i = str(i)
if not isinstance(j,str):
j = str(j)
newString = ''
#Check for double NAN case and change it to Expenses
if i == 'nan' and j == 'nan':
i = 'Expenses'
newString = newString + i
#Check for leading NAN and remove it
elif i == 'nan':
newString = newString + j
else:
newString = newString + i + ' ' + j
newColNames.append(newString)
#Now update the dataframes column names
dataFrame.columns = newColNames
#Remove the name rows since they are now the column names
dataFrame = dataFrame.iloc[2:,:]
#Going to clean the values in the DF
clean_numbers(dataFrame)
def clean_numbers(dataFrame):
#Fill NAN values with 0
noNan = dataFrame.fillna(0)
#Pull each column, clean the values, then put it back
for i in range(noNan.shape[1]):
colList = noNan.iloc[:,i].tolist()
#calling to clean the column so that it is all ints
col_checker(colList)
noNan.iloc[:,i] = colList
return noNan
def col_checker(col):
#Going through, checking and cleaning
for i in range(len(col)):
#print(type(colList[i]))
if isinstance(col[i],str):
col[i] = col[i].replace(',','')
if col[i].isdigit():
#print('not here')
col[i] = int(col[i])
#If it is not a number then make it 0
else:
col[i] = 0
Then when I run this:
doesThisWork = data_cleaner(cleaner)
type(doesThisWork)
I get NoneType. I might be doing this the long way as I am new to this, so any advice is much appreciated!
The reason you are getting NoneType is because your function does not have a return statement, meaning that when finishing executing it will automatically returns None. And it is the return value of a function that is assigned to a variable var in a statement like this:
var = fun(x)
Now, a different thing entirely is whether or not your dataframe cleaner will be changed by the function data_cleaner, which can happen because dataframes are mutable objects in Python.
In other words, your function can read your dataframe and change it, so after the function call cleaner is different than before. At the same time, your function can return a value (which it doesn't) and this value will be assigned to doesThisWork.
Usually, you should prefer that your function does only one thing, so expect that the function changes its argument and return a value is usually bad practice.

Separate for - loops to one alternating one

I am writing some Python code and I needed to change the logic of the code when I realized I can't come to a neat and efficient code solution.
So my first version is the following:
#set ranges
range_a=150
range_b=178
range_c=20
#add elements
for x in range(0, range_a):
# ...do something...
add_element_a()
for y in range(0, range_b):
# ...do something...
add_element_b()
for z in range(0, range_c):
# ...do something...
add_element_c()
As you can see I was adding the elements by type, first for type_a, then for type_b, and in the end for type_c. Now I would like to create a while-loop or something in order to add them alternating.
For example, we start by adding one from type_a and then one from type_b and etc. and we do it until we reach a range for a certain type, and then we continue for the rest of them.
I know it could be seen as a basic problem, but I am looking for an efficient solution?
Here is the second version which I find too complicated and was wondering if there is a more efficient way to do it:
filled_a = False
filled_b = False
filled_c = False
while(!(filled_a & filled_b & filled_c) == True)):
if (counter_a < range_a):
add_element_a()
counter_a++
if(counter_a==range_a): filled_a=True
if (counter_b < range_b):
add_element_b()
counter_b++
if(counter_b==range_b): filled_b=True
if (counter_c < range_c):
add_element_a()
counter_c++
if(counter_c==range_c): filled_c=True
Iterate according to the greatest range and check which range still wasn't exhausted:
#set ranges
range_a=150
range_b=178
range_c=20
#add elements
for i in range(max(range_a, range_b, range_c)):
if i < range_a:
add_element_a()
if i < range_b:
add_element_b()
if i < range_c:
add_element_c()
Such one of sulutions you can try this:
class RangeCounter:
def __init__(self, range_type):
self.range_type = range_type
def add_element(self):
# implement your addition logic here
pass
range_a= RangeCounter(150)
range_b= RangeCounter(178)
range_c= RangeCounter(20)
range_list = [range_a, range_b, range_c]
for range_type in range_list:
range_type.add_element()

Single remove clause in while loop is removing two elements

I am writing a simple secret santa script that selects a "GiftReceiver" and a "GiftGiver" from a list. Two lists and an empty dataframe to be populated are produced:
import pandas as pd
import random
santaslist_receivers = ['Rudolf',
'Blitzen',
'Prancer',
'Dasher',
'Vixen',
'Comet'
]
santaslist_givers = santaslist_receivers
finalDataFrame = pd.DataFrame(columns = ['GiftGiver','GiftReceiver'])
I then have a while loop that selects random elements from each list to pick a gift giver and receiver, then remove from the respective list:
while len(santaslist_receivers) > 0:
print (len(santaslist_receivers)) #Used for testing.
gift_receiver = random.choice(santaslist_receivers)
santaslist_receivers.remove(gift_receiver)
print (len(santaslist_receivers)) #Used for testing.
gift_giver = random.choice(santaslist_givers)
while gift_giver == gift_receiver: #While loop ensures that gift_giver != gift_receiver
gift_giver = random.choice(santaslist_givers)
santaslist_givers.remove(gift_giver)
dummyDF = pd.DataFrame({'GiftGiver':gift_giver,'GiftReceiver':gift_receiver}, index = [0])
finalDataFrame = finalDataFrame.append(dummyDF)
The final dataframe only contains three elements instead of six:
print(finalDataframe)
returns
GiftGiver GiftReceiver
0 Dasher Prancer
0 Comet Vixen
0 Rudolf Blitzen
I have inserted two print lines within the while loop to investigate. These print the length of the list santaslist_receivers before and after the removal of an element. The expected return is to see original list length on the first print, then minus 1 on the second print, then the same length again on the first print of the next iteration of the while loop, then so on. Specifically I expect:
6,5,5,4,4,3,3... and so on.
What is returned is
6,5,4,3,2,1
Which is consistent with the DataFrame having only 3 rows, but I do not see the cause of this.
What is the error in my code or my approach?
You can solve it by simply changing this line
santaslist_givers = santaslist_receivers
to
santaslist_givers = list(santaslist_receivers)
Python variables are pointers essentially so they refer to the same list , ie santaslist_givers and santaslist_receivers were accessing the same location in memory in your implementation . To make them different use a list function
And for some extra information , you can refer copy.deepcopy
You should make an explicit copy of your list here
santaslist_givers = santaslist_receivers
there are multiple options for doing this as explained in this question.
In this case I would recommend (if you have Python >= 3.3):
santaslist_givers = santaslist_receivers.copy()
If you are on an older version of Python, the typical way to do it is:
santaslist_givers = santaslist_receivers[:]

Nested "for" in Django view won´t work

I want to generate a JSON type object for a HttpResponse and in order to build it i´m using a nested "for" structure. I wrote down some code, tried it with my python interpreter but when I used it on my django view it refuses to work correctly.
My structure is something like this:
tarifas = ['2.0A','2.0DHA','2.0DHSA']
terminos = ['Dia','Hora','GEN','NOC','VHC','COFGEN','COFNOC','COFVHC','PMHGEN','PMHNOC','PMHVHC','SAHGEN','SAHNOC','SAHVHC','FOMGEN','FOMNOC','FOMVHC','FOSGEN','FOSNOC','FOSVHC','INTGEN','INTNOC','INTVHC','PCAPGEN','PCAPNOC','PCAPVHC','TEUGEN','TEUNOC','TEUVHC']
data_json = {}
data_json['datos_TOT'] = []
data_json['datos_TEU'] = []
data_json['fecha'] = fecha
for i in range(3):
data_json['datos_TOT'].append({})
data_json['datos_TEU'].append({})
data_json['datos_TOT'][i]['tarifa'] = tarifas[i]
data_json['datos_TEU'][i]['tarifa'] = tarifas[i]
for j in range(0,24):
data_json['datos_TEU'][i]['values'] = []
data_json['datos_TEU'][i]['values'].append({})
data_json['datos_TEU'][i]['values'][j]['periodo'] = "{0}-{1}".format(j,j+1)
return HttpResponse(json.dumps(data_json), content_type="application/json")
In fact it has one more depth level but as the second don´t work I didn´t put it here.
With this nested structure I expected a JSON object with (b-a) entries in the first level with (d-c) entries each one. But what I see is that the second loop only returns the last value! So if the "j" loop goes from 0 to 24 it will just return "23" and nothing more. Seems like it just works one "lap".
Is there any limit in nesting loops in the views? If there is, where could I place them? I´m trying to keep the models.py free from logic.
Your problem is that you reset data_json['datos_TEU'][i]['values'] to an empty list at the beginning of every iteration of the j loop, so it will only ever have one element. Move that line to before the nested loop.
Note that your code could be written much more Pythonically:
for tarifa in tarifas:
tot = {'tarifa': tarifa}
data_json['datos_TOT'].append(tot)
teu = {'tarifa': tarifa}
values = []
for j, termino in enumerate(terminos):
value = {'termino': termino, 'periodo': "{0}-{1}".format(j,j+1)}
values.append(value)
teu['values'] = values
data_json['datos_TEU'].append(teu)

assign an existing list to a blank list python

So basically I'm trying to check if a bunch of strings in a list called list9000 contain an "#" sign. What I want is to empty the list once it has 6 elements, but before clearing it to do a for loop checking the elements for any "#" signs. I've tried using del and other emptying techniques, but it just doesn't seem to work. Here's my work so far:
if( len(list9000) == 6):
# print(list9000)
i = list9000.count("#")
if(i>1):
amount9000 = amount9000 - i + 1
numWrong = numWrong - i + 1
list9000[:] = []
list9000.append(line)
This is just a snippet of my code. There are about 300 lines of other code. I am reading a text file, in which I add the lines of text in the file to my list. If I could solve this problem, I would be basically done with my project!
Edit: I've tried using del list9000[:], but it doesn't work.
**Update: ** I have printed out the length of the list, and it doesn't seem to be 6 most of the time, but rather increased by 6 every time.
You can just reassign list9000 to a new empty list object. But be careful, Python creates reference if you are assigning mutable objects. A simple list9000[:] = [] should do the job to clear your list.
if( len(list9000) == 6):
# print(list9000)
i = 0
for item in list9000:
if("#" in item):
i += i + 1
if(i>1):
amount9000 = amount9000 - i + 1
numWrong = numWrong - i + 1
list9000[:] = []

Categories