Create new data frame with the name from loop number - python

I tried to create the loop in python. The code can be seen below.
df=pd.DataFrame.copy(mef_list)
form=['','_M3','_M6','_M9','_M12','_LN','_C']
for i in range(0, len(form)):
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(form[i])
When I run the code, the result is only from the last loop, which is variable+'_C' I think it is because the data frame (df) is always replaced when the new loop start. In order to avoid the issue, I would think that if the data frame (df) could be renamed by plus the number of loop, the problem would be solved.
I used str function and hope to get df0, df1, ...,df6 but it doesn't work with the data frame name. Please suggest me how to change the name of data frame by add number of loop and also I still open for any alternative way.
Thanks!

This isn't a pythonic thing to do, have you thought about instead creating a list of dataframes?
df=pd.DataFrame.copy(mef_list)
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in range(0, len(form)):
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(form[i])
list_of_df.append(df)
Then you can access 'df0' as list_of_df[0]
You also don't need to iterate through a range, you can just loop through the form list itself:
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in form:
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(i) ## You can remove str() if everything in form is already a string
list_of_df.append(df)

mef_list = ["UR", "CPI", "CEI", "Farm", "PCI", "durable", "C_CVM"]
form = ['', '_M3', '_M6', '_M9', '_M12', '_LN', '_C']
Variable_new = []
foo = 0
for variable in form:
Variable_new.append(mef_list[foo]+variable)
foo += 1
print(Variable_new)

Related

How to loop through a pandas data frame using a columns values as the order of the loop?

I have two CSV files which I’m using in a loop. In one of the files there is a column called "Availability Score"; Is there a way that I can make the loop iterate though the records in descending order of this column? I thought I could use Ob.sort_values(by=['AvailabilityScore'],ascending=False) to change the order of the dataframe first, so that when the loop starts in will already be in the right order. I've tried this out and it doesn’t seem to make a difference.
# import the data
CF = pd.read_csv (r'CustomerFloat.csv')
Ob = pd.read_csv (r'Orderbook.csv')
# Convert to dataframes
CF = pd.DataFrame(CF)
Ob = pd.DataFrame(Ob)
#Remove SubAssemblies
Ob.drop(Ob[Ob['SubAssembly'] != 0].index, inplace = True)
#Sort the data by thier IDs
Ob.sort_values(by=['CustomerFloatID'])
CF.sort_values(by=['FloatID'])
#Sort the orderbook by its avalibility score
Ob.sort_values(by=['AvailabilityScore'],ascending=False)
# Loop for Urgent Values
for i, rowi in CF.iterrows():
count = 0
urgent_value = 1
for j, rowj in Ob.iterrows():
if(rowi['FloatID']==rowj['CustomerFloatID'] and count < rowi['Urgent Deficit']):
Ob.at[j,'CustomerFloatPriority'] = urgent_value
count+= rowj['Qty']
You need to add inplace=True, like this:
Ob.sort_values(by=['AvailabilityScore'],ascending=False, inplace=True)
sort_values() (like most Pandas functions nowadays) are not in-place by default. You should assign the result back to the variable that holds the DataFrame:
Ob = Ob.sort_values(by=['CustomerFloatID'], ascending=False)
# ...
BTW, while you can pass inplace=True as argument to sort_values(), I do not recommend it. Generally speaking, inplace=True is often considered bad practice.

calling a dataframe column name in the parameters of a function

I have a dataframe with 8 columns that i would like to run below code (i tested it works on a single column) as a function to map/apply over all 8 columns.
click here for sample of dataframe
all_adj_noun = []
for i in range(len(bigram_df)):
if len([bigram_df['adj_noun'][i]]) >= 1:
for j in range(len(bigram_df['adj_noun'][i])):
all_adj_noun.append(bigram_df['adj_noun'][i][j])
However, when i tried to define function the code returns an empty list when it is not empty.
def combine_bigrams(df_name, col_name):
all_bigrams = []
for i in range(len(df_name)):
if len([df_name[col_name][i]]) >= 1:
for j in range(len(df_name[col_name][i])):
return all_bigrams.append(df_name[col_name][i][j])
I call the function by
combine_bigrams(bigram_df, 'adj_noun')
May I know is there anything that I may be doing wrong here?
The problem is that you are returning the result of .append, which is None
However, there is a better (and faster) way to do this. To return a list with all the values present in the columns, you can leverage Series.agg:
col_name = 'adj_noun'
all_bigrams = bigram_df[col_name].agg(sum)

Is it a correct way of doing loops in python?

df = pd.read_excel("file.xlsx", sheet_name = "Sheet1")
for i in range(0, len(df)):
cnt = 0
while True:
cnt += 1
driver.get(df['urls'][0])
'df' is a variable where the excel file contains with a heading 'urls', and it contains 4 urls in a column.
I'd like to open them one by one using loops but I'm not sure this is the correct way of doing it. I tried to run it but didn't work out.
I'll be appreciated if you give some helpful tips
cheers!
It seems like you haven't read the documentation for pandas here. In there you will see that there are builtin methods for dataframes
df = pd.read_excel("file.xlsx", sheet_name = "Sheet1")
for i, row in df.iterrows():
driver.get(row.url)
Or you can to it even simpler!
df.apply(lambda row: driver.get(row.url), axis=1)
This applies the lambda-function to each row in the entiry dataframe.
Everything you will need to know to get startet with python can be found in here https://wiki.python.org/moin/BeginnersGuide
For while True to work, you need a condition where you end the loop. The solution I'm giving is a valid way to iterate over loops in general. DataFrames have their own iterators that are potentially more performant, but it seems like some basic Python loops will be beneficial to you.
In your code you are doing a for loop with a while loop inside with a counter, but then never referencing that index in your DataFrame. You can just use a for loop without the while loop.
arr = [1, 2, 3]
for i in range(len(arr)):
print(arr[i])
The 0 is implied and you use the i variable to index into your array.
Another way is to use enumerate. It will give you the index and the value every iteration.
for i, value in enumerate(arr):
print(value)
Now if you don't need the index, you can just iterate over the array directly.
for value in arr:
print(value)
For dataframes, they have some iterators that you can use in a for loop. I don't think this matches your data model from your description, but it would give you a general case.
for row in df.iterrows():
print(row['urls'])
You may be able to do:
for url in df['urls']:
print(url)
if 'urls' is a column.
For reference, when you use a while loop, it would need to look something like this.
this is not the right way to iterate over this array
cnt = 0
while cnt < len(arr):
print(arr[cnt])
cnt += 1
Or for while True
cnt = 0
while True:
if cnt >= len(arr):
break
print(arr[cnt])
cnt += 1
Hopefully you can see why the for loops are preferred. Less code and it's cleaner and less prone to mistakes.

Modify pandas dataframe within iterrows loop

I'm new to Python.
I am trying to add prefix (Serial number) to an element within a data frame using for loop, to do with data cleaning/preparation before analysis.
The code is
a=pd.read_excel('C:/Users/HP/Desktop/WFH/PowerBI/CMM data.xlsx','CMM_unclean')
a['Serial Number'] = a['Serial Number'].apply(str)
print(a.iloc[72,1])
for index,row in a.iterrows():
if len(row['Serial Number']) == 6:
row['Serial Number'] = 'SR0' + row['Serial Number']
print(row['Serial Number'])
print(a.iloc[72,1])
The output is
C:\Users\HP\anaconda3\envs\test\python.exe C:/Users/HP/PycharmProjects/test/first.py
101306
SR0101306
101306
I don't understand why this is happening inside the for loop, value is changing, however outside it is the same.
This will never change the actual dataframe named a.
TL;DR: The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. However, you can use the index to access and edit the relevant row of the dataframe.
EXPLANATION
Why?
The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. However, you can use the index to access and edit the relevant row of the dataframe.
The solution is this:
import pandas as pd
a = pd.read_excel("Book1.xlsx")
a['Serial Number'] = a['Serial Number'].apply(str)
a.head()
# ID Serial Number
# 0 1 SR0101306
# 1 2 1101306
print(a.iloc[0,1])
#101306
for index,row in a.iterrows():
row = row.copy()
if len(row['Serial Number']) == 6:
# use the index and .loc method to alter the dataframe
a.loc[index, 'Serial Number'] = 'SR0' + row['Serial Number']
print(a.iloc[0,1])
#SR0101306
In the documentation, I read (emphasis from there)
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
Maybe this means in your case that a copy is made and no reference used. So the change applies temporarily to the copy but not to the data in the data frame.
Since you're already using apply, you could do this straight inside the function you call apply with:
def fix_serial(n):
n_s = str(n)
if len(n_s) == 6:
n_s = 'SR' + n_s
return n_s
a['Serial Number'] = a['Serial Number'].apply(fix_serial)

how to append a dataframe to an existing dataframe inside a loop

I made a simple DataFrame named middle_dataframe in python which looks like this and only has one row of data:
display of the existing dataframe
And I want to append a new dataframe generated each time in a loop to this existing dataframe. This is my program:
k = 2
for k in range(2, 32021):
header = whole_seq_data[k]
if header.startswith('>'):
id_name = get_ucsc_ids(header)
(chromosome, start_p, end_p) = get_chr_coordinates_from_string(header)
if whole_seq_data[k + 1].startswith('[ATGC]'):
seq = whole_seq_data[k + 1]
df_temp = pd.DataFrame(
{
"ucsc_id":[id_name],
"chromosome":[chromosome],
"start_position":[start_p],
"end_position":[end_p],
"whole_sequence":[seq]
}
)
middle_dataframe.append(df_temp)
k = k + 2
My iterations in the for loop seems to be fine and I checked the variables that stored the correct value after using regular expression. But the middle_dataframe doesn`t have any changes. And I can not figure out why.
The DataFrame.append method returns the result of the append, rather than appending in-place (link to the official docs on append). The fix should be to replace this line:
middle_dataframe.append(df_temp)
with this:
middle_dataframe = middle_dataframe.append(df_temp)
Depending on how that works with your data, you might need also to pass in the parameter ignore_index=True.
The docs warn that appending one row at a time to a DataFrame can be more computationally intensive than building a python list and converting it into a DataFrame all at once. That's something to look into if your current approach ends up too slow for your purposes.

Categories