How to print the column name on Pandas DataFrame row? - python

I have this DataFrame:
df = pd.DataFrame({'day':['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017','1/7/2017'],
'event':['Rain','Sunny','Snow','Snow','Rain','Sunny','Sunny'],
'temperature': [32, 35, 28,24,32,31,''],'windspeed':[6,7,2,7,4,2,'']})
df
I am trying to find the headers for the missing values on index 6:
for x in df.loc[6]:
if x == '':
print(df.columns.values)
else: print(x)
I have tried searching and the closest I could get was what I have now. Ultimately I'm trying insert these values into the dataframe: temperature =
34, windspeed = 8.
But my first step was simply trying to build the loop/if statement that says if x=='' & [COLUMN_NAME] == 'temperature'... and that is where I got stuck. I'm new to python, just trying to learn Pandas. I need to only return the column I'm on, and not a list of all the columns.

There are better ways to do this, but this works.
for col, val in df.loc[6].iteritems():
if not val: # this is the same as saying "if val == '':"
print(col)
else:
print(val)

Modified from your code:
for i,x in enumerate(df.loc[6]):
if x == '':
print(df.columns[i])
else: print(x)

I would use list comprehension as follows:
listOfNulls = [ind for ind in df.loc[6].index if df.loc[6][ind] == '']
and when I print the listOfNulls, I get:
>>>> print(listOfNulls)
Out: ['temperature', 'windspeed']
The key here is it understand that df.loc[6] is a pandas Series which has indices. We are using the values of the Series to get the indices.

Related

recursively merging rows pandas dataframe based on the condition

community,
I have a sorted pandas dataframe that looks as following:
I want to merge rows that have overlapping values in start and end columns. Meaning that if the end value of initial row is bigger than start value of the sequential one or any othe sequential, they will be merged into one row. Examples are rows 3, 4 and 5. Output I would expect is:
To do so, I am trying to implement recursion function, that would loop over the dataframe until condition worsk and then return me a number that would be used to search location for the end row .
However, the functioin I am trying to implement, returns me empty dataframe. Could you help me please, where should I put attention, or what alternative can I build if recurtion is not a solution?
def row_merger(pd_df):
counter = 0
new_df = pd.DataFrame(columns=pd_df.columns)
for i in range(len(pd_df) - 1):
def recursion_inside(pd_df, counter = 0):
counter = 0
if pd_df.iloc[i + 1 + counter]["q.start"] <= pd_df.iloc[i]["q.end"]:
counter = counter+1
recursion_inside(pd_df, counter)
else:
return counter
new_row = {"name": pd_df["name"][i], "q.start": pd_df.iloc[i]
["q.start"], "q.end": pd_df.iloc[i+counter]["q.start"]}
new_df.append(new_row, ignore_index=True)
return new_df
I don't see the benefit of using recursion here, so I would just iterate over the rows instead, building up the rows for the output dataframe one by one, e.g. like this:
def row_merger(df_in):
if len(df_in) <= 1:
return df_in
rows_out = []
current_row = df_in.iloc[0].values
for next_row in df_in.iloc[1:].values:
if next_row[1] > current_row[2]:
rows_out.append(current_row)
current_row = next_row
else:
current_row[2] = max(current_row[2], next_row[2])
rows_out.append(current_row)
return pd.DataFrame(rows_out, columns=df_in.columns)

how to count excel rows with the same values in python

I have an excel file containing 3 columns(source,destination and time) and 140400 rows, I want to count rows with the same soure, destination and time value) similar values in all columns, by this I mean to count the rows containing packet information from the same sources to the same destination and at the same time.(row1:0,1,3 and row102:0,1,3 so we have 2 same rows here), all the values are integer. I tried to use df.iloc but just returns zero, tried to use dictionary but couldnt make it. I would appreciate if someone help me to find a solution.
for t in timestamps:
this is one way I tried but didn't work.
for x in range(120):
for y in range(120):
while i < 140400 and df.iloc[i,0] <= t:
#if df.iloc[i,0]<= t :
if df.iloc[i, 0] == t and df.iloc[i, 1]==y and df.iloc[i, 2]==x:
TotalArp[x][y]+=1
i=i+1
this is the file format
If I understood correctly, you just want to count rows that all have the same value, right? This should work, despite not being the most efficient way probably:
counter = 0
for index, row in df.iterrows():
if row[0] == row[1] == row[2]:
counter += 1
Edit:
OK, since I'm too stupid to comment, I'll just edit it here:
duplicate_count_df = df.groupby(df.columns.tolist(), as_index=False).size().drop_duplicates(subset=list(df.columns)
This should lead you into the right direction.
Suppose you have These Columns in Your DataFrame:
["Col1" , "Col2" , "Col3" , "Col4"]
Now you want to count rows that contains equal values in each column of your DataFrame:
len(df[df['Col1'] == df['Col2'] == df['Col3'] == df['Col4'])
Just easy Like That.
Update:
if you would like to get the count by each element specifically :
# Create Dictionary to specify count for each element
Properties = {k:0 for k in set([item for elem in df.columns for item in df[elem]])}
# Then Start counting values that are equal in each row
for item in range(len(df)) :
if df.iloc[item , 0] == df.iloc[item , 1] == df.iloc[item , 2] == df.iloc[item , 3]:
Properties[df.iloc[item , 0]] += 1
print(Properties)
Let's See an Example:
# Here i have a DataFrame with 2 columns and 3 rows
df = pd.DataFrame({'1':[1,2,3] , '2':[1,1,'-']})
df
OutPut :
And Then :
Properties = {k:0 for k in set([item for elem in df.columns for item in df[elem]])}
for item in range(len(df)) :
if df.iloc[item , 0] == df.iloc[item , 1]:
Properties[df.iloc[item , 0]] += 1
Properties
Output:
{1: 1, 2: 0, 3: 0, '-': 0}
to better understand:

How to fill column based on the condition in dataframe?

I am trying to fill records one column based on some condition but I am not getting the result. Can you please help me how to do this?
Example:
df:
applied_sql_function1 and_or_not_oprtor_pre comb_fld_order_1
CASE WHEN
WHEN AND
WHEN AND
WHEN
WHEN AND
WHEN OR
WHEN
WHEN dummy
WHEN dummy
WHEN
Expected Output:
applied_sql_function1 and_or_not_oprtor_pre comb_fld_order_1 new
CASE WHEN CASE WHEN
WHEN AND
WHEN AND
WHEN WHEN
WHEN AND
WHEN OR
WHEN WHEN
WHEN dummy
WHEN dummy
WHEN WHEN
I have written some logic for this but it is not working:
df_main1['new'] =''
for index,row in df_main1.iterrows():
new = ''
if((str(row['applied_sql_function1']) != '') and (str(row['and_or_not_oprtor_pre']) == '') and (str(row['comb_fld_order_1']) == '')):
new += str(row['applied_sql_function1'])
print(new)
if(str(row['applied_sql_function1']) != '') and (str(row['and_or_not_oprtor_pre']) != ''):
new += ''
print(new)
else:
new += ''
row['new'] = new
print(df_main1['new'])
Using, loc
mask = df.and_or_not_oprtor_pre.fillna("").eq("") \
& df.comb_fld_order_1.fillna("").eq("")
df.loc[mask, 'new'] = df.loc[mask, 'applied_sql_function1']
try this one, it would work in a quick way
indexes = df.index[(df['and_or_not_oprtor_pre'].isna()) & (df['comb_fld_order_1'].isna())]
df.loc[indexes, 'new'] = df.loc[indexes, 'applied_sql_function1']
Go with np.where all the way! It's easy to understand and vectorized, so the performance is good on really large datasets.
import pandas as pd, numpy as np
df['new'] = ''
df['new'] = np.where((df['and_or_not_oprtor_pre'] == '') & (df['comb_fld_order_1'] == ''), df['applied_sql_function1'], df['new'])
df

Search for a value anywhere in a pandas DataFrame

This seems like a simple question, but I couldn't find it asked before (this and this are close but the answers aren't great).
The question is: if I want to search for a value somewhere in my df (I don't know which column it's in) and return all rows with a match.
What's the most Pandaic way to do it? Is there anything better than:
for col in list(df):
try:
df[col] == var
return df[df[col] == var]
except TypeError:
continue
?
You can perform equality comparison on the entire DataFrame:
df[df.eq(var1).any(1)]
You should using isin , this is return the column , is want row check cold' answer :-)
df.isin(['bal1']).any()
A False
B True
C False
CLASS False
dtype: bool
Or
df[df.isin(['bal1'])].stack() # level 0 index is row index , level 1 index is columns which contain that value
0 B bal1
1 B bal1
dtype: object
You can try the code below:
import pandas as pd
x = pd.read_csv(r"filePath")
x.columns = x.columns.str.lower().str.replace(' ', '_')
y = x.columns.values
z = y.tolist()
print("Note: It take Case Sensitive Values.")
keyWord = input("Type a Keyword to Search: ")
try:
for k in range(len(z)-1):
l = x[x[z[k]].str.match(keyWord)]
print(l.head(10))
k = k+1
except:
print("")
This is a solution which will return the actual column you need.
df.columns[df.isin(['Yes']).any()]
Minimal solution:
import pandas as pd
import numpy as np
def locate_in_df(df, value):
a = df.to_numpy()
row = np.where(a == value)[0][0]
col = np.where(a == value)[1][0]
return row, col

Iterate over list

Q6
4;99
3;4;8;9;14;18
2;3;8;12;18
2;3;11;18
2;3;8;18
2;3;4;5;6;7;8;9;11;12;15;16;17;18
2;3;4;8;9;10;11;13;18
1;3;4;5;6;7;13;16;17
2;3;4;5;6;7;8;9;11;12;14;15;18
3;11;18
2;3;5;8;9;11;12;13;15;16;17;18
2;5;11;18
1;2;3;4;5;8;9;11;17;18
3;7;8;11;13;14
2;3;8;18
2;13
2;3;5;8;9;11;12;13;18
2;3;4;9;11;12;18
2;3;5;9;11;18
1;2;3;4;5;6;7;8;9;11;14;15;16;17;18
2;3;8;11;13;18
import pandas as pd
df_1 = pd.read_csv('amazon_final 29082018.csv')
list_6 = list(df_1["Q6"])
list_6 = list(map(str, list_6))
list_7 = list(zip(list_6))
tem_list = []
for x in list_6:
if ('3' in x[0]):
tem_list.append('Fire')
else:
tem_list.append(None)
df_1.to_csv('final.csv', index=False)
I have many such columns in data.
I want to extract value '3' from this, the code which i wrote is give giving me 3 value along with 13,23,33 so on. I only want count of rows having value 3.
You need to break up the rows and convert each value to an integer. At the moment you are looking for the presence of the string "3" which is why strings like "2;13" pass the test. Try something like this:
list_6 = ["4;99", "3;4;8;9;14;18", "2;3;8;12;18", "2;3;11;18", "2;3;8;18",
"2;3;4;5;6;7;8;9;11;12;15;16;17;18", "2;3;4;8;9;10;11;13;18",
"1;3;4;5;6;7;13;16;17", "2;3;4;5;6;7;8;9;11;12;14;15;18", "3;11;18",
"2;3;5;8;9;11;12;13;15;16;17;18", "2;5;11;18", "1;2;3;4;5;8;9;11;17;18",
"3;7;8;11;13;14", "2;3;8;18", "2;13", "2;3;5;8;9;11;12;13;18",
"2;3;4;9;11;12;18", "2;3;5;9;11;18",
"1;2;3;4;5;6;7;8;9;11;14;15;16;17;18", "2;3;8;11;13;18"]
temp_list = []
for x in list_6:
numbers = [int(num_string) for num_string in x.split(';')]
if (3 in numbers):
temp_list.append('Fire')
else:
temp_list.append('None')
print(temp_list)

Categories