How can iterate rows of dataframe using index? - python

I am looking to apply a loop over indices of dataframe in python.
My loop is like:
for index in DataFrame:
if index <= 10
index= index+1
return rows(index)

Use DataFrame.iterrows():
for row, srs in pd.DataFrame({'a': [1,2], 'b': [3,4]}).iterrows():
...do something...

Try this:
for index, row in df.iterrows():
if index <=10:
print row
This is going to print the first 10 rows

We have to take list of index if any condition is required
we can take the rows in list of Series
for i in index:
l1 = list(range(i-10,i+2))
all_index.extend(l1)
all_index = list(set(all_index))
all_series = []
take list of series
for i in all_index:
a = df.iloc[i, :]
all_series = all_series.extend(a)

Related

Python - Pandas - drop specific columns (axis)?

So I got a numeric list [0-12] that matches the length of my columns in my spreadsheet and also replaced the column headers with that list df.columns = list.
Now i want to drop specific columns out of that spreadsheet like this.
To create the list of numbers to match the length of columns I got this:
listOfNumbers = []
column_name = []
for i in range(0, len(df.columns)):
listOfNumbers.append(i)
df.columns = listOfNumbers
for i in range(1, len(df.columns)):
for j in range(1, len(df.columns)):
if i != colList[j]:
df.drop(i, inplace=True)
And I got the list [1,2,3] as seen in the picture.
But i always get this Error:
KeyError: '[1] not found in axis
I tried to replace df.drop(i, inplace=True) with df.drop(i, axis=1, inplace=True) but that didn't work either.
Any suggestions? Thanks.
the proper way will be:
columns_to_remove = [1, 2, 3] # columns to delete
df = df.drop(columns=df.columns[columns_to_remove])
So for your use case:
for i in range(1, len(df.columns)):
for j in range(1, len(df.columns)):
if i != colList[j]:
df.drop(columns=df.columns[i], inplace=True)
If you want to drop every column that does not appear in colList, this code does it, using set difference:
setOfNumbers = set(range(df.shape[1]))
setRemainColumns = set(colList)
for dropColumn in setOfNumbers.difference(setRemainColumns):
df.drop(dropColumn, axis=1, inplace=True)

Searching index position in python

cols = [2,4,6,8,10,12,14,16,18] # selected the columns i want to work with
df = pd.read_csv('mywork.csv')
df1 = df.iloc[:, cols]
b= np.array(df1)
b
outcome
b = [['WV5 6NY' 'RE4 9VU' 'BU4 N90' 'TU3 5RE' 'NE5 4F']
['SA8 7TA' 'BA31 0PO' 'DE3 2FP' 'LR98 4TS' 0]
['MN0 4NU' 'RF5 5FG' 'WA3 0MN' 'EA15 8RE' 'BE1 4RE']
['SB7 0ET' 'SA7 0SB' 'BT7 6NS' 'TA9 0LP' 'BA3 1OE']]
a = np.concatenate(b) #concatenated to get a single array, this worked well
a = np.array([x for x in a if x != 'nan'])
a = a[np.where(a != '0')] #removed the nan
print(np.sort(a)) # to sort alphabetically
#Sorted array
['BA3 1OE' 'BA31 0PO' 'BE1 4RE' 'BT7 6NS' 'BU4 N90'
'DE3 2FP' 'EA15 8RE' 'LR98 4TS' 'MN0 4NU', 'NE5 4F' 'RE4 9VU'
'RF5 5FG' 'SA7 0SB' 'SA8 7TA' 'SB7 0ET' 'TA9 0LP' 'TU3 5RE'
'WA3 0MN' 'WV5 6NY']
#Find the index position of all elements of b in a(sorted array)
def findall_index(b, a )
result = []
for i in range(len(a)):
for j in range(len(a[i])):
if b[i][j] == a:
result.append((i, j))
return result
print(findall_index(0,result))
I am still very new with python, I tried finding the index positions of all element of b in a above. The underneath codes blocks doesn't seem to be giving me any result. Please can some one help me.
Thank you in advance.
One way you could approach this is by zipping (creating pairs) the index of elements in b with the actual elements and then sorting this new array based on the elements only. Now you have a mapping from indices of the original array to the new sorted array. You can then just loop over the sorted pairs to map the current index to the original index.
I would highly suggest you to code this yourself, since it will help you learn!

How do I know is there a value in another column?

I have a df something like this:
lst = [[30029509,37337567,41511334,41511334,41511334]]
lst2 = [35619048]
lst3 = [[41511334,37337567,41511334]]
lst4 = [[37337567,41511334]]
df = pd.DataFrame()
df['0'] = lst, lst2, lst3, lst4
I need to count how many times there is a '41511334' in every column
I do this code:
df['new'] = '41511334' in str(df['0'])
And I got True in every column's row, but it's a mistake for second line.
What's wrong?
Thanks
str(df['0']) gives a string representation of column 0 and so includes all the data. You will then see that
'41511334' in str(df['0'])
gives True, and you assign this to every row of the 'new' column. You are looking for something like
df['new'] = df['0'].apply(lambda x: '41511334' in str(x))
or
df['new'] = df['0'].astype(str).str.contains('41511334')

Compare 1 column of 2D array and remove duplicates Python

Say I have a 2D array like:
array = [['abc',2,3,],
['abc',2,3],
['bb',5,5],
['bb',4,6],
['sa',3,5],
['tt',2,1]]
I want to remove any rows where the first column duplicates
ie compare array[0] and return only:
removeDups = [['sa',3,5],
['tt',2,1]]
I think it should be something like:
(set first col as tmp variable, compare tmp with remaining and #set array as returned from compare)
for x in range(len(array)):
tmpCol = array[x][0]
del array[x]
removed = compare(array, tmpCol)
array = copy.deepcopy(removed)
print repr(len(removed)) #testing
where compare is:
(compare first col of each remaining array items with tmp, if match remove else return original array)
def compare(valid, tmpCol):
for x in range(len(valid)):
if valid[x][0] != tmpCol:
del valid[x]
return valid
else:
return valid
I keep getting 'index out of range' error. I've tried other ways of doing this, but I would really appreciate some help!
Similar to other answers, but using a dictionary instead of importing counter:
counts = {}
for elem in array:
# add 1 to counts for this string, creating new element at this key
# with initial value of 0 if needed
counts[elem[0]] = counts.get(elem[0], 0) + 1
new_array = []
for elem in array:
# check that there's only 1 instance of this element.
if counts[elem[0]] == 1:
new_array.append(elem)
One option you can try is create a counter for the first column of your array before hand and then filter the list based on the count value, i.e, keep the element only if the first element appears only once:
from collections import Counter
count = Counter(a[0] for a in array)
[a for a in array if count[a[0]] == 1]
# [['sa', 3, 5], ['tt', 2, 1]]
You can use a dictionary and count the occurrences of each key.
You can also use Counter from the library collections that actually does this.
Do as follows :
from collection import Counter
removed = []
for k, val1, val2 in array:
if Counter([k for k, _, _ in array])[k]==1:
removed.append([k, val1, val2])

Groupby with row index operation?

How can I select the rows with given row index operation (say, only even rows or only if row# % 5 == 0) in pandas?
let's say I have a dataframe with df [120 rows x 10 column], and I want to create two out of it, one from even rows df1 [60 rows x 10 column], and one from odd rows [60 rows x 10 column]?
You can slice the dfs using normal list style slicing semantics:
first = df.iloc[::2]
second = df.iloc[1::2]
So the first steps every 2 rows starting from first to last row
the second does the same but starts from row 1, the second row, and steps every 2 rows
As stated already, you may use iloc
df0 = df.iloc[::2]
df1 = df.iloc[1::2]
If you have a more complex selection schema you may pass a boolean vector to iloc, e.g.,
def filter_by( idx ):
# param idx: an index
# returns True if idx%4==0 or idx%4==1
if idx%4==0 or idx%4==1:
return True
else:
return False
# a boolean vector is created by means of filter_by
df_new = df.iloc[ [ filter_by(i) for i in range(df.shape[0]) ] ]
the filtering above is then:
df0 = df.iloc[ [ idx%2==0 for idx in range(df.shape[0]) ] ]
df1 = df.iloc[ [ idx%2==1 for idx in range(df.shape[0]) ] ]

Categories