pandas iterate over column values at once and generate range

pandas iterate over column values at once and generate range - python

I have a pandas dataframe like as below
df1 = pd.DataFrame({'biz': [18, 23], 'seg': [30, 34], 'PID': [40, 52]})
I would like to do the below
a) pass all the values from each column at once to for loop
For ex:
I am trying the below
cols = ['biz','seg','PID']
for col in cols:
for i, j in df1.col.values:
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
but this doesn;t work and I get an error
TypeError: cannot unpack non-iterable numpy.int64 object
I expect my output to be like as below
D18:F23
Q18:S23
AB18:AD23
D30:F34
Q30:S34
AB30:AD34
D40:F52
Q40:S52
AB40:AD52

The mistake is in the innermost forloop.
You are requesting an iterator over a 1-dimensional array of values, this iterator yields scalar values and hence they can not be unpacked.
If your dataframe only has 2 items per column, then this should suffice
cols = ['biz','seg','PID']
for col in cols:
i, j = getattr(df1, col).values
print("D" + str(i) + ":" + "F" + str(j))
print("Q" + str(i) + ":" + "S" + str(j))
print("AB" + str(i) + ":" + "AD" + str(j))
Alternatives
Pandas using loc
This is actually the simplest way to solve it but only now it occurred to me. We use the column name col along with loc to get all rows (given by : in loc[:, col])
cols = ['biz','seg','PID']
for col in cols:
i, j = df1.loc[:, col].values
Attrgetter
We can use the attrgetter object from operator library to get a single (or as many attributes) as we want:
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
for col in cols:
i, j = col.values
Attrgetter 2
This approach is similar to the one above, except that we select multiple columns and have the i and j in two lists, with each entry corresponding to one column.
from operator import attrgetter
cols = ['biz','seg','PID']
cols = attrgetter(*cols)(df1)
cols = [col.values for col in cols]
all_i, all_j = zip(*cols)
Pandas solution
This approach uses just pandas functions. It gets the column index using the df1.columns.get_loc(col_name) function, and then uses .iloc to index the values. In .iloc[a,b] we use : in place of a to select all rows, and index in place of b to select just the column.
cols = ['biz','seg','PID']
for col in cols:
index = df1.columns.get_loc(col)
i, j = df1.iloc[:, index]
# do the printing here

for i in range(len(df1)):
print('D' + str(df1.iloc[i,0]) + ':' + 'F' + str(df1.iloc[i+1,0]))
print('Q' + str(df1.iloc[i,0]) + ':' + 'S' + str(df1.iloc[i+1,0]))
print('AB' + str(df1.iloc[i,0]) + ':' + 'AD' + str(df1.iloc[i+1,0]))
print('D' + str(df1.iloc[i,1]) + ':' + 'F' + str(df1.iloc[i+1,1]))
print('Q' + str(df1.iloc[i,1]) + ':' + 'S' + str(df1.iloc[i+1,1]))
print('AB' + str(df1.iloc[i,1]) + ':' + 'AD' + str(df1.iloc[i+1,1]))
print('D' + str(df1.iloc[i,2]) + ':' + 'F' + str(df1.iloc[i+1,2]))
print('Q' + str(df1.iloc[i,2]) + ':' + 'S' + str(df1.iloc[i+1,2]))
print('AB' + str(df1.iloc[i,2]) + ':' + 'AD' + str(df1.iloc[i+1,2]))

Related

print statement never gets executed in a loop

in the below posted code, the first nested for loops displays the logs or the print statemnt as expected. but the latter nested for loops which has k and l as indces never displys the logs or the print statement within it.
please let me know why the print statement
print(str(x) + ",,,,,,,,,,,,,,,,,,," + str(y))
never gets displayed despite the polygonCoordinatesInEPSG25832 contains values
python code:
for feature in featuresArray:
polygonCoordinatesInEPSG4326.append(WebServices.fetchCoordinateForForFeature(feature))
for i in range(len(polygonCoordinatesInEPSG4326)):
for j in range(len(polygonCoordinatesInEPSG4326[i])):
lon = polygonCoordinatesInEPSG4326[i][j][0]
lat = polygonCoordinatesInEPSG4326[i][j][1]
x, y = transform(inputProj, outputProj, lon, lat)
xy.append([x,y])
print ("lon:" + str(lon) + "," + "lat:" + str(lat) + "<=>" + "x:" + str(x) + "," + "y:" , str(y))
print(str(x) + "," + str(y))
print("xy[%d]: %s"%(len(xy)-1,str(xy[len(xy)-1])))
print("\n")
print("len(xy): %d"%(len(xy)))
polygonCoordinatesInEPSG25832.append(xy)
print("len(polygonCoordinatesInEPSG25832[%d]: %d"%(i,len(polygonCoordinatesInEPSG25832[i])))
xy.clear()
print("len(polygonCoordinatesInEPSG25832 = %d" %(len(polygonCoordinatesInEPSG25832)))
for k in range(len(polygonCoordinatesInEPSG25832)):
for l in range(len(polygonCoordinatesInEPSG25832[k])):
x = polygonCoordinatesInEPSG25832[k][l][0]
y = polygonCoordinatesInEPSG25832[k][l][1]
print(str(x) + ",,,,,,,,,,,,,,,,,,," + str(y))

polygonCoordinatesInEPSG25832 contain values but polygonCoordinatesInEPSG25832[k] don't.
You append it with xy but you didn't unlinked it so when you call xy.clear() it become empty. Try deep copy it instead.

Dynamically create string from pandas column

I have two data frame like below one is df and another one is anomalies:-
d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
df = pd.DataFrame(data=d)
Basically anomalies in a mirror copy of df just in anomalies the value will be 0 or 1 which indicates anomalies where value is 1 and non-anomaly where value is 0
d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
anomalies = pd.DataFrame(data=d)
and I am converting that into a specific format with the below code:-
details = (
'\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' +
'\n' + '10028:' + '\t' + str(df.tail(1)['10028'][0]) + '\t' + str(anomalies['10028'][0]) +
'\n' + '1058:' + '\t' + '\t' + str(df.tail(1)['1058'][0]) + '\t' + str(anomalies['1058'][0]) +
'\n' + '20120:' + '\t' + str(df.tail(1)['20120'][0]) + '\t' + str(anomalies['20120'][0]) +
'\n' + '20121:' + '\t' + str(round(df.tail(1)['20121'][0], 2)) + '\t' + str(anomalies['20121'][0]) +
'\n' + '20122:' + '\t' + str(round(df.tail(1)['20122'][0], 2)) + '\t' + str(anomalies['20122'][0]) +
'\n' + '20123:' + '\t' + str(round(df.tail(1)['20123'][0], 3)) + '\t' + str(anomalies['20123'][0]) +
'\n' + '5043:' + '\t' + str(round(df.tail(1)['5043'][0], 3)) + '\t' + str(anomalies['5043'][0]) +
'\n' + '5046:' + '\t' + str(round(df.tail(1)['5046'][0], 3)) + '\t' + str(anomalies['5046'][0]) +
'\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)
The problem is the column values are changing always in every run I mean like in this run its '10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046' but maybe in next run it will be '10029', '1038', '20121', '20122', '20123', '5083', '5946'
How I can create the details dynamically depending on what columns are present in the data frame as I don't want to hard code and in the message i want to pass the name of columns whose value is 1.
The value of columns will always be either 1 or 0.

Try this:
# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly'
# dynamically add the data
for idx, val in df.iloc[-1].iteritems():
s += f'\n{idx}\t{val}\t{anomalies[idx][0]}'
# for Python 3.5 and below, use this
# s += '\n{}\t{}\t{}'.format(idx, val, anomalies[idx][0])
# last part
s += ('\n\n' + 'message:' + '\t' +
'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
)

I'm trying to create a series of random numbers in Python

I have a script here that I created to generate a string of random numbers. I do not know how to make it where the numbers would be random every time it is used. For instance I want it to be like this:
0,1,2,3,5
1,4,7,3,5
2,8,2,1,3
Instead I am getting:
0,1,2,3,5
1,1,2,3,5
2,1,2,3,5
The first number is running in sequence which is great. But I need the other 4 numbers to change.
inter1rand = random.randint(1, 1)
inter1time = inter1rand
inter1 = inter1time
print (inter1)
inter2rand = random.randint(24, 31)
inter2time = inter2rand + inter1time
inter2 = inter2time
print (inter2)
inter3rand = random.randint(34, 55)
inter3time = inter3rand + inter2time
inter3 = inter3time
print (inter3)
mouse1 = random.randint(110, 1100)
mouse2 = random.randint(875, 1100)
mouse3 = random.randint(375, 607)
col_list1 = [str(n) + ',1,' + str(mouse1) + ',' + str(mouse2) + ',' + str(mouse3) + ';' for n in range(0,inter1)]
col_list2 = [str(n) + ',1,' + str(mouse1) + ',' + str(mouse2) + ',' + str(mouse3) + ';' for n in range(inter1,inter2)]
col_list3 = [str(n) + ',1,' + str(mouse1) + ',' + str(mouse2) + ',' + str(mouse3) + ';' for n in range(inter2,inter3)]
#print (col_list1)
#print (col_list1 + col_list2)
#print (col_list1 + col_list2 + col_list3)

Change the code from using mouse1, mouse2 and mouse3 to directly use random.randint:
col_list1 = [str(n) + ',1,' + str(random.randint(110, 1100)) + ',' + str(random.randint(875, 1100)) + ',' + str(random.randint(375, 607)) + ';' for n in range(0,inter1)]
col_list2 = [str(n) + ',1,' + str(random.randint(110, 1100)) + ',' + str(random.randint(875, 1100)) + ',' + str(random.randint(375, 607)) + ';' for n in range(inter1,inter2)]
col_list3 = [str(n) + ',1,' + str(random.randint(110, 1100)) + ',' + str(random.randint(875, 1100)) + ',' + str(random.randint(375, 607)) + ';' for n in range(inter2,inter3)]

If you want truly random integer values you can check out np.random.choice() function also.
np.random.choice(10,3)
array([9, 4, 2])
>>> np.random.choice(10,3)
array([6, 7, 8])
>>> np.random.choice(31,3)
array([11, 28, 10])
>>> np.random.choice([0,1],3)
array([0, 1, 1])
In this function you might not get to give range but chances of repetition is less.

Use the random module, and its randint:
import random
for u in range(3):
r = [random.randint(0, 10) for i in range(5)]
for d in r[:-1]:
print(d, end=', ')
print(r[-1])
Returns:
9, 6, 0, 4, 1
7, 9, 2, 2, 1
4, 10, 6, 5, 8

There are a lot of ways to do this, I've linked some of the ways sources below:
https://www.tutorialspoint.com/generating-random-number-list-in-python
https://docs.python.org/3/library/secrets.html
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

Error when accessing pandas series by index

I have a piece of code:
texts = []
for i in range(len(df.index)):
_text = ''
for c in df.columns:
_text += c + ':' + str(df[c][i]) + '<br>'
texts.append (_text)
and I have the error of ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> at the line _text += c + ':' + str(df[c][i]) + '<br>'.
I changed the [i] to all kinds of accessing functions I know: at, iat, loc, iloc but none of them works, they still have the same problem.

This works for me: i inserted iloc[i] in line 5:
texts = []
for i in range(len(df.index)):
_text = ''
for c in df.columns:
_text += c + ':' + str(df[c].iloc[i]) + '<br>'
texts.append (_text)

Organize dictionary by frequency

I create a dictionary for the most used words and get the top ten. I need to sort this for the list, which should be in order. I can't do that without making a list, which I can't use. Here is my code. I am away dictionaries cannot be sorted, but i still need help.
most_used_words = Counter()
zewDict = Counter(most_used_words).most_common(10)
newDict = dict(zewDict)
keys = newDict.keys()
values = newDict.values()
msg = ('Here is your breakdown of your most used words: \n\n'
'Word | Times Used'
'\n:--:|:--:'
'\n' + str(keys[0]).capitalize() + '|' + str(values[0]) +
'\n' + str(keys[1]).capitalize() + '|' + str(values[1]) +
'\n' + str(keys[2]).capitalize() + '|' + str(values[2]) +
'\n' + str(keys[3]).capitalize() + '|' + str(values[3]) +
'\n' + str(keys[4]).capitalize() + '|' + str(values[4]) +
'\n' + str(keys[5]).capitalize() + '|' + str(values[5]) +
'\n' + str(keys[6]).capitalize() + '|' + str(values[6]) +
'\n' + str(keys[7]).capitalize() + '|' + str(values[7]) +
'\n' + str(keys[8]).capitalize() + '|' + str(values[8]) +
'\n' + str(keys[9]).capitalize() + '|' + str(values[9]))
r.send_message(user, 'Most Used Words', msg)
How would I do it so the msg prints the words in order from most used word on the top to least on the bottom with the correct values for the word?
Edit: I know dictionaries cannot be sorted on their own, so can I work around this somehow?

Once you have the values it's as simple as:
print('Word | Times Used')
for e, t in collections.Counter(values).most_common(10):
print("%s|%d" % (e,t))
Print something like:
Word | Times Used
e|4
d|3
a|2
c|2

From the Docs: most_common([n])
Return a list of the n most common elements and their counts from the
most common to the least. If n is not specified, most_common() returns
all elements in the counter. Elements with equal counts are ordered
arbitrarily:
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
Your code can be:
from collections import Counter
c = Counter(most_used_words)
msg = "Here is your breakdown of your most used words:\n\nWords | Times Used\n:--:|:--:\n"
msg += '\n'.join('%s|%s' % (k.capitalize(), v) for (k, v) in c.most_common(10))
r.send_message(user, 'Most Used Words', msg)

import operator
newDict = dict(zewDict)
sorted_newDict = sorted(newDict.iteritems(), key=operator.itemgetter(1))
msg = ''
for key, value in sorted_newDict:
msg.append('\n' + str(key).capitalize() + '|' + str(value))
This will sort by the dictionary values. If you want it in the other order add reverse=True to sorted().

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas iterate over column values at once and generate range - python

Related

print statement never gets executed in a loop

Dynamically create string from pandas column

I'm trying to create a series of random numbers in Python

Error when accessing pandas series by index

Organize dictionary by frequency

Categories

Resources