Related
I have a file containing multiple lines in format student code and followed by some answer. e.g
N00000047,B,,D,C,C,B,D,D,C,C,D,,A,B,D,C,,D,A,C,,D,B,D,C
N00000048,B,A,D,D,C,B,,D,C,C,D,B,A,B,A,D,B,D,A,C,A,A,B,D,D
N00000049,A,,D,D,C,B,D,,C,C,D,B,,B,A,C,C,D,A,C,A,A,B,D,D
N00000050,,C,,D,,D,D,A,C,A,A,B,A,B,A,D,B,D,A,C,D,A,B,D,D
N00000051,B,A,B,,C,B,D,A,C,C,D,D,A,B,A,C,B,C,A,,A,A,B,D,B
N00000052,B,A,D,D,,B,D,A,D,,D,B,A,B,A,C,B,C,A,C,A,A,B,D,D
N00000053,B,A,D,D,C,B,D,A,C,C,D,B,B,B,C,C,B,D,A,C,A,C,A,D,D
And now I have to find which is the most question was skipped by students by order which question, how many student skipped and how many % student skipped that question.
I was split then make a loop and add every entry of skipped question in a list and then got stuck in find the max duplicates values in a list (it can be more than 1 output).
This is some expected output:
Question that most people answer incorrectly: 10 - 4 - 0.20, 14 - 4 - 0.20, 16 - 4 - 0.20, 19 - 4 - 0.20, 22 - 4 - 0.20. In format : a - b - c which a is question number, b is how much student was skipped, c is it take how many percentage of total student in class. There are 3 question have the most skipped is 10, 14, 19 and 22 and they all have 4 skipped.
Edited:
I put all skipped question in a list and count for which question have a largest duplicate like this:
def find_max_count(list):
item_with_max_count = []
max_count = 0
for item in list:
item_count = list.count(item)
if item_count > max_count:
max_count = list.count
for item1 in list:
if list.count(item1) == max_count:
item_with_max_count.append(item1)
return item_with_max_count
but there is an error:
TypeError: '>' not supported between instances of 'int' and 'builtin_function_or_method'
Start by accumulating a dictionary of all responses to each question and a list of all skipped answers:
from collections import defaultdict
responses = defaultdict(list) # all responses to a given question
skipped = [] # all skiped answers
for record in data.splitlines():
student_id, *answers = record.split(',')
for question_number, answer in enumerate(answers, start=1):
responses[question_number].append(answer)
if answer == '':
skipped.append(question_number)
Next perform the analysis:
from statistics import multimode
print('Most skipped questions:', multimode(skipped))
print('Answer for questions with more than two or more skips')
for question, answers in responses.items():
if answers.count('') >= 2:
print(f'Question {question}: {answers}')
This outputs:
Most skipped questions: [2, 5]
Answer for questions with more than two or more skips
Question 2: ['', 'A', '', 'C', 'A', 'A', 'A']
Question 5: ['C', 'C', 'C', '', 'C', '', 'C']
I'm certain this is what you wanted (a target output wasn't shown), but this should get you started the key techniques for analysis. In particular, the multimode function is super helpful in identifying most frequent occurrences including ties for first place. Also defaultdict is super useful for transposing the data from answers by student to answers by question.
Let's get a dictionary with the student id and the answers.
data = """
N00000047,B,,D,C,C,B,D,D,C,C,D,,A,B,D,C,,D,A,C,,D,B,D,C
N00000048,B,A,D,D,C,B,,D,C,C,D,B,A,B,A,D,B,D,A,C,A,A,B,D,D
N00000049,A,,D,D,C,B,D,,C,C,D,B,,B,A,C,C,D,A,C,A,A,B,D,D
N00000050,,C,,D,,D,D,A,C,A,A,B,A,B,A,D,B,D,A,C,D,A,B,D,D
N00000051,B,A,B,,C,B,D,A,C,C,D,D,A,B,A,C,B,C,A,,A,A,B,D,B
N00000052,B,A,D,D,,B,D,A,D,,D,B,A,B,A,C,B,C,A,C,A,A,B,D,D
N00000053,B,A,D,D,C,B,D,A,C,C,D,B,B,B,C,C,B,D,A,C,A,C,A,D,D
"""
info = {s: a
for line in data.strip().split('\n')
for s, *a in [line.split(',')]}
# {'N00000047': ['B', '', 'D', 'C', 'C', 'B', 'D', 'D', 'C', 'C', 'D', '', 'A', 'B', 'D', 'C', '', 'D', 'A', 'C', '', 'D', 'B', 'D', 'C'],
# 'N00000048': ['B', 'A', 'D', 'D', 'C', 'B', '', 'D', 'C', 'C', 'D', 'B', 'A', 'B', 'A', 'D', 'B', 'D', 'A', 'C', 'A', 'A', 'B', 'D', 'D'],
# 'N00000049': ['A', '', 'D', 'D', 'C', 'B', 'D', '', 'C', 'C', 'D', 'B', '', 'B', 'A', 'C', 'C', 'D', 'A', 'C', 'A', 'A', 'B', 'D', 'D'],
# 'N00000050': ['', 'C', '', 'D', '', 'D', 'D', 'A', 'C', 'A', 'A', 'B', 'A', 'B', 'A', 'D', 'B', 'D', 'A', 'C', 'D', 'A', 'B', 'D', 'D'],
# 'N00000051': ['B', 'A', 'B', '', 'C', 'B', 'D', 'A', 'C', 'C', 'D', 'D', 'A', 'B', 'A', 'C', 'B', 'C', 'A', '', 'A', 'A', 'B', 'D', 'B'],
# 'N00000052': ['B', 'A', 'D', 'D', '', 'B', 'D', 'A', 'D', '', 'D', 'B', 'A', 'B', 'A', 'C', 'B', 'C', 'A', 'C', 'A', 'A', 'B', 'D', 'D'],
# 'N00000053': ['B', 'A', 'D', 'D', 'C', 'B', 'D', 'A', 'C', 'C', 'D', 'B', 'B', 'B', 'C', 'C', 'B', 'D', 'A', 'C', 'A', 'C', 'A', 'D', 'D']}
Now, we can use collections.Counter to count up answers.
from collections import Counter
info = {s: Counter(a)
for line in data.strip().split('\n')
for s, *a in [line.split(',')]}
# {'N00000047': Counter({'D': 8, 'C': 7, 'B': 4, '': 4, 'A': 2}),
# 'N00000048': Counter({'D': 8, 'B': 6, 'A': 6, 'C': 4, '': 1}),
# 'N00000049': Counter({'D': 7, 'C': 6, 'A': 5, 'B': 4, '': 3}),
# 'N00000050': Counter({'D': 8, 'A': 7, 'B': 4, '': 3, 'C': 3}),
# 'N00000051': Counter({'B': 7, 'A': 7, 'C': 5, 'D': 4, '': 2}),
# 'N00000052': Counter({'A': 7, 'D': 7, 'B': 6, 'C': 3, '': 2}),
# 'N00000053': Counter({'D': 7, 'C': 7, 'B': 6, 'A': 5})}
From here, finding the statistical data you're looking for should be much easier. For instance:
(Requires Python 3.8+ for := operator.)
{q: {'all': (c := Counter(a)),
'skipped': (s := c['']),
'percentage': s / len(a)}
for line in data.strip().split('\n')
for q, *a in [line.split(',')]}
# {'N00000047': {'all': Counter({'D': 8, 'C': 7, 'B': 4, '': 4, 'A': 2}), 'skipped': 4, 'percentage': 0.16},
# 'N00000048': {'all': Counter({'D': 8, 'B': 6, 'A': 6, 'C': 4, '': 1}), 'skipped': 1, 'percentage': 0.04},
# 'N00000049': {'all': Counter({'D': 7, 'C': 6, 'A': 5, 'B': 4, '': 3}), 'skipped': 3, 'percentage': 0.12},
# 'N00000050': {'all': Counter({'D': 8, 'A': 7, 'B': 4, '': 3, 'C': 3}), 'skipped': 3, 'percentage': 0.12},
# 'N00000051': {'all': Counter({'B': 7, 'A': 7, 'C': 5, 'D': 4, '': 2}), 'skipped': 2, 'percentage': 0.08},
# 'N00000052': {'all': Counter({'A': 7, 'D': 7, 'B': 6, 'C': 3, '': 2}), 'skipped': 2, 'percentage': 0.08},
# 'N00000053': {'all': Counter({'D': 7, 'C': 7, 'B': 6, 'A': 5}), 'skipped': 0, 'percentage': 0.0}}
Something like?:
cat skipped.csv
N00000047,B,,D,C,C,B,D,D,C,C,D,,A,B,D,C,,D,A,C,,D,B,D,C
N00000048,B,A,D,D,C,B,,D,C,C,D,B,A,B,A,D,B,D,A,C,A,A,B,D,D
N00000049,A,,D,D,C,B,D,,C,C,D,B,,B,A,C,C,D,A,C,A,A,B,D,D
N00000050,,C,,D,,D,D,A,C,A,A,B,A,B,A,D,B,D,A,C,D,A,B,D,D
N00000051,B,A,B,,C,B,D,A,C,C,D,D,A,B,A,C,B,C,A,,A,A,B,D,B
N00000052,B,A,D,D,,B,D,A,D,,D,B,A,B,A,C,B,C,A,C,A,A,B,D,D
N00000053,B,A,D,D,C,B,D,A,C,C,D,B,B,B,C,C,B,D,A,C,A,C,A,D,D
import csv
from collections import Counter
with open("skipped.csv", "r" , newline="") as csv_file:
reader = csv.reader(csv_file)
l = []
for line in reader:
d = {"q": line.pop(0)}
ct = Counter(line)
# In Python 3.10+ you can do ct.total() instead of below.
q_sum = sum(ct.values())
skipped = ct['']
perc = skipped/q_sum
d.update({"skipped": skipped, "percentage": perc})
l.append(d)
l.sort(key=lambda x: x['skipped'], reverse=True)
l
[{'q': 'N00000047', 'skipped': 4, 'percentage': 0.16},
{'q': 'N00000049', 'skipped': 3, 'percentage': 0.12},
{'q': 'N00000050', 'skipped': 3, 'percentage': 0.12},
{'q': 'N00000051', 'skipped': 2, 'percentage': 0.08},
{'q': 'N00000052', 'skipped': 2, 'percentage': 0.08},
{'q': 'N00000048', 'skipped': 1, 'percentage': 0.04},
{'q': 'N00000053', 'skipped': 0, 'percentage': 0.0}]
I have been trying to rotate a list left and right in python
def rotate(l, r):
return l[r:] + l[:r]
l = eval(input())
r = int(input())
print(rotate(l, r))
but if i give input list as ['A','B','C','D',1,2,3,4,5] and r = -34 I'm getting output as
['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
but actual output is this :
['C', 'D', 1, 2, 3, 4, 5, 'A', 'B']
Can anyone tell how can I do it?
First you could use print() and test it for different values
def rotate(l, r):
return l[r:] + l[:r]
l = ['A','B','C','D',1,2,3,4,5]
print('len(l):', len(l))
for r in range(0, -34, -1):
print(f"{r:3}", rotate(l, r))
And you see
len(l): 9
0 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-1 [5, 'A', 'B', 'C', 'D', 1, 2, 3, 4]
-2 [4, 5, 'A', 'B', 'C', 'D', 1, 2, 3]
-3 [3, 4, 5, 'A', 'B', 'C', 'D', 1, 2]
-4 [2, 3, 4, 5, 'A', 'B', 'C', 'D', 1]
-5 [1, 2, 3, 4, 5, 'A', 'B', 'C', 'D']
-6 ['D', 1, 2, 3, 4, 5, 'A', 'B', 'C']
-7 ['C', 'D', 1, 2, 3, 4, 5, 'A', 'B']
-8 ['B', 'C', 'D', 1, 2, 3, 4, 5, 'A']
-9 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-10 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-11 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-12 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-13 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-14 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-15 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-16 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-17 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-18 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-19 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-20 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-21 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-22 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-23 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-24 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-25 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-26 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-27 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-28 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-29 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-30 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-31 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-32 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-33 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
When -r is bigger then len(r) then it doesn't work as you would expect.
It gets empty list + full list or full list + empty list
The same problem is with +34 and -34.
Because you get the same list for r=len(l), r=len(l)*2, ...r=len(l)*n so you would use modulo (r % len(l)) to have value smaller then len(l) and get what you need.
def rotate(l, r):
r = r % len(l)
return l[r:] + l[:r]
l = ['A','B','C','D',1,2,3,4,5]
print('len(l):', len(l))
for r in range(0, -34, -1):
print(f"{r:3}", rotate(l, r))
Result:
len(l): 9
0 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-1 [5, 'A', 'B', 'C', 'D', 1, 2, 3, 4]
-2 [4, 5, 'A', 'B', 'C', 'D', 1, 2, 3]
-3 [3, 4, 5, 'A', 'B', 'C', 'D', 1, 2]
-4 [2, 3, 4, 5, 'A', 'B', 'C', 'D', 1]
-5 [1, 2, 3, 4, 5, 'A', 'B', 'C', 'D']
-6 ['D', 1, 2, 3, 4, 5, 'A', 'B', 'C']
-7 ['C', 'D', 1, 2, 3, 4, 5, 'A', 'B']
-8 ['B', 'C', 'D', 1, 2, 3, 4, 5, 'A']
-9 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-10 [5, 'A', 'B', 'C', 'D', 1, 2, 3, 4]
-11 [4, 5, 'A', 'B', 'C', 'D', 1, 2, 3]
-12 [3, 4, 5, 'A', 'B', 'C', 'D', 1, 2]
-13 [2, 3, 4, 5, 'A', 'B', 'C', 'D', 1]
-14 [1, 2, 3, 4, 5, 'A', 'B', 'C', 'D']
-15 ['D', 1, 2, 3, 4, 5, 'A', 'B', 'C']
-16 ['C', 'D', 1, 2, 3, 4, 5, 'A', 'B']
-17 ['B', 'C', 'D', 1, 2, 3, 4, 5, 'A']
-18 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-19 [5, 'A', 'B', 'C', 'D', 1, 2, 3, 4]
-20 [4, 5, 'A', 'B', 'C', 'D', 1, 2, 3]
-21 [3, 4, 5, 'A', 'B', 'C', 'D', 1, 2]
-22 [2, 3, 4, 5, 'A', 'B', 'C', 'D', 1]
-23 [1, 2, 3, 4, 5, 'A', 'B', 'C', 'D']
-24 ['D', 1, 2, 3, 4, 5, 'A', 'B', 'C']
-25 ['C', 'D', 1, 2, 3, 4, 5, 'A', 'B']
-26 ['B', 'C', 'D', 1, 2, 3, 4, 5, 'A']
-27 ['A', 'B', 'C', 'D', 1, 2, 3, 4, 5]
-28 [5, 'A', 'B', 'C', 'D', 1, 2, 3, 4]
-29 [4, 5, 'A', 'B', 'C', 'D', 1, 2, 3]
-30 [3, 4, 5, 'A', 'B', 'C', 'D', 1, 2]
-31 [2, 3, 4, 5, 'A', 'B', 'C', 'D', 1]
-32 [1, 2, 3, 4, 5, 'A', 'B', 'C', 'D']
-33 ['D', 1, 2, 3, 4, 5, 'A', 'B', 'C']
BTW:
Without modulo you would have to use for-loops with [1:], [:1] or [-1:],[:-1] - but it need many moves - so it may need more time and memory (but for small list it is not visible).
def rotate(l, r):
if r >= 0:
for _ in range(0, r, 1):
l = l[1:] + l[:1]
else:
for _ in range(0, r, -1):
l = l[-1:] + l[:-1]
return l
l = ['A','B','C','D',1,2,3,4,5]
print('len(l):', len(l))
#for r in range(0, -34, -1):
# print(f"{r:3}", rotate(l, r))
for r in range(0, 34, 1):
print(f"{r:3}", rotate(l, r))
The same with one for-loop
def rotate(l, r):
if r >= 0:
s = 1
else:
s = -1
for _ in range(0, r, s):
l = l[s:] + l[:s]
return l
If r can be bigger than the list you need to add the modulo operater as #tim-roberts mentioned:
def rotate(l, r):
r = r % len(l)
return l[r:] + l[:r]
Outputs
l = [1,2,3]
print(rotate(l,0))
[1, 2, 3]
print(rotate(l,1))
[2, 3, 1]
print(rotate(l,-1))
[3, 1, 2]
print(rotate(l,4))
[2, 3, 1]
print(rotate(l,-4))
[3, 1, 2]
(personally I'd also turn around the rotation direction, using e.g. -r)
I want to create a list in Python from the dataset below:
import pandas as pd
df = pd.DataFrame({
`'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',` `'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],`
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
With some data wrangling by transposing the data to convert the value with this code:
#Transpose data
dfT = pd.DataFrame(df.T).reset_index(inplace=False)
dfT
#Data converting
df = df.replace({'C': -1, 'F': -2, 'Z': -3}).astype(int).T
df
The data frame look like this:
For example,
#in column 0, max value is 8,
#in column 1, max value is 8,
#in column 2, max value is 7,
.....
and so on until column 23.
Final result that I expected should be a list that consists a maximum value from each column:
max_val = [8,8,7,6,5,4,3,2,6,5,...,2,2,2,2,2,2,2,2,2,2]
You can try this:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
#Transpose data
dfT = pd.DataFrame(df.T).reset_index(inplace=False)
#Data converting
df = df.replace({'C': -1, 'F': -2, 'Z': -3}).astype(int).T
max_val = list(df.max())
print(max_val)
Output:
[8, 8, 7, 6, 5, 4, 3, 2, 6, 5, 4, 3, 2, 3, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2]
I have a dataframe in Python below:
import pandas as pd
df = pd.DataFrame({
'CRDACCT_DLQ_CYC_1_MNTH_AGO' : [3, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_2_MNTH_AGO': [4, 3, 3, 3, 3, 3, 2, 0, 5, 4, 3, 2, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2],
'CRDACCT_DLQ_CYC_3_MNTH_AGO': [8, 7, 6, 5, 4, 3, 2, 'F', 'F', 0, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'C', 'F', 'F'],
'CRDACCT_DLQ_CYC_4_MNTH_AGO' : [0, 2, 'F', 'F', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'F', 'C', 'F'],
'CRDACCT_DLQ_CYC_5_MNTH_AGO' : [2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_6_MNTH_AGO' : [2, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 0, 2, 0, 2, 0],
'CRDACCT_DLQ_CYC_7_MNTH_AGO' : [3, 3, 2, 'C', 'C', 'C', 'F', 0, 6, 5, 4, 3, 2, 2, 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'CRDACCT_DLQ_CYC_8_MNTH_AGO' : [5, 4, 4, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 'C', 'C', 0, 2, 2, 2, 'C', 'C', 0, 'Z'],
'CRDACCT_DLQ_CYC_9_MNTH_AGO' : [2, 2, 'C', 0, 2, 0, 2, 'C', 'C', 'C', 'C', 'C', 0, 3, 2, 'C', 'F', 'C', 'F', 'F', 'F', 'F', 'F', 'F'],
'CRDACCT_DLQ_CYC_10_MNTH_AGO' : [5, 4, 3, 2, 3, 2, 0, 2, 0, 2, 'C', 'C', 'F', 2, 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'C'],
'CRDACCT_DLQ_CYC_11_MNTH_AGO' : [4, 3, 2, 'F', 2, 0, 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z', 'Z'],
'CRDACCT_DLQ_CYC_12_MNTH_AGO' : ['F', 8, 7, 6, 5, 4, 3, 2, 'C', 'C', 'C', 0, 2, 'C', 'C', 0, 2, 0, 3, 2, 'C', 'C', 'F', 2]
})
df.head()
I want to transform those values (string value: C, F, and Z) into some categories with this condition:
if values in column CRDACCT_DLQ_CYC_1_MNTH_AGO, CRDACCT_DLQ_CYC_2_MNTH_AGO, ......., CRDACCT_DLQ_CYC_12_MNTH_AGO consist:
C = -1
F = -2
Z = -3
else value = value
Then I transpose the table to identify Month Since Dlq (MSD).
dfT =pd.DataFrame(df.T).reset_index(inplace=False)
dfT
I want to create a list with the name of MSD. MSD is identified for value if it is greater than 1 ( value > 1). For example, in index 2 CRDACCT_DLQ_CYC_1_MNTH_AGO = C or after it has changed = -1 which is not greater than 1. Then, check CRDACCT_DLQ_CYC_2_MNTH_AGO is greater than 1? CRDACCT_DLQ_CYC_2_MNTH_AGO = 3 is greater than 1. Hence, the MSD is 2 because it's in CRDACCT_DLQ_CYC_2_MNTH_AGO. Detail flow chart & overview table for identification .
The MSD value is between 1 and 12 depends on i in CRDACCT_DLQ_CYC_i_MNTH_AGO, for i = 1,2,3,...,12.
So the final result is a MSD list with 24 value, identified for each index 0 -23.
Does it what you are looking for:
# From your dataframe
MSD = df.T.apply(pd.to_numeric, errors='coerce').ge(1).idxmax(axis=0) \
.str.extract(r'CYC_(\d+)_MNTH', expand=False).astype(int).tolist()
print(MSD)
# Output:
[1, 1, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 7, 2, 2, 2, 2, 2, 2, 8, 2, 2, 6, 2]
I have a huge list of lists.
I want to Merge all child lists to parent list
and remove duplicates item from parent list after merge.
What is the optimized way to do this?
For Example:
x = [['a', 'b', 'c', 2, 4], ['x', 1, 2, 3, 'z'], ['z', 'b', 'y', 'a' 'x']]
How we can get the value of x like:
['a', 'b', 'c', 1, 2, 3, 4, 'z', 'y', 'x']
Use set and chain:
x = [['a', 'b', 'c', 2, 4], ['x', 1, 2, 3, 'z'], ['z', 'b', 'y', 'a' 'x']]
from itertools import chain
result = list(set(chain.from_iterable(x)))
print(result)
Use set
x = [['a', 'b', 'c', 2, 4], ['x', 1, 2, 3, 'z'], ['z', 'b', 'y', 'a' 'x']]
>>> list(set([item for sublist in x for item in sublist]))
[1, 2, 3, 4, 'z', 'ax', 'a', 'b', 'c', 'x', 'y']
first you can convert the list of list into a one list and than apply set to that list.
x = [['a', 'b', 'c', 2, 4], ['x', 1, 2, 3, 'z'], ['z', 'b', 'y', 'a' 'x']]
new_ls=[]
for ls in x:
new_ls.extend(ls)
print(list(set(new_ls))
output:
[1, 2, 3, 4, 'ax', 'b', 'y', 'x', 'c', 'z', 'a']