How to make a commonality?

How to make a commonality? - python

I'm making a commonality table for some data, and i need to have a subset table with the number of appearances of each value in descending order.
I have a table:
the end result for me is a list that the names do not appear, but the rows are the months with commonality, something like this:
i need it to be in descending order horizontally and vertically(number of appearance).

at the end what worked for me is:
import pandas as pd
import csv
# this function gets a string like: "line,num%" and returns the num in float
def take_second(elem):
elem = float(elem[elem.find(",") + 1:elem.find("%")])
return elem
# this function gets a list and sorts it in descending order according to the second variable of every cell
def sorting(lst):
lst.sort(key=lambda x: take_second(x), reverse=True)
return lst
data = pd.DataFrame.from_csv('commonality_input.csv')
# calculates the times each value appeared for a given name
a = pd.Series({col: data[col].value_counts() for col in data.columns})
for i, val in enumerate(a):
# calculates the % of each, and rounds to 2 decimal places.
a[i] = round(100*val/val.sum(), 2)
# converts all of the objects to a list called 'b'
b = pd.DataFrame(a.tolist())
# get the data frame size, and lists of headers for future use
line = list(b.columns.values)
name = list(b.index.values)
cols = len(b.columns)
rows = len(b)
for c in b.columns:
b[c] = b[c].apply(lambda t: "{},{}%".format(c, t))
# copy the DataFrame to a list of lists, and creates a new list with out the NaN values
lol = b.values.tolist()
data_list = []
for items in lol:
data_list.append([x for x in items if 'nan%' not in x])
# sort list values in descending values in every row, adds names back
for l in range(rows):
sorting(data_list[l])
data_list[l].insert(0, parameter.pop(0))
# sort list by value of the highest percentage of the second column in descending order
data_list.sort(key=lambda x: take_second(x[1]), reverse=True)
# writes the commonality table to a csv file
result_file = open("test.csv", 'w', newline='')
wr = csv.writer(result_file)
for item in data_list:
wr.writerow(item)
note that i'm pretty new to python, so there must be a more efficient way for this program.

Related

Pandas filter list of list values in a dataframe column

I have a dataframe like as below
sample_df = pd.DataFrame({'single_proj_name': [['jsfk'],['fhjk'],['ERRW'],['SJBAK']],
'single_item_list': [['ABC_123'],['DEF123'],['FAS324'],['HSJD123']],
'single_id':[[1234],[5678],[91011],[121314]],
'multi_proj_name':[['AAA','VVVV','SASD'],['QEWWQ','SFA','JKKK','fhjk'],['ERRW','TTTT'],['SJBAK','YYYY']],
'multi_item_list':[[['XYZAV','ADS23','ABC_123'],['ABC_123','ADC_123']],['XYZAV','DEF123','ABC_123','SAJKF'],['QWER12','FAS324'],['JFAJKA','HSJD123']],
'multi_id':[[[2167,2147,29481],[5432,1234]],[2313,57567,2321,7898],[1123,8775],[5237,43512]]})
I would like to do the below
a) Pick the value from single_item_list for each row
b) search that value in multi_item_list column of the same row. Please note that it could be list of lists for some of the rows
c) If match found, keep only that matched values in multi_item_list and remove all other non-matching values from multi_item_list
d) Based on the position of the match item, look for corresponding value in multi_id list and keep only that item. Remove all other position items from the list
So, I tried the below but it doesn't work for nested list of lists
for a, b, c in zip(sample_df['single_item_list'],sample_df['multi_item_list'],sample_df['multi_id']):
for i, x in enumerate(b):
print(x)
print(a[0])
if a[0] in x:
print(x.index(a[0]))
pos = x.index(a[0])
print(c[pos-1])
I expect my output to be like as below. In real world, I will have more cases like 1st input row (nested lists with multiple levels)

Here is one approach which works with any number of nested lists:
def func(z, X, Y):
A, B = [], []
for x, y in zip(X, Y):
if isinstance(x, list):
a, b = func(z, x, y)
A.append(a), B.append(b)
if x == z:
A.append(x), B.append(y)
return A, B
c = ['single_item_list', 'multi_item_list', 'multi_id']
df[c[1:]] = [func(z, X, Y) for [z], X, Y in df[c].to_numpy()]
Result
single_proj_name single_item_list single_id multi_proj_name multi_item_list multi_id
0 [jsfk] [ABC_123] [1234] [AAA, VVVV, SASD] [[ABC_123], [ABC_123]] [[29481], [5432]]
1 [fhjk] [DEF123] [5678] [QEWWQ, SFA, JKKK, fhjk] [DEF123] [57567]
2 [ERRW] [FAS324] [91011] [ERRW, TTTT] [FAS324] [8775]
3 [SJBAK] [HSJD123] [121314] [SJBAK, YYYY] [HSJD123] [43512]

The code you've provided uses a zip() function to iterate over the 'single_item_list', 'multi_item_list', and 'multi_id' columns of the DataFrame simultaneously.
For each iteration, it uses a nested for loop to iterate over the sublists in the 'multi_item_list' column. It checks if the first element of the 'single_item_list' is present in the current sublist, using the in operator. If it is present, it finds the index of the matching element in the sublist using the index() method, and assigns it to the variable pos. Then it prints the value in the corresponding index of the 'multi_id' column.
This code will work correctly, but it's only printing the matched value in multi_id column, it's not updating the multi_item_list and multi_id columns of the DataFrame.
In order to update the DataFrame with the matched values, you will have to use the .iloc method to update the Dataframe.
e.g: sample_df.iloc[i,j] = new_val
for i, (single, multi_item, multi_id) in enumerate(zip(sample_df['single_item_list'],sample_df['multi_item_list'],sample_df['multi_id'])):
for j, item_list in enumerate(multi_item):
if single[0] in item_list:
pos = item_list.index(single[0])
sample_df.at[i,'multi_item_list'] = [item_list]
sample_df.at[i,'multi_id'] = [multi_id[j]]
print(sample_df)
This will print the updated DataFrame with the filtered values in the 'multi_item_list' and 'multi_id' columns.
Please note that the print(sample_df) should be placed after the for loop to make sure the table is printed after the updates.
This code iterates over the 'single_item_list', 'multi_item_list', and 'multi_id' columns of the DataFrame simultaneously.
In each iteration, it uses a nested for loop to iterate over the sublists in the 'multi_item_list' column.
It checks if the first element of the 'single_item_list' is present in the current sublist, using the in operator. If it is present, it finds the index of the matching element in the sublist using the index() method, and assigns it to the variable pos.
Then it updates the 'multi_item_list' and 'multi_id' columns of the DataFrame at the current index with the matched value using the at method.
Please note that this code will remove the non-matching items from the 'multi_item_list' and 'multi_id' columns, if there is no matching item it will keep the original values.

I made use to isinstance to check whether it is a nested list or not and came up with something like below which results in expected output. Am open to suggestions and improvement for experts here
for i, (single, multi_item, multi_id) in enumerate(zip(sample_df['single_item_list'],sample_df['multi_item_list'],sample_df['multi_id'])):
if (any(isinstance(i, list) for i in multi_item)) == False:
for j, item_list in enumerate(multi_item):
if single[0] in item_list:
pos = item_list.index(single[0])
sample_df.at[i,'multi_item_list'] = [item_list]
sample_df.at[i,'multi_id'] = [multi_id[j]]
else:
print("under nested list")
for j, item_list in enumerate(zip(multi_item,multi_id)):
if single[0] in multi_item[j]:
pos = multi_item[j].index(single[0])
sample_df.at[i,'multi_item_list'][j] = single[0]
sample_df.at[i,'multi_id'][j] = multi_id[j][pos]
else:
sample_df.at[i,'multi_item_list'][j] = np.nan
sample_df.at[i,'multi_id'][j] = np.nan

How to group two lists of class objects based on an Attribute efficiently in Python?

I have two lists that both contain Objects from the same class. I want to group them together in a third list that contains lists or tuples of Objects with the same attribute value.
Example
Object1.time = 1
Object2.time = 2
Object3.time = 1
Object4.time = 2
Objekt5.time = 3
list1 = [Object1, Object2]
list2 = [Object3,Object4]
There result of the sorting should look like this:
result_list = [[Object1,Object3], [Object2,Object4], [Object5]]
I need to mention: I don't need the lists that contain only one object!
so, the final list should look like this:
final_result = [[Objekt1, Objekt3], [Objekt2, Objekt4]]
List1 contains 1500 objects, List2 over 70,000 the Problem is: if I use two for-loops to compare the objects it takes too long.
Here is my inefficient example:
class Example:
def __init__(self,time,example_attribute):
self.time = time
self.example_attribute = example_attribute
test_list1 = [1,1,2,3,4,5,6,6,7,8,9,9]
test_list2 = ["a","b","c","d","e","f","d","e","f","g","h","i"]
test_list3 = ["j","k","l","m","n","o","p","q","r","s","t","u"]
object_list1 = []
for i,j in zip(test_list1,test_list2):
object_list1.append(Example(i,j))
object_list2 = []
for i,j in zip(test_list1,test_list3):
object_list2.append(Example(i,j))
# How to group both lists together by the time attribute? This part takes too long.
group_by_time = []
for i in object_list1:
my_list = [i]
for j in object_list2:
if i.time == j.time:
my_list.append(j)
group_by_time.append(my_list)
for sub_list in group_by_time:
for index, item in enumerate(sub_list):
if index == 0:
print(item.time, ",",item.example_attribute,end =",")
else:print(item.example_attribute, end = ",")
print("")```

Use a dictionary, which is how you idiomatically group things:
import itertools
grouped = {}
for obj in itertools.chain(list1, list2):
grouped.setdefault(obj.time, []).append(obj)
Now you have a dictionary mapping the time attribute to a list of objects. You can get a list of list if you really want, something like:
final = list(grouped.values())
If you want to omit lists with only a single value, you can do something like:
final = [v for v in grouped.values() if len(v) > 1]

Sorting a list in order of highest to lowest

Hi there new to python here, so recently I was learning how to code and I encountered this problem.
myfile = open('Results.txt')
title = '{0:20} {1:20} {2:20} {3:20} {4:20}'.format('Nickname','Matches Played','Matches Won','Matches Lost','Points')
print(title)
for line in myfile:
item = line.split(',')
points = int(item[2]) * 3
if points != 0:
result = '{0:20} {1:20} {2:20} {3} {4:20}'.format(item[0], item[1], item[2], item[3],point)
print(result)
So I have been given a file and I am suppose to sort the list in order of highest to lowest by points. To calculate points I will need to do the amount of matches won * 3 and print a sorted list of names and other from top to bottom. Here's the list.
1)Leeri,19,7,12
2)Jenker,19,8,11
3)Tieer,19,0,19
5)Baby Boss,19,7,12
6)Gamered,19,5,14
7)Dogman,19,3,16
8)Harlock,19,6,13
9)Billies,19,7,12
How do you do it? Do you need like a sorting algorithm?

It' s actually pretty easy:
f = open("Results.txt")
title = ("{:20}" * 5).format(
"Nickname",
"Matches Played",
"Matches Won",
"Matches Lost",
"Points"
)
print(title)
lines = [i.rstrip().split(',') for i in f] # this is a generator expression
f.close()
lines.sort(reverse=True, key=lambda x: int(x[2]) * 3) # sorts the list
# reverse = reversed order. Python normally sorts from small to high.
print("\n".join('{:20}' * 5).format(*(i + [int(x[2]) * 3])))
# f(*l) calls f with l as its arguments
# (note the plural. so f(*[1, 2, 3]) is the same as f(1, 2, 3))
# list1 + list2 concatenates them.

I would do something like this:
scores = []
myfile = open('Results.txt')
for line in myfile:
scores.append(line.split(','))
sortedScores = sorted(scores,key=lambda x: x[2]*3)
This will create a list of list (each sublist is an itemas you called it), then sort it by the third element, that is the total wins.
Note:
The key=lambda x: x[2]*3 is a parameter given to sorted to specify sorting criteria. For each item in scores the lambda function is called. The item is a list, and we return it's third element multiplied by three, that is the value to sort by.

Compare 1 column of 2D array and remove duplicates Python

Say I have a 2D array like:
array = [['abc',2,3,],
['abc',2,3],
['bb',5,5],
['bb',4,6],
['sa',3,5],
['tt',2,1]]
I want to remove any rows where the first column duplicates
ie compare array[0] and return only:
removeDups = [['sa',3,5],
['tt',2,1]]
I think it should be something like:
(set first col as tmp variable, compare tmp with remaining and #set array as returned from compare)
for x in range(len(array)):
tmpCol = array[x][0]
del array[x]
removed = compare(array, tmpCol)
array = copy.deepcopy(removed)
print repr(len(removed)) #testing
where compare is:
(compare first col of each remaining array items with tmp, if match remove else return original array)
def compare(valid, tmpCol):
for x in range(len(valid)):
if valid[x][0] != tmpCol:
del valid[x]
return valid
else:
return valid
I keep getting 'index out of range' error. I've tried other ways of doing this, but I would really appreciate some help!

Similar to other answers, but using a dictionary instead of importing counter:
counts = {}
for elem in array:
# add 1 to counts for this string, creating new element at this key
# with initial value of 0 if needed
counts[elem[0]] = counts.get(elem[0], 0) + 1
new_array = []
for elem in array:
# check that there's only 1 instance of this element.
if counts[elem[0]] == 1:
new_array.append(elem)

One option you can try is create a counter for the first column of your array before hand and then filter the list based on the count value, i.e, keep the element only if the first element appears only once:
from collections import Counter
count = Counter(a[0] for a in array)
[a for a in array if count[a[0]] == 1]
# [['sa', 3, 5], ['tt', 2, 1]]

You can use a dictionary and count the occurrences of each key.
You can also use Counter from the library collections that actually does this.
Do as follows :
from collection import Counter
removed = []
for k, val1, val2 in array:
if Counter([k for k, _, _ in array])[k]==1:
removed.append([k, val1, val2])

Python - Sorting a list of strings (names and scores) from highest to lowest

Lets say I had this list of names and scores:
scores_list = ["Username1,9,5,6", "Username2,7,6,8", "Username3,10,10,7"]
Each value in the list is formatted as -Name-Score1-Score2-Score3-
How would I sort this list so that the user with the highest score from their past 3 results is first all the way down to the lowest score last.
So, in this case, I want it would be ordered like this
("Username3,10,10,7", "Username1,9,6,5", "Username2,8,7,6")
as 10 is the higest, then 9, then 8.
So, essentially, I need the three scores to be put in order from higest to lowest, Then the highest score from each value in the list to be ordered from highest to lowest too.
Maybe I need to give each value in the list its own index so it's a 2d list?
Thanks for any help.

You will first need to split up each entry and sort the scores numerically. The resulting list can then be recombined to create the updated scores_list as follows:
scores_list = ["Username1,9,5,6", "Username2,7,6,8", "Username3,10,10,7"]
output = []
for entry in scores_list:
values = entry.split(',')
output.append([values[0], sorted([int(x) for x in values[1:]], reverse=True)])
scores_list = ['{},{}'.format(entry[0], ','.join(str(x) for x in entry[1])) for entry in sorted(output, key=lambda x: x[1], reverse=True)]
print scores_list
This would display the following:
['Username3,10,10,7', 'Username1,9,6,5', 'Username2,8,7,6']

Using sorted or .sort method and key attribute of sort
Code:
scores_list = ["Username1,920,5,6", "Username2,10,11,8", "Username3,10,10,7"]
def transform(lst):
return map(int, lst.split(",")[1:])
print sorted(scores_list, key= transform ,reverse=True)
Output:
['Username3,10,10,7', 'Username1,9,5,6', 'Username2,7,6,8']
Notes:
We are calling tranform function on the key attribute on sorted
In transform function we slipt at , and transform them to integer list and give it to key

First, user a function to convert each element like Username1,9,5,6 to a list, like ["Username1",9,5,6]. At the same time, sort the three scores, the final form of each element is like ["Username1",9,6,5], then sort the transferred data. Finally, join elements of lists of the sorted list
scores_list = ["Username1,9,5,6", "Username2,7,6,8", "Username3,10,10,7"]
def transfer(x):
items = x.split(",")
result = [items[0]] + sorted(map(int,items[1:]), reverse = True)
return result
transfered = map(transfer,scores_list)
transfered.sort(key=lambda x:x[1], reverse = True)
result = [",".join(map(str,x)) for x in transfered]
print result
>>>output:
['Username3,10,10,7', 'Username1,9,6,5', 'Username2,8,7,6']

If you are reading this from a file you would be much better using the csv module to parse so the data:
from csv import reader
with open("your_file") as f:
srt = sorted(([row[0]] + sorted(map(int, row[1:]), reverse=1) for row in reader(f)), reverse=1)
for row in srt:
print(row)
Output:
['Username3', 10, 10, 7]
['Username2', 8, 7, 6]
['Username1', 9, 6, 5]
If you plan on using the data then having it separated will make life much easier.
When you are finished with the data you can rewrite to file using a csv.writer:
with open("your_file","w") as out:
wr = csv.writer(out)
wr.writerows(srt)
Which will give you:
Username3,10,10,7
Username2,8,7,6
Username1,9,6,5

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make a commonality? - python

Related

Pandas filter list of list values in a dataframe column

How to group two lists of class objects based on an Attribute efficiently in Python?

Sorting a list in order of highest to lowest

Compare 1 column of 2D array and remove duplicates Python

Python - Sorting a list of strings (names and scores) from highest to lowest

Categories

Resources