How to match specific list sequence against dictreader row? - python

I have the following lists:
main_list:
[4, 1, 5]
iterated lists/two rows from dict:
['John', '1', '4', '3']
['Mary', '4', '1', '5']
the iterated list is from the below, dictionary being csv.DictReader(x):
for row in dictionary:
print(list(row.values()))
I want the below to work, where if my main_list matches a sequence from the dictionary list, it will spit out the first column, in which the header is 'name':
if main_list in list(row.values()):
print(row['name'])
For the example above, as Mary's items match 4, 1, 5, the final returned value should be Mary.
I'm new to Python, and I would appreciate any advice on how to work this out.

You can use extended tuple unpacking to split a row into its name and the rest.
name,*therest = `['Mary', '4', '1', '5']
Then make the comparison
test = [4, 1, 5]
if therest == [str(thing) for thing in test]:
print(name)

Related

How to select elements of lists in a list group, if the elements(string) startswith a letter/number?

Here I want to select the elements in each list which meet the condition that they starts with '6'. However I didn't find the way to achieve it.
The lists are converted from a dataframe:
d = {'c1': ['64774', '60240', '60500', '19303', '38724', '11402'],
'c2': ['', '95868', '95867', '60271', '60502', '19125'],
'c3':['','','','','95867','60500']}
df= pd.DataFrame(data=d)
df
c1 c2 c3
64774
60240 95868
60500 95867
19303 60271
38724 60502 95867
11402 19125 60500
list = df.values.tolist()
list = str(list)
list
[['64774', '', ''],
['60240', '95868', ''],
['60500', '95867', ''],
['19303', '60271', ''],
['38724', '60502', '95867'],
['11402', '19125', '60500']]
I tried the code like:
[x for x in list if x.startswith('6')]
However it only returned '6' for elements meet the condition
['6', '6', '6', '6', '6', '6', '6', '6', '6']
What I'm looking for is a group of lists like:
"[['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]"
When you do list = str(list) you're converting your list to a string representation, i.e. list becomes
"[['64774', '', ''], ['60240', '95868', ''], ['60500', '95867', ''], ['19303', '60271', ''], ['38724', '60502', '95867'], ['11402', '19125', '60500']]"
You then loop through the string with the list comprehension
[x for x in list if x.startswith('6')]
Which produces each individual character in the string which means you just find all occurrences of 6 in the string, hence your result of
['6', '6', '6', '6', '6', '6', '6', '6', '6']
Sidenote: Don't use variable names that shadow builtin functions, like list, dict and so on, it will almost definitely cause issues down the line.
I'm not sure if there is any specific reason to use a dataframe/pandas for your question. If not, you could simply use a list comprehension
d = {
'c1': ['64774', '60240', '60500', '19303', '38724', '11402'],
'c2': ['', '95868', '95867', '60271', '60502', '19125'],
'c3':['','','','','95867','60500']
}
d2 = [[x] for v in d.values() for x in v if x.startswith('6')]
# d2: [['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]
You don't need to convert your list into str(list) since it is already string type.
lst = df.values.tolist()
lst = [[i] for l in lst for i in l if i.startswith('6') ]
print(lst)
Result:
[['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]
Try this:
flatten = lambda l: [[item] for sublist in l for item in sublist]
print( flatten([ df[col][df[col].str.startswith("6") ].tolist() for col in df]))
Here, I used a list generator that collects all matching cells in a list, while iterating over the columns; this yields [['64774', '60240', '60500'], ['60271', '60502'], ['60500']]. To get to your desired output, I defined a function flatten which (somewhat) flattens that list to [['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']].

Python: Inserting into a list using length as index

All,
I've recently picked up Python and currently in the process of dealing with lists. I'm using a test file containing several lines of characters indented by a tab and then passing this into my python program.
The aim of my python script is to insert each line into a list using the length as the index which means that the list would be automatically sorted. I am considering the most basic case and am not concerned about any complex cases.
My python code below;
newList = []
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
newList.insert(size, data)
for i in range(len(newList)):
print ( newList[i])
My 'test' file below;
2 2 2 2
1
3 2
2 3 3 3 3
3 3 3
My expectation of the output of the python script is to print the contents of the list in the following order sorted by length;
['1']
['3', '2']
['3', '3', '3']
['2', '2', '2', '2']
['2', '3', '3', '3', '3']
However, when I pass in my test file to my python script, I get the following;
cat test | ./listSort.py
['2', '2', '2', '2']
['1']
['3', '2']
['3', '3', '3']
['2', '3', '3', '3', '3']
The first line of the output ['2', '2', '2', '2'] is incorrect. I'm trying to figure out why it isn't being printed at the 4th line (because of length 4 which would mean that it would have been inserted into the 4th index of the list). Could someone please provide some insight into why this is? My understanding is that I am inserting each 'data' into the list using 'size' as the index which means when I print out the contents of the list, they would be printed in sorted order.
Thanks in advance!
Inserting into lists work quite differently than what you think:
>>> newList = []
>>> newList.insert(4, 4)
>>> newList
[4]
>>> newList.insert(1, 1)
>>> newList
[4, 1]
>>> newList.insert(2, 2)
>>> newList
[4, 1, 2]
>>> newList.insert(5, 5)
>>> newList
[4, 1, 2, 5]
>>> newList.insert(3, 3)
>>> newList
[4, 1, 2, 3, 5]
>>> newList.insert(0, 0)
>>> newList
[0, 4, 1, 2, 3, 5]
Hopefully you can see two things from this example:
The list indices are 0-based. That is to say, the first entry has index 0, the second has index 1, etc.
list.insert(idx, val) inserts things into the position which currently has index idx, and bumps everything after that down a position. If idx is larger than the current length of the list, the new item is silently added in the last position.
There are several ways to implement the functionality you want:
If you can predict the number of lines, you can allocate the list beforehand, and simply assign to the elements of the list instead of inserting:
newList = [None] * 5
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
newList[size - 1] = data
for i in range(len(newList)):
print ( newList[i])
If you can predict a reasonable upper bound of the number of lines, you can also do this, but you need to have some way to remove the None entries afterwards.
Use a dictionary:
newList = {}
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
newList[size - 1] = data
for i in range(len(newList)):
print ( newList[i])
Add elements to the list as necessary, which is probably a little bit more involved:
newList = []
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
if len(newList) < size: newList.extend([None] * (size - len(newList)))
newList[size - 1] = data
for i in range(len(newList)):
print ( newList[i])
I believe I've figured out the answer to my question, thanks to mkrieger1. I append to the list and then sort it using the length as the key;
newList = []
for line in sys.stdin:
data = line.strip().split('\t')
newList.append(data)
newList.sort(key=len)
for i in range(len(newList)):
print (newList[i])
I got the output I wanted;
/listSort.py < test
['1']
['3', '2']
['3', '3', '3']
['2', '2', '2', '2']
['2', '3', '3', '3', '3']

Any built-in function in pandas/python which converts a list like data into a list

Suppose I have a data like [1 2 3 4]
Any built-in function in pandas/python which converts into a list
like [1, 2, 3, 4]
Yes, there is the split function, however before calling it on your string, you must get rid of the [ and ], or else instead of ['1', '2', '3', '4'], you will get ['[1', '2', '3', '4]'], so instead of s.split(), we do s[1:-1].split(), also this means your list is strings instead of ints ('1' instead of 1), but that is easily fixable:
[int(i) for i in s[1:-1].split()]

Reading both numbers in an integer instead of the first when sorting

I'm trying to sort data from a text file and show it in python.
So far i have:
text_file = open ("Class1.txt", "r")
data = text_file.read().splitlines()
namelist, scorelist = [],[]
for li in data:
namelist.append(li.split(":")[0])
scorelist.append(li.split(":")[1])
scorelist.sort()
print (scorelist)
text_file.close()
It sorts the the data, however it only reads the first number:
['0', '0', '10', '3', '3', '5']
It reads 10 as "1"
This is what my text file looks like:
Harry:3
Jarrod:10
Jacob:0
Harold:5
Charlie:3
Jj:0
It's lexographically sorting, if you need integer sorting, append the split as an int
scorelist.append(int(li.split(":")[1]))
Since scorelist is a list of strings, "10" shows up before "3" because the first character in "10" is less than the first character in "3" (lexicographic sorting -- like words in a dictionary). The trick here is to tell python to sort integers. You can do that as the other answers point out by sorting a list of integers rather than a list of strings, OR you could use a key function to sort:
scorelist.sort(key=int)
This tells python to sort the items as integers rather than as strings. The nice thing here is that you don't need to change the data at all. You still end up with a list of strings rather than a list of integers -- you just tell python to change how it compares the strings. Neat.
demo:
>>> scorelist = ['3', '10', '0', '5', '3', '0']
>>> scorelist_int = [int(s) for s in scorelist]
>>>
>>> scorelist.sort(key=int)
>>> scorelist
['0', '0', '3', '3', '5', '10']
>>>
>>> scorelist_int.sort()
>>> scorelist_int
[0, 0, 3, 3, 5, 10]
The data are actually strings. The sort is done like in a dictionary.
You should convert scores into int:
scorelist.append(int(li.split(":")[1]))

Using list.index with duplicate items inside the list in Python

I'm working in python 3.4
I have a problem with a piece in a program that is supposed to return all nested lists which first value is the biggest value.
I first tried the following code:
L = [['5','4','3'], ['23', '40', '8'], ['33', '24', '29'], ['33', '24', '29'],
['13', '66', '54'], ['5', '4', '3']]
BigNumFirst = []
for i in L:
if i[0] > i[1] and i[0] > i[2]:
BigNumFirst.append(L.index(i))
print(BigNumFirst)
And got the following output:
[0, 2, 2, 0]
As you can see the problem is that list.index() only returns the first matching nested list, so the index for the duplicate nested lists is not correct. I want the output to be:
[0, 2, 3, 5]
I can't figure out how I should solve this, at first I thought I could just add a variable that kept count of how many of a duplicate that existed inside of
BigNumFirst
but that of course only works if there's only one nested list with duplicates as my attempt showed:
BigNumFirst = []
NumbOfCopys=0
for i in L:
if i[0] > i[1] and i[0] > i[2]:
if L.index(i) in BigNumFirst:
NumbOfCopys+=1
BigNumFirst.append(L.index(i)+NumbOfCopys)
print(BigNumFirst)
output:
[0, 2, 3, 2]
As you can see the last number is still wrong. So, how would I do to make my program "know" what index a nested list has, even if it is a duplicate of a previous nested list?
Simply you can use enumerate and list comprehension :
>>> [i for i,j in enumerate(L) if max(j)==j[0]]
[0, 2, 3, 5]

Categories