Comparing nested lists - python

I have two lists, one of the form:
[a1, b1, [[c1, d1, e1], [f1, g1, h1], etc, etc], etc]
and the other, a dictionary, whose entries are in the form:
[[a2, b2, c2], [d2, e2, f2], etc, etc].
I need to compare the first entries of those two sub lists there and find any which are the same, and any in the first which don't appear at all in the second.
Foe example, if c1 = d2, I'd want to know, and if f1 isn't equal to either a2 or d2, I'd want to know that.
Anyway, I'm having a bit of trouble implementing this properly, any help would be appreciated.
Thanks!
(I'm not sure how clear the list formats are to understand, sorry if they're still confusing)
CODE SAMPLE:
for row in range(0, len(command[2])):
counter = 0
for nodeRows in range(0, len(nodeTable[command[0]])):
if nodeTable[command[0]][nodeRows][0] == command[2][row][0]:
if ((command[2][row][2]) + 1) < nodeTable[command[0]][nodeRows][2]:
counter += 1
newrow = command[2][row]
newrow[1] = command[1]
newrow[2] = newrow[2] + 1
nodeTable[command[0]][nodeRows] = newrow
change = 'true'
I imagine this doesn't help. The code is a bit monolithic (that's why I didn't post it initially). But I'm basically trying to compare two values. The first values of the items from the list in the 3rd position of another list and the first values of the items from the lists contained in another list.
Um...sorry. I have tried making the code simpler, but it's a bit complicated.

I'm not sure I understand correctly your problem but I'll give it a try.
I guess you need to compare only the first element of every
sublist of 3 elements.
So first I separate all the first elements, then make the comprarisson.
Here is the code with some doctest so you can check if it does what you
are asking:
def compare(l0, l1):
"""
>>> l0 = [1, 2, [[10, 20, 30], [40, 50, 60], [70, 80, 90]], 3]
>>> l1 = [[11, 21, 31], [41, 51, 61], [71, 81, 91]]
>>> compare(l0, l1)
([], [10, 40, 70])
>>> l0 = [1, 2, [[10, 20, 30], [40, 50, 60], [70, 80, 90]], 3]
>>> l1 = [[10, 21, 31], [41, 51, 61], [71, 81, 91]]
>>> compare(l0, l1)
([10], [40, 70])
>>> l0 = [1, 2, [[10, 20, 30], [40, 50, 60], [70, 80, 90]], 3]
>>> l1 = [[10, 21, 31], [40, 51, 61], [70, 81, 91]]
>>> compare(l0, l1)
([10, 40, 70], [])
"""
first_entries_l0 = [x[0] for x in l0[2]]
first_entries_l1 = [x[0] for x in l1]
equals = [x for x in first_entries_l0 if x in first_entries_l1]
unique = [x for x in first_entries_l0 if x not in first_entries_l1]
return equals, unique
To test the code just copy it to a file 'code.py' and run it with:
python -m doctest code.py
You could make it more efficient using sets and looping only once but I'm not even sure this solves your problem so I'll leave that to you.

The answer is: transform your current datastructure to a proper one. Presumably the inputs are be defined by yourself, so you should not write better code to deal with ugly structures, but improve the structures. If you are writing against a bad API, map the API to a useful structure.
You will have to post the whole code to get a proper answer, because the problem is in the definitions. I guess you will have to refactor the whole module and start again, because this is simply bad code.
Some ideas: could command be a tree? a queued list? a matrix? a class? why does the length of the items vary, and why do you want to compare different subitems? Try using classes and override __cmp__.

Related

Trying to swap the first and last elements of a list in Python, but it works only for a predefined list. Not for a list of randomly generated numbers

I was trying to create a python program which swaps the first and last elements of a list. I passed a pre-created list into the algorithm and it worked perfectly. Here's my code:
def swapFirstAndLast(list_to_be_swapped):
size = len(list_to_be_swapped)
list_to_be_swapped[0],list_to_be_swapped[size-1] = list_to_be_swapped[size-1],list_to_be_swapped[0]
return list_to_be_swapped
l = [12,33,42,76,46,97]
swapFirstAndLast(l)
print(l)
Output:
[97, 33, 42, 76, 46, 12]
Then I tried to create functions; one function to create a list of randomly generated numbers, and the second function to perform the swapping operation. Although everything makes sense to me, it is not performing the swapping operation now. This is the code I came up with:
import random
def generateList(size):
list1 = []
for i in range(size):
list1.append(random.randint(0,99))
return list1
def swapFirstAndLast(list_to_be_swapped):
size = len(list_to_be_swapped)
list_to_be_swapped[0],list_to_be_swapped[size-1] = list_to_be_swapped[size-1],list_to_be_swapped[0]
return list_to_be_swapped
l = generateList(5)
l1 = swapFirstAndLast(l)
print(l,l1)
Output:
[49, 78, 63, 82, 72] [49, 78, 63, 82, 72]
As you can see, it does not perform the swapping operation now. I am not able to understand where I am going wrong.
You are swapping the first and the last element of the initial list (i.e., l) too! Please look at this slightly modified example:
import random
def generateList(size):
list1 = []
for i in range(size):
list1.append(random.randint(0,99))
return list1
def swapFirstAndLast(list_to_be_swapped):
size = len(list_to_be_swapped)
list_to_be_swapped[0],list_to_be_swapped[size-1] = list_to_be_swapped[size-1],list_to_be_swapped[0]
return list_to_be_swapped
l = generateList(5)
print(l)
l1 = swapFirstAndLast(l)
print(l, l1)
Output:
[54, 14, 3, 38, 87]
[87, 14, 3, 38, 54] [87, 14, 3, 38, 54]
As you can see, the list l has been changed.
The thing here is that you are not creating a new list, but you're modifying the existing one. It doesn't matter if it has a different name within the function.
If you want to retain the original list l, and also return a separate swapped list l1, you have to create a new list! Here is how you can do it:
import random
def generateList(size):
return [random.randint(0, 99) for _ in range(size)]
def swapFirstAndLast(list_to_be_swapped):
new_list = list_to_be_swapped.copy()
new_list[0], new_list[-1] = new_list[-1], new_list[0]
return new_list
l = generateList(5)
print(l)
l1 = swapFirstAndLast(l)
print(l, l1)
Output:
[38, 59, 86, 26, 19]
[38, 59, 86, 26, 19] [19, 59, 86, 26, 38]
your program works ! your function just modifies the list directly, you can see it better if you do this :
l = generateList(5)
print(l)
l1 = swapFirstAndLast(l)
print(l1)
It turns out that you have already swapped the list (i.e. l) it's just when your print (l,l1) that it looks like you haven't swapped it because it's printing the swapped version of (l). put the print(l) line above ( l1 = swapFirstAndLast(l) ) to see it!
the swapping can be done by using index:
def swapFirstAndLast(lst):
lst[0], lst[-1] = lst[-1], lst[0]
return lst
lst = [12,33,42,76,46,97]
print(swapFirstAndLast(lst))
result is: [97, 33, 42, 76, 46, 12]

Trying to figure out how to use append() to create a list like the second one; it makes an additional tuple not in OG list

data = [[10, 20, 30],
[40, 50, 60]]
how would you use append() to make data as:
>>> data
[[10, 20, 30], [40, 50, 60], [70, 80, 90]]
It's pretty simple, just pass the part you want to include in the list [70, 80, 90] (as you call it tuple) to the .append() method of the data list, see below.
data = [[10, 20, 30], [40, 50, 60]]
data.append([70, 80, 90])
print(data)
Be wary however that in python a tuple is a different data type then a list. Both can contain multiple elements, but they have the key difference that tuples are immutable (not changeable after assigned). Here is a tutorial on differences between lists and tuples in python
In Python this
a = (2,3,4)
print(a)
Is a immutable tuple
And this
a = [2,3,4]
print(a)
Is a mutable list.

Split a list in for loop based on indices of list

This is a simplified version of some code im working on, a kind of toy model so i can focus only on the bit thats troubling me. This is why i have a function defined for finding the minimum, rather than simply using the numpy command. In my proper code, a fucntion is required, since its not as simple as using a pre-existing numpy function.
What i want to do is split my list into 2 list at the minimum point - so given what i have here, i would ideally get [15, 62, 49, 49, 4] and [100, 71, 16, 70, 62] . HOwever, i dont get this. I get all the points stored in node1, and nothing in node2. i really cant figure out why - as you can see, ive tried 2 ways of doing the for loop. i know completely the idea of the code, that the loop should run over the indices of the list and then store the values of these indices in the new lists, node1 and node2. what am i doing wrong?
#generate random list of numbers
import random
randomlist = [] #==profile_coarse
for i in range(0,10):
n = random.randint(1,100)
randomlist.append(n)
print(randomlist)
#define a function to find minimum
def get_min(list):
mini=np.min(randomlist)
return mini
node1=[]
node2=[]
for i in enumerate(randomlist):
if i<=get_min(randomlist):
node1.append(randomlist[i])
else:
node1.append(randomlist[i])
#OR
for i in range(0,len(randomlist)):
if i<get_min(randomlist):
node1.append(randomlist[i])
else:
node1.append(randomlist[i])
print(node1,node2)
which yields
[15, 62, 49, 49, 4, 100, 71, 16, 70, 62] []
you could use the built-in functions enumerate and min:
randomlist = [15, 62, 49, 49, 4, 100, 71, 16, 70, 62]
#define a function to find minimum
def get_min(l):
mini= min(enumerate(l), key=lambda x: x[1])
return mini
index_min, min_val = get_min(randomlist)
node1 = randomlist[:index_min + 1]
node2 = randomlist[index_min + 1:]
print(node1, node2)
print(node1)
print(node2)
output:
[15, 62, 49, 49, 4] [100, 71, 16, 70, 62]
enumerate returns a tuple of a a counter (by default starting at 0) and the one value from the iterable you are passing.
In your code i would be (0, 15), (1, 62)... etc.
My guess is that you simply want to do for i in randomlist
Keep in mind using a min function will in any case put all the values in one node and not in 2 as you want it to do.

Using a function to append a unique identifier to each sub-list within an outputted list of lists?

I have a function I made it to scrape table data from a long list of URLs (baseball statistics). Each URL input contains a unique table for a single player, with multiple rows of data. The rows on each URL represent all of the seasons in a player's career. The input parameter is, of course, a list of all URLs that I am scraping.
So the overall list of lists of lists contains statistical data of several players. For each player, we have multiple rows representing all of the years of their career.
All of the URLs are from the same domain, but with different extensions. Example list:
input_list = ['www.baseball.com/BarryBonds01', 'www.baseball.com/JohnRSmith01', 'www.baseballl.com/MickyJMantle01', 'www.baseball.com/JohnJSmith02, www.baseball.com/MickySMantle02]
However, the tables on each URL page do not contain a unique identifier. So when I create the final list of lists of lists and the final dataframe, I have a long list of columns with all of my data but nothing that uniquely identifies each sub-list within the overall dataframe.
How can I append a unique identifier for each sub-list (for each player)? An ideal identifier would be the URL extension, but I can't figure out the code to make this happen.
Currently, my output list of lists of lists looks something like this (each list of lists within the broader list of lists of lists is a single player):
output_list = [[[45, 54, 23, 23], [44, 22, 11, 55]], # Player A
[[32, 23, 54, 23], [223, 44, 55, 66], [23, 67, 74, 24]], # Player B
[[32, 46, 77, 44], [24, 65, 24, 44]], # Player C
[[23, 2, 5, 7], [22, 455, 44, 332]], # Player D
[[33, 33, 22, 55], [88, 2, 4, 66], [1, 0, 0, 8], [3, 3, 5, 6]]] # Player E
The output figure, however, looks like this -- with no identification of the row data belonging to particular players.
Here is a better representation of my output list:
output_list = [[45, 54, 23, 23], [44, 22, 11, 55], # Player A
[32, 23, 54, 23], [223, 44, 55, 66], [23, 67, 74, 24], # Player B
[32, 46, 77, 44], [24, 65, 24, 44], # Player C
[23, 2, 5, 7], [22, 455, 44, 332], # Player D
[33, 33, 22, 55], [88, 2, 4, 66], [1, 0, 0, 8], [3, 3, 5, 6]] # Player E
This is not a very robust method to achieve what you want, but without more details it is difficult to advice. In a crunch, this should work:
# Since input and output are same length and aligned
# We enumerate output to get the position, and value
# Then use the position to find the corresponding element in input
# Slice an identifier corresponding input
# Append to all relevant output
for index, player in enumerate(output_list):
# Slice URL from '/' onwards
identifier = input_list[index][input_list[index].find('/'):]
# loop through all players for stats
for stats in player:
# append identifier for each list of stat
stats.append(identifier)
This should give you an additional column with the identifier when you convert the list of list of list to a df.
There are better ways to accomplish this, like .zip() or pd.DataFrame.from_dict(), but this should fit right into your code without much changes downstream.
I will leave the original answer for future reference for other users.
With regards to your new output format, there is no possible way to format it after scraping. The 'easiest' way is to add the identifier using the above method, during scraping.
For example:
master_list = []
for URL in input_list:
identifier = get_identifier(URL)
temp_list = run_scrape(URL)
for stats in temp_list
stats.append(identifier)
master_list = master_list + temp_list
I cannot give you concrete code since I don't know how you are doing it. All the functions you need can be derived from above samples, or can just be as is.
The general idea is to add the identifier every time you get a new set of data corresponding to your URL, before adding the 'identified-data' to a master_list, then go to the next URL.
Depending on how you are doing the scraping, you may or may not have control over the iteration process.
If you do, the above should work
If you don't, please check the docs for your library (there should be some method that allows for such insertion)
If the method does not exist, you can check the docs for smaller functions that give you more control (though your code will be more complex)
You can check S/O to see if anyone has a similar problem with the library you are using, and how they are solving it. Otherwise, you can ask a question specific to your library.
This is not a pandas problem anymore

sum of surrounding elements in a list

I'm writing a code which calculates the sum of the numbers beside it.
For example, list1 = [10, 20, 30, 40, 50], the new list = [30 (10+20), 60 (10+20+30), 90 (20+30+40), 120 (30+40+50), 90 (40+50)]. => final list = [30, 60, 90, 120, 90].
At the moment my idea was of using a for loop but it was totally off.
You can do it by creating triplets using zip:
# pad for first and last triplet
lst = [0] + original + [0]
# summarize triplets
sums = [sum(triplet) for triplet in zip(lst, lst[1:], lst[2:])]
Example:
>>> original = [10, 20, 30, 40, 50]
>>> lst = [0] + original + [0]
>>> sums = [sum(triplet) for triplet in zip(lst, lst[1:], lst[2:])]
>>> sums
[30, 60, 90, 120, 90]
>>>
Check out this guy's flatten function What is the fastest way to flatten arbitrarily nested lists in Python?
Take the result of the flattened lists of lists and sum the collection normally with a for loop, or a library that provides a count utility for collections.

Categories