Partially merge to list in python - python

Classic, but i'm new to python... and have a problem a can't manage to solve. I'm assuming it's fairly easy.
I have two csv files, one scraped from the web(a=[]), containing 20000+ lines, the other exported from a local system [b=[]] 80+ lines.
I have open the files and stored the data in list a and b. Theey are structured like the example below.
a = [[1,'a','b','a#b',11],
[2,'c','d','c#b',22],
[3,'e','f','e#b',33]]
b = [['a','banana','A',100],
['e','apple','A',100]]
Now i would like to go through list a and when index 1 of every sublist in list a is equal to index 0 of the sublist in list b it shall append index 3 and 4 of a. So I would end up with
c= [['a','banana','A',100,'a#b',11],
['e','apple','A',100,'e#b',33],]
How to achive this. The solution don't need to be fast if it teaches something about the structure in Python. But if solved easy with pandas i'm all ears.
If this fora is not for questions like this i'm sorry for the time.

This is not optimized and isn't efficient time complexity vice, but it's readable and does the job.
c = []
for a_entry in a:
for b_entry in b:
if a_entry[1] == b_entry[0]:
c.append(b_entry + a_entry[3:])

make a dictionary ({index1:[index2,index3],...}) by iterating over a
for each item/sublist use index 1 for the key and indices 2 and 3 for the value
do the same thing for b except use index zero for the key and [1:] for the value
iterate over the items in the b dictionary
use the key of each item to get a value from the a dictionary
if there is one, extend the b item's value
reconstruct b from the modified dictionary

Related

for x in lst - does x is the pointer or the value itself

I have tried to run this code:
a = 1000
d = [a,2]
d[1] = -1
a = 1003
for x in d:
x = 7
I wonder why don't all the list elements values change to 7?
Like when I run
d[1] = -1
This statement has changed the value of the second element in the list from 1,000 to -1.
The way I understand it, its this -
In every iteration of the for-loop, x equals to some pointer in the list. for example in the first iteration we actually make the next statement -
d[0] = 7
And then the first element supposed to be changed from 1,000 to 7.
Where do I get wrong?
This a Pyhton-Tutor screenshot that can help:
Pyhton-TutorSC
As far as I understand your problem and Python itself, x is neither pointer not element - it is an image (copy) of an element of d. I am far from being an expert, but I believe this is intended for optimisation purposes and using pure python you can not change simple objects in this way.
I am not sure if this behaviour will be same for large, complex objects - unfortunately in python some objects are "soft copied", while other are "hard copied" in these cases. Here, x is a hard copy of d[1], meaning these two objects are not related anymore (see a in your example).
If you want to change a list in this manner, you can use the following snippet:
for i, x in enumerate(d):
d[i] = 7
enumerate is a built-in function which assigns an index to every element. In this way you can iterate though d, and for each element of d get its image x and its "pointer" (index) i, which you can use to assign to the original list.
Please mind redefining the list you iterate through might be ill-advised. This kind of operation should be most likely done by declaring a new list, for example using a list comprehension:
d = [7 for x in d]
This will generate a new list d, where for every image x in original d it will assign a value of 7.

How to filter a list based on elements in another list in python

I have a list A of about 62,000 numbers, and another list B of about 370,000. I would like to filter B so that it contains only elements from A. I tried something like this:
A=[0,3,5,73,88,43,2,1]
B=[0,5,10,42,43,56,83,88,892,1089,3165]
C=[item for item in A if item in set(B)]
Which works, but is obviously very slow for such large lists because (I think?) the search continues through the entire B, even when the element has already been found in B. So the script is going through a list of 370,000 elements 62,000 times.
The elements in A and B are unique (B contains a list of unique values between 0 and 700,000 and A contains a unique subset of those) so once A[i] is found in B, the search can stop. The values are also in ascending order, if that means anything.
Is there some way to do this more quickly?
This is creating a new set(B) for every item in A. Instead, use the built-in set.intersection:
C = set(A).intersection(B)
To be really sure what I've done is the fastest possible, I would have done that :
A=[0,3,5,73,88,43,2,1]
B=[0,5,10,42,43,56,83,88,892,1089,3165]
B_filter = B.copy()
C = []
for item in A:
if filter in B_filter:
C.append(item)
B_filter.pop(0) # B_filter is a list, and it's in ascending order so always the first
If you don't care about losing your B list, you can just use B instead of B_filter and not declare B_filter, so you don't have to copy a 370k large list.

Overwriting existing dataframe in loop

I am trying to transform elements in various data frames (standardize numerical values to be between 0 and 1, one-hot encode categorical variables) but when I try to overwrite the dataframe in a loop it doesn't modify the existing dataframe, only the loop variable. Here is a dummy example:
t = pd.DataFrame(np.arange(1, 16).reshape(5, 3))
b = pd.DataFrame(np.arange(1, 16).reshape(5, 3))
for hi in [t, b]:
hi = pd.DataFrame(np.arange(30, 45).reshape(5, 3))
But when I run this code both t and b have their original values. How can I overwrite the original dataframe (t or b) while in a loop?
The specific problem I'm running into is when trying to use get_dummies function in the loop:
hi = pd.get_dummies(hi, columns=['column1'])
You can't change elements of a list while iterating over the list that way. Search "changing list elements loop python" for a bunch of good stack overflow questions on why this is the case. My understanding is that "hi" is value-copied, not a reference to the original variable.
If you want to modify elements in a list iteratively, you can try enumerate(), or list comprehensions. You might want to create a dictionary of lists and iterate over that, instead of using variable names to keep track of all the lists, as suggested here.

Delete rows in matrix containing certain elements (python)

The following problem I have, might be very trivial for a more advanced python programmer, but I -- as a python beginner -- can't figure out the problem.
I just want to delete a row from a 2D-list, if it matches a certain condition --- in my case, if the row contains a certain character. I wanted to do it in a more functional, python way, rather than looping over all list items. Therefore, my attempt was
alist = [[1,2],[3,4]]
map(lambda ele : (if 2 in ele: tmp3.remove(ele)), alist)
which should just delete the first row, because it contains a "2". But I just get an error "invalid syntax" and I don't know why!
(I also came across some solution which uses dataframes from the pandas package, but as I'm learning python, I want to avoid pandas at this stage ;) )
Thanks in advance!
You can't use an if statement in a lambda. You could use the more clearer list comprehension:
alist = [row for row in alist if 2 not in row]
This also has the advantage of iterating through the list once, as opposed to using map and list.remove, although you get a new list.
If you are trying to remove elements from a list, you need filter instead of map which is often used for transformation and doesn't change the length of the list:
alist = [[1,2],[3,4]]
filter(lambda ele : 2 not in ele, alist)
# [[3, 4]]

Append specific rows from one list to another

Having some difficulty trying to take a 2d list with 7 columns and 10 rows, and append all rows from only columns 4,5 and 6 (or 3,4,5 from index 0) to a new list. The original list is actually a csv and is much, much longer but I've just put part of it in the function for troubleshooting purposes.
What I have so far is...
def coords():
# just an example of first couple lines...
bigList = [['File','FZone','Type','ID','Lat','Lon','Ref','RVec']
['20120505','Cons','mit','3_10','-21.77','119.11','mon_grs','14.3']
newList=[]
for row in bigList[1:]: # skip the header
newList.append(row[3])
return newList # return newList to main so it can be sent to other functions
This code gives me a new list with 'ID' only but I also want 'Lat' and 'Lon'.
The new list should look like...['3_10', '-21.77','119.11']['4_10','-21.10'...]
I tried re-writing newList.append(row[3,4,5])...and of course that doesn't work but not sure how to go about it.
row[3] refers to the fourth element. You seem to want the fourth through sixth elements, so slice it:
row[3:6]
You could also do this all with a list comprehension:
newList = [row[3:6] for row in myList[1:]]

Categories