Scrapy Only Returning First Result in Loop

Scrapy Only Returning First Result in Loop - python

I have a loop (as shown below) that executes twice (indexes 1->3), but Scrapy only returns the first trackname in both results. But the print item line shows different values for str_selector so I know the loop works but Scrapy isn't seeing the changing value of x.
Any idea what mistake I have made?
items = []
item = scrapyItem()
for x in range (1,3):
str_selector = '//tr[#name="tracks-grid-browse_track_{0}"]/td[contains(#class,"secondColumn")]/a/text()'.format(x)
item['trackname'] = hxs.select(str_selector).extract()
print item
items.append(item)
return items

It's just that you should build a new item for each iteration, instead of keeping the same: you add in items the same object, which is mutable (as for all user-defined classes by default in python) and so when you update item['trackname'], all items contained are updated !
Here is some code to illustrate:
>>> class C(object):
# Basic user-defined class
def __init__(self):
self.test = None
>>> c = C()
>>> items = []
>>> for x in range (1,3):
c.test = x
print c, c.test
items.append(c)
<__main__.C object at 0x01CEB130> 1
<__main__.C object at 0x01CEB130> 2
>>> items # All objects contained are the same !!!
[<__main__.C object at 0x01CEB130>, <__main__.C object at 0x01CEB130>]
>>> for c in items:
print c.test
2
2
Now create a new object each time:
>>> items = []
>>> for x in range (1,3):
c = C()
c.test = x
print c, c.test
items.append(c)
<__main__.C object at 0x01CEB110> 1
<__main__.C object at 0x011F2270> 2
Objects are now different !
>>> for c in items:
print c.test
1
2

what actually you are doing right now is creating an item object and changing its value in loop, you need to create item in loop.
items = []
#item = scrapyItem()
for x in range (1,3):
item = scrapyItem()
str_selector = '//tr[#name="tracks-grid-browse_track_{0}"]/td[contains(#class,"secondColumn")]/a/text()'.format(x)
item['trackname'] = hxs.select(str_selector).extract()
print item
items.append(item)
return items

Related

is iterable object a copy of original object?

nums = [1,2,3,4,5]
it = iter(nums)
print(next(it))
print(next(it))
for i in nums:
print(i)
here the result is:
1
2
1
2
3
4
5
So my question is that when we apply iter method on a object then does it create a copy of object on which it runs next method?

iter(object) returns an iterator object which is an iterator version of the object given to it given that it implements __iter__. iter(object) doesn't create a copy of the object.
>>> l=[[1,2],[4,5]]
>>> it=iter(l)
>>>next(it).append(3) #appending to the output of next() mutates the list l
>>> l
[[1,2,3],[4,5]]
>>> next(it).append(6)
>>> l
[[1,2,3],[4,5,6]]
>>> it=iter(l)
>>> l.pop() #Mutating the list l mutated iterator it.
[4,5,6]
>>>list(it)
[[1,2,3]]

Here is one way to figure it out:
lst = ['Hi', 'I am a copy!']
itr = iter(lst)
print(next(itr))
lst[1] = 'I am _not_ a copy!'
print(next(itr))
(iter(lst) does not create a copy of lst)

No, they don't. Some Python types, e.g. all its collections, just support being iterated over multiple times. Multiple iterator objects can hold references to the very same list, they all just maintain their own position within the list.
Notice some effects:
lst = [1,2,3,4,5]
it = iter(lst)
lst.pop() # modify the original list
list(it) # the iterator is affected
# [1,2,3,4]
Even more obvious is the case of exhaustable iterators and calling iter on them:
it1 = iter(range(10))
it2 = iter(it1)
next(it)
# 0
next(it2)
# 1
next(it)
# 2
next(it2)
# 3
Clearly the iterators share state.

The = operator assigns values from right side operands to left side operands" i.e. c = a + b assigns value of a + b into c Operators
You're not altering any variables present in the right side of an assignment line, a copy of the value is having a function applied to it and then that result is being assigned the new variable name it.

What is the difference between del a_list[:] and a_list = [] in a function?

This is just a question asking for the difference in the code.
I have several lists ie. a=[], b=[], c=[], d=[]
Say if I have a code that appends to each list, and I want to reset all these lists to its original empty state, I created a function:
def reset_list():
del a[:]
del b[:]
del c[:]
del d[:]
So whenever I call reset_list() in a code, it removes all the appended items and set all lists to []. However, the one below doesn't work:
def reset_list():
a = []
b = []
c = []
d = []
This might be a stupid question but I was wondering why the second one wouldn't work.

When you do del a[:] then it looks for the variable a (including outer contexts) and then performs del found_a[:] on it.
But when you use a = [] it creates a name a in the current context and assigns an empty list to it. When the function exits the variable a from the function is not "accessible" anymore (destroyed).
So in short the first works because you change the a from an outer context, the second does not work because you don't modify the a from the outer context, you just create a new a name and temporarily (for the duration of the function) assigns an empty list to it.
There's a difference between del a[:] and a = []
Note that these actually do something different which becomes apparent if you have additional references (aliases) to the original list. (as noted by #juanpa.arrivillaga in the comments)
del list[:] deletes all elements in the list but doesn't create a new list, so the aliases are updated as well:
>>> list_1 = [1,2,3]
>>> alias_1 = list_1
>>> del alist_1[:]
>>> list_1
[]
>>> alias_1
[]
However a = [] creates a new list and assigns that to a:
>>> list_2 = [1,2,3]
>>> alias_2 = list_2
>>> list_2 = []
>>> list_2
[]
>>> alias_2
[1, 2, 3]
If you want a more extensive discussion about names and references in Python I can highly recommend Ned Batchelders blog post on "Facts and myths about Python names and values".
A better solution?
In most cases where you have multiple variables that belong together I would use a class for them. Then instead of reset you could simply create a new instance and work on that:
class FourLists:
def __init__(self):
self.a = []
self.b = []
self.c = []
self.d = []
Then you can create a new instance and work with the attributes of that instance:
>>> state = FourLists()
>>> state.a
[]
>>> state.b.append(10)
>>> state.b.extend([1,2,3])
>>> state.b
[10, 1, 2, 3]
Then if you want to reset the state you could simply create a new instance:
>>> new_state = FourLists()
>>> new_state.b
[]

You need to declare a,b,c,d as global if you want python to use the globally defined 'versions' of your variables. Otherwise, as pointed out in other answers, it will simply declare new local-scope 'versions'.
a = [1,2,3]
b = [1,2,3]
c = [1,2,3]
d = [1,2,3]
def reset_list():
global a,b,c,d
a = []
b = []
c = []
d = []
print(a,b,c,d)
reset_list()
print(a,b,c,d)
Outputs:
[1, 2, 3] [1, 2, 3] [1, 2, 3] [1, 2, 3]
[] [] [] []
As pointed out by #juanpa.arrivillaga, there is a difference between del a[:] and a = []. See this answer.

The 1st method works because:
reset_list() simply deletes the contents of the four lists. It works on the lists that you define outside the function, provided they are named the same. If you had a different name, you'd get an error:
e = [1,2,3,4]
def reset_list():
del a[:] #different name for list
NameError: name 'e' is not defined
The function will only have an effect if you initialize the lists before the function call. This is because you are not returning the lists back after the function call ends:
a = [1,2,3,4] #initialize before function definition
def reset_list():
del a[:]
reset_list() #function call to modify a
print(a)
#[]
By itself the function does not return anything:
print(reset_list())
#None
The 2nd method doesn't work because:
the reset_list() function creates 4 empty lists that are not pointing to the lists that may have been defined outside the function. Whatever happens inside the function stays inside(also called scope) and ends there unless you return the lists back at the end of the function call. The lists will be modified and returned only when the function is called. Make sure that you specify the arguments in reset_list(a,..) in the function definition:
#function definition
def reset_list(a):
a = []
return a
#initialize list after function call
a = [1,2,3,4]
print("Before function call:{}".format(a))
new_a = reset_list(a)
print("After function call:{}".format(new_a))
#Output:
Before function call:[1, 2, 3, 4]
After function call:[]
As you've seen, you should always return from a function to make sure that your function "does some work" on the lists and returns the result in the end.

The second function (with a = [ ] and so on) initialises 4 new lists with a local scope (within the function). It is not the same as deleting the contents of the list.

Filter object returning an attribute

I'm trying to filter an object returning a list of a specific attribute. Look what I've tried:
class Foo:
def __init__(self,a,b):
self.a = a
self.b = b
x = Foo(1,2)
y = Foo(1,3)
z = Foo(2,4)
result = filter(lambda f: f.b if f.a == 1 else None, [x,y,z])
print(list(result))
I was expecting a list like this [2, 3], but It returns me a list of foo objects. Is there a way to do it using just filter other function? I'd like to avoid using map and filter, for example.

You can use a list comprehension
result = [i.b for i in [x,y,z] if i.a == 1]
Using filter it would take two steps: one to filter out the objects where i.a != 1 and the second to pull the .b out of each object (which would require map).

Retrieving from list of objects by id

In the following code I create a list of three objects and use the variable a,b,c to store the first, second and then the third object by their id's but when I try to store the third object in variable c, it stores a list of the second and third object.
class Obj1():
id='obj1'
class Obj2():
id='obj2'
class Obj3():
id='obj3'
list1=[Obj1(),Obj2(),Obj3()]
a=list1[id=="obj1"]
print a
b=list1[id!='obj1']
print b
c=list1[id!='obj1'and id!='obj2':]
print c
When I run this code I get :
<__main__.Obj1 instance at 0x02AD5DA0>
<__main__.Obj2 instance at 0x02AD9030>
[<__main__.Obj2 instance at 0x02AD9030>, <__main__.Obj3 instance at 0x02AD90A8>]
why does variable c contain two objects?

Using a dictionary is probably the best idea in this case, as mentioned by Medhat. However, you can do things in a similar way to what you attempted using list comprehensions:
a = [e for e in list1 if e.id == "obj1"]
print a
b = [e for e in list1 if e.id != "obj1"]
print b
c = [e for e in list1 if e.id != "obj1" and e.id != "obj2"]
# Or:
# [e for e in list1 if e.id not in ("obj1", "obj2")]
print c

You should use a dictionary instead:
obj1 = Obj1()
obj2 = Obj2()
obj3 = Obj3()
list1 = {obj1.id: obj1, obj2.id: obj2, obj3.id: obj3}
Then access your objects like this:
a = list1[obj1.id]
b = list1[obj2.id]
c = list1[obj3.id]

id!='obj1'and id!='obj2' returns true which equals 1 in Python, that is to say, c=list1[id!='obj1'and id!='obj2':] equals c=list1[1:] , which of course has two objects.
BTW, id is the name of a built-in function. Please avoid using it as a name of varible.

Your list1 contains 3 elements:
>>> list1
[<__main__.Obj1 instance at 0x103437440>, <__main__.Obj2 instance at 0x1034377a0>, <__main__.Obj3 instance at 0x103437320>]
>>> id!='obj1'
True
>>> id!='obj2'
True
>>> True and True
True
>>> list1[True:]
[<__main__.Obj2 instance at 0x1034377a0>, <__main__.Obj3 instance at 0x103437320>]
>>>
True is 1 and False is 0 index:
Here is the example:
>>> ls = [1,2]
>>> ls[True]
2
>>> ls[False]
1
>>>
So, list[True:] is equal to list[1:] which is from first element till last.
In your case the the last two elements in the list1.

Intersection of instance data and list

I have a list of instances of a class:
>>> class A:
...
... def __init__(self,l=None):
... self.data=l
>>> k=list()
>>> for x in range(5):
... k.append(A(x))
Now I need to intersect the 'data' field against a given list
>>> m=[0,2]
>>> f=set([r.data for r in k]) & set(m)
>>> f
set([0, 2])
So far so good.
But now, I need to get the instances of 'A' which had 'data' having one the values in intersection set 'f'.
Is there an easier way to achieve all of this - rather than iterating through instances again?

You can use a list comprehension:
>>> [x for x in k if x.data in f]
[<__main__.A instance at 0x92b1c0c>, <__main__.A instance at 0x92b1c4c>]

While iterating, you can check if the item is in the m list.
class A:
def __init__(self,l=None):
self.data=l
result=[]
k=list()
m=[0,2]
for x in range(5):
some_A= A(x)
k.append(someA)
if x in m:
result.append(someA)
print result
[<__main__.A instance at 0x021CBEB8>, <__main__.A instance at 0x021CBF08>]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scrapy Only Returning First Result in Loop - python

Related

is iterable object a copy of original object?

What is the difference between del a_list[:] and a_list = [] in a function?

Filter object returning an attribute

Retrieving from list of objects by id

Intersection of instance data and list

Categories

Resources