make tuples or new list from a list - python

I have a list which comes from a text file that I have parsed using very primitive regular expressions. I would like to reorganize a more spartan list that contains only files with a date immediately following. I've tried looping through the list using len() but that will only extract the files and not the next entry. Many thanks in advance.
This:
2014-01-28
part002.csv.gz
2014-01-28
part001.csv.gz
2014-01-28
2014-01-28
2014-01-27
2014-01-27
2014-01-26
2014-01-26
2014-01-25
part002.csv.gz
2014-01-25
Becomes this:
part002.csv.gz
2014-01-28
part001.csv.gz
2014-01-28
part002.csv.gz
2014-01-25

You can use a list comprehension:
filtered = [e for i, e in enumerate(l) if not isDate(e) or (i > 0 and not isDate(l[i-1]))]
Complete example:
l = ['2014-01-28', 'part002.csv.gz', '2014-01-28', 'part001.csv.gz', '2014-01-28', '2014-01-28', '2014-01-27', 'part002.csv.gz', '2014-01-25']
def isDate (s):
return '.' not in s
filtered = [e for i, e in enumerate(l) if not isDate(e) or (i > 0 and not isDate(l[i-1]))]
print (filtered)
Explained:
l is our original list.
isDate takes a string and tests whether it is a date (in my simple example it just checks that it doesn't contain a period, for better results use regex or strptime).
enumerate enumerates a list (or anything iterable, I will now stick to the word list, just in order not to get too technical). It returns a list of tuples; each tuple containing the index and the element of the list passed to enumerate. For instance enumerate (['a', None, 3]) makes [(0,'a'),(1,None),(2,3)]
i, e = unpacks the tuple, assigning the index to i and the element to e.
A list comprehension works like this (simplyfied): [x for x in somewhere if cond(x)] returns a list of all elements of somewhere which comply with the condition cond(x).
In our case we only add elements to our filtered list, if they are no dates (not the fruit) not isDate(e) or if they are not at the beginning i > 0 and at the same time their predecessor is not a date not isDate(l[i-1]) (that is, a file).
In pseudocode:
Take list `l`
Let our filtered list be an empty list
For each item in `l` do
let `i` be the index of the item
let `e` be the item itself
if `e` is not a Date
or if `i` > 0 (i.e. it is not the first item)
and at the sametime the preceding item is a File
then and only then add `e` to our filtered list.

Store the previous line at each line, then you always have have it when you need it
previous_line = None
newlist = []
for line in lines:
if isdate(line):
newlist.append(previous_line)
previous_line = line
Defining isdate:
import datetime
def isdate(s):
try:
datetime.datetime.strptime(s, '%Y-%m-%d')
except:
return False
else:
return True

Working through it:
s = """
#that long string, snipped
"""
li = [x for x in s.splitlines() if x]
li
Out[3]:
['2014-01-28',
'part002.csv.gz',
'2014-01-28',
'part001.csv.gz',
'2014-01-28',
'2014-01-28',
'2014-01-27',
'2014-01-27',
'2014-01-26',
'2014-01-26',
'2014-01-25',
'part002.csv.gz',
'2014-01-25']
[tup for tup in zip(li,li[1:]) if 'csv' in tup[0]] #shown for dicactic purposes, gen expression used below
Out[7]:
[('part002.csv.gz', '2014-01-28'),
('part001.csv.gz', '2014-01-28'),
('part002.csv.gz', '2014-01-25')]
The actual answer:
from itertools import chain
list(chain.from_iterable(tup for tup in zip(li,li[1:]) if 'csv' in tup[0]))
Out[9]:
['part002.csv.gz',
'2014-01-28',
'part001.csv.gz',
'2014-01-28',
'part002.csv.gz',
'2014-01-25']
Essentially: zip (in python 2, use izip) the list together with itself, one index advanced. Iterate over the pairwise tuples, filtering out those that don't have a file-like string for their first element. Lastly, flatten the tuples into a list using itertools.chain to achieve your desired output.

Related

Delete item from list upon a condition

I have a list of lists of tuples of integers.
ls = [[(a_1, a_2), (b_1, b_2)], [(c_1, c_2), (d_1, d_2), (e_1, e_2)], ...]
And I need to delete every item of ls that contains a tuple whose second entry is equal to a predetermined integer.
I tried this:
for item in ls:
for tpl in item:
if tpl[1] == m:
ls.remove(item)
But for some reason, this only removes a few of the list items but not all containing a tuple with second entry = m.
Use a list comprehension:
ls = [item for item in ls if all(tuple[1] != m for tuple in item)]
Or use a filter:
ls = filter(lambda item: all(tuple[1] != m for tuple in item),ls)
Code sucks and we need less of it - here's as sparse as it gets.
[l for l in ls if m not in [i[1] for i in l]]
The best way to filter a list in python is to use a list comprehension:
filtered = [item for item in ls if not(contains_m(item))]
And then all you need is a function that can tell if an item contains m, for example:
def contains_m(item):
return any([tpl[1] == m for tpl in item])
Removing an itme from list is not a good idea while iterating though it.
Try that (if where are talking Python here)
ls = [[('a_1', 'a_2'), ('b_1', 'b_2')], [('c_1', 'c_2'), ('d_1', 'd_2'), ('e_1', 'e_2')]]
m='c_2'
print [ x for x in ls if not [ y for y in x if y[1]==m ]]
Python's list iterator is lazy. This means that when you remove an item from the list, it will skip the next item. For example, say you want to remove all ones from the following list:
[1, 1, 2]
Your for loop starts at index 0:
[1, 1, 2]
^
It removes the element and moves on:
[1, 2]
^
This example is just to help illustrate the issue. One simple workaround is to loop backwards using the index:
for ind in range(len(ls)-1, -1, -1):
item = ls[ind]
for tpl in item:
if tpl[1] == m:
del ls[ind]
break # You can end the inner loop immediately in this case
Another way is to make a copy of the list to iterate over, but remove from the original:
for item in ls[:]:
for tpl in item:
if tpl[1] == m:
ls.remove(item)
break
The last approach can be simplified into creating an output list that contains only the elements that you want. This is easiest to do with a list comprehension. See #AlexeySmirnov 's answer for the best way to do that.

Why does removing duplicates from a list produce [None, None] output?

I am new to Python and I'm not able to understand why I am getting the results with None values.
#Remove duplicate items from a list
def remove_duplicates(list):
unique_list = []
return [unique_list.append(item) for item in list if item not in unique_list]
print remove_duplicates([1,1,2,2]) -> result [None, None]
When I print the result it shows the following: [None, None]
PS: I've seen other solutions and also aware of the list(set(list)) but I am trying to understand why the above result with integers gives [None, None] output.
Although using a set is the proper way, the problem with your code, as the comments indicated, is that you are not actually returning unique_list from your function, you are returning the result of the list comprehension.
def remove_duplicates(my_list):
unique_list = []
do = [unique_list.append(item) for item in my_list if item not in unique_list]
return unique_list # Actually return the list!
print remove_duplicates([1,1,2,2]) -> result [1, 2]
Here I simply made a throwaway variable do that is useless, it just "runs" the comprehension. Understand?
That comprehension is storing a value each time you call unique_list.append(item) ... and that value is the result of the append method, which is None! So do equals [None, None].
However, your unique_list is in fact being populated correctly, so we can return that and now your function works as expected.
Of course, this is not a normal use for a list comprehension and really weird.
The problem with your code is that the method list.append returns None. You can test this easily with the following code:
myList=[1, 2, 3]
print myList.append(4)
So, a solution for you would issue would be
def remove_duplicates(myList):
alreadyIncluded = []
return [item for item in myList if item not in alreadyIncluded and not alreadyIncluded.append(item)]
print remove_duplicates([1,1,2,2])
The idea is that you will begin with an empty list of aldeady included elements and you will loop over all the elements in list, including them in the alreadyIncluded list. The not is necessary because the append will return None and not None is True, so the if will not be affected by the inclusion.
You were including a list of the result of the appends (always None), but what you need is a list of the elements that passed the if test.
I hope it helps.
As the other answers have explained, the reason you're getting a list of None values is because list.append returns None, and you're calling it in a list comprehension. That means you're building a list full of None values along side your list of unique values.
I would like to suggest that you ditch the list comprehension. Because you need to access outside state (the list of unique values seen so far), a comprehension can't easily do what you want. A regular for loop is much more appropriate:
def remove_duplicates(lst):
unique_list = []
for item in lst:
if item not in unique_list:
unique_list.append(item)
return unique_list
A more Pythonic approach however would be to use a set to handle the unique items, and to make your function a generator:
def remove_duplicates(lst):
uniques = set()
for item in lst:
if item not in unique_list:
yield item
uniques.add(item)
The itertools.ifilterfase function from the standard library can help improve this even further, as shown in the recipe in the docs (you'll have to scroll down a little to find the specific recipe):
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element

What does enumerate() mean?

What does for row_number, row in enumerate(cursor): do in Python?
What does enumerate mean in this context?
The enumerate() function adds a counter to an iterable.
So for each element in cursor, a tuple is produced with (counter, element); the for loop binds that to row_number and row, respectively.
Demo:
>>> elements = ('foo', 'bar', 'baz')
>>> for elem in elements:
... print elem
...
foo
bar
baz
>>> for count, elem in enumerate(elements):
... print count, elem
...
0 foo
1 bar
2 baz
By default, enumerate() starts counting at 0 but if you give it a second integer argument, it'll start from that number instead:
>>> for count, elem in enumerate(elements, 42):
... print count, elem
...
42 foo
43 bar
44 baz
If you were to re-implement enumerate() in Python, here are two ways of achieving that; one using itertools.count() to do the counting, the other manually counting in a generator function:
from itertools import count
def enumerate(it, start=0):
# return an iterator that adds a counter to each element of it
return zip(count(start), it)
and
def enumerate(it, start=0):
count = start
for elem in it:
yield (count, elem)
count += 1
The actual implementation in C is closer to the latter, with optimisations to reuse a single tuple object for the common for i, ... unpacking case and using a standard C integer value for the counter until the counter becomes too large to avoid using a Python integer object (which is unbounded).
It's a builtin function that returns an object that can be iterated over. See the documentation.
In short, it loops over the elements of an iterable (like a list), as well as an index number, combined in a tuple:
for item in enumerate(["a", "b", "c"]):
print item
prints
(0, "a")
(1, "b")
(2, "c")
It's helpful if you want to loop over a sequence (or other iterable thing), and also want to have an index counter available. If you want the counter to start from some other value (usually 1), you can give that as second argument to enumerate.
I am reading a book (Effective Python) by Brett Slatkin and he shows another way to iterate over a list and also know the index of the current item in the list but he suggests that it is better not to use it and to use enumerate instead.
I know you asked what enumerate means, but when I understood the following, I also understood how enumerate makes iterating over a list while knowing the index of the current item easier (and more readable).
list_of_letters = ['a', 'b', 'c']
for i in range(len(list_of_letters)):
letter = list_of_letters[i]
print (i, letter)
The output is:
0 a
1 b
2 c
I also used to do something, even sillier before I read about the enumerate function.
i = 0
for n in list_of_letters:
print (i, n)
i += 1
It produces the same output.
But with enumerate I just have to write:
list_of_letters = ['a', 'b', 'c']
for i, letter in enumerate(list_of_letters):
print (i, letter)
As other users have mentioned, enumerate is a generator that adds an incremental index next to each item of an iterable.
So if you have a list say l = ["test_1", "test_2", "test_3"], the list(enumerate(l)) will give you something like this: [(0, 'test_1'), (1, 'test_2'), (2, 'test_3')].
Now, when this is useful? A possible use case is when you want to iterate over items, and you want to skip a specific item that you only know its index in the list but not its value (because its value is not known at the time).
for index, value in enumerate(joint_values):
if index == 3:
continue
# Do something with the other `value`
So your code reads better because you could also do a regular for loop with range but then to access the items you need to index them (i.e., joint_values[i]).
Although another user mentioned an implementation of enumerate using zip, I think a more pure (but slightly more complex) way without using itertools is the following:
def enumerate(l, start=0):
return zip(range(start, len(l) + start), l)
Example:
l = ["test_1", "test_2", "test_3"]
enumerate(l)
enumerate(l, 10)
Output:
[(0, 'test_1'), (1, 'test_2'), (2, 'test_3')]
[(10, 'test_1'), (11, 'test_2'), (12, 'test_3')]
As mentioned in the comments, this approach with range will not work with arbitrary iterables as the original enumerate function does.
The enumerate function works as follows:
doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
for i in enumerate(doc1):
print(i)
The output is
(0, 'I like movie')
(1, " But I don't like the cast")
(2, ' The story is very nice')
I am assuming that you know how to iterate over elements in some list:
for el in my_list:
# do something
Now sometimes not only you need to iterate over the elements, but also you need the index for each iteration. One way to do it is:
i = 0
for el in my_list:
# do somethings, and use value of "i" somehow
i += 1
However, a nicer way is to user the function "enumerate". What enumerate does is that it receives a list, and it returns a list-like object (an iterable that you can iterate over) but each element of this new list itself contains 2 elements: the index and the value from that original input list:
So if you have
arr = ['a', 'b', 'c']
Then the command
enumerate(arr)
returns something like:
[(0,'a'), (1,'b'), (2,'c')]
Now If you iterate over a list (or an iterable) where each element itself has 2 sub-elements, you can capture both of those sub-elements in the for loop like below:
for index, value in enumerate(arr):
print(index,value)
which would print out the sub-elements of the output of enumerate.
And in general you can basically "unpack" multiple items from list into multiple variables like below:
idx,value = (2,'c')
print(idx)
print(value)
which would print
2
c
This is the kind of assignment happening in each iteration of that loop with enumerate(arr) as iterable.
the enumerate function calculates an elements index and the elements value at the same time. i believe the following code will help explain what is going on.
for i,item in enumerate(initial_config):
print(f'index{i} value{item}')

Python: Iterating over two lists and replacing elements in one list1 with the element from list2

I have two lists of strings. In list1, which contains around 1000 string elements, you have a string called "Date" that occurs randomly, immediately followed by a string that contains the a particular Date : "17/09/2011". This happens about 70 times. In List2: I have around 80 dates, as strings.
Question :
I want to write a script that loops through both lists simultaneously, and replaces the dates in list1, with the dates in list2, in order. So, obviously you will have the first 70 dates of list2 replacing the the 70 occurrences of dates in list1. Afterwards I want to write the modified list1 to a .txt file.
I tried this, but I am totally stuck. I am super noob at Python.
def pairwise(lst):
""" yield item i and item i+1 in lst. e.g.
(lst[0], lst[1]), (lst[1], lst[2]), ..., (lst[-1], None)
"""
if not lst: return
#yield None, lst[0]
for i in range(len(lst)-1):
yield lst[i], lst[i+1]
yield lst[-1], None
for line in file:
list1.append(line.strip())
for i,j in pairwise(list1):
for k in list2:
if i == "Date":
list1."replace"(j) # Dont know what to do. And i know this double for looping is wrong also.
Maybe something like this (if there are no 'date' strings without a following date):
iter2 = iter (list2)
for idx in (idx for idx, s in enumerate (list1) if s == 'Date'):
list1 [idx + 1] = next (iter2)
with open ('out.txt', 'w') as f:
f.write ('{}'.format (list1) )
#user1998510, here a bit of explanation:
enumerate takes a list as an argument and generates tuples of the form (i, i-th element of the list). In my generator (i.e. the (x for y in z if a) part) I assign the parts of this tuple to the local variables idx and s. The generator itself only yields the index as the actual item of the list (to whit s) is of no importance, as in the generator itself we filter for interesting items if s == 'Date'. In the for loop I iterate through this generator assigning its yielded values to idx (this is another idx than the inner idx as generators in python don't leak anymore their local variables). The generator yields all the indices of the list whose element is 'Date' and the for iterates over it. Hence I assign the next date from the second list to the idx+1st item of the old list for all interesting indices.

Python - Look in list for a item containing numbers

some_list = ['Name','Surname','R500']
some_list = ['Name','Surname','500']
how would if get the index of the item in the list that contains a number, in both cases I should get back index = 2
I was looking at something like:
some_list.index(r'%r' % '\d+')
You'll need to loop over the elements:
for i, x in enumerate(my_list):
if re.search(r"\d", x):
print i
If you're looking just for the first item containing a digit, this works without regular expressions and returns -1 (can be changed to whatever you want) if there is no element with digits:
next((i for i,n in enumerate(some_list) if any(c.isdigit() for c in n)), -1)

Categories