Several list comprehensions - one after each other - python

I have written some code, and to try and grasp the concept of list comprehensions, I am trying to convert some of the code into list comprehensions.
I have a nested for loop:
with (Input) as searchfile:
for line in searchfile:
if '*' in line:
ID = line[2:13]
IDstr = ID.strip()
print IDstr
hit = line
for i, x in enumerate(hit):
if x=='*':
position.append(i)
print position
I have made the first part of the code into a list comprehension as such:
ID = [line[2:13].strip() for line in Input if '*' in line]
print ID
This works fine. I have tried to do some of the next, but it is not working as intended. How do I make several list comprehensions after each other. The "Hit = …"-part below works fine, if it is the first list comprehension, but not if it is the second. The same with the above - it seems to work only, if it is the first. Why is this?
Hit = [line for line in Input if '*' in line]
print Hit
Positions = [(i, x) for i, x in enumerate(Hit) if x == '*']
print Positions

it seems to work only, if it is the first. Why is this?
This is because file objects -- input in your case -- are iterators, i.e. they are exhausted once you iterated them once. In your for loop this is not a problem, because you are iterating the file just once for both ID and position. If you want to use two list comprehensions like this, you either have to open the file anew for the second one, or read the lines from the file into a list, and use that list in the list comprehensions.
Also note that your positions list comprehension is wrong, as it enumerates the Hit list, and not each of the elements in the list, as was the case in your loop.
You could try like this (not tested):
# first, get the lines with '*' just once, cached as a list
star_lines = [line for line in input if '*' in line]
# now get the IDs using those cached lines
ids = [line[2:13].strip() for line in star_lines]
# for the positions we need a nested list comprehension
positions = [i for line in star_lines for i, x in enumerate(line) if x == '*']
That nested list comprehension is about equivalent to this nested loop:
positions = []
for line in star_lines:
for i, x in enumerate(line):
if x == '*':
posiitons.append(i)
Basically, you just "flatten" that block of code and put the thing to be appended to the front.

Related

Explanation of Python code - integers split

RTX_number = [int(x) for x in input().split()]
Could someone explain this line of code to me step by step?
I am having great difficulty understanding it.
As far as I know, .split creates spaces between elements?
I saw this code on a forum and I am trying to get a better understanding of it because I think it might be helpful for a simulation project.
I heard this is called a list comprehension, but I am kind of lost as of for now.
input().split()
Reads a line and breaks it where ever there is a space into a list of strings.
for x in input().split()
Takes this list, runs over it item by item, and binds this item to x.
int(x) for ...
Takes this x we bound, and runs int(x) on it and returns it.
[int(x) for x in input().split()]
Takes all these results and puts them into a list.
The short version is that this:
RTX_number = [int(x) for x in input().split()]
is a short-form of this:
RTX_number = []
for x in input().split():
RTX_number.append(int(x))
where input().split() returns the list of strings that you get from separating whatever input() returned on each whitespace (for example, "Hello World" becomes ["Hello", "World"].
The str.split() function can also be given an argument, such as ',', which it will split on instead of whitespace.
The general syntax of a comprehension is
(expression) for (element) in (iterable) if (condition)
For every element element in the iterable, if the condition resolves to True (note that the condition can be omitted entirely) the expression is evaluated and added to the returned list.
We usually use comprehensions as shorthand for full loops, since they often save space and complexity.
Note that list comprehensions aren't the only kind of comprehension - they can be used to make several different data structures:
# list comprehension - produces a list
[expression for element in iterable]
# set comprehension - produces a set instead of a list
{expression for element in iterable}
# dict comprehension - produces a dict with key-value pairs
{key:value for element in iterable}
# generator comprehension - like a list comprehension, but each expression is not
# actually evaluated until something tries to read it.
# The parentheses are technically optional, if the syntax isn't ambiguous
(expression for element in iterable)
This code is equivalent to:
# ask user input (it expected something like "1 2 34 5")
input_text = input()
# split on spaces
split_text_list = input_text.split()
list_of_integers = []
# for each string item in list
for item in split_text_list:
# convert to integer
number = int(item)
# add to list
list_of_integers.append(number)
But of course it avoids having all the unnecessary intermediate variables, so it is shorter. Also faster as it doesn't require to store the intermediate values.

How to read and create a new list without duplicate words in Python?

I am new in Python and I have the following problem to solve:
"Open the file sample.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order."
I have done the following code, with some good result, but I can't understand the reason my result appears to multiple list. I just need to have the words in one list.
thanks in advance!
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
lst=fh.read().split()
final_list=list()
for line in lst:
if line in lst not in final_list:
final_list.append(line)
final_list.sort()
print(final_list)
Your code is largely correct; the major problem is the conditional on your if statement:
if line in lst not in final_list:
The expression line in lst produces a boolean result, so this will end up looking something like:
if false not in final_list:
That will always evaluate to false (because you're adding strings to your list, not boolean values). What you want is simply:
if line not in final_list:
Right now, you're sorting and printing your list inside the loop, but it would be better to do that once at the end, making your code look like this:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
lst=fh.read().split()
final_list=list()
for line in lst:
if line not in final_list:
final_list.append(line)
final_list.sort()
print(final_list)
I have a few additional comments on your code:
You don't need to explicitly initialize a variable (as in lst = list())) if you're going to immediately assign something to it. You can just write:
fh = open(fname)
lst=fh.read().split()
On the other hand, you do need to initialize final_list because
you're going to try to call the .append method on it, although it
would be more common to write:
final_list = []
In practice, it would be more common to use a set to
collect the words, since a set will de-duplicate things
automatically:
final_list = set()
for line in lst:
final_list.add(line)
print(sorted(final_list))
Lastly, if I were to write this code, it might look like this:
fname = input("Enter file name: ")
with open(fname) as fh:
lst = fh.read().split()
final_list = set(word.lower() for word in lst)
print(sorted(final_list))
Your code has following problems as is:
if line in lst not in final_list - Not sure what you are trying to do here. I think you expect this to go over all words in the line and check in the final_list
Your code also have some indentation issues
Missing the call to close() method
You need to read all the lines to a list and iterate over the list of lines and perform the splitting and adding elements to the list as:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
lst = fh.read().split()
final_list=list()
for word in lst:
if word not in final_list:
final_list.append(word)
final_list.sort()
print(final_list)
fh.close()

Why does this give me an IndexError?

I have the following code that opens a csv, and appends all the values to a list. I then remove all the values that do not start with '2'. However, on the line if lst[k][0] != '2':, it raises an error:
Traceback (most recent call last):
File "historical_tempo1.py", line 23, in <module>
if lst[k][0] != '2':
IndexError: list index out of range
Here is the code:
y = open('today.csv')
lst = []
for k in y:
lst.append(k)
lst = ' '.join(lst).split()
for k in range(0, len(lst)-1):
if lst[k][0] != '2':
lst[k:k+1] = ''
Here is the first bit of content from the csv file:
Date,Time,PM2.5 Mass concentration(ug/m3),Status
3/15/2014,4:49:13 PM,START
2014/03/15,16:49,0.5,0
3/15/2014,4:49:45 PM,START
2014/03/15,16:50,5.3,0
2014/03/15,16:51,5.1,0
2014/03/15,16:52,5.0,0
2014/03/15,16:53,5.0,0
2014/03/15,16:54,5.4,0
2014/03/15,16:55,6.4,0
2014/03/15,16:56,6.4,0
2014/03/15,16:57,5.0,0
2014/03/15,16:58,5.2,0
2014/03/15,16:59,5.2,0
3/15/2014,5:03:48 PM,START
2014/03/15,17:04,4.8,0
2014/03/15,17:05,4.9,0
2014/03/15,17:06,4.9,0
2014/03/15,17:07,5.1,0
2014/03/15,17:08,4.6,0
2014/03/15,17:09,4.9,0
2014/03/15,17:10,4.4,0
2014/03/15,17:11,5.7,0
2014/03/15,17:12,4.4,0
2014/03/15,17:13,4.0,0
2014/03/15,17:14,4.6,0
2014/03/15,17:15,4.7,0
2014/03/15,17:16,4.8,0
2014/03/15,17:17,4.5,0
2014/03/15,17:18,4.4,0
2014/03/15,17:19,4.5,0
2014/03/15,17:20,4.8,0
2014/03/15,17:21,4.6,0
2014/03/15,17:22,5.1,0
2014/03/15,17:23,4.2,0
2014/03/15,17:24,4.6,0
2014/03/15,17:25,4.5,0
2014/03/15,17:26,4.4,0
Why do you get an IndexError? Because when you write lst[k:k+1] = '', you have just removed the k+1 element from your list, which means your list is shorter by 1 element, and your loop is still going up to the old len(lst), so the index variable k is guaranteed to go over.
How can you fix this? Loop over a copy and delete from the original using list.remove().
The following code loops over the copy.
for s in lst[:]:
if k[0] != '2':
list.remove(k)
The expressions lst[k][0] raises an IndexError, which means that either:
# (1) this expressions raises it
x = lst[k]
# or (2) this expression raises it
x[0]
If (1) raises it, it means len(lst) <= k, i.e. there are fewer items than you expect.
If (2) raises it, it means x is an empty string, which means you can't access its item at index 0.
Either way, instead of guessing, use pdb. Run your program using pdb, and at the point your script aborts, examine the values of lst, k, lst[k], and lst[k][0].
Basically, your list, 'lst', starts out at length 43. The 'slice' operation lst[k:k+1] doesn't replace two separate indexed values with '', but wipes out one of the list entries. If you did a lst[k:k+5], you would wipe out five entries. Try it in the interpreter.
I'd recommend you don't try to wipe out those entries particularly in the list you are performing operations. It is shrinking in this case which means you go out of range and get an "IndexError". Store the values you want into another a list if you have to remove the lines that don't begin with "2".
List comprehensions work great in this case...
mynewlist = [x for x in lst if x[0] == '2']

Why can't I append a char to an empty list in Python?

In a program I am writing to create a list of words from a list of chars, I am getting a "list index out of range" exception.
def getlist(filename):
f = open('alice.txt','r')
charlist = f.read()
wordlist = []
done = False
while(not done):
j = 0
for i in range(0,len(charlist)):
if charlist[i] != ' ' and charlist[i] != '\n':
wordlist[j] += charlist[i]
else: j+= 1
done = i == len(charlist)-1
return wordlist
So I started playing around with how lists work, and found that:
list = ['cars']
list[0]+= '!'
gives list = ['cars!']
However, with:
list = []
list[0]+= '!'
I get an out of bounds error. Why doesn't it do what seems logical: list= ['!']? How can I solve this? If I must initialize with something, how will I know the required size of the list? Are there any better, more conventional, ways to do what I'm attempting?
['cars'] is a list containing one element. That element is the string 'cars', which contains 4 characters.
list[0] += '!' actually does 3 separate things. The list[0] part selects the element of list at position 0. The += part both concatenates the two strings (like 'cars' + '!' would), and stores the resulting string back in the 0th slot of list.
When you try to apply that to the empty list, it fails at the "selects the element at position 0" part, because there is no such element. You are expecting it to behave as if you had not the empty list, but rather ['']; the list containing one element which is the empty string. You can easily append ! onto the end of an empty string, but in your example you don't have an empty string.
To add to a list, including an an empty one, use the append() method:
>>> mylist = []
>>> mylist.append('!')
>>> mylist
['!']
However, with:
list = []
list[0]+= '!'
I get an out of bounds error. Why doesn't it do what seems logical:
list= ['!']?
Because that isn't logical in Python. To append '!' to list[0], list[0] has to exist in the first place. It will not magically turn into an empty string for you to concatenate the exclamation mark to. In the general case, Python would not have a way to figure out what kind of "empty" element to magic up, anyway.
The append method is provided on lists in order to append an element to the list. However, what you're doing is massively over-complicating things. If all you want is a list consisting of the words in the file, that is as easy as:
def getlist(filename):
with open(filename) as f:
return f.read().split()
Your error is not from the statement list[0]+= '!', its from accessing an empty list which is out of range error :
>>> my_list = list()
>>> my_list[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
And += is not used for appending in a list, its for concatenating a string or numeric addition and internally its calling the following method.
__iadd__(...)
x.__iadd__(y) <==> x+=y

Reading and Grouping a List of Data in Python

I have been struggling with managing some data. I have data that I have turned into a list of lists each basic sublist has a structure like the following
<1x>begins
<2x>value-1
<3x>value-2
<4x>value-3
some indeterminate number of other values
<1y>next observation begins
<2y>value-1
<3y>value-2
<4y>value-3
some indeterminate number of other values
this continues for an indeterminate number of times in each sublist
EDIT I need to get all the occurrences of <2,<3 & <4 separated out and grouped together I am creating a new list of lists [[<2x>value-1,<3x>value-2, <4x>value-3], [<2y>value-1, <3y>value-2, <4y>value-3]]
EDIT all of the lines that follow <4x> and <4y> (and for that matter <4anyalpha> have the same type of coding and I don't know a-priori how high the numbers can go-just think of these as sgml tags that are not closed I used numbers because my fingers were hurting from all the coding I have been doing today.
The solution I have come up with finally is not very pretty
listINeed=[]
for sublist in biglist:
for line in sublist:
if '<2' in line:
var2=line
if '<3' in line:
var3=line
if '<4' in line:
var4=line
templist=[]
templist.append(var2)
templist.append(var3)
templist.append(var4)
listIneed.append(templist)
templist=[]
var4=var2=var3=''
I have looked at ways to try to clean this up but have not been successful. This works fine I just saw this as another opportunity to learn more about python because I would think that this should be processable by a one line function.
itertools.groupby() can get you by.
itertools.groupby(biglist, operator.itemgetter(2))
If you want to pick out the second, third, and fourth elements of each sublist, this should work:
listINeed = [sublist[1:4] for sublist in biglist]
You're off to a good start by noticing that your original solution may work but lacks elegance.
You should parse the string in a loop, creating a new variable for each line.
Here's some sample code:
import re
s = """<1x>begins
<2x>value-1
<3x>value-2
<4x>value-3
some indeterminate number of other values
<1y>next observation begins
<2y>value-1
<3y>value-2
<4y>value-3"""
firstMatch = re.compile('^\<1x')
numMatch = re.compile('^\<(\d+)')
listIneed = []
templist = None
for line in s.split():
if firstMatch.match(line):
if templist is not None:
listIneed.append(templist)
templist = [line]
elif numMatch.match(line):
#print 'The matching number is %s' % numMatch.match(line).groups(1)
templist.append(line)
if templist is not None: listIneed.append(templist)
print listIneed
If I've understood your question correctly:
import re
def getlines(ori):
matches = re.finditer(r'(<([1-4])[a-zA-Z]>.*)', ori)
mainlist = []
sublist = []
for sr in matches:
if int(sr.groups()[1]) == 1:
if sublist != []:
mainlist.append(sublist)
sublist = []
else:
sublist.append(sr.groups()[0])
else:
mainlist.append(sublist)
return mainlist
...would do the job for you, if you felt like using regular expressions.
The version below would break all of the data down into sublists (not just the first four in each grouping) which might be more useful depending what else you need to do to the data. Use David's listINeed = [sublist[1:4] for sublist in biglist] to get the first four results from each list for the specific task above.
import re
def getlines(ori):
matches = re.finditer(r'(<(\d*)[a-zA-Z]>.*)', ori)
mainlist = []
sublist = []
for sr in matches:
if int(sr.groups()[1]) == 1:
print "1 found!"
if sublist != []:
mainlist.append(sublist)
sublist = []
else:
sublist.append(sr.groups()[0])
else:
mainlist.append(sublist)
return mainlist

Categories