How does one ignore lines in a file?
Example:
If you know that the first lines in a file will begin with say, a or b and the remainder of lines end with c, how does one parse the file so that lines beginning a or b are ignored and lines ending c are converted to a nested list?
What I have so far:
fname = raw_input('Enter file name: ')
z = open(fname, 'r')
#I tried this but it converts all lines to a nested list
z_list = [i.strip().split() for i in z]
I am guessing that I need a for loop.
for line in z:
if line[0] == 'a':
pass
if line[0] == 'b':
pass
if line[-1] == 'c':
list_1 = [line.strip().split()]
The above is the general idea but I am expert at making dead code! How does one render it undead?
startswith can take a tuple of strings to match, so you can do this:
[line.strip().split() for line in z if not line.startswith(('a', 'b'))]
This will work even if a and b are words or sentences not just characters.
If there can be cases where lines don't start with a or b but also don't end with c you can extend the list comprehension to this:
[
line.strip().split()
for line in z if line.endswith('c') and not line.startswith(('a', 'b'))
]
One very general approach is to "filter" the file by removing some lines:
import itertools
zlist = [l.strip.split() for l in itertools.ifilter(lambda line: line[0] not in 'ab', z)]
You can use itertools.ifilter any time you want to "selectively filter" an iterable, getting another iterable which only contains those items which satisfy some predicate -- which is why I say this approach is very general. itertools has a lot of great, fast tools for dealing with iterables in a myriad way, and is well worth studying.
A similar but syntactically simpler approach, which suffices in your case (and which therefore I would recommend due to the virtue of simplicity), is to do the "filtering" with an if clause in the listcomp:
zlist = [l.strip.split() for l in z if l[0] not in 'ab']
You can add if conditions to list comprehensions.
z_list = [i.strip().split() for i in z if i[-1] == 'c']
or
z_list = [i.strip().split() for i in z if (i[0] <> 'a' and i[0] <> 'b')]
One way to do it is to replace 'pass' with 'continue'. This will continue to the next line in the file without doing anything. You will also need to append line to list_1
if line[-1] == 'c':
list_1.append([line.strip().split()])
f=open("file")
for line in f:
li=line.strip()
if not li[0] in ["a","b"] and li[-1]=="c":
print line.rstrip()
f.close()
For those interested in the solution.
And also, another question!
Example file format:
c this is a comment
p m 1468 1 267
260 32 0
8 1 0
Code:
fname = raw_input('Please enter the name of file: ')
z = open(fname, 'r')
required_list = [line.strip().split() for line in z if not line.startswith(('c', 'p'))]
print required_list
Output:
[['260', '32', '0'], ['8', '1', '0']]
Any suggestions on how to convert the strings in the lists to integers and perform arithmetic operations?
Pseudocode to illustrate:
#for the second item in each sublist
#if sum is > than first number in second line of file
#pass
#else
#abort/raise error
Cheers folks for your suggestions so far,
Seafoid.
#Nadia, my day seems a little more worthwhile now! I spent hours (days even) trying to crack this solo! Thanks!
Related
I'm currently working on a program that takes a group of lists from a csv file, and groups them together. The program I came up with is:
List_one = []
with open("trees.csv") as f:
skiplines = f.readline()
for line in f:
res = line.split(" ")
List_one.append(res)
for i in List_one:
(i[0]) = (i[0]).rstrip("\n")
print (List_one)
What I get now are a group of lists, but the problem is that these lists are strings and I want them as floats. The lists look like this:
[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4'], ['5,10.7,81,18.8'], ['6,10.8,83,19.7'], ['7,11.0,66,15.6'], ['8,11.0,75,18.2'], ['9,11.1,80,22.6'], ['10,11.2,75,19.9'], ['11,11.3,79,24.2'], ['12,11.4,76,21.0'], ['13,11.4,76,21.4'], ['14,11.7,69,21.3'], ['15,12.0,75,19.1'], ['16,12.9,74,22.2'], ['17,12.9,85,33.8'], ['18,13.3,86,27.4'], ['19,13.7,71,25.7'], ['20,13.8,64,24.9'], ['21,14.0,78,34.5'], ['22,14.2,80,31.7'], ['23,14.5,74,36.3'], ['24,16.0,72,38.3'], ['25,16.3,77,42.6'], ['26,17.3,81,55.4'], ['27,17.5,82,55.7'], ['28,17.9,80,58.3'], ['29,18.0,80,51.5'], ['30,18.0,80,51.0'], ['31,20.6,87,77.0']]
As you guys can see I also can't use float() on list one either, because the list is a whole string on its own. Is there a way I can split the lists by indexing so I get:
['1', '8.3', '70', '10.3'].....
Any help is welcome.
"line.split(',')" split the string with "," and returns list.
for string '1,8.3,70,10.3' it will return [1, 8.3, 70, 10.3]
You can split the strings by the commas if you want. You should probably do everything before you append them to List_one though.
res = [float(x) for x in line.split(" ")[0].split(",")]
List_one.append(res)
Does this work how you want it to? Sorry I'm not sure what format the input is in so I'm kind of guessing
You could say:
res = line.split(" ")
# map takes a function as the first arg and a list as the second
list_of_floats = list(map(lambda n: float(n), res.split(",")))
# then you can
List_one.append(list_of_floats)
Which will still give you a nested list because you are pushing a list during each iteration of for line in f:, but each list would at least be floats as you've specified.
If you wanted to just get one flat list of floats instead of doing the initial line.split(' ') you could use regex to split the line read from the csv:
import re # at the top of your file
res = re.split(r'[\s\,]', line)
list_of_floats = list(map(lambda n: float(n), res))
List_one.append(list_of_floats)
This might help:
l =[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4']]
l2 =[]
for x in l:
a =x[0].split(",")
l2.append(a)
print(l2)
Enjoy!
Say I have some list with files of the form *.1243.*, and I wish to obtain everything before these 4 digits. How do I do this efficiently?
An ugly, inefficient example of working code is:
names = []
for file in file_list:
words = file.split('.')
for i, word in enumerate(words):
if word.isdigit():
if int(word)>999 and int(word)<10000:
names.append(' '.join(words[:i]))
break
print(names)
Obviously though, this is far from ideal and I was wondering about better ways to do this.
You may want to use regular expressions for this.
import re
name = []
for file in file_list:
m = re.match(r'^(.+?)\.\d{4}\.', file)
if m:
name.append(m.groups()[0])
Using a regular expression, this would become simpler
import re
names = ['hello.1235.sas','test.5678.hai']
for fn in names:
myreg = r'(.*)\.(?:\d{4})\..*'
output = re.findall(myreg,fn)
print(output)
output:
['hello']
['test']
If you know that all entries has the same format, here is list comprehension approach:
[item[0] for item in filter(lambda start, digit, end: len(digit) == 4, (item.split('.') for item in file_list))]
To be fair I also like solution, provided by #James. Note, that downside of this list comprehension is three loops:
1. On all items to split
2. Filtering all items, that match
3. Returning result.
With regular for loop it could be be more sufficient:
output = []
for item in file_list:
begging, digits, end = item.split('.')
if len(digits) == 4:
output.append(begging)
It does only one loop, which way better.
You can use Positive Lookahead (?=(\.\d{4}))
import re
pattern=r'(.*)(?=(\.\d{4}))'
text=['*hello.1243.*','*.1243.*','hello.1235.sas','test.5678.hai','a.9999']
print(list(map(lambda x:re.search(pattern,x).group(0),text)))
output:
['*hello', '*', 'hello', 'test', 'a']
I am a python newbie.
I want to read a text file which reads something like this
1345..
245..
..456
and store it in a list of lists of integers. I want to keep the numbers and replaces the periods by 0s.How do i do it?
EDIT:
Apologize for the ambiguous output spec
p.s I want the output to be a list of list
[ [1,3,4,5,0,0],
[2,4,5,0,0],
[0,0,4,5,6]]
with open('yourfile') as f:
lst = [ map(int,x.replace('.','0')) for x in f ]
Which is the same thing as the following nested list-comp:
lst = [ [int(val) for val in line.replace('.','0')] for line in f]
Here I used str.replace to change the '.' to '0' before converting to an integer.
with open(file) as f:
lis=[[int(y) for y in x.replace('.','0').strip()] for x in f]
Here's an answer in the form of classic for loops, which is easier for a newbie to understand:
a_list = []
l = []
with open('a') as f:
for line in f:
for c in line.rstrip('\n').replace('.', '0'):
l.append(int(c))
a_list.append(l)
#next line
l = []
print a_list
I have a file format like this:
9 8 1
3 4 1
...
...
Now, I want to get each line as three integers.
When I used
for line in f.readlines():
print line.split(" ")
The script printed this:
['9', '8', '1\r\n']
['3', '4', '1\r\n']
...
...
How can I get each line as three integers?
Using the code you have and addressing your specific question of how to convert your list to integers:
You can iterate through each line and convert the strings to int with the following example using list comprehension:
Given:
line =['3', '4', '1\r\n']
then:
int_list = [int(i) for i in line]
will yield a list of integers
[3, 4, 1]
that you can then access via subscripts (0 to 2). e.g.
int_list[0] contains 3,
int_list[1] contains 4,
etc.
A more streamlined version for your consideration:
with open('data.txt') as f:
for line in f:
int_list = [int(i) for i in line.split()]
print int_list
The advantage of using with is that it will automatically close your file for you when you are done, or if you encounter an exception.
UPDATE:
Based on your comments below, if you want the numbers in 3 different variables, say a, b and c, you can do the following:
for line in f:
a, b, c = [int(i) for i in line.split()]
print 'a = %d, b = %d, c = %d\n' %(a, b, c)
and get this:
a = 9, b = 8, c = 1
This counts on there being 3 numbers on each line.
Aside:
Note that in place of "list comprehension" (LC) you can also use a "generator expression" (GE) of this form:
a, b, c = (int(i) for i in line.split())
for your particular problem with 3 integers this doesn't make much difference, but I show it for completeness. For larger problems, LC requires more memory as it generates a complete list in memory at once, while GE generate a value one by one as needed. This SO question Generator Expressions vs. List Comprehension will give you more information if you are curious.
with open("myfile.txt") as f:
for line in f:
int_list = [int(x) for x in line.split()]
You don't say what you want to do with the list of integers, there may be a better way to iterate over them, depending.
If you "need the values as three different variables," then"
a, b, c = int_list
though you could also use:
int_list[0]
int_list[1]
int_list[2]
as desired.
line.strip().split(" ")
would do.
more complete, with all lines still intact in one large string:
data = f.read().strip() # loose final \n
[ int(x.split(" ")) for x in data.split('\n')]
would give you a list with answers you want for each line.
If you wanna store the integers in three variables :
with open('data1.txt') as f:
for line in f:
a,b,c=(int(x) for x in line.split())
print a,b,c
output:
9 8 1
3 4 1
This block of code should solve your problem:
f = open(filepath)
for line in f:
intList = map(int, line.strip().split())
print intList
f.close()
When I do the following list comprehension I end up with nested lists:
channel_values = [x for x in [ y.split(' ') for y in
open(channel_output_file).readlines() ] if x and not x == '\n']
Basically I have a file composed of this:
7656 7653 7649 7646 7643 7640 7637 7634 7631 7627 7624 7621 7618 7615
8626 8623 8620 8617 8614 8610 8607 8604 8600 8597 8594 8597 8594 4444
<snip several thousand lines>
Where each line of this file is terminated by a new line.
Basically I need to add each number (they are all separated by a single space) into a list.
Is there a better way to do this via list comprehension?
You don't need list comprehensions for this:
channel_values = open(channel_output_file).read().split()
Just do this:
channel_values = open(channel_output_file).read().split()
split() will split according to whitespace that includes ' ' '\t' and '\n'. It will split all the values into one list.
If you want integer values you can do:
channel_values = map(int, open(channel_output_file).read().split())
or with list comprehensions:
channel_values = [int(x) for x in open(channel_output_file).read().split()]
Also, the reason the original list comprehension had nested lists is because you added an extra level of list comprehension with the inner set of square brackets. You meant this:
channel_values = [x for x in y.split(' ') for y in
open(channel_output_file) if x and not x == '\n']
The other answers are still better ways to write the code, but that was the cause of the problem.
Well another problem is that you're leaving the file open. Note that open is an alias for file.
try this:
f = file(channel_output_file)
channel_values = f.read().split()
f.close()
Note they'll be string values so if you want integer ones change the second line to
channel_values = [int(x) for x in f.read().split()]
int(x) will throw a ValueError if you have a non integer value in the file.
If you don't care about dangling file references, and you really must have a list read into memory all at once, the one-liner mentioned in other answers does work:
channel_values = open(channel_output_path).read().split()
In production code, I would probably use a generator, why read all those lines if you don't need them?
def generate_values_for_filename(filename):
with open(filename) as f:
for line in f:
for value in line.split():
yield value
You can always make a list later if you really need to do something other than iterate over values:
channel_values = list(generate_values_for_filename(channel_output_path))
Is there a better way to do this via list comprehension?
Sort of..
Instead of reading each line as an array, with the .readlines() methods, you can just use .read():
channel_values = [x for x in open(channel_output_file).readlines().split(' ')
if x not in [' ', '\n']]
If you need to do anything more complicated, particularly if it involves multiple list-comprehensions, you're almost always better of expanding it into a regular for loop.
out = []
for y in open(channel_output_file).readlines():
for x in y.split(' '):
if x not in [' ', '\n']:
out.append(x)
Or using a for loop and a list-comprehension:
out = []
for y in open(channel_output_file).readlines():
out.extend(
[x for x in y.split(' ')
if x != ' ' and x != '\n'])
Basically, if you can't do something simply with a list comprehension (or need to nest them), list-comprehensions are probably not the best solution.