Splitting a string of values into multiple strings

Splitting a string of values into multiple strings - python

I'm currently working on a program that takes a group of lists from a csv file, and groups them together. The program I came up with is:
List_one = []
with open("trees.csv") as f:
skiplines = f.readline()
for line in f:
res = line.split(" ")
List_one.append(res)
for i in List_one:
(i[0]) = (i[0]).rstrip("\n")
print (List_one)
What I get now are a group of lists, but the problem is that these lists are strings and I want them as floats. The lists look like this:
[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4'], ['5,10.7,81,18.8'], ['6,10.8,83,19.7'], ['7,11.0,66,15.6'], ['8,11.0,75,18.2'], ['9,11.1,80,22.6'], ['10,11.2,75,19.9'], ['11,11.3,79,24.2'], ['12,11.4,76,21.0'], ['13,11.4,76,21.4'], ['14,11.7,69,21.3'], ['15,12.0,75,19.1'], ['16,12.9,74,22.2'], ['17,12.9,85,33.8'], ['18,13.3,86,27.4'], ['19,13.7,71,25.7'], ['20,13.8,64,24.9'], ['21,14.0,78,34.5'], ['22,14.2,80,31.7'], ['23,14.5,74,36.3'], ['24,16.0,72,38.3'], ['25,16.3,77,42.6'], ['26,17.3,81,55.4'], ['27,17.5,82,55.7'], ['28,17.9,80,58.3'], ['29,18.0,80,51.5'], ['30,18.0,80,51.0'], ['31,20.6,87,77.0']]
As you guys can see I also can't use float() on list one either, because the list is a whole string on its own. Is there a way I can split the lists by indexing so I get:
['1', '8.3', '70', '10.3'].....
Any help is welcome.

"line.split(',')" split the string with "," and returns list.
for string '1,8.3,70,10.3' it will return [1, 8.3, 70, 10.3]

You can split the strings by the commas if you want. You should probably do everything before you append them to List_one though.
res = [float(x) for x in line.split(" ")[0].split(",")]
List_one.append(res)
Does this work how you want it to? Sorry I'm not sure what format the input is in so I'm kind of guessing

You could say:
res = line.split(" ")
# map takes a function as the first arg and a list as the second
list_of_floats = list(map(lambda n: float(n), res.split(",")))
# then you can
List_one.append(list_of_floats)
Which will still give you a nested list because you are pushing a list during each iteration of for line in f:, but each list would at least be floats as you've specified.
If you wanted to just get one flat list of floats instead of doing the initial line.split(' ') you could use regex to split the line read from the csv:
import re # at the top of your file
res = re.split(r'[\s\,]', line)
list_of_floats = list(map(lambda n: float(n), res))
List_one.append(list_of_floats)

This might help:
l =[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4']]
l2 =[]
for x in l:
a =x[0].split(",")
l2.append(a)
print(l2)
Enjoy!

Related

How to convert a for loop output into a python list

I'm new to Python, sorry for the level of this question.
This is my output (prices from a website). I'm wondering how to convert them into a list of ints
for price_list_items in price_list:
for values in price_list_items:
x= values.rstrip(' zł')
print(x)
479 000
355 000
269 000
499 000
289 000
The desired result will like this [479 000,355 000,... ]. Also, I want to be able to perform basic with the values.
I found this thread How to convert a for loop output into a list (python), but it didn't help me.

lista = []
for price_list_items in price_list:
for values in price_list_items:
x= values.rstrip(' zł')
lsita.append(x)
lista = ['479 000', '350 000']
for idx, item in enumerate(lista):
item = item.split()
item = ''.join(item)
lista[idx] = int(item)
print(lista)
~/python/stack$ python3.7 sum.py [479000, 350000]
Change your last line to append to lista instead of print. Now we have lista = ['479 000', ...] but we want ints to perform operations on.
So we can then enumerate our list, from there we can split() and join() to get to here lista = ['479000', ...] then we can just use int(item) and put them back into lista as ints
For fun we could do some map and just go from:
lista = ['479 000', '350 000']
lista = list(map(lambda x: int(''.join((x.split()))), lista))

It looks like your string is meant to be a series of 6-digit numbers but both the individual number-parts, for lack of a better term, are split by spaces and the numbers themselves are split by newlines. The solution, therefore, is to remove the space in between number-parts, converting the result to an integer, like this:
int(part.replace(' ', '')) # Finds all instances of space and replaces them with nothing
Putting this in a list-comprehension, we have:
numbers = [int(l.replace(' ', '')) for l in str]
UPDATE
Since you've posted your code, I can give you a better answer.
[ int(v.rstrip(' zł').replace(' ', '')) for price_list_items in price_list for v in price_list_items ]

How can you terminate a string after k consecutive numbers have been found?

Say I have some list with files of the form *.1243.*, and I wish to obtain everything before these 4 digits. How do I do this efficiently?
An ugly, inefficient example of working code is:
names = []
for file in file_list:
words = file.split('.')
for i, word in enumerate(words):
if word.isdigit():
if int(word)>999 and int(word)<10000:
names.append(' '.join(words[:i]))
break
print(names)
Obviously though, this is far from ideal and I was wondering about better ways to do this.

You may want to use regular expressions for this.
import re
name = []
for file in file_list:
m = re.match(r'^(.+?)\.\d{4}\.', file)
if m:
name.append(m.groups()[0])

Using a regular expression, this would become simpler
import re
names = ['hello.1235.sas','test.5678.hai']
for fn in names:
myreg = r'(.*)\.(?:\d{4})\..*'
output = re.findall(myreg,fn)
print(output)
output:
['hello']
['test']

If you know that all entries has the same format, here is list comprehension approach:
[item[0] for item in filter(lambda start, digit, end: len(digit) == 4, (item.split('.') for item in file_list))]
To be fair I also like solution, provided by #James. Note, that downside of this list comprehension is three loops:
1. On all items to split
2. Filtering all items, that match
3. Returning result.
With regular for loop it could be be more sufficient:
output = []
for item in file_list:
begging, digits, end = item.split('.')
if len(digits) == 4:
output.append(begging)
It does only one loop, which way better.

You can use Positive Lookahead (?=(\.\d{4}))
import re
pattern=r'(.*)(?=(\.\d{4}))'
text=['*hello.1243.*','*.1243.*','hello.1235.sas','test.5678.hai','a.9999']
print(list(map(lambda x:re.search(pattern,x).group(0),text)))
output:
['*hello', '*', 'hello', 'test', 'a']

Building a List from a text file in python

I am a python newbie.
I want to read a text file which reads something like this
1345..
245..
..456
and store it in a list of lists of integers. I want to keep the numbers and replaces the periods by 0s.How do i do it?
EDIT:
Apologize for the ambiguous output spec
p.s I want the output to be a list of list
[ [1,3,4,5,0,0],
[2,4,5,0,0],
[0,0,4,5,6]]

with open('yourfile') as f:
lst = [ map(int,x.replace('.','0')) for x in f ]
Which is the same thing as the following nested list-comp:
lst = [ [int(val) for val in line.replace('.','0')] for line in f]
Here I used str.replace to change the '.' to '0' before converting to an integer.

with open(file) as f:
lis=[[int(y) for y in x.replace('.','0').strip()] for x in f]

Here's an answer in the form of classic for loops, which is easier for a newbie to understand:
a_list = []
l = []
with open('a') as f:
for line in f:
for c in line.rstrip('\n').replace('.', '0'):
l.append(int(c))
a_list.append(l)
#next line
l = []
print a_list

How to split the file content by space and end-of-line character?

When I do the following list comprehension I end up with nested lists:
channel_values = [x for x in [ y.split(' ') for y in
open(channel_output_file).readlines() ] if x and not x == '\n']
Basically I have a file composed of this:
7656 7653 7649 7646 7643 7640 7637 7634 7631 7627 7624 7621 7618 7615
8626 8623 8620 8617 8614 8610 8607 8604 8600 8597 8594 8597 8594 4444
<snip several thousand lines>
Where each line of this file is terminated by a new line.
Basically I need to add each number (they are all separated by a single space) into a list.
Is there a better way to do this via list comprehension?

You don't need list comprehensions for this:
channel_values = open(channel_output_file).read().split()

Just do this:
channel_values = open(channel_output_file).read().split()
split() will split according to whitespace that includes ' ' '\t' and '\n'. It will split all the values into one list.
If you want integer values you can do:
channel_values = map(int, open(channel_output_file).read().split())
or with list comprehensions:
channel_values = [int(x) for x in open(channel_output_file).read().split()]

Also, the reason the original list comprehension had nested lists is because you added an extra level of list comprehension with the inner set of square brackets. You meant this:
channel_values = [x for x in y.split(' ') for y in
open(channel_output_file) if x and not x == '\n']
The other answers are still better ways to write the code, but that was the cause of the problem.

Well another problem is that you're leaving the file open. Note that open is an alias for file.
try this:
f = file(channel_output_file)
channel_values = f.read().split()
f.close()
Note they'll be string values so if you want integer ones change the second line to
channel_values = [int(x) for x in f.read().split()]
int(x) will throw a ValueError if you have a non integer value in the file.

If you don't care about dangling file references, and you really must have a list read into memory all at once, the one-liner mentioned in other answers does work:
channel_values = open(channel_output_path).read().split()
In production code, I would probably use a generator, why read all those lines if you don't need them?
def generate_values_for_filename(filename):
with open(filename) as f:
for line in f:
for value in line.split():
yield value
You can always make a list later if you really need to do something other than iterate over values:
channel_values = list(generate_values_for_filename(channel_output_path))

Is there a better way to do this via list comprehension?
Sort of..
Instead of reading each line as an array, with the .readlines() methods, you can just use .read():
channel_values = [x for x in open(channel_output_file).readlines().split(' ')
if x not in [' ', '\n']]
If you need to do anything more complicated, particularly if it involves multiple list-comprehensions, you're almost always better of expanding it into a regular for loop.
out = []
for y in open(channel_output_file).readlines():
for x in y.split(' '):
if x not in [' ', '\n']:
out.append(x)
Or using a for loop and a list-comprehension:
out = []
for y in open(channel_output_file).readlines():
out.extend(
[x for x in y.split(' ')
if x != ' ' and x != '\n'])
Basically, if you can't do something simply with a list comprehension (or need to nest them), list-comprehensions are probably not the best solution.

Python - ignore lines in a file

How does one ignore lines in a file?
Example:
If you know that the first lines in a file will begin with say, a or b and the remainder of lines end with c, how does one parse the file so that lines beginning a or b are ignored and lines ending c are converted to a nested list?
What I have so far:
fname = raw_input('Enter file name: ')
z = open(fname, 'r')
#I tried this but it converts all lines to a nested list
z_list = [i.strip().split() for i in z]
I am guessing that I need a for loop.
for line in z:
if line[0] == 'a':
pass
if line[0] == 'b':
pass
if line[-1] == 'c':
list_1 = [line.strip().split()]
The above is the general idea but I am expert at making dead code! How does one render it undead?

startswith can take a tuple of strings to match, so you can do this:
[line.strip().split() for line in z if not line.startswith(('a', 'b'))]
This will work even if a and b are words or sentences not just characters.
If there can be cases where lines don't start with a or b but also don't end with c you can extend the list comprehension to this:
[
line.strip().split()
for line in z if line.endswith('c') and not line.startswith(('a', 'b'))
]

One very general approach is to "filter" the file by removing some lines:
import itertools
zlist = [l.strip.split() for l in itertools.ifilter(lambda line: line[0] not in 'ab', z)]
You can use itertools.ifilter any time you want to "selectively filter" an iterable, getting another iterable which only contains those items which satisfy some predicate -- which is why I say this approach is very general. itertools has a lot of great, fast tools for dealing with iterables in a myriad way, and is well worth studying.
A similar but syntactically simpler approach, which suffices in your case (and which therefore I would recommend due to the virtue of simplicity), is to do the "filtering" with an if clause in the listcomp:
zlist = [l.strip.split() for l in z if l[0] not in 'ab']

You can add if conditions to list comprehensions.
z_list = [i.strip().split() for i in z if i[-1] == 'c']
or
z_list = [i.strip().split() for i in z if (i[0] <> 'a' and i[0] <> 'b')]

One way to do it is to replace 'pass' with 'continue'. This will continue to the next line in the file without doing anything. You will also need to append line to list_1
if line[-1] == 'c':
list_1.append([line.strip().split()])

f=open("file")
for line in f:
li=line.strip()
if not li[0] in ["a","b"] and li[-1]=="c":
print line.rstrip()
f.close()

For those interested in the solution.
And also, another question!
Example file format:
c this is a comment
p m 1468 1 267
260 32 0
8 1 0
Code:
fname = raw_input('Please enter the name of file: ')
z = open(fname, 'r')
required_list = [line.strip().split() for line in z if not line.startswith(('c', 'p'))]
print required_list
Output:
[['260', '32', '0'], ['8', '1', '0']]
Any suggestions on how to convert the strings in the lists to integers and perform arithmetic operations?
Pseudocode to illustrate:
#for the second item in each sublist
#if sum is > than first number in second line of file
#pass
#else
#abort/raise error
Cheers folks for your suggestions so far,
Seafoid.
#Nadia, my day seems a little more worthwhile now! I spent hours (days even) trying to crack this solo! Thanks!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting a string of values into multiple strings - python

"line.split(',')" split the string with "," and returns list. for string '1,8.3,70,10.3' it will return [1, 8.3, 70, 10.3]

This might help: l =[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4']] l2 =[] for x in l: a =x[0].split(",") l2.append(a) print(l2) Enjoy!

Related

How to convert a for loop output into a python list

How can you terminate a string after k consecutive numbers have been found?

Building a List from a text file in python

How to split the file content by space and end-of-line character?

Python - ignore lines in a file

Categories

Resources