Python: split line by comma, then by space

Python: split line by comma, then by space - python

I'm using Python 3 and I need to parse a line like this
-1 0 1 0 , -1 0 0 1
I want to split this into two lists using Fraction so that I can also parse entries like
1/2 17/12 , 1 0 1 1
My program uses a structure like this
from sys import stdin
...
functions'n'stuff
...
for line in stdin:
and I'm trying to do
for line in stdin:
X = [str(elem) for elem in line.split(" , ")]
num = [Fraction(elem) for elem in X[0].split()]
den = [Fraction(elem) for elem in X[1].split()]
but all I get is a list index out of range error: den = [Fraction(elem) for elem in X[1].split()]
IndexError: list index out of range
I don't get it. I get a string from line. I split that string into two strings at " , " and should get one list X containing two strings. These I split at the whitespace into two separate lists while converting each element into Fraction. What am I missing?
I also tried adding X[-1] = X[-1].strip() to get rid of \n that I get from ending the line.

The problem is that your file has a line without a " , " in it, so the split doesn't return 2 elements.
I'd use split(',') instead, and then use strip to remove the leading and trailing blanks. Note that str(...) is redundant, split already returns strings.
X = [elem.strip() for elem in line.split(",")]
You might also have a blank line at the end of the file, which would still only produce one result for split, so you should have a way to handle that case.

With valid input, your code actually works.
You probably get an invalid line, with too much space or even an empty line or so. So first thing inside the loop, print line. Then you know what's going on, you can see right above the error message what the problematic line was.
Or maybe you're not using stdin right. Write the input lines in a file, make sure you only have valid lines (especially no empty lines). Then feed it into your script:
python myscript.py < test.txt

How about this one:
pairs = [line.split(",") for line in stdin]
num = [fraction(elem[0]) for elem in pairs if len(elem) == 2]
den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Related

Removing \n from readlines with rstrip("\n")

im reading 2 files per lines that contains strings
and grouping per other one here in my code im grooping 1 line from proxyfile with 3 lines from websitefiles
major problem when i try to print the lines from 2nd file i cant remove the \n
i have tried to use rstrip("\n") but i got this exception
AttributeError: 'tuple' object has no attribute 'rstrip'
this is my code :
proxy = open("proxy.txt",'r')
readproxy = proxy.readlines()
sites = open("websites.txt",'r')
readsites = sites.readlines()
for i,j in zip(readproxy, zip(*(iter(readsites),) * 3)):
try:
i = i.rstrip("\n")
#j = j.rstrip("\n")
print(i,j)
except ValueError:
print 'whatever'
i removed the \n from "i" successfully , but coudent remove it from "j"
this screenshot can explain all i think

A tuple doesn't have a method that magically propagates to all string elements.
To do that you have to apply rstrip for all items in a list comprehension to create a list.
j = [x.rstrip("\n") for x in j]
that rebuilds a list with items stripped. Convert to tuple if really needed:
j = tuple(x.rstrip("\n") for x in j)
note that you could avoid this post processing by doing:
readsites = sites.read().splitlines()
instead of sites.readlines() since splitlines splits into lines while removing the termination character if no argument is specified. The only drawback is in the case when the file is huge. In that case, you could encounter a memory issue (memory needed by read + split is doubled). In which case, you could do:
with open("websites.txt",'r') as sites:
readsites = [line.rstrip("\n") for line in sites]

Interpreting a string received from a socket

I am trying to interpret a string that I have received from a socket. The first set of data is seen below:
2 -> 1
1 -> 2
2 -> 0
0 -> 2
0 -> 2
1 -> 2
2 -> 0
I am using the following code to get the numerical values:
for i in range(0,len(data)-1):
if data[i] == "-":
n1 = data[i-2]
n2 = data[i+3]
moves.append([int(n1),int(n2)])
But when a number greater than 9 appears in the data, the program only takes the second digit of that number (eg. with 10 the program would get 0). How would I get both of the digits from the code while maintaining the ability to get single digit numbers?

Well you just grab one character on each side ..
for the second value you can make it like this: data[i+3,len(data)-1]
for the first one: : data[0,i-2]

Use the split() function
numlist = data[i].split('->')
moves.append([int(numlist[0]),int(numlist[1])])

I assume each line is available as a (byte) string in a variable named line. If it's a whole bunch of lines then you can split it into individual lines with
lines = data.splitlines()
and work on each line inside a for statement:
for line in lines:
# do something with the line
If you are confident the lines will always be correctly formatted the easiest way to get the values you want uses the string split method. A full code starting from the data would then read like this.
lines = data.splitlines()
for line in lines:
first, _, second = line.split()
moves.append([int(first), int(second)])

Returning every instance of whatever's between two strings in a file [Python 3]

What I'm trying to do is open a file, then find every instance of '[\x06I"' and '\x06;', then return whatever is between the two.
Since this is not a standard text file (it's map data from RPG maker) readline() will not work for my purposes, as the file is not at all formatted in such a way that the data I want is always neatly within one line by itself.
What I'm doing right now is loading the file into a list with read(), then simply deleting characters from the very beginning until I hit the string '[\x06I'. Then I scan ahead to find '\x06;', store what's between them as a string, append said string to a list, then resume at the character after the semicolon I found.
It works, and I ended up with pretty much exactly what I wanted, but I feel like that's the worst possible way to go about it. Is there a more efficient way?
My relevant code:
while eofget == 0:
savor = 0
while savor == 0 or eofget == 0:
if line[0:4] == '[\x06I"':
x = 4
spork = 0
while spork == 0:
x += 1
if line[x] == '\x06':
if line[x+1] == ';':
spork = x
savor = line[5:spork] + "\n"
line = line[x+1:]
linefinal[lineinc] = savor
lineinc += 1
elif line[x:x+7] == '#widthi':
print("eof reached")
spork = 1
eofget = 1
savor = 0
elif line[x:x+7] == '#widthi':
print("finished map " + mapname)
eofget = 1
savor = 0
break
else:
line = line[1:]
You can just ignore the variable names. I just name things the first thing that comes to mind when I'm doing one-offs like this. And yes, I am aware a few things in there don't make any sense, but I'm saving cleanup for when I finalize the code.
When eofget gets flipped on this subroutine terminates and the next map is loaded. Then it repeats. The '#widthi' check is basically there to save time, since it's present in every map and indicates the beginning of the map data, AKA data I don't care about.

I feel this is a natural case to use regular expressions. Using the findall method:
>>> s = 'testing[\x06I"text in between 1\x06;filler text[\x06I"text in between 2\x06;more filler[\x06I"text in between \n with some line breaks \n included in the text\x06;ending'
>>> import re
>>> p = re.compile('\[\x06I"(.+?)\x06;', re.DOTALL)
>>> print(p.findall(s))
['text in between 1', 'text in between 2', 'text in between \n with some line breaks \n included in the text']
The regex string '\[\x06I"(.+?)\x06;'can be interpreted as follows:
Match as little as possible (denoted by ?) of an undetermined number of unspecified characters (denoted by .+) surrounded by '[\x06I"' and '\x06;', and only return the enclosed text (denoted by the parentheses around .+?)
Adding re.DOTALL in the compile makes the .? match line breaks as well, allowing multi-line text to be captured.

I would use split():
fulltext = 'adsfasgaseg[\x06I"thisiswhatyouneed\x06;sdfaesgaegegaadsf[\x06I"this is the second what you need \x06;asdfeagaeef'
parts = fulltext.split('[\x06I"') # split by first label
results = []
for part in parts:
if '\x06;' in part: # if second label exists in part
results.append(part.split('\x06;')[0]) # get the part until the second label
print results

finding and replacing a string within a line with an if statement

I am trying to parse a particular text file. I am trying to open the text file and line by line ask if a particular string is there (In the following example case its the presence of the number 01 in the curly brackets), then manipulate a particular string either forwards backwards, or keep it the same. Here's that example, with one line named arbitrarily "go"... (other lines in the full file have similar format but have {01}, {00} etc...
go = 'USC_45774-1111-0 <hkxhk> {10} ; 78'
go = go.replace(go[22:24],go[23:21:-1])
>>> go
'USC_45774-1111-0 <khxkh> {10} ; 78'
I am trying to manipulate the first "hk" (go[22:24]) by replacing it with the same letters but backwards (go[23:21:-1).What I want is to see khxhk but as you can see, the result I am getting is that both are turned backwards to khxkh.
I am also having a problem of executing the specific if statement for each line. Many lines that dont have {01} are being manipulated as if they were....
with open('c:/LG 1A.txt', 'r') as rfp:
with open('C:/output5.txt', 'w') as wfp:
for line in rfp.readlines():
if "{01}" or "{-1}" in line:
line = line.replace(line[25:27],line[26:24:-1])
line = line.replace("<"," ")
line = line.replace(">"," ")
line = line.replace("x"," ")
wfp.write(line)
elif "{10}" or "{1-}" in line:
line = line.replace(line[22:24],line[23:21:-1])
line = line.replace("<"," ")
line = line.replace(">"," ")
line = line.replace("x"," ")
wfp.write(line)
elif "{11}" in line:
line = line.replace(line[22:27],line[26:21:-1])
line = line.replace("<"," ")
line = line.replace(">"," ")
line = line.replace("x"," ")
wfp.write(line)
wfp.close()
Am I missing something simple?

The string replace method does not replace characters by position, it replaces them by what characters they are.
>>> 'apple aardvark'.replace('a', '!')
'!pple !!rdv!rk'
So in your first case, you are telling to replace "hk" with "kh". It doesn't "know" that you want to only replace one of the occurrences; it just knows you want to replace "hk" with "kh", so it replaces all occurrences.
You can use the count argument to replace to specify that you only want to replace the first occurrence:
>>> go = 'USC_45774-1111-0 <hkxhk> {10} ; 78'
... go.replace(go[22:24],go[23:21:-1],1)
'USC_45774-1111-0 <khxhk> {10} ; 78'
Note, though, that this will always replace the first occurrence, not necessarily the occurrence at the position in the string you specified. In this case I guess that's what you want, but it may not work directly for other similar tasks. (That is, there is no way to use this method as-is to replace the second occurrence or the third occurrence; you can only replace the first, or the first two, or the first three, etc. To replace the second or third occurrence you'd need to do a bit more.)
As for the second part of your question, you are misunderstanding what if "{01}" or "{-1}" in line means. It means, in layman's terms, if "{01}" or if "{-1}" in line. Since if "{01}" is always true (i.e., the string "{01}" is not a false value), the whole condition is always true. What you want is if "{01}" in line or "{-1}" in line".

I don't know what it is about Python, but your problem is one that gets posted here at least a couple times every day.
if "{01}" or "{-1}" in line:
This doesn't do what you think it does. It asks, "is "{01}" true"? Because it's a non-zero-length string, it is. Because or short-circuits, the rest of the condition is not tested because the first argument is true. Therefore the body of your if statement is always executed.
In other words, Python evaluates as if you'd written this:
if ("{01}") or ("{-1}" in line):
You want something like:
if "{01}" in line or "{-1}" in line:
Or if you have a lot of similar conditions:
if any(x in line for x in ("{01}", "{-1}")):

you can use count argument of replace():
'USC_45774-1111-0 <hkxhk> {10} ; 78'.replace("hk","kh",1)
For your second question, you need change the condition to:
if "{01}" in line or "{-1}" in line:
...

python: find and replace numbers < 1 in text file

I'm pretty new to Python programming and would appreciate some help to a problem I have...
Basically I have multiple text files which contain velocity values as such:
0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00
etc for many lines...
What I need to do is convert all the values in the text file that are less than 1 (e.g. 0.137865E+00 above) to an arbitrary value of 0.100000E+01. While it seems pretty simple to replace specific values with the 'replace()' method and a while loop, how do you do this if you want to replace a range?
thanks

I think when you are beginning programming, it's useful to see some examples; and I assume you've tried this problem on your own first!
Here is a break-down of how you could approach this:
contents='0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00'
The split method works on strings. It returns a list of strings. By default, it splits on whitespace:
string_numbers=contents.split()
print(string_numbers)
# ['0.259515E+03', '0.235095E+03', '0.208262E+03', '0.230223E+03', '0.267333E+03', '0.217889E+03', '0.156233E+03', '0.144876E+03', '0.136187E+03', '0.137865E+00']
The map command applies its first argument (the function float) to each of the elements of its second argument (the list string_numbers). The float function converts each string into a floating-point object.
float_numbers=map(float,string_numbers)
print(float_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 0.13786499999999999]
You can use a list comprehension to process the list, converting numbers less than 1 into the number 1. The conditional expression (1 if num<1 else num) equals 1 when num is less than 1, otherwise, it equals num.
processed_numbers=[(1 if num<1 else num) for num in float_numbers]
print(processed_numbers)
# [259.51499999999999, 235.095, 208.262, 230.22300000000001, 267.33300000000003, 217.88900000000001, 156.233, 144.876, 136.18700000000001, 1]
This is the same thing, all in one line:
processed_numbers=[(1 if num<1 else num) for num in map(float,contents.split())]
To generate a string out of the elements of processed_numbers, you could use the str.join method:
comma_separated_string=', '.join(map(str,processed_numbers))
# '259.515, 235.095, 208.262, 230.223, 267.333, 217.889, 156.233, 144.876, 136.187, 1'

typical technique would be:
read file line by line
split each line into a list of strings
convert each string to the float
compare converted value with 1
replace when needed
write back to the new file
As I don't see you having any code yet, I hope that this would be a good start

def float_filter(input):
for number in input.split():
if float(number) < 1.0:
yield "0.100000E+01"
else:
yield number
input = "0.259515E+03 0.235095E+03 0.208262E+03 0.230223E+03 0.267333E+03 0.217889E+03 0.156233E+03 0.144876E+03 0.136187E+03 0.137865E+00"
print " ".join(float_filter(input))

import numpy as np
a = np.genfromtxt('file.txt') # read file
a[a<1] = 0.1 # replace
np.savetxt('converted.txt', a) # save to file

You could use regular expressions for parsing the string. I'm assuming here that the mantissa is never larger than 1 (ie, begins with 0). This means that for the number to be less than 1, the exponent must be either 0 or negative. The following regular expression matches '0', '.', unlimited number of decimal digits (at least 1), 'E' and either '+00' or '-' and two decimal digits.
0\.\d+E(-\d\d|\+00)
Assuming that you have the file read into variable 'text', you can use the regexp with the following python code:
result = re.sub(r"0\.\d*E(-\d\d|\+00)", "0.100000E+01", text)
Edit: Just realized that the description doesn't limit the valid range of input numbers to positive numbers. Negative numbers can be matched with the following regexp:
-0\.\d+E[-+]\d\d
This can be alternated with the first one using the (pattern1|pattern2) syntax which results in the following Python code:
result = re.sub(r"(0\.\d+E(-\d\d|\+00)|-0\.\d+E[-+]\d\d)", "0.100000E+00", subject)
Also if there's a chance that the exponent goes past 99, the regexp can be further modified by adding a '+' sign after the '\d\d' patterns. This allows matching digits ending in two OR MORE digits.

I've got the script working as I want now...thanks people.
When writing the list to a new file I used the replace method to get rid of the brackets and commas - is there a simpler way?
ftext = open("C:\\Users\\hhp06\\Desktop\\out.grd", "r")
otext = open("C:\\Users\\hhp06\\Desktop\\out2.grd", "w+")
for line in ftext:
stringnum = line.split()
floatnum = map(float, stringnum)
procnum = [(1.0 if num<1 else num) for num in floatnum]
stringproc = str(procnum)
s = (stringproc).replace(",", " ").replace("[", " ").replace("]", "")
otext.writelines(s + "\n")
otext.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: split line by comma, then by space - python

How about this one: pairs = [line.split(",") for line in stdin] num = [fraction(elem[0]) for elem in pairs if len(elem) == 2] den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Related

Removing \n from readlines with rstrip("\n")

Interpreting a string received from a socket

Returning every instance of whatever's between two strings in a file [Python 3]

finding and replacing a string within a line with an if statement

python: find and replace numbers < 1 in text file

Categories

Resources