Python len() confusion - python

I'm not sure what I'm doing wrong here, spent all day googling and reading python books..
I have the following function:
def extract(inNo, inputFile2, outputFile):
ifile = open(inputFile2, 'r')
ofile = open(outputFile, 'w')
lines = ifile.readlines()
for line in lines:
print(str(len(line)))
if str(len(line)) == str(inNo):
ofile.write(line)
I'm trying to understand len(), I seem to get odd results when using it.
My input file is the following:
1
22
333
4444
55555
666666
7777777
88888888
Now if I use '7' as the inNo variable, the output (i.e., print) I get is:
2
3
4
5
6
7
8
8
and the output file becomes:
666666
I'm sure from checking in python.exe length count start from 1 i.e:
len('123')
would give a result of
3
Is my understanding of len() wrong, or am I coding it wrong?
Essentially, what this function does is; it takes an input, an output and a character length as arguments. These come from a different function and the 2nd function calls this one with arguments.
This function reads lines from the input file.
For every line, it gets the character length and compares it to the input No.
If they are equal it should write that line to the output file. I assume as I have no "else:" it should carry on the iteration.
The print() function allows me to see exactly what it has calculated as the length.
When I try to add '- 1' or '+ 1' to the len() i.e. (len(line) + 1) it causes even more odd things start to happen.

len() also considers the new line character \n, that's why you're getting one more in every line except the last one.

Related

python file output bugs

So I have to take the numbers from a certain file
containing:
1 5
2 300
3 3
9 155
7 73
7 0
Multiply them and add them to a new file
I used the script under here but for some reason, it now gives a syntax error.
f=open('multiply.txt')
f2=open('resulted.txt','w')
while True:
line=f.readline()
if len(line)==0:
break
line=line.strip()
result=line.split(" ")
multiply=int(result[0])*int(result[1])
multiply=str(multiply)
answer=print(result[0],"*",result[1],"=",multiply)
f2.write(str(multiply))
f.close()
f2.close()
i found out that f2.write(multiply) works
but i get all the answers as 1 string (5600913955110)
how do i get it to be 1 good text file and give the right calculation
Update:
f=open('multiply.txt')
f2=open('result.txt','w')
while True:
line=f.readline()
if len(line)==0:
break
line=line.strip()
result=line.split(" ")
multiply=int(result[0])*int(result[1])
multiply=str(multiply)
answer=print(result[0],"*",result[1],"=",multiply)
answer=str(answer)
f2.write(str(answer))
f2.write(str(multiply))
f.close()
f2.close()
output:
None5None600None9None1395None511None0
at the end of the code you have this line:
f2.write(str(answer)
notice there is not a ) at the end and you have two ( in the line.
try this:
f2.write(str(answer))
Also the name of the post sounds like its provoking opinion response. Try to change it so it doesn't mention your friend but the problem at hand.
In most programming languages, there are escape sequences. Escape sequences allow you to do many things. in your case you need to add the escape sequence
"\n"
this will add a new line onto each thing you append to the file.
like this:
answer=str(result[0])+"*"+str(result[1])+"="+str(multiply)
print(answer)
f2.write(str(answer)+"\n")

Editing a column by a multiplier, and then replacing the column with the multiplier and saving the file as a new .txt file -python

So I have a file that looks like this
mass (GeV) spectrum (1-100 GeV)
10 0.06751019803888393
20 0.11048827045815585
30 0.1399367785958526
40 0.1628781532692572
I want to multiply the spectrum by half or any percentage, then create a new file with the same data, but the spectrum is replaced with the new spectrum multiplied by the multiplier
DM_file=input("Name of DM.in file: ") #name of file is DMmumu.in
print(DM_file)
n=float(input('Enter the percentage of annihilation: '))
N=n*100
pct=(1-n)
counter = 0
with open (DM_file,'r+') as f:
with open ('test.txt','w') as output:
lines=f.readlines()
print(type(lines))
Spectrumnew=[]
Spectrum=[]
for i in range(8,58):
single_line=lines[i].split("\t")
old_number = single_line[1]
new_number = float(single_line[1])*pct
Spectrumnew.append(new_number)
Spectrum.append(old_number)
f.replace(Spectrum,Spectrumnew)
output.write(str(new_number))
The problem I'm having is f.replace(Spectrum,Spectrumnew) is not working, and if I were to comment it out, a new file is created called test.txt with just Spectrumnew nothing else. What is wrong with f.replace, am I using the wrong string method?
replace is a function that works on strings. f is not a string. (For that matter, neither is Spectrum or Spectrumnew.)
You need to construct the line you want in the output file as a string and then write it out. You already have string output working. To construct the output line, you can just concatenate the first number from the input, a tab character, and the product of the second number and the multiplier. You can convert a number to a string with the str() function and you can concatenate strings with the + operator.
There are several more specific answers on this site already that may be helpful, such as replacing text in a file with Python.

Interpreting a string received from a socket

I am trying to interpret a string that I have received from a socket. The first set of data is seen below:
2 -> 1
1 -> 2
2 -> 0
0 -> 2
0 -> 2
1 -> 2
2 -> 0
I am using the following code to get the numerical values:
for i in range(0,len(data)-1):
if data[i] == "-":
n1 = data[i-2]
n2 = data[i+3]
moves.append([int(n1),int(n2)])
But when a number greater than 9 appears in the data, the program only takes the second digit of that number (eg. with 10 the program would get 0). How would I get both of the digits from the code while maintaining the ability to get single digit numbers?
Well you just grab one character on each side ..
for the second value you can make it like this: data[i+3,len(data)-1]
for the first one: : data[0,i-2]
Use the split() function
numlist = data[i].split('->')
moves.append([int(numlist[0]),int(numlist[1])])
I assume each line is available as a (byte) string in a variable named line. If it's a whole bunch of lines then you can split it into individual lines with
lines = data.splitlines()
and work on each line inside a for statement:
for line in lines:
# do something with the line
If you are confident the lines will always be correctly formatted the easiest way to get the values you want uses the string split method. A full code starting from the data would then read like this.
lines = data.splitlines()
for line in lines:
first, _, second = line.split()
moves.append([int(first), int(second)])

Python: split line by comma, then by space

I'm using Python 3 and I need to parse a line like this
-1 0 1 0 , -1 0 0 1
I want to split this into two lists using Fraction so that I can also parse entries like
1/2 17/12 , 1 0 1 1
My program uses a structure like this
from sys import stdin
...
functions'n'stuff
...
for line in stdin:
and I'm trying to do
for line in stdin:
X = [str(elem) for elem in line.split(" , ")]
num = [Fraction(elem) for elem in X[0].split()]
den = [Fraction(elem) for elem in X[1].split()]
but all I get is a list index out of range error: den = [Fraction(elem) for elem in X[1].split()]
IndexError: list index out of range
I don't get it. I get a string from line. I split that string into two strings at " , " and should get one list X containing two strings. These I split at the whitespace into two separate lists while converting each element into Fraction. What am I missing?
I also tried adding X[-1] = X[-1].strip() to get rid of \n that I get from ending the line.
The problem is that your file has a line without a " , " in it, so the split doesn't return 2 elements.
I'd use split(',') instead, and then use strip to remove the leading and trailing blanks. Note that str(...) is redundant, split already returns strings.
X = [elem.strip() for elem in line.split(",")]
You might also have a blank line at the end of the file, which would still only produce one result for split, so you should have a way to handle that case.
With valid input, your code actually works.
You probably get an invalid line, with too much space or even an empty line or so. So first thing inside the loop, print line. Then you know what's going on, you can see right above the error message what the problematic line was.
Or maybe you're not using stdin right. Write the input lines in a file, make sure you only have valid lines (especially no empty lines). Then feed it into your script:
python myscript.py < test.txt
How about this one:
pairs = [line.split(",") for line in stdin]
num = [fraction(elem[0]) for elem in pairs if len(elem) == 2]
den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Python: Read large file in chunks

Hey there, I have a rather large file that I want to process using Python and I'm kind of stuck as to how to do it.
The format of my file is like this:
0 xxx xxxx xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1 xxx xxxx xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
So I basically want to read in the chunk up from 0-1, do my processing on it, then move on to the chunk between 1 and 2.
So far I've tried using a regex to match the number and then keep iterating, but I'm sure there has to be a better way of going about this. Any suggestion/info would be greatly appreciated.
If they are all within the same line, that is there are no line breaks between "1." and "2." then you can iterate over the lines of the file like this:
for line in open("myfile.txt"):
#do stuff
The line will be disposed of and overwritten at each iteration meaning you can handle large file sizes with ease. If they're not on the same line:
for line in open("myfile.txt"):
if #regex to match start of new string
parsed_line = line
else:
parsed_line += line
and the rest of your code.
Why don't you just read the file char by char using file.read(1)?
Then, you could - in each iteration - check whether you arrived at the char 1. Then you have to make sure that storing the string is fast.
If the "N " can only start a line, then why not use use the "simple" solution? (It sounds like this already being done, I am trying to reinforce/support it ;-))
That is, just reading a line at a time, and build up the data representing the current N object. After say N=0, and N=1 are loaded, process them together, then move onto the next pair (N=2, N=3). The only thing that is even remotely tricky is making sure not to throw out a read line. (The line read that determined the end condition -- e.g. "N " -- also contain the data for the next N).
Unless seeking is required (or IO caching is disabled or there is an absurd amount of data per item), there is really no reason not to use readline AFAIK.
Happy coding.
Here is some off-the-cuff code, which likely contains multiple errors. In any case, it shows the general idea using a minimized side-effect approach.
# given an input and previous item data, return either
# [item_number, data, next_overflow] if another item is read
# or None if there are no more items
def read_item (inp, overflow):
data = overflow or ""
# this can be replaced with any method to "read the header"
# the regex is just "the easiest". the contract is just:
# given "N ....", return N. given anything else, return None
def get_num(d):
m = re.match(r"(\d+) ", d)
return int(m.groups(1)) if m else None
for line in inp:
if data and get_num(line) ne None:
# already in an item (have data); current line "overflows".
# item number is still at start of current data
return [get_num(data), data, line]
# not in item, or new item not found yet
data += line
# and end of input, with data. only returns above
# if a "new" item was encountered; this covers case of
# no more items (or no items at all)
if data:
return [get_num(data), data, None]
else
return None
And usage might be akin to the following, where f represents an open file:
# check for error conditions (e.g. None returned)
# note feed-through of "overflow"
num1, data1, overflow = read_item(f, None)
num2, data2, overflow = read_item(f, overflow)
If the format is fixed, why not just read 3 lines at a time with readline()
If the file is small, you could read the whole file in and split() on number digits (might want to use strip() to get rid of whitespace and newlines), then fold over the list to process each string in the list. You'll probably have to check that the resultant string you are processing on is not initially empty in case two digits were next to each other.
If the file's content can be loaded in memory, and that's what you answered, then the following code (needs to have filename defined) may be a solution.
import re
regx = re.compile('^((\d+).*?)(?=^\d|\Z)',re.DOTALL|re.MULTILINE)
with open(filename) as f:
text = f.read()
def treat(inp,regx=regx):
m1 = regx.search(inp)
numb,chunk = m1.group(2,1)
li = [chunk]
for mat in regx.finditer(inp,m1.end()):
n,ch = mat.group(2,1)
if int(n) == int(numb) + 1:
yield ''.join(li)
numb = n
li = []
li.append(ch)
chunk = ch
yield ''.join(li)
for y in treat(text):
print repr(y)
This code, run on a file containing :
1 mountain
orange 2
apple
produce
2 gas
solemn
enlightment
protectorate
3 grimace
song
4 snow
wheat
51 guludururu
kelemekinonoto
52asabi dabada
5 yellow
6 pink
music
air
7 guitar
blank 8
8 Canada
9 Rimini
produces:
'1 mountain\norange 2\napple\nproduce\n'
'2 gas\nsolemn\nenlightment\nprotectorate\n'
'3 grimace\nsong\n'
'4 snow\nwheat\n51 guludururu\nkelemekinonoto\n52asabi dabada\n'
'5 yellow\n'
'6 pink \nmusic\nair\n'
'7 guitar\nblank 8\n'
'8 Canada\n'
'9 Rimini'

Categories