Python: File Writing Adding Unintentional Newlines on Linux Only - python

I am using Python 2.7.9. I'm working on a program that is supposed to produce the following output in a .csv file per loop:
URL,number
Here's the main loop of the code I'm using:
csvlist = open(listfile,'w')
f = open(list, "r")
def hasQuality(item):
for quality in qualities:
if quality in item:
return True
return False
for line in f:
line = line.split('\n')
line = line[0]
# print line
itemname = urllib.unquote(line).decode('utf8')
# print itemhash
if hasQuality(itemname):
try:
looptime = time.time()
url = baseUrl + line
results = json.loads(urlopen(url).read())
# status = results.status_code
content = results
if 'median_price' in content:
medianstr = str(content['median_price']).replace('$','')
medianstr = medianstr.replace('.','')
median = float(medianstr)
volume = content['volume']
print url+'\n'+itemname
print 'Median: $'+medianstr
print 'Volume: '+str(volume)
if (median > minprice) and (volume > minvol):
csvlist.write(line + ',' + medianstr + '\n')
print '+ADDED TO LIST'
else:
print 'No median price given for '+itemname+'.\nGiving up on item.'
print "Finished loop in " + str(round(time.time() - looptime,3)) + " seconds."
except ValueError:
print "we blacklisted fool?? cause we skippin beats"
else:
print itemname+'is a commodity.\nGiving up on item.'
csvlist.close()
f.close()
print "Finished script in " + str(round(time.time() - runtime, 3)) + " seconds."
It should be generating a list that looks like this:
AWP%20%7C%20Asiimov%20%28Field-Tested%29,3911
M4A1-S%20%7C%20Hyper%20Beast%20%28Field-Tested%29,4202
But it's actually generating a list that looks like this:
AWP%20%7C%20Asiimov%20%28Field-Tested%29
,3911
M4A1-S%20%7C%20Hyper%20Beast%20%28Field-Tested%29
,4202
Whenever it is ran on a Windows machine, I have no issue. Whenever I run it on my EC2 instance, however, it adds that extra newline. Any ideas why? Running commands on the file like
awk 'NR%2{printf $0" ";next;}1' output.csv
do not do anything. I have transferred it to my Windows machine and it still reads the same. However, when I paste the output into Steam's chat client it concatenates it in the way that I want.
Thanks in advance!

This is where the problem occurs
code:
csvlist.write(line + ',' + medianstr + '\n')
This can be cleared is you strip the space
modified code:
csvlist.write(line.strip() + ',' + medianstr + '\n')
Problem:
The problem is due to the fact you are reading raw lines from the input file
Raw_lines contain \n to indicate there is a new line for every line which is not the last and for the last line it just ends with the given character .
for more details:
Just type print(repr(line)) before writing and see the output

Related

How to write an integer variable to log file with f.write?

This script is a simple while True loop that checks a voltage value and writes a log file. However, I am not able to get the voltage value written in the log.
Simple log files with a text string and date / timestamp work fine but the write fails when I try to use the variable name.
ina3221 = SDL_Pi_INA3221.SDL_Pi_INA3221(addr=0x40)
LIPO_BATTERY_CHANNEL = 1
busvoltage1 = ina3221.getBusVoltage_V(LIPO_BATTERY_CHANNEL)
while True:
if busvoltage1 <= 3.70:
with open("/<path>/voltagecheck.log", "a") as f:
f.write("battery voltage below threshold: " + busvoltage1 + "\n")
f.write("timestamp: " + time.asctime() + "\n")
f.write("-------------------------------------------" + "\n")
f.close()
else:
time.sleep(3600)
I've also tried:
with open("/<path>/voltagecheck.log", "a") as f:
f.write("battery voltage below threshold: " + str(busvoltage1) + "\n")
f.write("timestamp: " + time.asctime() + "\n")
f.write("-------------------------------------------" + "\n")
f.close()
Without trying to add the busvoltage1 value to the log, the log is created and the timestamp line works fine.
With the busvoltage1 value, the log is created but nothing is written.
When running this in the terminal, the errors for "str(busvoltage1)" and just the plain "busvoltage1" are:
ValueError: I/O operation on closed file
and
TypeError: cannot concatenate 'str' and 'float' objects
The value needs to be formatted as a string before f.write is used.
Added v = str(busvoltage1) and then referenced v in f.write: f.write(battery voltage below threshold: " + v + "\n")
This is referenced in the 5th example of 7.2.1 here: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files. Thanks #PatrickArtner

read and write only a line not the full file

i have the following problem in the code below.
I open a file and load it into "csproperties" (Comment #open path). In every open file i want to make three changes (Comment #change parameters). Then i want to write the three changes to the file and close it. I want to do this file per file.
When i now open the changed file, the file has three times the same content. In content one i can see my first change, in content two the second and so on.
I do not understand why my tool writes the full file content 3 times in an changed file.
i think it hat somethink to do with the #write file Block... i tried serverell things, but nothing worked the right way.
Any suggestions?
Kind regards
for instance in cs_id:
cspath.append(cs_id[n] + '/mypath/conf/myfile.txt')
# open path
f = open(cspath[n], "r")
csproperties = f.read()
f.close()
#change parameters
CS_License_Key_New = csproperties.replace(oms + "CSLicenseKey=", oms + "CSLicenseKey="+ keystore[n])
Logfile_New = csproperties.replace(oms + "LogFile=", oms + "LogFile=" + logs + 'ContentServer_' + cs_id[n] +'.log')
Pse_New = csproperties.replace(oms + "PABName=", oms + "PABName=" + pse + 'ContentServer_' + cs_id[n] + '.PSE')
#write File
f = open(cspath[n],'w')
f.write(CS_License_Key_New)
f.write(Logfile_New)
f.write(Pse_New)
f.close()
n += 1
You're doing 3 different replaces on the same content. You should chain the replaces instead:
result = (csproperties
.replace(oms + "CSLicenseKey=", oms + "CSLicenseKey="+ keystore[n])
.replace(oms + "LogFile=",
oms + "LogFile=" + logs + 'ContentServer_' + cs_id[n] +'.log')
.replace(oms + "PABName=",
oms + "PABName=" + pse + 'ContentServer_' + cs_id[n] + '.PSE'))
...
f.write(result)
CS_License_Key_New = csproperties.replace(...)
Logfile_New = csproperties.replace(...)
Pse_New = csproperties.replace(...)
There are three different copies of content.
You are trying to replace content and save it to three different variables.
You should do it in once time.

Python - Readline skipping characters

I ran into a curious problem while parsing json objects in large text files, and the solution I found doesn't really make much sense. I was working with the following script. It copies bz2 files, unzips them, then parses each line as a json object.
import os, sys, json
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# USER INPUT
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
args = sys.argv
extractDir = outputDir = ""
if (len(args) >= 2):
extractDir = args[1]
else:
extractDir = raw_input('Directory to extract from: ')
if (len(args) >= 3):
outputDir = args[2]
else:
outputDir = raw_input('Directory to output to: ')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# RETRIEVE FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
tweetModel = [u'id', u'text', u'lang', u'created_at', u'retweeted', u'retweet_count', u'in_reply_to_user_id', u'coordinates', u'place', u'hashtags', u'in_reply_to_status_id']
filenames = next(os.walk(extractDir))[2]
for file in filenames:
if file[-4:] != ".bz2":
continue
os.system("cp " + extractDir + '/' + file + ' ' + outputDir)
os.system("bunzip2 " + outputDir + '/' + file)
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# PARSE DATA
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
input = open (outputDir + '/' + file[:-4], 'r')
output = open (outputDir + '/p_' + file[:-4], 'w+')
for line in input.readlines():
try:
tweet = json.loads(line)
for field in enumerate(tweetModel):
if tweet.has_key(field[1]) and tweet[field[1]] != None:
if field[0] != 0:
output.write('\t')
fieldData = tweet[field[1]]
if not isinstance(fieldData, unicode):
fieldData = unicode(str(fieldData), "utf-8")
output.write(fieldData.encode('utf8'))
else:
output.write('\t')
except ValueError as e:
print ("Parse Error: " + str(e))
print line
line = input.readline()
quit()
continue
print "Success! " + str(len(line))
input.flush()
output.write('\n')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# REMOVE OLD FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
os.system("rm " + outputDir + '/' + file[:-4])
While reading in certain lines in the for line in input.readlines(): loop, the lines would occasionally be truncated at inconsistent locations. Since the newline character was truncated as well, it would keep reading until it found the newline character at the end of the next json object. The result was an incomplete json object followed by a complete json object, all considered one line by the parser. I could not find the reason for this issue, but I did find that changing the loop to
filedata = input.read()
for line in filedata.splitlines():
worked. Does anyone know what is going on here?
After looking at the source code for file.readlines and string.splitlines I think I see whats up. Note: This is python 2.7 source code so if you're using another version... maybe this answer pertains maybe not.
readlines uses the function Py_UniversalNewlineFread to test for a newline splitlines uses a constant STRINGLIB_ISLINEBREAK that just tests for \n or \r. I would suspect Py_UniversalNewlineFread is picking up some character in the file stream as linebreak when its not really intended as a line break, could be from the encoding.. I don't know... but when you just dump all that same data to a string the splitlines checks it against \r and \n theres no match so splitlines moves on until the real line break is encountered and you get your intended line.

Python Replace String in File in With clause

I am trying to replace a string in a file.
Below code is simply modifying certain substrings within the bigger string from the file. Any ideas on how I can actually replace line with current_line in the filename?
from sys import *
import os
import re
import datetime
import fileinput
script, filename = argv
userhome = os.path.expanduser('~')
username = os.path.split(userhome)[-1]
print "\n"
print "User: " + username
today = datetime.date.today().strftime("%Y/%m/%d")
time = datetime.datetime.now().strftime("%H:%M:%S")
print "Date: " + str(today)
print "Current time: " + str(time)
print "Filename: %s\n" % filename
def replace_string():
found = False
with open(filename, 'r+') as f:
for line in f:
if re.search("CVS Header", line):
print line
####################################################################################
# Below logic: #
# if length of revision number is 4 characters (e.g. 1.15) then increment by 0.01 #
# else if it is 3 characters (e.g. 1.5) then increment by 0.1 #
####################################################################################
if len(line.split("$Revision: ")[1].split()[0]) == 4:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.01))
elif len(line.split("$Revision: ")[1].split()[0]) == 3:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.1))
###
###
newer_line = str.replace(new_line, line.split("$Author: ")[1].split()[0], username)
newest_line = str.replace(newer_line, line.split("$Date: ")[1].split()[0], today)
current_line = str.replace(newest_line, line.split("$Date: ")[1].split()[1], time)
print current_line
found = True
if not found:
print "No CVS Header exists in %s" % filename
if __name__ == "__main__":
replace_string()
I tried adding something like..
f.write(f.replace(line, current_line))
but this just clears all the contents out of the file and leaves it blank so obviously that is incorrect.
The fileinput provides a way to edit a file in place. If you use the inplace parameter the file is moved to a backup file and standard output is directed to the input file.
import fileinput
def clause(line):
return len(line) < 5
for line in fileinput.input('file.txt', inplace=1):
if clause(line):
print '+ ' + line[:-1]
fileinput.close()
Trying to apply this idea to your example, it could be something like this:
def replace_string():
found = False
for line in fileinput.input(filename, inplace=1): # <-
if re.search("CVS Header", line):
#print line
####################################################################################
# Below logic: #
# if length of revision number is 4 characters (e.g. 1.15) then increment by 0.01 #
# else if it is 3 characters (e.g. 1.5) then increment by 0.1 #
####################################################################################
if len(line.split("$Revision: ")[1].split()[0]) == 4:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.01))
elif len(line.split("$Revision: ")[1].split()[0]) == 3:
new_line = str.replace(line, line.split("$Revision: ")[1].split()[0], str(float(line.split("$Revision: ")[1].split()[0]) + 0.1))
###
###
newer_line = str.replace(new_line, line.split("$Author: ")[1].split()[0], username)
newest_line = str.replace(newer_line, line.split("$Date: ")[1].split()[0], today)
current_line = str.replace(newest_line, line.split("$Date: ")[1].split()[1], time)
print current_line[:-1] # <-
found = True
else:
print line[:-1] # <- keep original line otherwise
fileinput.close() # <-
if not found:
print "No CVS Header exists in %s" % filename
The solution proposed by user2040251 is the correct way, and the way used but all text editors I know. The reason is that in case of a major problem when writing the file, you keep the previous version unmodified until the new version is ready.
But of course if you want you can edit in place, if you accept the risk of completely losing the file in case of crash - it can be acceptable for a file under version control since you can always get previous commited version.
The principle is then a read before write, ensuring that you never write something that you have not still read.
At the simplest level, you load everything in memory with readlines, replace the line rewind the file the the correct position (or to the beginning) and write it back.
Edit : here is a simple implementation when all lines can be loaded in memory :
fd = open(filename, "r+")
lines = fd.readlines()
for i, line in enumerate(lines):
# test if line if the searched line
if found :
lines[i] = replacement_line
break
fd.seek(0)
fd.writelines()
It could be done even for a big file using readlines(16384) for example instead of readlines() to read by chunks of little more than 16K, and always reading one chunk before writing previous, but it is really much more complicated and anyway you should use a backup file when processing big files.
You can create another file and write the output to it. After that, you can just remove the original file and rename the new file.

How to Parse text file in python based on condition

I have a text file that I want to parse based on the condition that if I find the match phrase in the line then I have to jump to the next line to fetch the value{unfortunately that's how the reports logs are generated}. I have created _dict to check my key and fetch my values in the next line.
Lines = f1.readlines()
numlines = len(Lines)
f1.close()
f1 = open('Testlog.txt','r')
f2 =open('writetoFile','r+')
f3 =open('Results.txt','w')
new_line="Test Name SubTest passed failed status "
f3.write(new_line)
f3.write("\n")
while i < numlines:
line=f1.readline()
if "Test Name" in line:
f2.write(line)
i=i+1
line =f1.readline()
if "true" in line:
f2.write(line)
line = line.strip('\n ')
#print line
data = re.split(r"\s{2,}",line)
Test_Name=data[4]
SubTest=data[6]
passed=data[7]
failed=data[8]
status=data[9]
result = Test_Name + " " + SubTest + " " + passed + " " + failed + " " + status
print result
f3.write(result)
f3.write("\n")
i=i+1
I was wondering if there better way to do this
What is your method for parsing the line? Can you post sample code, that will help.
To answer your second question, you could make a Dictionary in which each key refers to a List, then you can use a for loop to iterate through each of the values (or whatever you'll need)
foo = { 1 : ['a','b','c'] }
for value in foo[1]:
print(value)
prints a b c

Categories