So, I need to review some pages, and made a janky queue for efficiency. I have a CSV that needs to be opened to be read, and one to be written to. For each page I open from the read CSV, I call input(), and write some notes, so that they can be saved to the csv to be written to. Code below.
with open("readfile.csv") as r:
csv_file = csv.DictReader(r)
with open("writefile.csv", 'w') as w:
headers = {'URL': None, 'JUDGEMENT': None}
writer = csv.DictWriter(w, fieldnames=headers)
writer.writeheader()
for row in csv_file:
url = row.get("Profile URL")
browser.get(url) //Selenium opening URL
judgement = input("What say you?")
writer.writerow({"Profile URL": url, "JUDGEMENT": judgement})
This works just fine when I do the entire CSV, but sometimes, I only want to do half. When I do CTRL+Z to escape the script, none of the write file saves. I tried adding an exception for the input like
try:
judgement = input("What say you?")
except Exception e:
//but can't find what to put here
That doesn't work, since I can't seem to find what to put here.
Maybe try w.close() in the exception handler - this should flush the buffer to the file, write the data, and then exit.
with open("readfile.csv") as r:
csv_file = csv.DictReader(r)
with open("writefile.csv", 'w') as w:
try:
headers = {'URL': None, 'JUDGEMENT': None}
writer = csv.DictWriter(w, fieldnames=headers)
writer.writeheader()
for row in csv_file:
url = row.get("Profile URL")
browser.get(url) //Selenium opening URL
judgement = input("What say you?")
writer.writerow({"Profile URL": url, "JUDGEMENT": judgement})
except KeyboardInterupt:
if not w.closed:
w.close() # Flushes buffer, and closes file
Alternatively, you could open the file for writing without a default buffer - 0 for unbuffered, 1 for line buffering (I suggest using 1):
with open("writefile.csv", 'w', buffering=1) as w
This post may help you understand further.
EDIT:
It seems as though both of these approaches are needed to solve this, opening with a line buffer, and catching the keyboard interrupt, rather than one of the two.
Related
What I need to do is to write some messages on a .txt file, close it and send it to a server. This happens in a infinite loop, so the code should look more or less like this:
from requests_toolbelt.multipart.encoder import MultipartEncoder
num = 0
while True:
num += 1
filename = f"example{num}.txt"
with open(filename, "w") as f:
f.write("Hello")
f.close()
mp_encoder = MultipartEncoder(
fields={
'file': ("file", open(filename, 'rb'), 'text/plain')
}
)
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
time.sleep(10)
The post works if the file is created manually inside my working directory, but if I try to create it and write on it through code, I receive this response message:
500 - Internal Server Error
System.IO.IOException: Unexpected end of Stream, the content may have already been read by another component.
I don't see the file appearing in the project window of PyCharm...I even used time.sleep(10) because at first, I thought it could be a time-related problem, but I didn't solve the problem. In fact, the file appears in my working directory only when I stop the code, so it seems the file is held by the program even after I explicitly called f.close(): I know the with function should take care of closing files, but it didn't look like that so I tried to add a close() to understand if that was the problem (spoiler: it was not)
I solved the problem by using another file
with open(filename, "r") as firstfile, open("new.txt", "a+") as secondfile:
secondfile.write(firstfile.read())
with open(filename, 'w'):
pass
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
if r.status_code == requests.codes.ok:
os.remove("new.txt")
else:
print("File not saved")
I make a copy of the file, empty the original file to save space and send the copy to the server (and then delete the copy). Looks like the problem was that the original file was held open by the Python logging module
Firstly, can you change open(f, 'rb') to open("example.txt", 'rb'). In open, you should be passing file name not a closed file pointer.
Also, you can use os.path.abspath to show the location to know where file is written.
import os
os.path.abspath('.')
Third point, when you are using with context manager to open a file, you don't close the file. The context manger supposed to do it.
with open("example.txt", "w") as f:
f.write("Hello")
Guys i'v got a little problem with my code. The code is supposed to check a list of numbers and group them in a text file provider based but doesn't work as expected. It only saved a single number in a file for each provider instead of multiple ones. This is my code , if anyone could help i'd be grateful.Sorry if my code is too traditional
def main():
dead = open('invalid_no.txt', 'a+')
print('-------------------------------------------------------')
print('-------------------------------------------------------')
list = input('Your Phone Numbers List : ')
base_url = "http://apilayer.net/api/validate"
params = {
'access_key': '3246123d1d67e385b1d9fa11d0e84959',
'number': '',
}
numero = open(list, 'r')
for num in numero:
num = num.strip()
if num:
lines = num.split(':')
params['number'] = lines[0]
response = requests.get(base_url, params=params)
print('status:', response.status_code)
print('-------------------------------------')
try:
resp = response.json()
print('number:', resp['valid'])
print('number:', resp['international_format'])
print('country:', resp['country_name'])
print('location:',resp['carrier'])
print('-------------------------------------')
mok = open(resp['carrier'],'w+')
if resp['carrier'] == mok.name:
mok.write(num +'\n')
except FileNotFoundError:
if resp['carrier'] == '':
print('skipping')
else:
mok = open(resp['carrier'],'w+')
if resp['carrier'] == mok.name:
mok.write(num)
else:
print('No')
if __name__ == '__main__': main()
Opening a file with mode "w" will erase the existing file and start with an empty new one. That is why you are getting only one number. Every time you write to the file, you overwrite whatever was there before. There is no mode "w+". I believe that ought to cause a ValueError: invalid mode: 'w+', but in fact it seems to do the same as "w". The fact that "r+" exists doesn't mean you can infer that there is also an undocumented "w+".
From the documentation for open():
The second argument is another string containing a few characters
describing the way in which the file will be used. mode can be 'r'
when the file will only be read, 'w' for only writing (an existing
file with the same name will be erased), and 'a' opens the file for
appending; any data written to the file is automatically added to the
end. 'r+' opens the file for both reading and writing. The mode
argument is optional; 'r' will be assumed if it’s omitted.
So, no "w+".
I think you want mode "a" for append. But if you do that, the first time your code tries to write to the file, it won't be there to append to, so you get the file not found error that you had a problem with.
Before writing to the file, check to see if it is there. If not, open it for writing, otherwise open it for appending.
if os.path.exists(resp['carrier']):
mok = open(resp['carrier'],'a')
else:
mok = open(resp['carrier'],'w')
or, if you have a taste for one-liners,
mok = open(resp['carrier'],'a' if os.path.exists(resp['carrier']) else 'w')
Also your code never calls close() on the file after it is finished writing to it. It should. Forgetting it can result in missing data or other baffling behaviour.
The best way not to forget it is to use a context manager:
with open(resp['carrier'],'a' if os.path.exists(resp['carrier']) else 'w') as mok:
# writes within the with-block here
# rest of program here
# after the with-block ends, the context manager closes the file for you.
I have a csv file, where I read urls line by line to make a request for each enpoint. Each request is parsed and data is written to the output.csv. This process is paralleled.
The issue is connected with written data. Some portions of data are partially missed, or totally missed (blank lines). I suppose that it is happening because of collision or conflicts between async processes. Can you please advice how to fix that.
def parse_data(url, line_num):
print line_num, url
r = requests.get(url)
htmltext = r.text.encode("utf-8")
pois = re.findall(re.compile('<pois>(.+?)</pois>'), htmltext)
for poi in pois:
write_data(poi)
def write_data(poi):
with open('output.csv', 'ab') as resfile:
writer = csv.writer(resfile)
writer.writerow([poi])
resfile.close()
def main():
pool = Pool(processes=4)
with open("input.csv", "rb") as f:
reader = csv.reader(f)
for line_num, line in enumerate(reader):
url = line[0]
pool.apply_async(parse_data, args=(url, line_num))
pool.close()
pool.join()
Try to add file locking:
import fcntl
def write_data(poi):
with open('output.csv', 'ab') as resfile:
writer = csv.writer(resfile)
fcntl.flock(resfile, fcntl.LOCK_EX)
writer.writerow([poi])
fcntl.flock(resfile, fcntl.LOCK_UN)
# Note that you dont have to close the file. The 'with' will take care of it
Concurrent writes to a same file is indeed a known cause of data loss / file corruption. The safe solution here is the "map / reduce" pattern - each process writes in it's own result file (map), then you concatenate those files together (reduce).
Hi I am facing I/O error while looping file execution. The code prompt 'ValueError: I/O operation on closed file.' while running. Does anyone have any idea while says operation on closed as I am opening new while looping? Many thanks
code below:
with open('inputlist.csv', 'r') as f: #input list reading
reader = csv.reader(f)
queries2Google = reader
print(queries2Google)
def QGN(query2Google):
s = '"'+query2Google+'"' #Keywords for query, to solve the + for space
s = s.replace(" ","+")
date = str(datetime.datetime.now().date()) #timestamp
filename =query2Google+"_"+date+"_"+'SearchNews.csv' #csv filename
f = open(filename,"wb") #open output file
pass
df = np.reshape(df,(-1,3))
itemnum,col=df.shape
itemnum=str(itemnum)
df1 = pd.DataFrame(df,columns=['Title','URL','Brief'])
print("Done! "+itemnum+" pieces found.")
df1.to_csv(filename, index=False,encoding='utf-8')
f.close()
return
for query2Google in queries2Google:
QGN(query2Google) #output should be multiple files
with closes the file that you are trying to read once it it is done. So you are opening file, making a csv reader, and then closing the underlying file and then trying to read from it. See more about file i/o here
Solution is to do all of your work on your queries2Google reader INSIDE the with statement:
with open('inputlist.csv', 'r') as f: #input list reading
reader = csv.reader(f)
for q2g in reader:
QGN(q2g)
Some additional stuff:
That pass isn't doing anything and you should probably be using with again inside the QGN function since the file is opened and closed in there. Python doesn't need empty returns. You also don't seem to even be using f in the QGN function.
I am browsing URL using txt file follow.txt and doing click on specific button in website.But the problem is that sometime I am getting error of unable to locate element and unable to click button.
I want that if that error come, it should read second line of txt file and ignore the error.I also tried the code to overcome the problem.But it is still not working properly.I think my code have some problem.How i can solve this problem.Here is my code that i used for error handling.
try:
f = open('follow.txt', 'r', encoding='UTF-8', errors='ignore')
line = f.readline()
while line:
line = f.readline()
browser.get(line)
browser.find_element_by_xpath(""".//*[#id='react-root']/section/main/article/header/div[2]/div[1]/span/button""").click()
time.sleep(50)
f.close();
except Exception as e:
f = open('follow.txt', 'r', encoding='UTF-8', errors='ignore')
line = f.readline()
while line:
line = f.readline()
browser.get(line)
browser.find_element_by_xpath(""".//*[#id='react-root']/section/main/article/header/div[2]/div[1]/span/button""").click()
time.sleep(20)
browser.find_element_by_tag_name("body").send_keys(Keys.ALT + Keys.NUMPAD2)
browser.switch_to_window(main_window)
time.sleep(10)
f.close();
In the way you have written the answer to a question like... "What happens when there is an error even in the second line?" would be scary. You definitely do NOT want to write as many nested try except blocks as the number of lines in the file.
So, you will need to have the try except on the statement where you would expect an error, which will allow you to use the opened file object without the necessity to reopen the file. Something similar to the following:
f = open('follow.txt', 'r', encoding='UTF-8', errors='ignore')
line = f.readline()
while line:
line = f.readline()
browser.get(line)
try:
browser.find_element_by_xpath(""".//*[#id='react-root']/section/main/article/header/div[2]/div[1]/span/button""").click()
except Exception as e:
print e # Or better log the error
time.sleep(50)
browser.find_element_by_tag_name("body").send_keys(Keys.ALT + Keys.NUMPAD2)
browser.switch_to_window(main_window)
time.sleep(10)
f.close();
This should let you to continue with the next line even though there is an error at the time of ".click()". Note that you do not want to close the file when you are not done with reading all that you want from file.
My intention of moving "try except" deep into the logic doesn't mean that you shouldn't use "try except" else where for example while opening file. The more better way is to use 'with' in which case you don't even need to worry about closing the file and handling exceptions while opening the file.
with open('follow.txt', 'r', encoding='UTF-8', errors='ignore') as f:
....