I'm using httplib2 to pull csv data directly from an internal website. The data is already in csv format, so I'm trying to save it directly to a file using a simple file.write.
If I run the script in Linux, this works file. If I run the script in Windows (which is what I'll eventually be doing), it inserts an extra line between each row. Inspecting the file in Notepad++ shows a carriage return after each record, followed by a line feed/carriage return on the empty line.
edit: code
resp, content = httplib2.Http().request(request_string)
filename="data.csv"
abs_path=os.path.join(abs_path,filename)
file=open(abs_path,"w")
file.write(content)
file.close()
Fixed it. Just replaced \n with a space before closing the file.
file.read().replace('\n',' ')
Related
I am trying to read and write to a text file . The reading and writing parts works fine but the actual file does not update until after the program has executed...I understand that this is because the data is being stored in a buffer and is being uploaded after.so I came across this How come a file doesn't get written until I stop the program?
and tried the .flush, os.sync, etc: this did not have an affect though..maybe I'm not seeing something.
Note that the .flush does not work in the Postdata sub ... I think it's because of the way that subroutine is coded.
Read does as it is expected.
Post takes an index and a line index and edits that specific text at that position.
def Getdata(Index,lineindex):#indexed so can say get data at index 3 and it will return it
Datafile = open("Trade data/3rd file", "a+")
Linetoget = linecache.getline('Trade data/Databaseforbot', lineindex).split("|")
Traddetail = Linetoget[Index]
print(Traddetail)
return Traddetail
def Postdata(index,lineindex,data):#will work fine the first time, but run it as PostdataV1(3,2) it will convert
Getdata(3,2)
with fileinput.FileInput('Trade data/Databaseforbot', inplace=True, backup='.bak') as file:
entireline= linecache.getline('Trade data/Databaseforbot', lineindex)
splitted = entireline.split("|")
Traddetail = splitted[index]
Newline = entireline.replace(Traddetail, str(index+1)+"*"+data)
for line in file:
print(line.replace(entireline, Newline), end='')
#os.fsync(file)
file.close()
Getdata(3, 2)
Postdata(3,2,"QW")
Getdata(3, 2)
The data file stores this data:
1|https://app.libertex.com/products/stock/BA/|3*45#4|4*0|5*0|6*0|7*0|8*0|9*Up|CDwindow-5C5C0883A51583A013B50FDC5A1798B7
2|https://app.libertex.com/products/energetics/NG/|3*56#5|4*0|5*0|6*0|7*0|8*0|9*Up|CDwindow-5C5C0883A51583A013B50FDC5A1798B7
3|https://app.libertex.com/products/metal/XAUUSD/|3*45#4|4*0|5*0|6*0|7*0|8*0|9*Up|CDwindow-5C5C0883A51583A013
Is there a way to live update the file so I can call other parts of the code to read the data from the file...I will be using something like getch to run other stuff...I don't mind if I have to pause postng data while reading... I tried doing a second file that the data eg: filex which is read from in Getdata() and the post data first writes to filey then copy everything to filex, but that did not work either.
Also there wil be around maybe 10-50 lines in the text file, if that helps.
I am working on a project that requires me to read a file with a .dif extension. Dif stands for data information exchange. The file opens nicely in Open Office Calc. Then you can easily save as a csv file, however when I open in Python all I get are random characters that don't make sense. Here is the last code that I tried just to see if I could read.
txt = open('C:\myfile.dif', 'rb').read()
print txt
I would even be open to programatically converting the file to csv first. before opening if someone knows how to do that. As always, any help is much appreciated. Below is a partial screenshot of what I get when I run the code.
Hadn't heard of this file format. Went and got a sample here.
I tested your method and it works fine:
>>> content = open(r"E:\sample.dif", 'rb').read()
>>> print (content)
b'TABLE\r\n0,1\r\n"EXCEL"\r\nVECTORS\r\n0,8\r\n""\r\nTUPLES\r\n0,3\r\n""\r\nDATA\r\n0,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"Welcome to File Extension FYI Center!"\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n""\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"ID"\r\n1,0\r\n"Type"\r\n1,0\r\n"Description"\r\n-1,0\r\nBOT\r\n0,1\r\nV\r\n1,0\r\n"ASP"\r\n1,0\r\n"Active Server Pages"\r\n-1,0\r\nBOT\r\n0,2\r\nV\r\n1,0\r\n"JSP"\r\n1,0\r\n"JavaServer Pages"\r\n-1,0\r\nBOT\r\n0,3\r\nV\r\n1,0\r\n"PNG"\r\n1,0\r\n"Portable Network Graphics"\r\n-1,0\r\nBOT\r\n0,4\r\nV\r\n1,0\r\n"GIF"\r\n1,0\r\n"Graphics Interchange Format"\r\n-1,0\r\nBOT\r\n0,5\r\nV\r\n1,0\r\n"WMV"\r\n1,0\r\n"Windows Media Video"\r\n-1,0\r\nEOD\r\n'
>>>
The question is what is in the file and how do you want to handle it. Personally I liked:
with open(r"E:\sample.dif", 'rb') as f:
for line in f:
print (line)
In the first code block, that long line that has a b'' (for bytes!) in front of it can be iterated on \r\n:
b'TABLE\r\n'
b'0,1\r\n'
b'"EXCEL"\r\n'
b'VECTORS\r\n'
b'0,8\r\n'
b'""\r\n'
b'TUPLES\r\n'
b'0,3\r\n'
b'""\r\n'
b'DATA\r\n'
b'0,0\r\n'
.
.
.
b'"Windows Media Video"\r\n'
b'-1,0\r\n'
b'EOD\r\n'
I use multiple python scripts that collect data and write it into one single json data file.
It is not possible to combine the scripts.
The writing process is fast and it happens often that errors occur (e.g. some chars at the end duplicate), which is fatal, especially since I am using json format.
Is there a way to prevent a python script to write into a file if there are other script currently trying to write into the file? (It would be absolutely ok, if the data that the python script tries to write into the file gets lost, but it is important that the file syntax does not get somehow 'injured'.)
Code Snipped:
This opens the file and retrieves the data:
data = json.loads(open("data.json").read())
This appends a new dictionary:
data.append(new_dict)
And the old file is overwritten:
open("data.json","w").write( json.dumps(data) )
Info: data is a list which contains dicts.
Operating System: The hole process takes place on linux server.
On Windows, you could try to create the file, and bail out if an exception occurs (because file is locked by another script). But on Linux, your approach is bound to fail.
Instead, I would
write one file per new dictionary, suffixing filename by process ID and a counter
consuming process(es) don't read a single file, but the sorted files (according to modification time) and build the data from it
So in each script:
filename = "data_{}_{}.json".format(os.getpid(),counter)
counter+=1
open(filename ,"w").write( json.dumps(new_dict) )
and in the consumers (reading each dict of sorted files in a protected loop):
files = sorted(glob.glob("*.json"),key=os.path.getmtime())
data = []
for f in files:
try:
with open(f) as fh:
data.append(json.load(fh))
except Exception:
# IO error, malformed json file: ignore
pass
I will post my own solution, since it works for me:
Every single python script checks (before opening and writing the data file) whether a file called data_check exists. If so, the pyhthon script does not try to read and write the file and dismisses the data, that was supposed to be written into the file. If not, the python script creates the file data_check and then starts to read and wirte the file. After the writing process is done the file data_check is removed.
I am currently trying to read a txt file from a website.
My script so far is:
webFile = urllib.urlopen(currURL)
This way, I can work with the file. However, when I try to store the file (in webFile), I only get a link to the socket. Another solution I tried was to use read()
webFile = urllib.urlopen(currURL).read()
However this seems to remove the formating (\n, \t etc) are removed.
If I open the file like this:
webFile = urllib.urlopen(currURL)
I can read it line by line:
for line in webFile:
print line
This will should result in:
"this"
"is"
"a"
"textfile"
But I get:
't'
'h'
'i'
...
I wish to get the file on my computer, but maintain the format at the same time.
You should use readlines() to read entire line:
response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
.
.
But, i strongly recommend you to use requests library.
Link here http://docs.python-requests.org/en/latest/
This is because you iterate over a string. And that will result in character for character printing.
Why not save the whole file at once?
import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()
f = open('destination.txt', 'w+')
f.write(txt)
f.close()
If you really want to loop over the file line for line use txt = webf.readlines() and iterate over that.
If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net
Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:
# Assign the open file to a variable
webFile = urllib.urlopen(currURL)
# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)
> This will be the file contents
# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)
If neither applies, please update the question to clarify.
You can directly download the file and save it using a name that you prefer. After that, you can read the file and later you can delete it if you don't need the file anymore.
!pip install wget
import wget
url = "https://raw.githubusercontent.com/apache/commons-validator/master/src/example/org/apache/commons/validator/example/ValidateExample.java"
wget.download(url, 'myFile.java')
I have an assignment for class that has me transfer txt data from excel and execute in python. But every time I run it, only hex is displayed. I was wondering how to have the data displayed in ascii in the shell. This is the code I have so far. Is it possible to print it out in ascii in the shell?
infile = open("data.txt", 'r')
listName = [line.rstrip() for line in infile]
print (listName)
infile.close()
The reason its not working is because you are opening an Excel file - which is in a special format and is not a plain text file.
You can test this by yourself by opening the file in a text editor like Notepad; and you'll see the contents aren't in text.
To open the file and read its contents in Python you will need to do one of these two things:
Open the file in Excel, then save it as a text file (or a comma separated file CSV). Keep in mind if you do this, then you can only save one sheet at a time.
Use a module like pyexcel which will allow you to read the Excel file correctly in Python.
Just opening the file as plain text (or changing its extension) doesn't convert it.