I am creating a simple file in python to reorganize some text data I grabbed from a website. I put the data in a .txt file and then want to use the "tail" command to get rid of the first 5 lines. I'm able to make this work for a simple filename shown below, but when I try to change the filename (to what I'd actually like it to be) I get an error. My code:
start = 2010
end = 2010
for i in range(start,end+1)
year = str(i)
...write data to a file called file...
teamname=open(file).readline() # want to use this in the new filename
teamfname=teamname.replace(" ","") #getting rid of spaces
file2 = "gotdata2_"+year+".txt"
os.system("tail -n +5 gotdata_"+year+".txt > "+file2)
The above code works as intended, creating file, then creating file2 that excludes the first 5 lines of file. However, when I change the name of file2 to be:
file2 = teamfname+"_"+year+".txt"
I get the error:
sh: line 1: _2010.txt: command not found
It's as if the end of my file2 statement is getting chopped off and the .txt part isn't being recognized. In this case, my code outputs a file but is missing the _2010.txt at the end. I've double checked that both year and teamfname are strings. I've also tried it with and without spaces in the teamfname string. I get the same error when I try to include a os.system mv statement that would rename the file to what I want it to be, so there must be something wrong with my understanding of how to specify the string here.
Does anyone have any ideas about what causes this? I haven't been able to find a solution, but I've found this problem difficult to search for.
Without knowing what your actual strings are, it's impossible to be sure what the problem is. However, it's almost certainly something to do with failing to properly quote and/or escape arguments for the command line.
My first guess would be that you have a newline in the middle of your filename, and the shell is truncating the command at the newline. But I wouldn't bet too heavily on that. If you actually printed out the repr of the pathname, I could tell you for sure. But why go through all this headache?
The solution to almost any problem with os.system is to not use os.system.
If you look at the docs, they even tell you this:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.
If you use subprocess instead of os.system, you can avoid the shell entirely. You can also pass arguments as a list instead of trying to figure out how to quote them and escape them properly. Which would completely avoid the exact problem you're having.
For example, if you do this:
file2 = "gotdata2_"+year+".txt"
with open(file2, 'wb') as f:
subprocess.check_call(['tail', '-n', '+5', "gotdata_"+year+".txt"], stdout=f)
Then, if you change that first line to this:
file2 = teamfname+"_"+year+".txt"
It will still work even if teamfname has a space or a quote or another special character in it.
That being said, I'm not sure why you want to use tail in the first place. You can skip the first 5 lines just as easily directly in Python.
Related
I am trying to read from a file which has contents like this:
#\5\5\5
...
#\5\5\10
This file content is then fed into subprocess module of python like this:
for lines in file.readlines():
print(lines)
cmd = ls
p = subprocess.run([cmd, lines])
The output turns into something like this:
CompletedProcess(args=['ls', "'#5\\5\\5'\n"], returncode=1)
I don't understand why the contents of the file is appended with a double quote and another backward slash is getting appended.
The real problem here isn't Python or the subprocess module. The problem the use of subprocess to invoke shell commands, and then trying to parse the results. In this case, it looks like the command is ls, and the plan appears to be to read some filesystem paths from a text file (each path on a separate line), and list the files at that location on the filesystem.
Using subprocess to invoke ls is really, really, really not the way to accomplish that in Python. This is basically an attempt to use Python like a shell script (this use of ls would still be problematic, but that's a different discussion).
If a shell script is the right tool for the job, then write a shell script. If you want to use Python, then use one of the API's that it provides for interacting with the OS and the filesystem. There is no need to bring in external programs to achieve this.
import os
with open("list_of_paths.txt", "r") as fin:
for line in fin.readlines():
w = os.listdir(line.strip())
print(w)
Note the use of .strip(), this is a string method that will remove invisible characters like spaces and newlines from the ends of the input.
The listdir method provided by the os module will return a list of the files in a directory. Other options are os.scandir, os.walk, and the pathlib module.
But please do not use subprocess. 95% of the time, when someone thinks "should I use Python's subprocess module for this?" the ansewr is "NO".
It is because \ with a relevant character or digit becomes something else other than the string. For example, \n is not just \ and n but it means next line. If you really want a \n, then you would add another backslash to it (\\n). Likewise \5 means something else. here is what I found when i ran \5:
and hence the \\ being added, if I am not wrong
This question already has an answer here:
Using subprocess.Popen (shell=True) with windows folders
(1 answer)
Closed 2 years ago.
The following code can open the test.txt file with notepad
subprocess.Popen(['start','test.txt'],shell=True)
But this won't open a file with a space in its name like 'the test.txt'.
I've tried using raw string(r'...'), double quote on cmd, '%20' to substitute the space, but none of that works.
Passing a list of arguments with shell=True happens to generally work on Windows, but is really never correct. Of course, start is a shell (cmd) command, so you need a shell here; but then, you also need to quote the string for the shell.
subproces.run('start "the test.txt"', shell=True)
Notice also that I switched to subprocess.run(); you generally want to avoid raw Popen whenever you can, as also suggested in the documentation.
It's much easier to rename the file with an underscore or dash etc.
I had a similar issue once with a client. I'm fairly certain we concluded by renaming by hand all of the necessary files to replace a space with an underscore.
Unless python have released a patch (or indeed, I was oblivious), I think it may be impossible.
Python reads a space and goes "okay, thats that command, what's next?" and then reads whatever follows the space, excluding certain things like string variables.
Take these lines
import os
os.system("start EXCEL.EXE my workbook.xlsb")
Python of course imports the os module here, then in line two it runs the os.system command with the parameters; start, EXCEL.EXE, my, workbook.xlsb
in fact it doesn't, once it reaches the filename "my" without an extension it quits and throws an error in the shell.
I'm working inside a system that has Jython2.5 but I need to be able to call some of Google's apis so I wrote an offline script that I wanted to call from my Jython environment and return to me small pieces of data. Like a JobID or a sheet URL or something from Google.
I've tried a number of things but I always get an error back from Windows, saying that it cannot find the file specified.
Path is done in two ways.
The first way using a string
stringPath = r"C:\GooglePipes\Scripts\filetobq.py C:\GooglePipes\Keys\DEV-BigQueryKey.json nofile C:\GooglePipes\BQ_Downtime\TESTFILE.CSV dataset1 table1"
And the second way, as a sequence (per the docs, using shell=false supply a sequence)
seqPath = [r"C:\GooglePipes\Scripts\filetobq.py",r"C:\GooglePipes\Keys\DEV-BigQueryKey.json","nofile",r"C:\GooglePipes\BQ_Downtime\TESTFILE.CSV","dataset1","table1"]
Called with
data, err = Popen(seqPath, shell=True, stderr=PIPE, stdout=PIPE).communicate()
#Read values back in
print data
print err
Replacing seqPath with stringPath to try it either way.
I've been at this all weekend, every time I run it I get from Windows
The system cannot find the path specified.
from the err print. I've been unable to debug much further than this. I'm not really sure what's happening. When I paste the stringPath variable directly into my computer's command window it executes.
I've also called subprocess.list2cmdline(seqPath) to see what it's outputting. It's giving me a ? in front of the string, but I haven't been able to figure out what that means. I can paste the rest of the string, starting after the question mark into the command window and it executes.
?C:\GooglePipes\Scripts\filetobq.py C:\GooglePipes...
I've tried a number of different combinations of true and false on shell, passing different args into Popen, double slashes, and I have no less than 30 tabs open from stack overflow and other help forums. I just have no idea what to do at this point and any help is appreciated.
Edit
The ? at the start of the sting is actually a NULL character when I did some additional logging. This seems to be the root of my problem. I can't figure out why it shows up, but it was present in my copy pastes. I started manually typing, and I got it working. When I feed the path with my Jython program it is present again.
Ultimately the error was the ?/NULL character.
I went back to the source value where the program was grabbing the path and it was present there. After I hand-re keyed it in, everything started working.
If you copy and paste what I put in the question, you can see the NULL character in the string if you run it through a string->ASCII converter.
>C:
>NULL 67 58
What a bunch of bullsh***.
I'm having a strange problem where some Python code that prints to a file is not inserting an EOF character. Basically, the Python script generates runscripts to later be submitted as jobs on a cluster. I essentially wrote the entire runscript between """'s, allowing for variables to be plugged in (to vary some parameters in my simulation). I write the runscripts using the
with open(file_name, 'w') as runscrpt:
runscrpt.write("""ENTIRE_FILE_CONTENTS_HERE""")
syntax. I can give the actual code if necessary but it's not much more than above. Despite the script running fine and generating all of my runsripts, whenever I submitted them nothing happened. It took me a long time to figure out why, but it's because they're missing an EOF character. I can fix it by, for example, opening one, adding some trailing whitespace or a blank line somewhere in vim, and resaving the file.
Why isn't Python inserting the EOF character, and is there a better way to fix this than manually making trivial edits to all the files with vim?
Sounds like you mean there is no EOL (not EOF!) at the end, because that's what diff will typically tell you. Just add a newline at the end of the write (make sure there is a newline before the final """ terminator, or write a separate newline explicitly).
with open(file_name, 'w') as runscript:
runscript.write("""ENTIRE_FILE_CONTENTS_HERE\n""")
(As a bonus, I added the missing vowel.)
I have a small problem with reading in my file. My code:
import csv as csv
import numpy
with open("train_data.csv","rb") as training:
csv_file_object = csv.reader(training)
header = csv_file_object.next()
data = []
for row in csv_file_object:
data.append(row)
data = numpy.array(data)
I get the error no such file "train_data.csv", so I know the problem lies with the location. But whenever I specify the pad like this: open("C:\Desktop...etc) it doesn't work either. What am I doing wrong?
If you give the full file path, your script should work. Since it is not, it must be that you have escape characters in your path. To fix this, use a raw-string to specify the file path:
# Put an 'r' at the start of the string to make it a raw-string.
with open(r"C:\path\to\file\train_data.csv","rb") as training:
Raw strings do not process escape characters.
Also, just a technical fact, not giving the full file path causes Python to look for the file in the directory that the script is launched from. If it is not there, an error is thrown.
When you use open() and Windows you need to deal with the backslashes properly.
Option 1.) Use the raw string, this will be the string prefixed with an r.
open(r'C:\Users\Me\Desktop\train_data.csv')
Option 2.) Escape the backslashes
open('C:\\Users\\Me\\Desktop\\train_data.csv')
Option 3.) Use forward slashes
open('C:/Users/Me/Desktop/train_data.csv')
As for finding the file you are using, if you just do open('train_data.csv') it is looking in the directory you are running the python script from. So, if you are running it from C:\Users\Me\Desktop\, your train_data.csv needs to be on the desktop as well.