Parsing any file entered in command line using python - python

Problem statement is to read any xml file (format of the xml file will remain same only the content will differ) entered by the user from command line which contains number of test cases, and I need to parse it, generate another xml as a output.
Currently I am using minidom:
document = parse(sys.argv[1])
Which can read only one specific file.
I got stuck with only this part rest all is working fine.
I need to submit it as soon as possible.

sys.argv[1] means take the second argument, so if your command is python foo.py abc.xml def.xml, argv[1] is 'abc.xml'. You need to grab all the files:
for f in sys.argv[1:]:
# do something for f

Related

Read and extract information from 3 files (python)

I'am designing a code in python for extract information from a xml file with a function with two variables. The code is working with one file:
import re
def Readfiles(XFile):
Id=''
des=''
with open(XFile,"r",encoding="utf-8") as h:
for line in h:
wline = line.rstrip("\n")
if re.search("^ID\s{3}",wline):
res=re.search(r"^ID\s{3}",wline)
Id=res.group(1)
if re.search("^DE\s{3}",wline):
res=re.search("^DE\s{3}",wline)
des=res.group(1)
return(Id,des)
(Identificator,desc)=Readfiles("rte.xml", "pre.xml", "ytl.xml")
print("Nom:",Identificator)
print("Descrip:",desc)
On the other hand, I want to read more files (tree xml in the code) in a same time but it give me error.
Thank for your help.
for f in ("rte.xml", "pre.xml", "ytl.xml"):
(Identificator,desc)=Readfiles(f)
The error is that Readfiles is called with three arguments but it has only one parameter.

How to prevent multi python scripts to overwrite same file?

I use multiple python scripts that collect data and write it into one single json data file.
It is not possible to combine the scripts.
The writing process is fast and it happens often that errors occur (e.g. some chars at the end duplicate), which is fatal, especially since I am using json format.
Is there a way to prevent a python script to write into a file if there are other script currently trying to write into the file? (It would be absolutely ok, if the data that the python script tries to write into the file gets lost, but it is important that the file syntax does not get somehow 'injured'.)
Code Snipped:
This opens the file and retrieves the data:
data = json.loads(open("data.json").read())
This appends a new dictionary:
data.append(new_dict)
And the old file is overwritten:
open("data.json","w").write( json.dumps(data) )
Info: data is a list which contains dicts.
Operating System: The hole process takes place on linux server.
On Windows, you could try to create the file, and bail out if an exception occurs (because file is locked by another script). But on Linux, your approach is bound to fail.
Instead, I would
write one file per new dictionary, suffixing filename by process ID and a counter
consuming process(es) don't read a single file, but the sorted files (according to modification time) and build the data from it
So in each script:
filename = "data_{}_{}.json".format(os.getpid(),counter)
counter+=1
open(filename ,"w").write( json.dumps(new_dict) )
and in the consumers (reading each dict of sorted files in a protected loop):
files = sorted(glob.glob("*.json"),key=os.path.getmtime())
data = []
for f in files:
try:
with open(f) as fh:
data.append(json.load(fh))
except Exception:
# IO error, malformed json file: ignore
pass
I will post my own solution, since it works for me:
Every single python script checks (before opening and writing the data file) whether a file called data_check exists. If so, the pyhthon script does not try to read and write the file and dismisses the data, that was supposed to be written into the file. If not, the python script creates the file data_check and then starts to read and wirte the file. After the writing process is done the file data_check is removed.

Python saving csv data with extra line break

I'm using httplib2 to pull csv data directly from an internal website. The data is already in csv format, so I'm trying to save it directly to a file using a simple file.write.
If I run the script in Linux, this works file. If I run the script in Windows (which is what I'll eventually be doing), it inserts an extra line between each row. Inspecting the file in Notepad++ shows a carriage return after each record, followed by a line feed/carriage return on the empty line.
edit: code
resp, content = httplib2.Http().request(request_string)
filename="data.csv"
abs_path=os.path.join(abs_path,filename)
file=open(abs_path,"w")
file.write(content)
file.close()
Fixed it. Just replaced \n with a space before closing the file.
file.read().replace('\n',' ')

How to make Python Configuration File from user input, and insert runs?

I've got a bunch of Python webdriver runs that I've converted from Selenium IDE. These runs have 1 or 2 settings that I would like to be able to change using a configuration file that would be created by running a script that collects user input that would set these 2 variables.
Here is my attempt at using the ConfigParser module:
import ConfigParser
file_path_input = raw_input("Enter path to 'webdriver' directory ex: '/home/user/': ")
print "you entered", file_path_input
url_input = raw_input("Enter url that the application will point to, ex: 172.31.13.56 or vtm55.example.com: ")
print "you entered", url_input
def createConfig(file_path_input):
"""
Create a config file
"""
config = ConfigParser.ConfigParser()
config.add_section("application_settings")
config.set("application_settings", "file_path_to_use", "file_path_input")
config.set("application_settings", "url_to_use", "url_input")
config.set("application_settings", "settings_info",
"Your application directory is in %(file_path_to_use)s and your application url is %(url_to_use)s")
with open(file_path_input, "wb") as config_file:
config.write(config_file)
The raw_input() and print() lines work, but the configuration file doesn't appear to be generated at all. Once the file has been created, I'd like to be able to insert the variables file_path_to_use and url_to_use in my various Python webdriver runs.
Your indentation is problematic, particularly in the createConfig function.
"file_path_input" is different from file_path_input.
Use 'w' (a normal write) instead of 'wb' (write bytes).
You have config and config_file backwards - call write on config_file, then pass it the content you want to write. If you have a list of strings, simply loop through them and write each one with basic file I/O:
configs = ['file_path_to_use:' + file_path_input,
'url_to_use:' + url_input]
with open(file_path_input, 'w') as config_file:
for line in configs:
config_file.write(line + '\n')
Sample result:
file_path_to_use:/home/user/
url_input:vtm.example.com
I didn't include the "settings_info" portion because it's just a recap.
You can then split or partition on ':' when reading the file back in.
I would recommend writing your output as text, not using ConfigParser. I've had odd issues with using to create config files in the past (though better luck using it to read them). If you convert to write just a text file with standard file-i/o, you have more specificity in how it's written, while still writing it in a format that ConfigParser can later read in.
You could scrap using ConfigParser and instead create a .py file with the variables saved within in the format KEY=VALUE in the same way you are creating a config file.
Then within your code, write your own function that opens the .py file (using with syntax, so that it automatically closes the file without .close()) and reads it, saving it as a string.
You can then use string manipulation to get the string you want and assign it to a new variable (remember to convert it to the data type you want to use, i.e. int() for an integer). Use type() to check the type of the variable if you aren't sure.
It actually starts with lowercase configparser and configparser.ConfigParser().
Please update.

Reading command Line Args

I am running a script in python like this from the prompt:
python gp.py /home/cdn/test.in..........
Inside the script i need to take the path of the input file test.in and the script should read and print from the file content. This is the code which was working fine. But the file path is hard coded in script. Now I want to call the path as a command line argument.
Working Script
#!/usr/bin/python
import sys
inputfile='home/cdn/test.in'
f = open (inputfile,"r")
data = f.read()
print data
f.close()
Script Not Working
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
data = argv[1:].read()
print data
f.close()
What change do I need to make in this ?
While Brandon's answer is a useful solution, the reason your code is not working also deserves explanation.
In short, a list of strings is not a file object. In your first script, you open a file and operate on that object (which is a file object.). But writing ['foo','bar'].read() does not make any kind of sense -- lists aren't read()able, nor are strings -- 'foo'.read() is clearly nonsense. It would be similar to just writing inputfile.read() in your first script.
To make things explicit, here is an example of getting all of the content from all of the files specified on the commandline. This does not use fileinput, so you can see exactly what actually happens.
# iterate over the filenames passed on the commandline
for filename in sys.argv[1:]:
# open the file, assigning the file-object to the variable 'f'
with open(filename, 'r') as f:
# print the content of this file.
print f.read()
# Done.
Check out the fileinput module: it interprets command line arguments as filenames and hands you the resulting data in a single step!
http://docs.python.org/2/library/fileinput.html
For example:
import fileinput
for line in fileinput.input():
print line
In the script that isn't working for you, you are simply not opening the file before reading it. So change it to
#!/usr/bin/python
import sys
print "\n".join(sys.argv[1:])
f = open(argv[1:], "r")
data = f.read()
print data
f.close()
Also, f.close() this would error out because f has not been defined. The above changes take care of it though.
BTW, you should use at least 3 chars long variable names according to the coding standards.

Categories