Delineating a Read File - python

Not really too sure how to word this question, therefore if you don't particularly understand it then I can try again.
I have a file called example.txt and I'd like to import this into my Python program. Here I will do some calculations with what it contains and other things that are irrelevant.
Instead of me importing this file, going through it line-by-line and extracting the information I want.. can Python do it instead? As in, if I structure the .txt correctly (whether it be key / value pairs seperated by an equals on each line), is there a current Python 'way' where it can handle it all and I work with that?

with open("example.txt") as f:
for line in f:
key, value = line.strip().split("=")
do_something(key,value)
looks like a starting point if I understand you correctly. You need Python 2.6 or 3.x for this.
Another place to look is the csv module that can parse comma-separated value files - and you can tell it to use = as a separator instead. This will abstract away some of the "manual work" in that previous example - but it seems your example doesn't especially need that kind of abstraction.
Another idea:
with open("example.txt") as f:
d = dict([line.strip().split("=") for line in f])
Now that's concise and pythonic :)

for line in open("file")
key, value = line.strip().split("=")
key=key.strip()
value=value.strip()
do_something(key,value)

There's also another method - you can create a valid python file (let it be a list, dict definition or whatever else), read its content using
f = open('file.txt', r)
content = f.read() #assuming file isn't too long
And then just parse it:
parsedContent = eval(content)
You can pass any environment to eval (see docs), so it might not have access to your globals and locals. This is evil and wrong, but in small program that won't be distributed and won't get 'file.txt' from network or from so called malicious user - you can use it.

Related

Python pointers

I was asked to write a program to find string "error" from a file and print matched lines in python.
Will first open a file with read more
i use fh.readlines and store it in a variable
After this, will use for loop and iterate line by line. check for the string "error".print those lines if found.
I was asked to use pointers in python since assigning file content to a variable consumes time when logfile contains huge output.
I did research on python pointers. But not found anything useful.
Could anyone help me out writing the above code using pointers instead of storing the whole content in a variable.
There are no pointers in python, although something like pointer can be implemented, but is not worth the efforts for your case.
As pointed out in the solution of this link,
Read large text files in Python, line by line without loading it in to memory .
You can use something like:
with open("log.txt") as infile:
for line in infile:
if "error" in line:
print(line.strip()) .
The context managers will close the file automatically and it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else.
You can use a dictionary by using key-pair value. Just dump the log file into dictionary wherein the key would be words and value would be the line number. So if you search for string "error" you will get the line numbers they are present it and accordingly you can print them. Since searching in dictionary or hashtable is in constant time O(1) it will take less time. But yes storing might take time depends if you avoid collision.
I used below code instead of putting the data in a variable and then for loop.
for line in open('c182573.log','r').readlines():
if ('Executing' in line):
print line
So there is no way that we can implement pointers or reference in python.
Thanks all
There are no pointers in python.
But something like pointer can be implemented, but for your case it's not required.
Try Below Code
with open('test.txt') as f:
content = f.readlines()
for i in content:
if "error" in i:
print(i.strip())
Even if you want to understand Python variables as pointers go to this link
http://scottlobdell.me/2013/08/understanding-python-variables-as-pointers/

Python - Handling an nth line hop with readlines()

I'm having a go at a fixing a broken lib that I want to use on Github.
I have locally "fixed" the problem. but I don't think its a very clean method...
I'm poking the WARC library by the internet archive, and spcifically the arc.py part (https://github.com/internetarchive/warc/blob/master/warc/arc.py).
Since the lib was written, the tools that make the ARC files have changed a bit, and as a result, the builtin parser fails, as its not expecting to see some metadata in the file.
My local fix looks like this:
if header.startswith("<arcmetadata"):
while not header.endswith("</arcmetadata>\n"):
header = self.fileobj.readline()
header = self.fileobj.readline()
header = self.fileobj.readline()
And I'm not sure that my calling of readlines() twice to strip of the next two empty lines (containing "/n" is the cleanest way of advancing through the fileobject.
Is this good python? or is there a better way?
The code looks like a copy/paste error. There is nothing wrong with using .readline(), just document what you are doing:
# skip metadata
if header.startswith("<arcmetadata"):
while not header.endswith("</arcmetadata>\n"):
header = self.fileobj.readline()
#NOTE: header ends with `"</arc..."` here i.e., it is not blank
# skip blank lines
while not header.strip():
header = self.fileobj.readline()
btw, if the file contains xml then use an xml parser to parse it. Don't do it by hand.
Although there's nothing inherently wrong with what you're doing, it might be more semantic to write:
next(self.fileobj, None)
without a variable assignment to signify that you are tossing the next line.
itertools may be of use here
from itertools import islice, dropwhile
if header.startswith("<arcmetadata"):
fileobj = dropwhile(lambda x: not x.endswith("</arcmetadata>\n"), fileobj)
fileobj = islice(fileobj, 2, None)

asking a person for a file to save in

What I'm trying to do is to ask a user for a name of a file to make and then save some stuff in this file.
My portion of the program looks like this:
if saving == 1:
ask=raw_input("Type the name file: ")
fileout=open(ask.csv,"w")
fileout.write(output)
I want the format to be .csv, I tried different options but can't seem to work.
The issue here is you need to pass open() a string. ask is a variable that contains a string, but we also want to append the other string ".csv" to it to make it a filename. In python + is the concatenation operator for strings, so ask+".csv" means the contents of ask, followed by .csv. What you currently have is looking for the csv attribute of the ask variable, which will throw an error.
with open(ask+".csv", "w") as file:
file.write(output)
You might also want to do a check first if the user has already typed the extension:
ask = ask if ask.endswith(".csv") else ask+".csv"
with open(ask, "w") as file:
file.write(output)
Note my use of the with statement when opening files. It's good practice as it's more readable and ensures the file is closed properly, even on exceptions.
I am also using the python ternary operator here to do a simple variable assignment based on a condition (setting ask to itself if it already ends in ".csv", otherwise concatenating it).
Also, this is presuming your output is already suitable for a CSV file, the extension alone won't make it CSV. When dealing with CSV data in general, you probably want to check out the csv module.
You need to use ask+'.csv' to concatenate the required extension on to the end of the user input.
However, simply naming the file with a .csv extension is not enough to make it a comma-separated file. You need to format the output. Use csvwriter to do that. The python documentation has some simple examples on how to do this.
I advise you not to attempt to generate the formatted comma-separated output yourself. That's a surprisingly hard task and utterly pointless in the presence of the built-in functionality.
Your variable ask is gonna be of type string after the raw_input.
So, if you want to append the extension .csv to it, you should do:
fileout = open(ask + ".csv", "w")
That should work.

How to copy a JSON file in another JSON file, with Python

I want to copy the contents of a JSON file in another JSON file, with Python
Any ideas ?
Thank you :)
Given the lack of research effort, I normally wouldn't answer, but given the poor suggestions in comments, I'll bite and give a better option.
Now, this largely depends on what you mean, do you wish to overwrite the contents of one file with another, or insert? The latter can be done like so:
with open("from.json", "r") as from, open("to.json", "r") as to:
to_insert = json.load(from)
destination = json.load(to)
destination.append(to_insert) #The exact nature of this line varies. See below.
with open("to.json", "w") as to:
json.dump(to, destination)
This uses python's json module, which allows us to do this very easily.
We open the two files for reading, then open the destination file again in writing mode to truncate it and write to it.
The marked line depends on the JSON data structure, here I am appending it to the root list element (which could not exist), but you may want to place it at a particular dict key, or somesuch.
In the case of replacing the contents, it becomes easier:
with open("from.json", "r") as from, open("to.json", "w") as to:
to.write(from.read())
Here we literally just read the data out of one file and write it into the other file.
Of course, you may wish to check the data is JSON, in which case, you can use the JSON methods as in the first solution, which will throw exceptions on invalid data.
Another, arguably better, solution to this could also be shutil's copy methods, which would avoid actually reading or writing the file contents manually.
Using the with statement gives us the benefit of automatically closing our files - even if exceptions occur. It's best to always use them where we can.
Note that in versions of Python before 2.7, multiple context managers are not handled by the with statement, instead you will need to nest them:
with open("from.json", "r") as from:
with open("to.json", "r+") as to:
...

what would be a quick way to read a property file in python?

I have a file with the format
VarName=Value
.
.
I want to read it into a hash such that H("VarName") will return the value.
What would be a quick way? (read a set of strings, split all of them where the equality sign is, and then put it into a hash?
I am working with python.
The oneliner answer:
H = dict(line.strip().split('=') for line in open('filename.txt'))
(optionally use .split() with maxsplit=1 if the values could also contain the "=" character)
Maybe ConfigParser can help you.
d = {}
with open('filename') as f:
for line in f:
key, value = line.split('=')
d[key] = value
Edit:
As suggested by foret, you could change it to
for line in f:
tokens = line.split('=')
d[tokens[0]] = '='.join(tokens[1:])
which would handle the case where equals signs were allowed in the value, but would still fail if the name could have equals signs as well -- for that you would need a true parser.
Taking #Steven's answer doesn't account comments and newlines in the properties file, this one does:
H = dict(line.strip().split('=') for line in open('file.properties') if not line.startswith('#') and not line.startswith('\n'))
Or ConfigObj
The csv module will let you do this easily enough:
import csv
H = dict([(row[0], row[1]) for row in csv.reader(open('the_file', 'r'), delimiter='=' )])
this may be a stupid answer but who know maybe it can help you :)
change the extension of your file to .py, and do necessary change like this:
file.py
VarName="Value" # if it's a string
VarName_2=1
# and you can also assign a dict a list to a var, how cool is that ?
and put it in your package tree or in sys.path, and now you can call it like this in the script when you want to use it:
>>> import file
>>> file.VarName
'Value'
why i'm writing this answer it's because ,what the hell is this file ? i never see a conf file like this , no section no nothing ? why you want to create a config file like this ? it look like a bad config file that should look like the Django settings, and i prefer using a django setting-like config file when ever i can.
Now you can put your -1 in the left :)
For python2 there is a jproperties https://pypi.python.org/pypi/jproperties/1.0.1
For python2/3 there is javaproperties http://javaproperties.readthedocs.io/en/v0.1.0/
as simple as:
import os, javaproperties
with open(file, 'rb') as f:
properties_dict = javaproperties.load(f)
OK nobody else in the answers has mentioned it, so I guess I'm going to. If you're writing Python, and have control over your interpreter, maybe you can force the use of the Jython interpreter.
Jython is a Python interpreter implemented entirely in Java. You have all the Python standard libraries at your fingertips, with the additional advantage of all your Java SE libraries available.
I haven't actually executed any of the following (think of it more like psudeo-code without exception handling), but you can mix and match Python and Java libraries, and your code might end up looking something like:
from java.util import Properties
from java.io import File, FileInputStream
import os
javasPropertyObject = Properties()
pathToPropFile = os.path.join('path', 'to', 'property', 'file.properties')
if os.path.isfile(pathToPropFile):
#this is java.io.File, not Python's file descriptor
propFile = File(pathToPropFile )
javasFileInputStreamObject = FileInputStream(propFile)
javasPropertyObject.load(javasFileInputStreamObject)
#now we can pull out Java properties as defined by the .property file grammar
myProp = javasPropertyObject.getProperty('myPropName')
where a file like this will be valid, that wouldn't in the simple split on '=' solutions:
myPropName1:value
myPropName2=value
myPropName3=\
value
#this is a = comment
myPropName4:my \
value
myPropNameWithUnicode=\u0009
The downside, is that you lose your ability to be portable among varying Python interpreters and now you're locked into Jython. You would be locked into a library if you attempt that approach as well. The reason why I like Jython is that your added flexibility with having all of the Java SE libraries available.
If you need to read all values from a section in properties file in a simple manner:
Your config.properties file layout :
[SECTION_NAME]
key1 = value1
key2 = value2
You code:
import configparser
config = configparser.RawConfigParser()
config.read('path_to_config.properties file')
details_dict = dict(config.items('SECTION_NAME'))
This will give you a dictionary where keys are same as in config file and their corresponding values.
details_dict becomes
{'key1':'value1', 'key2':'value2'}
Now to get key1's value :
value_1 = details_dict['key1']
Putting it all in a method which reads that section from config file only once(the first time the method is called during a program run).
def get_config_dict():
if not hasattr(get_config_dict, 'config_dict'):
get_config_dict.config_dict = dict(config.items('SECTION_NAME'))
return get_config_dict.config_dict
Now call the above function and get the required key's value :
config_details = get_config_dict()
key_1_value = config_details['key1']

Categories