Python pointers - python

I was asked to write a program to find string "error" from a file and print matched lines in python.
Will first open a file with read more
i use fh.readlines and store it in a variable
After this, will use for loop and iterate line by line. check for the string "error".print those lines if found.
I was asked to use pointers in python since assigning file content to a variable consumes time when logfile contains huge output.
I did research on python pointers. But not found anything useful.
Could anyone help me out writing the above code using pointers instead of storing the whole content in a variable.

There are no pointers in python, although something like pointer can be implemented, but is not worth the efforts for your case.
As pointed out in the solution of this link,
Read large text files in Python, line by line without loading it in to memory .
You can use something like:
with open("log.txt") as infile:
for line in infile:
if "error" in line:
print(line.strip()) .
The context managers will close the file automatically and it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else.

You can use a dictionary by using key-pair value. Just dump the log file into dictionary wherein the key would be words and value would be the line number. So if you search for string "error" you will get the line numbers they are present it and accordingly you can print them. Since searching in dictionary or hashtable is in constant time O(1) it will take less time. But yes storing might take time depends if you avoid collision.

I used below code instead of putting the data in a variable and then for loop.
for line in open('c182573.log','r').readlines():
if ('Executing' in line):
print line
So there is no way that we can implement pointers or reference in python.
Thanks all

There are no pointers in python.
But something like pointer can be implemented, but for your case it's not required.
Try Below Code
with open('test.txt') as f:
content = f.readlines()
for i in content:
if "error" in i:
print(i.strip())
Even if you want to understand Python variables as pointers go to this link
http://scottlobdell.me/2013/08/understanding-python-variables-as-pointers/

Related

Python ValueError

So I keep receiving this error:
ValueError: Mixing iteration and read methods would lose data
And 1) I don't quite understand why I'm receiving it, and 2) people with similar problems seem to be doing things with their code which are much more complex than a beginner (like myself) can adapt with.
The idea of my code is to read a data_file.txt and convert each line into its own individual array.
so far I have this:
array = [] #declaring a list with name '**array**'
with open('file.txt','r') as input_file:
for line in input_file:
line = input_file.readlines()
array.append(line)
print('done 1') #for test purposes
return array
And I keep recieving an error.
"Value error: Mixing iteration and read methods would lose data "message while extracting numbers from a string from a .txt file using python
The above question seemed to be doing something similar, calling in items for an array, however his code was skipping lines and using a range to call in certain parts of it, I don't need that. All I need is to call in all the lines and have them be made into an array.
Python: Mixing files and loops
In this question, once again, something much more than I can understand was being asked. From what I understood, he just wanted a code that would restart after an error and continue, and the answers were about that part. Once again not what I'm looking for.
The error is pretty much self-explanatory (once you know what it is about), so here goes.
You start with the loop for line in input_file:. File objects are iterable in Python. They iterate over the lines in the file. This means that for each iteration of the loop, line will contain the next line in your file.
Next you read a line manually line = input_file.readlines(). This attempts to read a line from the file, but you are already iterating over the lines in the for loop.
Files are usually read sequentially, with no going backwards. What you end up with is a conflict. If you read a line using readline, the iterator in the loop will be forced to return the line after next since it can not go back. However, it is promising to return the next line. The error is telling you that readline knows that there is an active iterator and that calling it would interfere with the loop.
If you take out line = input_file.readlines(), the loop will do what you expect it to.
To make an array of the lines of the file, with one line per array element:
with open('file.txt','r') as input_file:
array = input_file.readlines()
return array
since readlines will give you the whole file in one shot. Alternatively,
return list(open('file.txt','r'))
will do the same per the docs.

How to load a big text file efficiently in python

I have a text file containing 7000 lines of strings. I got to search for a specific string based upon few params.
Some are saying that the below code wouldn't be efficient (speed and memory usage).
f = open("file.txt")
data = f.read().split() # strings as list
First of all, if don't even make it as a list, how would I even start searching at all?
Is it efficient to load the entire file? If not, how to do it?
To filter anything, we need to search for that we need to read it right!
A bit confused
iterate over each line of the file, without storing it. This will make for program memory Efficient.
with open(filname) as f:
for line in f:
if "search_term" in line:
break

How to create a dictionary from file?

I want to create a dictionary with values from a file.
The problem is that it would have to be read line by line to be added to the dictionary because I don't think I have enough memory to load in all the information to be appended to the dictionary.
The key can be default but the value will be one selected from each line in the file. The file is not csv but I always split the lines so I can be able to select a value from it.
import sys
def prod_check(dirname):
dict1 = {}
k = 0
with open('select_sha_sub_hashes.out') as inf:
for line in inf:
pline = line.split('|')
value = pline[3]
dict1[line] = dict1[k]
k += 1
print dict1
if __name__ =="__main__":
dirname=sys.argv[1]
prod_check(dirname)
This is the code I am working with, and the variable I have set as value is the index in the line from the file which I am pulling data from. I seem to be coming to a problem when I try and call the dictionary to print the values, but I think it may be a problem in my syntax or maybe an assignment I made. I want the values to be added to the keys, but the keys to remain as regular numbers like 0-100
If you don't have enough memory to store the entire dictionary in RAM at once, try anydbm, bsddb and/or gdbm. These are dictionary-like objects that keep key-value pairs on disk in a single-table, keystring-valuestring database.
Optionally, consider:
http://stromberg.dnsalias.org/~strombrg/cachedb.html
...which will allow you to transparently convert between serialized and not-serialized representations pretty transparently.
Have a look at something like "Tokyo Cabinet" # http://fallabs.com/tokyocabinet/ which has Python bindings and is fairly efficient. There's also Kyoto cabinet but the licensing on that is a little restrictive.
Also check out this previous S/O post: Reliable and efficient key--value database for Linux?
So it sounds as if the main problem is reading the file line-by-line. To read a file line-by-line you can do this:
with open('data.txt') as inf:
for line in inf:
# do your rest of processing
The advantage of using with is that the file is closed for you automagically when you are done or an exception occurs.
--
Note, the original post didn't contain any code, it now seems to have incorporated a copy of this code to help further explain the problem.

it's possible to determine how many lines exist in file without per line iteration? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to get line count cheaply in Python?
Good day. i have some code below, which implements per line file reading and counter iteration.
def __set_quantity_filled_lines_in_file(self):
count = 0
with open(self.filename, 'r') as f:
for line in f:
count += 1
return count
My question is, are there methods to determine how many lines of text data in current file without per line iteration?
Thanks!
In general it's not possible to do better than reading every character in the file and counting newline characters.
It may be possible if you know details about the internal structure of the file. For example, if the file is 1024kB long, and every line is 1kB in length, then you can deduce there are 1024 lines in the file.
I'm not sure if Python has that function or not, highly doubt it, but it would essentially require reading the whole file. A newline is signified by the \n character (actually system dependent) so there is no way to know how many of those exist in a file without going through the whole file.
You could use the readlines() file method and this is probably the easiest.
If you want to be different, you could use the read() member function to get the entire file and count CR, LF,CRLR LFCR character combinations using collections.Counter class. However, you will have to deal with the various ways of terminating lines.
Something like:
import collections
f=open("myfile","rb")
d=f.read()
f.close()
c=collections.Counter(d)
lines1=c['\r\n']
lines2=c['\n\r']
lines3=c['\r']-lines1-lines2
lines4=c['\n']-lines1-lines2
nlines=lines3+lines4
No, such information can only be retrieved by iterating over the whole file's content (or reading the whole file into memory. But unless you know for sure that the files will always be small better don't even think about doing this).
Even if you do not loop over the file contents, the functions you call do. For example, len(f.readlines()) will read the whole file into a list just to count the number of elements. That's horribly inefficient since you don't need to store the file contents at all.
This gives the answer, but reads the whole file and stores the lines in a list
len(f.readlines())

Delineating a Read File

Not really too sure how to word this question, therefore if you don't particularly understand it then I can try again.
I have a file called example.txt and I'd like to import this into my Python program. Here I will do some calculations with what it contains and other things that are irrelevant.
Instead of me importing this file, going through it line-by-line and extracting the information I want.. can Python do it instead? As in, if I structure the .txt correctly (whether it be key / value pairs seperated by an equals on each line), is there a current Python 'way' where it can handle it all and I work with that?
with open("example.txt") as f:
for line in f:
key, value = line.strip().split("=")
do_something(key,value)
looks like a starting point if I understand you correctly. You need Python 2.6 or 3.x for this.
Another place to look is the csv module that can parse comma-separated value files - and you can tell it to use = as a separator instead. This will abstract away some of the "manual work" in that previous example - but it seems your example doesn't especially need that kind of abstraction.
Another idea:
with open("example.txt") as f:
d = dict([line.strip().split("=") for line in f])
Now that's concise and pythonic :)
for line in open("file")
key, value = line.strip().split("=")
key=key.strip()
value=value.strip()
do_something(key,value)
There's also another method - you can create a valid python file (let it be a list, dict definition or whatever else), read its content using
f = open('file.txt', r)
content = f.read() #assuming file isn't too long
And then just parse it:
parsedContent = eval(content)
You can pass any environment to eval (see docs), so it might not have access to your globals and locals. This is evil and wrong, but in small program that won't be distributed and won't get 'file.txt' from network or from so called malicious user - you can use it.

Categories