Process is coming out of for loop for IJson Iteams - python

I am using python 3.9 and trying :
with open(file, 'r') as fl:
val = ijson.items(fl, '<my_key>.item', use_float=True)
for i in val:
print(i)
After some time print statement is not printing anything on jupyter console, but that jupyte cell still run for a very long time.
Is it like, even if I parse specific elements, ijson scan complete file from start to end?, if YES, how can i restrict this behaviour(if it is possible).
Note: Instead of content->print am writing content->into some file, I can see file contents are not changing after some time, but process keeps running.
I have tried all sorts of closing file operations etc. Nothing work so far.
Thanks in advance.

The only way to break out of the ijson iteration is to break it yourself (i.e., actually break from the for loop). This is because, as you suggest, ijson goes through reads input files fully. This in turn is because your path (<my_key>.item) could appear again in your file after the initial set of results you are seeing (keys are not required to be unique in JSON).

Related

Python: Json file become empty

Here is my code of accessing&editing the file:
def edit_default_settings(self, setting_type, value):
with open("cam_settings.json", "r") as f:
cam_settings = json.load(f)
cam_settings[setting_type] = value
with open("cam_settings.json", 'w') as f:
json.dump(cam_settings, f, indent=4)
I use It in a program that runs for several hours in a day, and once in a ~week I'm noticing, that cam_settings.json file becoming empty (literally empty, the file explorer shows 0 bytes), but can't imagine how that is possible
Would be glad to hear some comments on what could go wrong
I can't see any issues with the code itself, but there can be an issue with the execution environment. Are you running the code in a multi-threaded environment or running multiple instances of the same program at once?
This situation can arise if this code is executed parallelly and multiple threads/processes try to access the file at the same time. Try logging each time the function was executed and if the function was executed successfully. Try exception handlers and error logging.
If this is a problem, using buffers or singleton pattern can solve the issue.
As #Chels said, the file is truncated when it's opened with 'w'. That doesn't explain why it stays that way; I can only imagine that happening if your code crashed. Maybe you need to check logs for code crashes (or change how your code is run so that crash reasons get logged, if they aren't).
But there's a way to make this process safer in case of crashes. Write to a separate file and then replace the old file with the new file, only after the new file is fully written. You can use os.replace() for this. You could do this simply with a differently-named file:
with open(".cam_settings.json.tmp", 'w') as f:
json.dump(cam_settings, f, indent=4)
os.replace(".cam_settings.json.tmp", "cam_settings.json")
Or you could use a temporary file from the tempfile module.
When openning a file with the "w" parameter, everytime you will write to it, the content of the file will be erased. (You will actually replace what's written already).
Not sure if this is what you are looking for, but could be one of the reasons why "cam_settings.json" becomes empty after the call of open("cam_settings.json", 'w')!
In such a case, to append some text, use the "a" parameter, as:
open("cam_settings.json", 'a')

Python pointers

I was asked to write a program to find string "error" from a file and print matched lines in python.
Will first open a file with read more
i use fh.readlines and store it in a variable
After this, will use for loop and iterate line by line. check for the string "error".print those lines if found.
I was asked to use pointers in python since assigning file content to a variable consumes time when logfile contains huge output.
I did research on python pointers. But not found anything useful.
Could anyone help me out writing the above code using pointers instead of storing the whole content in a variable.
There are no pointers in python, although something like pointer can be implemented, but is not worth the efforts for your case.
As pointed out in the solution of this link,
Read large text files in Python, line by line without loading it in to memory .
You can use something like:
with open("log.txt") as infile:
for line in infile:
if "error" in line:
print(line.strip()) .
The context managers will close the file automatically and it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else.
You can use a dictionary by using key-pair value. Just dump the log file into dictionary wherein the key would be words and value would be the line number. So if you search for string "error" you will get the line numbers they are present it and accordingly you can print them. Since searching in dictionary or hashtable is in constant time O(1) it will take less time. But yes storing might take time depends if you avoid collision.
I used below code instead of putting the data in a variable and then for loop.
for line in open('c182573.log','r').readlines():
if ('Executing' in line):
print line
So there is no way that we can implement pointers or reference in python.
Thanks all
There are no pointers in python.
But something like pointer can be implemented, but for your case it's not required.
Try Below Code
with open('test.txt') as f:
content = f.readlines()
for i in content:
if "error" in i:
print(i.strip())
Even if you want to understand Python variables as pointers go to this link
http://scottlobdell.me/2013/08/understanding-python-variables-as-pointers/

Program copies specific lines from HTML file; certain lines break it with no visible pattern?

I have written a program in Python to take the text from specific elements in an HTML file ('input') and write it all into another document ('output'). The program works by searching for the particular HTML tag that precedes all elements of the desired type, and then writing the next line. This is the code, generalized:
input = open(filepath, 'r')
output = open(filepath2, 'w')
collect = 0
onstring = "string to be searched for"
for i in range(numberOfLines):
line = input.readline()
if onstring in line:
collect = 1
elif collect == 1:
output.write(line)
collect = 0
I doubt it is optimal, but it functions as intended except for one hangup: For every HTML file I try this on, between 5 and 15 of the last elements that should be copied get completely cut off. There is seemingly no pattern to what amount are cut off, so I was wondering if someone more experienced with Python could point out an apparent flaw.
If it helps, some things I have tested:
if I append two HTML files, the same amount of posts will be cut off as are cut off from the second one alone.
if I remove the last element that gets copied, more elements that were cut off after it are copied as normal, but usually some posts get cut off later, suggesting that the particular element that is copied is responsible for this issue. There is no discernible pattern of which ones 'break' the program, though.
Expanding on my comment from above. You have opened the file for writing, but each write operation does not go directly to disk. Instead, it is sent to a write buffer; when the buffer fills up, all of the write operation in the buffer are written to the physical disk. Closing the file forces whatever write operations are in the buffer to be written out.
Since your program exits without closing the file, you have writes left in the memory buffer that were never written to disk. Try using:
output.close()
I solved the problem by properly closing the files with output.close().
Credit goes to James for his helpful comment.
If not, you may have writes left in the memory buffer that were never
written to disk. Try using output.close().

Python, run commands in specific order

I'm writing a script that gets the most recently modified file from a unix directory.
I'm certain it works, but I have to create a unittest to prove it.
The problem is the setUp function. I want to be able to predict the order the files are created in.
self.filenames = ["test1.txt", "test2.txt", "test3.txt", "filename.txt", "test4"]
newest = ''
for fn in self.filenames:
if pattern.match(fn): newest = fn
with open(fn, "w") as f: f.write("some text")
The pattern is "test.*.txt" so it just matches the first three in the list. In multiple tests, newest sometimes returns 'test3.txt' and sometimes 'test1.txt'.
Use os.utime to explicitly set modified time on the files that you have created. That way your test will run faster.
I doubt that the filesystem you are using supports fractional seconds on file create time.
I suggest you insert a call to time.sleep(1) in your loop so that the filesystem actually has a different timestamp on each created file.
It could be due to syncing. Just because you call write() on files in a certain order, it doesn't mean the data will be updated by the OS in that order.
Try calling f.flush() followed by os.fsync() on your file object before going to the next file. Giving some time between calls (using sleep()) might help also

Reading from a file using pickle and for loop in python

I have a file in which I have dumped a huge number of lists.Now I want to load this file into memory and use the data inside it.I tried to load my file using the "load" method of "pickle", However, for some reason it just gives me the first item in the file. actually I noticed that it only load the my first list into memory and If I want to load my whole file(a number of lists) then I have to iterate over my file and use "pickle.load(filename)" in each of the iterations i take.
The problem is that I don't know how to actually implement it with a loop(for or while), because I don't know when I reach the end of my file.
an example would help me a lot.
thanks
How about this:
lists = []
infile = open('yourfilename.pickle', 'r')
while 1:
try:
lists.append(pickle.load(infile))
except (EOFError, UnpicklingError):
break
infile.close()

Categories