How to correctly read bitstreams in Python - python

Tell me how to write bitstreams correctly
Here is my code, it works, but it seems to me that I am not writing data to the file correctly.
fd = stream.open()
filename = 'file.ts'
while True:
data = fd.read(8192)
file_output = open(filename, "ab")
file_output.write(data)
fd is a data stream, it is replenished and, accordingly, I must read from it until it ends.

Related

Displaying the contents of the file

I'm having problems with displaying the contents of the file:
def NRecieve_CnD():
host = "localhost"
port = 8080
NServ = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
NServ.bind(("localhost",8080))
NServ.listen(5)
print("Ballot count is waiting for ballots")
conn, addr = NServ.accept()
File = "Ballot1.txt"
f = open(File, 'w+')
data = conn.recv(1024)
print('Ballots recieved initializing information decoding procedures')
while(data):
data1 = data.decode('utf8','strict')
f.write(data1)
data = conn.recv(1024)
print("Ballot saved, Now displaying vote... ")
files = open("Ballot1.txt", "r")
print(files.read())
when the program is run the area where the contents of the file are supposed to be shown is blank.
You're writing to a file, then opening the file again and without flushing your first write either explicitly, or by closing the file handle, you're reading a file. The result is you're reading the file before you finish writing it.
f.write(data1)
f.flush() # Make sure the data is written to disk
data = conn.recv(1024)
print("Ballot saved, Now displaying vote... ")
files = open("Ballot1.txt", "r")
Beyond that, it's probably best if you don't keep the file open for longer than necessary to avoid surprises like this:
with open(File, 'w+') as f:
f.write(data1)
data = conn.recv(1024)
print("Ballot saved, Now displaying vote... ")
with open(File, "r") as f:
print(f.read())

File content not reading without seek in python

In my case I am going to write some content to a file in bytearray format and tries to read the content that I have written . But here the problem is if I am not giving the seek function then the file content read is empty. What I understood is by default the reference point is at the beginning of the file which is similar to seek(0). Please help me to understand this problem. I will give you both scenarios as example here
Without seek command
filename = "my_file"
Arr = [0x1, 0x2]
file_handle = open(filename, "wb+")
binary_format = bytearray(Arr)
file_handle.write(binary_format)
#file_handle.seek(0) #Here commenting the seek(0) part
print("file_handle-",file_handle.read())
file_handle.close()
Output in the console
file_handle- b''
With seek command
filename = "my_file"
Arr = [0x1, 0x2]
file_handle = open(filename, "wb+")
binary_format = bytearray(Arr)
file_handle.write(binary_format)
file_handle.seek(0)
print("file_handle-",file_handle.read())
file_handle.close()
Output in the console is
file_handle- b'\x01\x02'
Is the seek(0) is mandatory here even if by default it points to the beginning of file ?

Read from file while it is being written to in Python?

I followed the solution proposed here
In order to test it, I used two programs, writer.py and reader.py respectively.
# writer.py
import time
with open('pipe.txt', 'w', encoding = 'utf-8') as f:
i = 0
while True:
f.write('{}'.format(i))
print('I wrote {}'.format(i))
time.sleep(3)
i += 1
# reader.py
import time, os
#Set the filename and open the file
filename = 'pipe.txt'
file = open(filename, 'r', encoding = 'utf-8')
#Find the size of the file and move to the end
st_results = os.stat(filename)
st_size = st_results[6]
file.seek(st_size)
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print(line)
But when I run:
> python writer.py
> python reader.py
the reader will print the lines after the writer has exited (when I kill the process)
Is there any other way around to read the contents the time they are being written ?
[EDIT]
The program that actually writes to the file is an .exe application and I don't have access to the source code.
You need to flush your writes/prints to files, or they'll default to being block-buffered (so you'd have to write several kilobytes before the user mode buffer would actually be sent to the OS for writing).
Simplest solution is to call .flush for after write calls:
f.write('{}'.format(i))
f.flush()
There are 2 different problems here:
OS and file system must allow concurrent accesses to a file. If you get no error it is the case, but on some systems it could be disallowed
The writer must flush its output to have it reach the disk so that the reader can find it. If you do not, the output stays in in memory buffer until those buffers are full which can require several kbytes
So writer should become:
# writer.py
import time
with open('pipe.txt', 'w', encoding = 'utf-8') as f:
i = 0
while True:
f.write('{}'.format(i))
f.flush()
print('I wrote {}'.format(i))
time.sleep(3)
i += 1

Converting cloud-init logs to json using a script

I am trying to convert the cloud-init logs to json, so that the filebeat can pick it up and send it to the Kibana. I want to do this by using a shell script or python script. Is there any script that converts such logs to json?
My python script is below
import json
import subprocess
filename = "/home/umesh/Downloads/scripts/cloud-init.log"
def convert_to_json_log(line):
""" convert each line to json format """
log = {}
log['msg'] = line
log['logger-name'] = 'cloud-init'
log['ServiceName'] = 'Contentprocessing'
return json.dumps(log)
def log_as_json(filename):
f = subprocess.Popen(['cat','-F',filename],
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
line = f.stdout.readline()
log = convert_to_json_log(line)
print log
with open("/home/umesh/Downloads/outputs/cloud-init-json.log", 'a') as new:
new.write(log + '\n')
log_as_json(filename)
The scripts returns a file with json format, but the msg filed returns empty string. I want to convert each line of the log as message string.
Firstly, try reading the raw log file using python inbuilt functions rather than running os commands using subprocess, because:
It will be more portable (work across OS'es)
Faster and less prone to errors
Re-writing your log_as_json function as follows worked for me:
inputfile = "cloud-init.log"
outputfile = "cloud-init-json.log"
def log_as_json(filename):
# Open cloud-init log file for reading
with open(inputfile, 'r') as log:
# Open the output file to append json entries
with open(outputfile, 'a') as jsonlog:
# Read line by line
for line in log.readlines():
# Convert to json and write to file
jsonlog.write(convert_to_json(line)+"\n")
After taking some time on preparing the customised script finally i made the below script. It might be helpful to many others.
import json
def convert_to_json_log(line):
""" convert each line to json format """
log = {}
log['msg'] = json.dumps(line)
log['logger-name'] = 'cloud-init'
log['serviceName'] = 'content-processing'
return json.dumps(log)
# Open the file with read only permit
f = open('/var/log/cloud-init.log', "r")
# use readlines to read all lines in the file
# The variable "lines" is a list containing all lines in the file
lines = f.readlines()
# close the file after reading the lines.
f.close()
jsonData = ''
for line in lines:
jsonLine = convert_to_json_log(line)
jsonData = jsonData + "\n" + jsonLine;
with open("/var/log/cloud-init/cloud-init-json.log", 'w') as new:
new.write(jsonData)

ThreadPoolExecutor behaving unexpectedly

I've got a directory of files such as:
input_0.data
input_1.data
and so forth. I want to parse these files with a function that has been shown to output 47 lines for input_0.data when run by itself. However, when I bring a ThreadPoolExecutor into the mix and actually run more than one thread, the output from input_0.data becomes huge, quickly exceeding the known good 47 lines.
The code I'm trying to use is as follows, with needless details cut fairly obviously:
def find_moves(param_list):
input_filename = param_list[0]
output_filename = param_list[1]
print(input_filename+" "+output_filename, flush=True)
input_file = open(input_filename, "r")
output_file = open(output_filename, "w")
for line in input_file:
if do_log_line(line):
log = format_log(line)
print(log, file=output_file, flush=True)
input_file.close()
output_file.close()
if len(argv) != 3:
print("Usage:\n\tmoves.py [input_dir] [output_dir]")
quit()
input_files = list()
for file in glob(path.join(argv[1], "input_*.data")):
input_files.append(file)
input_files = sorted(input_files)
with ThreadPoolExecutor(max_workers=8) as executor:
for file_number, input_filename in enumerate(input_files):
output_filename = "moves_"+str(file_number)+".csv"
output_filename = path.join(argv[2], output_filename)
executor.submit(find_moves, (input_filename, output_filename))
It's obvious I'm using this tool incorrectly, but it's not obvious to me where my mistake is. I'd appreciate some guidance in the matter.
It seems like the threads are writing to each other's files, even though they explicitly state they're working on the right file.

Categories