I have lots of live data coming from sensor. Currently, I stored the data in a csv file as following:
0 2 1 437 464 385 171 0:44:4 dog.jpg
1 1 3 452 254 444 525 0:56:2 cat.jpg
2 3 2 552 525 785 522 0:52:8 car.jpg
3 8 4 552 525 233 555 0:52:8 car.jpg
4 7 5 552 525 433 522 1:52:8 phone.jpg
5 9 3 552 525 555 522 1:52:8 car.jpg
6 6 6 444 392 111 232 1:43:4 dog.jpg
7 1 1 234 322 191 112 1:43:4 dog.jpg
.
.
.
.
Third column has numbers between 1 to 6. I want to read information of columns #4 and #5 for all the rows that have number 2 and 5 in the third columns. I also want to write them in another csv file line by line every 2 second, one line at the time.
I do so because I have another code which would go through the data and read the data from there. I was wondering how could I write the information for the lines that have 3 and 5 in their 3rd column? Please advise!
for example:
2 552 525
5 552 525
......
......
.....
.
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] == 2 & 5:
The third column has index 2 so you should be checking if row[2] is one of '2' or '5'. I have done this by defining the set select = {'2', '5'} and checking if row[2] in select.
I don't see what you are using header for but I assume you have more code that processes header somewhere. If you don't need header and just want to skip the first line, just do next(reader) without assigning it to header but I have kept header in my code under the assumption you use it later.
We can use time.sleep(2) from the time module to help us write a row every 2 seconds.
Below, "in.txt" is the csv file containing the sample input you provided and "out.txt" is the file we write to.
Code
import csv
import time
select = {'2', '5'}
with open("in.txt") as f_in, open("out.txt", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
header = next(reader)
for row in reader:
if row[2] in select:
print(f"Writing {row[2:5]} at {time.time()}")
writer.writerow(row[2:5])
# f_out.flush() may need to be run here
time.sleep(2)
Output
Writing ['2', '552', '525'] at 1650526118.9760585
Writing ['5', '552', '525'] at 1650526120.9763758
"out.txt"
2,552,525
5,552,525
Input
"in.txt"
0,2,1,437,464,385,171,0:44:4,dog.jpg
1,1,3,452,254,444,525,0:56:2,cat.jpg
2,3,2,552,525,785,522,0:52:8,car.jpg
3,8,4,552,525,233,555,0:52:8,car.jpg
4,7,5,552,525,433,522,1:52:8,phone.jpg
5,9,3,552,525,555,522,1:52:8,car.jpg
6,6,6,444,392,111,232,1:43:4,dog.jpg
7,1,1,234,322,191,112,1:43:4,dog.jpg
I think you'd just need to change your if statement to be able to get the rows you want.
for example:
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] in [2,5]:
inside the if, you'll get the rows that have 2 or 5
I'm trying to obtain the difference between two csv files A.csv and B.csv in order to obtain new rows added in the second file. A.csv has the following data.
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
B.csv has the following data.
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
To write the new rows added into an output file I'm using the following script.
input_file1 = "A.csv"
input_file2 = "B.csv"
output_path = "out.csv"
with open(input_file1, 'r') as t1:
fileone = set(t1)
with open(input_file2, 'r') as t2, open(output_path, 'w') as outFile:
for line in t2:
if line not in fileone:
outFile.write(line)
Expected output is :
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
Output obtained through the above script is :
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 Redundant/RSK
acct ABC 88888888 99999999 ABC-GHD 4/1/18 4 1 2018 DT/89
I'm not sure where I'm making a mistake, tried debugging it but with no progress.
You need to be careful with trailing newlines. As such it is safer to remove the newlines before comparing and then add them back when writing:
input_file1 = "A.csv"
input_file2 = "B.csv"
output_path = "out.csv"
with open(input_file1, 'r') as t1:
fileone = set(t1.read().splitlines())
with open(input_file2, 'r') as t2, open(output_path, 'w') as outFile:
for line in t2:
line = line.strip()
if line not in fileone:
outFile.write(line + '\n')
By reference to this question: How to extract and copy lines from csv file to another csv file in python? the result That I have after executing this code based on this array:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
This is the code:
# read
data = []
with open('test.csv', 'r') as f:
f_csv = csv.reader(f)
# header = next(f_csv)
for row in f_csv:
data.append(row)
# write
with open('newtest.csv', 'w+') as f:
writer = csv.writer(f)
for i in range(int(len(data) * 30 / 100)):
writer.writerow(data[i])
This is the result in the newtest.csv file:
frame.number frame.len frame.cap_len frame.Type
empty line....
1 100 100 ICMP_tt
empty line....
2 64 64 UDP
However, I hope that the result looks like this:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
The test.csv file stil the same I mean that the two lines copied are not deleted. that means that I want to have:
frame.number frame.len frame.cap_len frame.Type
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
I hope that you can help me please.
To do the read and write separately in Python 2.x, you could use the following approach:
import csv
with open('test.csv', 'rb') as f_input:
data = list(csv.reader(f_input))
with open('newtest.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(data)
This would mean data holds all of the rows as a list of lists. Make sure the files are opened in binary mode for both reading and writing. If this is not done, you will get empty lines.
To do this one row at at time (useful if the CSV file is too large for memory), you could do the following:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(row)
Or even:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerows(csv_input)
I was writing a python script which converts an ascii file containing one pair numbers per line to a straight binary representation. Here is my script:
in_file = open("p02_0609.bin", 'r')
out_file = open("sta013.bin", 'w')
out_data = bytearray()
for line in in_file:
addr, i2c_data = [int(x) for x in line.split(" ")]
out_data.append(addr)
out_data.append(i2c_data)
out_file.write(out_data)
out_file.close()
in_file.close()
and a sample of the file it's reading (about 2000 lines total)
58 1
42 4
40 0
41 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 1
40 2
33 143
40 3
33 0
40 4
40 5
40 6
40 7
40 8
40 9
40 10
40 11
The output file ends on an odd byte, which it shouldn't since all the data is in pairs, and is about 80 bytes longer than expected. After poking around with a hex editor, I finally found the culprit. Every instance of "10" (Ascii LF) has had a CR appended in front of it. How do I make it stop doing that?
Tl;dr: Python is being a dumbass and adding CR to LF in binary data where that makes no sense. How to fix?
You are working with text files so line endings are automatically added by open function. You need to use the mode 'wb' in open for reading and writing bytes.
I would like to read in a file with multiple columns and write out a new file with columns in a different order than the original file. One of the columns has some extra text that I want eliminated in the new file as well.
For instance, if I read in file: data.txt
1 6 omi=11 16 21 26
2 7 omi=12 17 22 27
3 8 omi=13 18 23 28
4 9 omi=14 19 24 29
5 10 omi=15 20 25 30
I would like the written file to be: dataNEW.txt
26 1 11 16
27 2 12 17
28 3 13 18
29 4 14 19
30 5 15 20
With the help of inspectorG4dget, I came up with this:
import csv as csv
import sys as sys
infile = open('Rearrange Column Test.txt')
sys.stdout = open('Rearrange Column TestNEW.txt' , 'w')
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [5, 0, 2, 3]]
newline[2] = newline[2].split('=')[1]
print newline[0], newline[1], newline[2], newline[3]
sys.stdout.close()
Is there a more concise way to get an output without any commas than listing each line index from 0 to the total number of lines?
import csv
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='\t'):
newline = [line[i] for i in [-1, 0, 2 3]]
newline[2] = newline[2].split('=')[1]
writer.writerow(newline)