I have lots of live data coming from sensor. Currently, I stored the data in a csv file as following:
0 2 1 437 464 385 171 0:44:4 dog.jpg
1 1 3 452 254 444 525 0:56:2 cat.jpg
2 3 2 552 525 785 522 0:52:8 car.jpg
3 8 4 552 525 233 555 0:52:8 car.jpg
4 7 5 552 525 433 522 1:52:8 phone.jpg
5 9 3 552 525 555 522 1:52:8 car.jpg
6 6 6 444 392 111 232 1:43:4 dog.jpg
7 1 1 234 322 191 112 1:43:4 dog.jpg
.
.
.
.
Third column has numbers between 1 to 6. I want to read information of columns #4 and #5 for all the rows that have number 2 and 5 in the third columns. I also want to write them in another csv file line by line every 2 second, one line at the time.
I do so because I have another code which would go through the data and read the data from there. I was wondering how could I write the information for the lines that have 3 and 5 in their 3rd column? Please advise!
for example:
2 552 525
5 552 525
......
......
.....
.
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] == 2 & 5:
The third column has index 2 so you should be checking if row[2] is one of '2' or '5'. I have done this by defining the set select = {'2', '5'} and checking if row[2] in select.
I don't see what you are using header for but I assume you have more code that processes header somewhere. If you don't need header and just want to skip the first line, just do next(reader) without assigning it to header but I have kept header in my code under the assumption you use it later.
We can use time.sleep(2) from the time module to help us write a row every 2 seconds.
Below, "in.txt" is the csv file containing the sample input you provided and "out.txt" is the file we write to.
Code
import csv
import time
select = {'2', '5'}
with open("in.txt") as f_in, open("out.txt", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
header = next(reader)
for row in reader:
if row[2] in select:
print(f"Writing {row[2:5]} at {time.time()}")
writer.writerow(row[2:5])
# f_out.flush() may need to be run here
time.sleep(2)
Output
Writing ['2', '552', '525'] at 1650526118.9760585
Writing ['5', '552', '525'] at 1650526120.9763758
"out.txt"
2,552,525
5,552,525
Input
"in.txt"
0,2,1,437,464,385,171,0:44:4,dog.jpg
1,1,3,452,254,444,525,0:56:2,cat.jpg
2,3,2,552,525,785,522,0:52:8,car.jpg
3,8,4,552,525,233,555,0:52:8,car.jpg
4,7,5,552,525,433,522,1:52:8,phone.jpg
5,9,3,552,525,555,522,1:52:8,car.jpg
6,6,6,444,392,111,232,1:43:4,dog.jpg
7,1,1,234,322,191,112,1:43:4,dog.jpg
I think you'd just need to change your if statement to be able to get the rows you want.
for example:
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] in [2,5]:
inside the if, you'll get the rows that have 2 or 5
Related
By reference to this question: How to extract and copy lines from csv file to another csv file in python? the result That I have after executing this code based on this array:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
This is the code:
# read
data = []
with open('test.csv', 'r') as f:
f_csv = csv.reader(f)
# header = next(f_csv)
for row in f_csv:
data.append(row)
# write
with open('newtest.csv', 'w+') as f:
writer = csv.writer(f)
for i in range(int(len(data) * 30 / 100)):
writer.writerow(data[i])
This is the result in the newtest.csv file:
frame.number frame.len frame.cap_len frame.Type
empty line....
1 100 100 ICMP_tt
empty line....
2 64 64 UDP
However, I hope that the result looks like this:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
The test.csv file stil the same I mean that the two lines copied are not deleted. that means that I want to have:
frame.number frame.len frame.cap_len frame.Type
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
I hope that you can help me please.
To do the read and write separately in Python 2.x, you could use the following approach:
import csv
with open('test.csv', 'rb') as f_input:
data = list(csv.reader(f_input))
with open('newtest.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(data)
This would mean data holds all of the rows as a list of lists. Make sure the files are opened in binary mode for both reading and writing. If this is not done, you will get empty lines.
To do this one row at at time (useful if the CSV file is too large for memory), you could do the following:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(row)
Or even:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerows(csv_input)
In csv file if the line start with # sign or it is empty, I can remove or ignore it easily.
# some description here
# 1 is for good , 2 is bad and 3 for worse
empty line
I can deal by ignoring the empty line and line start with # by following logic in python.
while True:
if len(data[0]) == 0 or data[0][0][0] == '#':
data.pop(0)
else:
break
return data
But Below is header data but it has few empty spaces in start and then data is available
0 temp_data 1 temp_flow 2 temp_record 3 temp_all
22 33 434 344
34 43 434 355
In some files i got header data like below and then I had to ignore only # sign and not column names
#0 temp_data 1 temp_flow 2 temp_record 3 temp_all
22 33 434 344
34 43 434 355
But I get no clue how to deal with these two situation.
if someone help me. it would be grateful. because my above logic fails on these two situations.
You can use the string strip() function to remove leading and trailing whitespace first...
>>> ' 0 temp_data 1 temp_flow 2 temp_record 3 temp_all'.strip()
'0 temp_data 1 temp_flow 2 temp_record 3 temp_all'
I have read other simliar posts but they don't seem to work in my case. Hence, I'm posting it newly here.
I have a text file which has varying row and column sizes. I am interested in the rows of values which have a specific parameter. E.g. in the sample text file below, I want the last two values of each line which has the number '1' in the second position. That is, I want the values '1, 101', '101, 2', '2, 102' and '102, 3' from the lines starting with the values '101 to 104' because they have the number '1' in the second position.
$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
425
.
.
$EndNodes
$Elements
630
.
97 15 2 0 193 97
98 15 2 0 195 98
99 15 2 0 197 99
100 15 2 0 199 100
101 1 2 0 201 1 101
102 1 2 0 201 101 2
103 1 2 0 202 2 102
104 1 2 0 202 102 3
301 2 2 0 303 178 78 250
302 2 2 0 303 250 79 178
303 2 2 0 303 198 98 249
304 2 2 0 303 249 99 198
.
.
.
$EndElements
The problem is, with the code I have come up with mentioned below, it starts from '101' but it reads the values from the other lines upto '304' or more. What am I doing wrong or does someone has a better way to tackle this?
# Here, (additional_lines + anz_knoten_gmsh - 2) are additional lines that need to be skipped
# at the beginning of the .txt file. Initially I find out where the range
# of the lines lies which I need.
# The two_noded_elem_start is the first line having the '1' at the second position
# and four_noded_elem_start is the first line number having '2' in the second position.
# So, basically I'm reading between these two parameters.
input_file = open(os.path.join(gmsh_path, "mesh_outer_region.msh"))
output_file = open(os.path.join(gmsh_path, "mesh_skip_nodes.txt"), "w")
for i, line in enumerate(input_file):
if i == (additional_lines + anz_knoten_gmsh + two_noded_elem_start - 2):
break
for i, line in enumerate(input_file):
if i == additional_lines + anz_knoten_gmsh + four_noded_elem_start - 2:
break
elem_list = line.strip().split()
del elem_list[:5]
writer = csv.writer(output_file)
writer.writerow(elem_list)
input_file.close()
output_file.close()
*EDIT: The piece of code used to find the parameters like two_noded_elem_start is as follows:
# anz_elemente_ueberg_gmsh is another parameter that is found out
# from a previous piece of code and '$EndElements' is what
# is at the end of the text file "mesh_outer_region.msh".
input_file = open(os.path.join(gmsh_path, "mesh_outer_region.msh"), "r")
for i, line in enumerate(input_file):
if line.strip() == anz_elemente_ueberg_gmsh:
break
for i, line in enumerate(input_file):
if line.strip() == '$EndElements':
break
element_list = line.strip().split()
if element_list[1] == '1':
two_noded_elem_start = element_list[0]
two_noded_elem_start = int(two_noded_elem_start)
break
input_file.close()
>>> with open('filename') as fh: # Open the file
... for line in fh: # For each line the file
... values = line.split() # Split the values into a list
... if values[1] == '1': # Compare the second value
... print values[-2], values[-1] # Print the 2nd from last and last
1 101
101 2
2 102
102 3
import csv
with open('Met.csv', 'r') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
print row
I am not able to go ahead how to get a column from the csv file I tried
print row[:column_name]
name id name reccla mass (g) fall year GeoLocation
Aachen 1 Valid L5 21 Fell 01/01/1880 (50.775000, 6.083330)
Aarhus 2 Valid H6 720 Fell 1/1/1951 (53.775000, 6.586560)
Abee 6 Valid EH4 -- Fell 1/1/1952 (50.775000, 6.083330)
Acapul 10 Valid A 353 Fell 1/1/1952 (50.775000, 6.083330)
Acapul 1914 valid A -- Fell 1/1/1952 (50.775000, 6.083330)
AdhiK 379 Valid EH4 56655 Fell 1/1/1919 (50.775000, 6.083330)
and I want avg of mass (g)
Try pandas instead of reading from csv
import pandas as pd
data = pd.read_csv('Met.csv')
It is far easier to grab columns and perform operations using pandas.
Here I am loading the csv contents to a dataframe.
Loaded data : (sample data)
>>> data
name id nametype recclass mass
0 Aarhus 2 Valid H6 720
1 Abee 6 Valid EH4 107000
2 Acapulco 10 Valid Acapulcoite 914
3 Achiras 370 Valid L6 780
4 Adhi Kot 379 Valid EH4 4239
5 Adzhi 390 Valid LL3-6 910
6 Agen 392 Valid H5 30000
Just the Mass column :
You can access individual columns as data['column name']
>>> data['mass']
0 720
1 107000
2 914
3 780
4 4239
5 910
6 30000
Name: mass, dtype: int64
Average of Mass column :
>>> data['mass'].mean()
20651.857142857141
You can use csv.DictReader() instead of csv.reader(). The following code works fine to me
import csv
mass_list = []
with open("../data/Met.csv", "r") as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
mass = row["mass"]
if mass is not None and mass is not "--":
mass_list.append(float(row["mass"]))
avg_mass = sum(mass_list) / len(mass_list)
print "avg of mass: ", avg_mass
Hope it helps.
I am new to python and leaning as fast as possible. I know how to do my problem in bash and trying to work on python.
I have a data file (data_array.csv in the example) and index file, index.csv, at which I want to extract the data from the data file that have the same ID in the index file and store in to a new file, Out.txt. I also want to put NA ,in the Out.txt, for those ID's that have no value in the data file. I know how to do it for one column. But my data has more than 1000 columns (from 1 to 1344). I want you help me with a script that can do it faster. My data file, index id and proposed out put as follows.
data_array.csv
Id 1 2 3 . . 1344
1 10 20 30 . . -1
2 20 30 40 . . -2
3 30 40 50 . . -3
4 40 50 60 . . -4
6 60 60 70 . . -5
8 80 70 80 . . -6
10 100 80 90 . . -7
index.csv
Id
1
2
8
9
10
Required Output is
Out.txt
Id 1 2 3 . . 1344
1 10 20 30 . . -1
2 20 30 40 . . -2
8 80 70 80 . . -6
9 NA NA NA NA
10 100 80 90 . . -7
I tried
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
for i in reader2:
for xl in reader:
if i[0] == xl[0]:
i.append(xl[1:])
writer.writerow(i)
But it only do for the first row. I want the program to work for the entire rows and columns of my data files.
It only output the first row because after xl in reader for the first time, you are at the end of the file. You need to point to the beginning of the file after that. To increase efficiency, you can read the csvinput into a dictionary first, then use dictionary lookup to get the row you need:
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
d = {}
for xl in reader2:
d[xl[0]] = xl[1:]
for i in reader:
if i[0] in d:
i.append(d[i[0]])
writer.writerow(i)
When you read a CSV file using for xl in readerit will go through every row until it reaches the end. But it will only do this once. You can tell it to go back to the first row of the CSV file by using .seek(0).
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
for i in reader2:
for xl in reader:
if i[0] == xl[0]:
i.append(xl[1:])
writer.writerow(i)
lookuplist.seek(0)