How can i read certain inputs only to my list? - python

My data looks like this :
image id name xMin xMax yMin yMax
24-5.png 1 4632 4695 42 57
24-5.png 2 2910 2962 60 74
24-5.png 3 2976 3045 60 74
24-5.png 4 2902 2980 84 99
45-11.png 1463 1209 1240 3455 3469
45-11.png 1464 1246 1300 3459 3470
As can be seen there that's i have a .csv file, let's call it data.csv how can i read the images with 24-5.png only to further processing?
The way i'm currently reading is
labels1 = []
with open("data.csv", 'r') as f:
reader = csv.DictReader(f, delimiter='\t')
for line in reader:
labels1.append(line)
basically i want label1 to contain the same data format but only for a specific value for images.

labels1 = []
with open("data.csv", 'r') as f:
reader = csv.DictReader(f, delimiter='\t')
for line in reader:
if line["image"] == "24-5.png": #Add an if to check for value of "image"
labels1.append(line)

Related

How to read specific columns in the csv file?

I have lots of live data coming from sensor. Currently, I stored the data in a csv file as following:
0 2 1 437 464 385 171 0:44:4 dog.jpg
1 1 3 452 254 444 525 0:56:2 cat.jpg
2 3 2 552 525 785 522 0:52:8 car.jpg
3 8 4 552 525 233 555 0:52:8 car.jpg
4 7 5 552 525 433 522 1:52:8 phone.jpg
5 9 3 552 525 555 522 1:52:8 car.jpg
6 6 6 444 392 111 232 1:43:4 dog.jpg
7 1 1 234 322 191 112 1:43:4 dog.jpg
.
.
.
.
Third column has numbers between 1 to 6. I want to read information of columns #4 and #5 for all the rows that have number 2 and 5 in the third columns. I also want to write them in another csv file line by line every 2 second, one line at the time.
I do so because I have another code which would go through the data and read the data from there. I was wondering how could I write the information for the lines that have 3 and 5 in their 3rd column? Please advise!
for example:
2 552 525
5 552 525
......
......
.....
.
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] == 2 & 5:
The third column has index 2 so you should be checking if row[2] is one of '2' or '5'. I have done this by defining the set select = {'2', '5'} and checking if row[2] in select.
I don't see what you are using header for but I assume you have more code that processes header somewhere. If you don't need header and just want to skip the first line, just do next(reader) without assigning it to header but I have kept header in my code under the assumption you use it later.
We can use time.sleep(2) from the time module to help us write a row every 2 seconds.
Below, "in.txt" is the csv file containing the sample input you provided and "out.txt" is the file we write to.
Code
import csv
import time
select = {'2', '5'}
with open("in.txt") as f_in, open("out.txt", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
header = next(reader)
for row in reader:
if row[2] in select:
print(f"Writing {row[2:5]} at {time.time()}")
writer.writerow(row[2:5])
# f_out.flush() may need to be run here
time.sleep(2)
Output
Writing ['2', '552', '525'] at 1650526118.9760585
Writing ['5', '552', '525'] at 1650526120.9763758
"out.txt"
2,552,525
5,552,525
Input
"in.txt"
0,2,1,437,464,385,171,0:44:4,dog.jpg
1,1,3,452,254,444,525,0:56:2,cat.jpg
2,3,2,552,525,785,522,0:52:8,car.jpg
3,8,4,552,525,233,555,0:52:8,car.jpg
4,7,5,552,525,433,522,1:52:8,phone.jpg
5,9,3,552,525,555,522,1:52:8,car.jpg
6,6,6,444,392,111,232,1:43:4,dog.jpg
7,1,1,234,322,191,112,1:43:4,dog.jpg
I think you'd just need to change your if statement to be able to get the rows you want.
for example:
import csv
with open('newfilename.csv', 'w') as f2:
with open('mydata.csv', mode='r') as infile:
reader = csv.reader(infile) # no conversion to list
header = next(reader) # get first line
for row in reader: # continue to read one line per loop
if row[5] in [2,5]:
inside the if, you'll get the rows that have 2 or 5

I want to extract values of cx,cy,r which all reside in column 3 [duplicate]

This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Unpack dictionary from Pandas Column
(2 answers)
Closed 4 years ago.
Below is my sample data in CSV file
filename, file_size, region_shape_attributes
1.jpg, 2551045, {"name":"circle","cx":371,"cy":2921,"r":73}
2.jpg, 2551045, {"name":"circle","cx":505,"cy":2951,"r":62}
3.jpg, 2551045, {"name":"circle","cx":619,"cy":2865,"r":83}
4.jpg, 2551045, {"name":"circle","cx":769,"cy":2793,"r":82}
5.jpg, 2551045, {"name":"circle","cx":885,"cy":2669,"r":87}
I want output as follow:
name cx cy r
circle 371 2921 73
circle 371 2921 73
circle 371 2921 73
import ast
# read your data
d = pd.read_clipboard()
# transform string to dictionary
d["region_shape_attributes"] = d["region_shape_attributes"].apply(lambda x: ast.literal_eval(x))
# convert column of dictionary to dataframe
pd.DataFrame(list(d['region_shape_attributes']))
It gives you the result.
cx cy name r
0 371 2921 circle 73
1 505 2951 circle 62
2 619 2865 circle 83
3 769 2793 circle 82
4 885 2669 circle 87
Read CSV file in a Dataframe:
df=pd.DataFrame({'img':['1.jpg','2.jpg','3jpg','4.jpg','5.jpg'],'id':[2551045,2551045,2551045,2551045,2551045],'dict':[{"name":"circle","cx":371,"cy":2921,"r":73},
{"name":"circle","cx":505,"cy":2951,"r":62},
{"name":"circle","cx":619,"cy":2865,"r":83},
{"name":"circle","cx":769,"cy":2793,"r":82},
{"name":"circle","cx":885,"cy":2669,"r":87}]})
use .apply(pd.Series)
df['dict'].apply(pd.Series)
Output:
cx cy name r
0 371 2921 circle 73
1 505 2951 circle 62
2 619 2865 circle 83
3 769 2793 circle 82
4 885 2669 circle 87
Old School way (without any package/module):
list.txt:
filename, file_size, region_shape_attributes
1.jpg, 2551045, {"name":"circle","cx":371,"cy":2921,"r":73}
2.jpg, 2551045, {"name":"circle","cx":505,"cy":2951,"r":62}
3.jpg, 2551045, {"name":"circle","cx":619,"cy":2865,"r":83}
4.jpg, 2551045, {"name":"circle","cx":769,"cy":2793,"r":82}
5.jpg, 2551045, {"name":"circle","cx":885,"cy":2669,"r":87}
and then:
logFile = "list.txt"
with open(logFile) as f:
content = f.readlines()
# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]
dict_list = []
for line in content[1:]:
l = line.split("{", 1)[1].strip("}")
dict_list.append(l)
print("name \t", end="")
print("cx \t\t", end="")
print("cy \t\t", end="")
print("r \t", )
for elem in dict_list:
x = elem.split(",")
print(x[0].split(":", 2)[1].replace('"', " "), end = "")
print(x[1].split(":", 2)[1].replace('"', " "), "\t", end = "")
print(x[2].split(":", 2)[1].replace('"', " "), "\t", end = "")
print(x[3].split(":", 2)[1].replace('"', " "), "\t")
OUTPUT:
name cx cy r
circle 371 2921 73
circle 505 2951 62
circle 619 2865 83
circle 769 2793 82
circle 885 2669 87
use below code :
csv_data=pd.read_csv(<file path>,sep=' ')
csv_data.columns=['Field1','Field2','Field3']
name=[]
cx=[]
cy=[]
r=[]
for i in csv_data['Field3']:
list_i=i.split(',')
name.append(list_i[0].split(':')[1])
cx.append(list_i[1].split(':')[1])
cy.append(list_i[2].split(':')[1])
r.append(list_i[3].split(':')[1].replace('}',''))
df_result=pd.DataFrame({'name':name,'cx':cx,'cy':cy,'r':r})
print (df_result)
output based on input given above:
cx cy name r
0 371 2921 "circle" 73
1 505 2951 "circle" 62
2 619 2865 "circle" 83
3 769 2793 "circle" 82
4 885 2669 "circle" 87

How to extract lines from csv file to anothe csv file?

By reference to this question: How to extract and copy lines from csv file to another csv file in python? the result That I have after executing this code based on this array:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
This is the code:
# read
data = []
with open('test.csv', 'r') as f:
f_csv = csv.reader(f)
# header = next(f_csv)
for row in f_csv:
data.append(row)
# write
with open('newtest.csv', 'w+') as f:
writer = csv.writer(f)
for i in range(int(len(data) * 30 / 100)):
writer.writerow(data[i])
This is the result in the newtest.csv file:
frame.number frame.len frame.cap_len frame.Type
empty line....
1 100 100 ICMP_tt
empty line....
2 64 64 UDP
However, I hope that the result looks like this:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
The test.csv file stil the same I mean that the two lines copied are not deleted. that means that I want to have:
frame.number frame.len frame.cap_len frame.Type
3 100 100 ICMP_tt
4 87 64 ICMP_nn
5 100 100 ICMP_tt
6 87 64 ICMP_nn
7 100 100 ICMP_tt
8 87 64 ICMP_nn
9 87 64 ICMP_nn
I hope that you can help me please.
To do the read and write separately in Python 2.x, you could use the following approach:
import csv
with open('test.csv', 'rb') as f_input:
data = list(csv.reader(f_input))
with open('newtest.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(data)
This would mean data holds all of the rows as a list of lists. Make sure the files are opened in binary mode for both reading and writing. If this is not done, you will get empty lines.
To do this one row at at time (useful if the CSV file is too large for memory), you could do the following:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
csv_output.writerow(row)
Or even:
with open('test.csv', 'rb') as f_input, open('newtest.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerows(csv_input)

how to find avg of column of csv file

import csv
with open('Met.csv', 'r') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
print row
I am not able to go ahead how to get a column from the csv file I tried
print row[:column_name]
name id name reccla mass (g) fall year GeoLocation
Aachen 1 Valid L5 21 Fell 01/01/1880 (50.775000, 6.083330)
Aarhus 2 Valid H6 720 Fell 1/1/1951 (53.775000, 6.586560)
Abee 6 Valid EH4 -- Fell 1/1/1952 (50.775000, 6.083330)
Acapul 10 Valid A 353 Fell 1/1/1952 (50.775000, 6.083330)
Acapul 1914 valid A -- Fell 1/1/1952 (50.775000, 6.083330)
AdhiK 379 Valid EH4 56655 Fell 1/1/1919 (50.775000, 6.083330)
and I want avg of mass (g)
Try pandas instead of reading from csv
import pandas as pd
data = pd.read_csv('Met.csv')
It is far easier to grab columns and perform operations using pandas.
Here I am loading the csv contents to a dataframe.
Loaded data : (sample data)
>>> data
name id nametype recclass mass
0 Aarhus 2 Valid H6 720
1 Abee 6 Valid EH4 107000
2 Acapulco 10 Valid Acapulcoite 914
3 Achiras 370 Valid L6 780
4 Adhi Kot 379 Valid EH4 4239
5 Adzhi 390 Valid LL3-6 910
6 Agen 392 Valid H5 30000
Just the Mass column :
You can access individual columns as data['column name']
>>> data['mass']
0 720
1 107000
2 914
3 780
4 4239
5 910
6 30000
Name: mass, dtype: int64
Average of Mass column :
>>> data['mass'].mean()
20651.857142857141
You can use csv.DictReader() instead of csv.reader(). The following code works fine to me
import csv
mass_list = []
with open("../data/Met.csv", "r") as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
mass = row["mass"]
if mass is not None and mass is not "--":
mass_list.append(float(row["mass"]))
avg_mass = sum(mass_list) / len(mass_list)
print "avg of mass: ", avg_mass
Hope it helps.

Vlookup in python

I am new to python and leaning as fast as possible. I know how to do my problem in bash and trying to work on python.
I have a data file (data_array.csv in the example) and index file, index.csv, at which I want to extract the data from the data file that have the same ID in the index file and store in to a new file, Out.txt. I also want to put NA ,in the Out.txt, for those ID's that have no value in the data file. I know how to do it for one column. But my data has more than 1000 columns (from 1 to 1344). I want you help me with a script that can do it faster. My data file, index id and proposed out put as follows.
data_array.csv
Id 1 2 3 . . 1344
1 10 20 30 . . -1
2 20 30 40 . . -2
3 30 40 50 . . -3
4 40 50 60 . . -4
6 60 60 70 . . -5
8 80 70 80 . . -6
10 100 80 90 . . -7
index.csv
Id
1
2
8
9
10
Required Output is
Out.txt
Id 1 2 3 . . 1344
1 10 20 30 . . -1
2 20 30 40 . . -2
8 80 70 80 . . -6
9 NA NA NA NA
10 100 80 90 . . -7
I tried
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
for i in reader2:
for xl in reader:
if i[0] == xl[0]:
i.append(xl[1:])
writer.writerow(i)
But it only do for the first row. I want the program to work for the entire rows and columns of my data files.
It only output the first row because after xl in reader for the first time, you are at the end of the file. You need to point to the beginning of the file after that. To increase efficiency, you can read the csvinput into a dictionary first, then use dictionary lookup to get the row you need:
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
d = {}
for xl in reader2:
d[xl[0]] = xl[1:]
for i in reader:
if i[0] in d:
i.append(d[i[0]])
writer.writerow(i)
When you read a CSV file using for xl in readerit will go through every row until it reaches the end. But it will only do this once. You can tell it to go back to the first row of the CSV file by using .seek(0).
#! /usr/bin/python
import csv
with open('data_array.csv','r') as lookuplist:
with open('index.csv', "r") as csvinput:
with open('VlookupOut','w') as output:
reader = csv.reader(lookuplist)
reader2 = csv.reader(csvinput)
writer = csv.writer(output)
for i in reader2:
for xl in reader:
if i[0] == xl[0]:
i.append(xl[1:])
writer.writerow(i)
lookuplist.seek(0)

Categories