I am learning how to read CSV files using Python 3, and have been playing around with my code and have managed to read either the whole document or certain columns, however I am trying to now read only certain records that contain a certain value.
For example I want to read all records where the car is blue, how would I make it read only those records? I can't figure this out and would be grateful for any help or guidance!
import csv
with open('cars.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['ID'], row['Make'], row['Colour'])
A simple "if" statement should suffice. See control flow docs.
import csv
with open('Cars.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if row['Colour'] == 'blue':
print(row['ID'] ,row ['Make'],row ['Colour'])
You can check the values while reading the rows.
with open('Cars.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
// check your values here - if car = blue
// do something with blue cars.
print(row['ID'] ,row ['Make'],row ['Colour'])
You read each row one by one and use an explicit check to filter those that you want to deal with. Then add them to an array for example, or process it in place.
Related
I have a massive csv files with over 12 million rows and with 4 columns, the first column is just to put it in order from 0 to 12 million, the second one has the name of the region where this thing is, third one is a city (each city is a number) and 4th one has the number of visitors.
What I would like to do is plot the third and fourth column (one on the x and one on the y) but just for a certain region, I tried so many things to just read the part of the file that says 'Essex' but there is nothing that works, the second column Is called "region" the region i am interested in is 'Essex', any help? Thank you!
You should look into the standard library called "csv". Something like this to get you going:
import csv
with open("name of csv file") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
# Check for Essex
if row[1] == 'Essex':
# Do whatever
pass
The above example assumes there is no header line in your CSV file. If you do have a header, you can skip it like this:
with open("name of csv file") as csvfile:
# Read and skip a header line.
header = csvfile.readline()
reader = csv.reader(csvfile)
for row in reader:
# As above
or look into csv.DictReader().
I want to work with the data imported from csv files. However, there are many lines of information that I don´t need in the csv files. Let´s say, data from the first three rows and all rows after 125 should be removed. How can I get this job done by using Python? I have figured out the way to remove the first three rows but I am still having problem with the rest part.
import csv
csv_file = open('Raman_060320.csv')
csv_reader = csv.reader(csv_file, delimiter='\t')
for skip in range(3):
next(csv_reader)
for row in csv_reader:
print(row)
csv_file.close()
I am from the field of hydrology and don´t know very deep about programming (I´ve just began to learn), so I would appreciate all the help I could get.
As suggested by Damzaky, using pandas:
import pandas as pd
df = pd.read_csv('Raman_060320.csv')
#Keep rows 4 - 125
df = df[3:126]
#Save to csv
df.to_csv('Raman_060320.csv', index = False)
list3 = []
with open('**directory**') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
list3.append(row)
I'm completely new to data analysis using Python, and require some assistance.
The file I'm accessing contains data from 5 people (CSV file). There are 3 columns - participant number, pre-task Score, and post-task Score.
I'm essentially trying to access this file (using csv.DictReader) and manipulate the data. By this, I mean I want to calculate the difference between the post-task score and pre-task score, for each participant, and print this to the screen.
However, I'm not sure how to do this. I can print each row to the screen, and I can save each row in a list (as I've done above) - but I'm clueless as to how I am to manipulate/deal with this data. I'm wondering if there is something better than the module I'm currently using.
Calculating the difference between the second and third columns in a CSV file can be accomplished as follows:
import csv
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
# skip the header row, remove this next line if there is no header
next(reader, None)
for row in reader:
difference = float(row[2]) - float(row[1])
print str(difference)
I am trying to add one duplicated column next to the existing column in my csv file. For example, a dataset looks like this.
A,B,C,D
D,E,F,G
Then to add one duplicated column.
A,A,B,B,C,C,D,D
D,D,E,E,F,F,G,G
Below is code I have tried but apparently it does not work.
import csv
with open('in.csv','r') as csvin:
with open('out.csv', 'wb') as csvout:
writer = csv.writer(csvout, lineterminator=',')
reader = csv.reader(csvin, lineterminator=',')
goal = []
for line in reader:
for i in range(1,len(line)+1,2):
line.append(line[i])
goal.append(line)
writer.writerows(goal)
Any hints please?
Well you can do it succinctly as follows
from itertools import repeat
# open the file, create a reader
for row in reader:
row_ = [i for item in row for i in itertools.repeat(item,2)]
# now do whatever you want to do with row_
I think that
for i in range(0,len(line)):
goal.append(i);
goal.append(i);
not best implentation, but it should work
I am trying to read in a table from a .CSV file which should have 5 columns.
But, some rows have corrupt data..making it more than 5 columns.
How do I reject those rows and continue reading further ?
*Using
temp = read_table(folder + r'\temp.txt, sep=r'\t')
Just gives an error and stops the program*
I am new to Python...please help
Thanks
Look into using Python's csv module.
Without testing the damaged file it is difficult to say if this will do the trick however the csvreader reads a csv file's rows as a list of strings so you could potentially check if the list has 5 elements and proceed that way.
A code example:
out = []
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimeter=' ')
for row in reader:
if len(row) == 5:
out.append(row)