Searching a CSV file - python

I need to filter and do some math on data coming from CSV files.
I've wrote a simple Pyhton script to isolate the rows I need to get (they should contain certain keywords like "Kite"), but my script does not work and I can't find why. Can you tell me what is wrong with it? Another thing: once I get to the chosen row/s, how can I point to each (comma separated) column?
Thanks in advance.
R.
import csv
with open('sales-2013.csv', 'rb') as csvfile:
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite":
print ",".join(row)

You are reading the file in bytes. Change the open('filepathAndName.csv, 'r') command or convert your strings like "Kite".encode('UTF-8'). The second mistake could be that you are looking for a line with the word "Kite", but if "Kite" is a substring of that line it will not be found. In this case you have to use if "Kite" in row:.
with open('sales-2013.csv', 'rb') as csvfile: # <- change 'rb' to 'r'
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite": # <- this would be better: if "Kite" in row:
print ",".join(row)
Read this:
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

To find the rows than contain the word "Kite", then you should use
for row in sales: # here you iterate over every row (a *list* of cells)
if "Kite" in row:
# do stuff
Now that you know how to find the required rows, you can access the desired cells by indexing the rows. For example, if you want to select the second cell of a row, you simply do
cell = row[1] # remember, indexes start with 0

Related

problems isolating a column based on what it's written in it in a csv file with Python

I have a massive csv files with over 12 million rows and with 4 columns, the first column is just to put it in order from 0 to 12 million, the second one has the name of the region where this thing is, third one is a city (each city is a number) and 4th one has the number of visitors.
What I would like to do is plot the third and fourth column (one on the x and one on the y) but just for a certain region, I tried so many things to just read the part of the file that says 'Essex' but there is nothing that works, the second column Is called "region" the region i am interested in is 'Essex', any help? Thank you!
You should look into the standard library called "csv". Something like this to get you going:
import csv
with open("name of csv file") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
# Check for Essex
if row[1] == 'Essex':
# Do whatever
pass
The above example assumes there is no header line in your CSV file. If you do have a header, you can skip it like this:
with open("name of csv file") as csvfile:
# Read and skip a header line.
header = csvfile.readline()
reader = csv.reader(csvfile)
for row in reader:
# As above
or look into csv.DictReader().

Trouble Accessing and Aggregating a Specific Column within a File

I need to access a .txt file and add up the integers in all of the last columns using an accumulation pattern. I know I've accessed and opened the file correctly, however, it's the aggregation of the last column that's stumped me. The current code is giving me a 0 (while playing around with it, I've run into a few different errors.)
I'm aware that each line is a string and that I need to split the lines into a list of values in order to continue. Any suggestions/help would be extremely helpful.
the_File = open("DoT_Info.txt", "r")
num_accidents = 0
for char in the_File.readlines():
new_splt = char.split(',')
num_accidents += int(new_splt[-1])
print('Total Incidents: ', num_accidents)
the_File.close()
Something like this ought to work, assuming the last element in each row is always the number of accidents.
import csv
with open("DoT_Info.txt", "r") as f:
reader = csv.reader(f)
# next(reader) - do this if there is a header row
num_accidents = sum(int(row[-1]) for row in reader)
print('Total Incidents: ', num_accidents)

Need a way to take three csv files and put into one as well as remove duplicates and replace values in Python

I'm new to Python but I need help creating a script that will take in three different csv files, combine them together, remove duplicates from the first column as well as remove any rows that are blank, then change a revenue area to a number.
The three CSV files are setup the same.
The first column is a phone number and the second column is a revenue area (city).
The first column will need all duplicates & blank values removed.
The second column will have values like "Macon", "Marceline", "Brookfield", which will need to be changed to a specific value like:
Macon = 1
Marceline = 8
Brookfield = 4
And then if it doesn't match one of those values put a default value of 9.
Welcome to Stack Overflow!
Firstly, you'll want to be using the csv library for the "reader" and "writer" functions, so import the csv module.
Then, you'll want to open the new file to be written to, and use the csv.writer function on it.
After that, you'll want to define a set (I name it seen). This will be used to prevent duplicates from being written.
Write your headers (if you need them) to the new file using the writer.
Open your first old file, using csv module's "reader". Iterate through the rows using a for loop, and add the rows to the "seen" set. If a row has been seen, simply "continue" instead of writing to the file. Repeat this for the next two files.
To assign the values to the cities, you'll want to define a dictionary that holds the old names as the keys, and new values for the names as the values.
So, your code should look something like this:
import csv
myDict = {'Macon' : 1, 'Marceline' : 8, 'Brookfield' : 4}
seen = set()
newFile = open('newFile.csv', 'wb', newline='') #newline argument will prevent the writer from writing extra newlines, preventing empty rows.
writer = csv.writer(newFile)
writer.writerow(['Phone Number', 'City']) #This will write a header row for you.
#Open the first file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('firstFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the second file, read each row, skip if row is empty, skip duplicate rows, change value of "City", write to new file.
with open('secondFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the third file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('thirdFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Close the output file
newFile.close()
I have not tested this myself, but it is very similar to two different programs that I wrote, and I have attempted to combine them into one. Let me know if this helps, or if there is something wrong with it!
-JCoder96

Creating Individual Rows with based on a cell value in a column

I am looking to take a CSV file and sort the file using python 2.7 to get an individual value based on two columns for a block and lot. My data looks like now in the link below:
Beginning
I want to be able on the lot value to create extra lines using Python to automate this into a new CSV where the values will look like this when drawn out on the new CSV
End Result
So I know that I need read the row and the column and based on the cell value for the lot column if there is a "," then the row will be copied to the next row in the other csv and all the values before the first column will be copied only and then the second, third etc.
After the Commas are separated out, then the ranges will be managed in a similar way in a third CSV. If there is a single value, the whole row will be copied as is.
Thank you for the help in advanced.
This should work.
On Windows open files in binary mode or else you get double new lines.
I assumed rows are separated by ; because cells contains ,
First split by ,, then check for ranges
print line is for debugging
Error checking is left as an exercise for the reader.
Code:
import csv
file_in = csv.reader(open('input.csv', 'rb'), delimiter=';')
file_out = csv.writer(open('output.csv', 'wb'), delimiter=';')
for i, line in enumerate(file_in):
if i == 0:
# write header
file_out.writerow(line)
print line
continue
for j in line[1].split(','):
if len(j.split('-')) > 1:
# lines with -
start = int(j.split('-')[0])
end = int(j.split('-')[1])
for k in xrange(start, end + 1):
line[1] = k
file_out.writerow(line)
print line
else:
# lines with ,
line[1] = j
file_out.writerow(line)
print line

How to print csv rows in ascending order Python

I am trying to read a csv file, and parse the data and return on row (start_date) only if the date is before September 6, 2010. Then print the corresponding values from row (words) in ascending order. I can accomplish the first half using the following:
import csv
with open('sample_data.csv', 'rb') as f:
read = csv.reader(f, delimiter =',')
for row in read:
if row[13] <= '1283774400':
print(row[13]+"\t \t"+row[16])
It returns the correct start_date range, and corresponding word column values, but they are not returning in ascending order which would display a message if done correctly.
I have tried to use the sort() and sorted() functions, after creating an empty list to populate then appending it to the rows, but I am just not sure where or how to incorporate that into the existing code, and have been terribly unsuccessful. Any help would be greatly appreciated.
Just read the list, filter the list according to the < date criteria and sort it according to the 13th row as integer
Note that the common mistake would be to filter as ASCII (which may appear to work), but integer conversion is relaly required to avoid sort problems.
import csv
with open('sample_data.csv', 'r') as f:
read = csv.reader(f, delimiter =',')
# csv has a title, we have to skip it (comment if no title)
title_row = next(read)
# read csv and filter out to keep only earlier rows
lines = filter(lambda row : int(row[13]) < 1283774400,read)
# sort the filtered list according to the 13th row, as numerical
slist = sorted(lines,key=lambda row : int(row[13]))
# print the result, including title line
for row in title_row+slist:
#print(row[13]+"\t \t"+row[16])
print(row)

Categories