I am trying to read a csv file, and parse the data and return on row (start_date) only if the date is before September 6, 2010. Then print the corresponding values from row (words) in ascending order. I can accomplish the first half using the following:
import csv
with open('sample_data.csv', 'rb') as f:
read = csv.reader(f, delimiter =',')
for row in read:
if row[13] <= '1283774400':
print(row[13]+"\t \t"+row[16])
It returns the correct start_date range, and corresponding word column values, but they are not returning in ascending order which would display a message if done correctly.
I have tried to use the sort() and sorted() functions, after creating an empty list to populate then appending it to the rows, but I am just not sure where or how to incorporate that into the existing code, and have been terribly unsuccessful. Any help would be greatly appreciated.
Just read the list, filter the list according to the < date criteria and sort it according to the 13th row as integer
Note that the common mistake would be to filter as ASCII (which may appear to work), but integer conversion is relaly required to avoid sort problems.
import csv
with open('sample_data.csv', 'r') as f:
read = csv.reader(f, delimiter =',')
# csv has a title, we have to skip it (comment if no title)
title_row = next(read)
# read csv and filter out to keep only earlier rows
lines = filter(lambda row : int(row[13]) < 1283774400,read)
# sort the filtered list according to the 13th row, as numerical
slist = sorted(lines,key=lambda row : int(row[13]))
# print the result, including title line
for row in title_row+slist:
#print(row[13]+"\t \t"+row[16])
print(row)
Related
I'm trying to create an array in Python, so I can access the last cell in it without defining how many cells there are in it.
Example:
from csv import reader
a = []
i = -1
with open("ccc.csv","r") as f:
csv_reader = reader(f)
for row in csv_reader:
a[i] = row
i = i-1
Here I'm trying to take the first row in the CSV file and put it in the last cell on the array, in order to put it in reverse order on another file.
In this case, I don't know how many rows are in the CSV file, so I can not set the cells in the array as the number of the rows in the file
I tried to use f.append(row), but it inserts the values to the first cell of the array, and I want it to insert the values to the last cell of the array.
Read all the rows in the normal order, and then reverse the list:
from csv import reader
with open('ccc.csv') as f:
a = list(reader(f))
a.reverse()
First up, your current code is going to raise an index error on account of there being no elements, so a[-1] points to nothing at all.
The function you're looking for is list.insert which it inherits from the generic sequence types. list.insert takes two arguments, the index to insert a value in and the value to be inserted.
To rewrite your current code for this, you'd end up with something like
import dbf
from csv import reader
a = []
with open("ccc.csv", "r") as f:
csv_reader = reader(f)
for row in csv_reader:
a.insert(0, row)
This would reverse the contents of the csv file, which you can then write to a new file or use as you need
I'm new to Python but I need help creating a script that will take in three different csv files, combine them together, remove duplicates from the first column as well as remove any rows that are blank, then change a revenue area to a number.
The three CSV files are setup the same.
The first column is a phone number and the second column is a revenue area (city).
The first column will need all duplicates & blank values removed.
The second column will have values like "Macon", "Marceline", "Brookfield", which will need to be changed to a specific value like:
Macon = 1
Marceline = 8
Brookfield = 4
And then if it doesn't match one of those values put a default value of 9.
Welcome to Stack Overflow!
Firstly, you'll want to be using the csv library for the "reader" and "writer" functions, so import the csv module.
Then, you'll want to open the new file to be written to, and use the csv.writer function on it.
After that, you'll want to define a set (I name it seen). This will be used to prevent duplicates from being written.
Write your headers (if you need them) to the new file using the writer.
Open your first old file, using csv module's "reader". Iterate through the rows using a for loop, and add the rows to the "seen" set. If a row has been seen, simply "continue" instead of writing to the file. Repeat this for the next two files.
To assign the values to the cities, you'll want to define a dictionary that holds the old names as the keys, and new values for the names as the values.
So, your code should look something like this:
import csv
myDict = {'Macon' : 1, 'Marceline' : 8, 'Brookfield' : 4}
seen = set()
newFile = open('newFile.csv', 'wb', newline='') #newline argument will prevent the writer from writing extra newlines, preventing empty rows.
writer = csv.writer(newFile)
writer.writerow(['Phone Number', 'City']) #This will write a header row for you.
#Open the first file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('firstFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the second file, read each row, skip if row is empty, skip duplicate rows, change value of "City", write to new file.
with open('secondFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the third file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('thirdFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Close the output file
newFile.close()
I have not tested this myself, but it is very similar to two different programs that I wrote, and I have attempted to combine them into one. Let me know if this helps, or if there is something wrong with it!
-JCoder96
I am trying to determine the type of data contained in each column of a .csv file so that I can make CREATE TABLE statements for MySQL. The program makes a list of all the column headers and then grabs the first row of data and determines each data type and appends it to the column header for proper syntax. For example:
ID Number Decimal Word
0 17 4.8 Joe
That would produce something like CREATE TABLE table_name (ID int, Number int, Decimal float, Word varchar());.
The problem is that in some of the .csv files the first row contains a NULL value that is read as an empty string and messes up this process. My goal is to then search each row until one is found that contains no NULL values and use that one when forming the statement. This is what I have done so far, except it sometimes still returns rows that contains empty strings:
def notNull(p): # where p is a .csv file that has been read in another function
tempCol = next(p)
tempRow = next(p)
col = tempCol[:-1]
row = tempRow[:-1]
if any('' in row for row in p):
tempRow = next(p)
row = tempRow[:-1]
else:
rowNN = row
return rowNN
Note: The .csv file reading is done in a different function, whereas this function simply uses the already read .csv file as input p. Also each row is ended with a , that is treated as an extra empty string so I slice the last value off of each row before checking it for empty strings.
Question: What is wrong with the function that I created that causes it to not always return a row without empty strings? I feel that it is because the loop is not repeating itself as necessary but I am not quite sure how to fix this issue.
I cannot really decipher your code. This is what I would do to only get rows without the empty string.
import csv
def g(name):
with open('file.csv', 'r') as f:
r = csv.reader(f)
# Skip headers
row = next(r)
for row in r:
if '' not in row:
yield row
for row in g('file.csv'):
print('row without empty values: {}'.format(row))
I have the following data in a file:
Sarah,10
John,5
Sarah,7
Sarah,8
John,4
Sarah,2
I would like to keep the last three rows for each person. The output would be:
John,5
Sarah,7
Sarah,8
John,4
Sarah,2
In the example, the first row for Sarah was removed since there where three later rows. The rows in the output also maintain the same order as the rows in the input. How can I do this?
Additional Information
You are all amazing - Thank you so much. Final code which seems to have been deleted from this post is -
import collections
with open("Class2.txt", mode="r",encoding="utf-8") as fp:
count = collections.defaultdict(int)
rev = reversed(fp.readlines())
rev_out = []
for line in rev:
name, value = line.split(',')
if count[name] >= 3:
continue
count[name] += 1
rev_out.append((name, value))
out = list(reversed(rev_out))
print (out)
Since this looks like csv data, use the csv module to read and write it. As you read each line, store the rows grouped by the first column. Store the line number along with the row so that they can be written out maintaining the same order as the input. Use a bound deque to keep only the last three rows for each name. Finally, sort the rows and write them out.
import csv
by_name = defaultdict(lambda x: deque(x, maxlen=3))
with open('my_data.csv') as f_in
for i, row in enumerate(csv.reader(f_in)):
by_name[row[0]].append((i, row))
# sort the rows for each name by line number, discarding the number
rows = sorted(row[1] for value in by_name.values() for row in value, key=lambda row: row[0])
with open('out_data.csv', 'w') as f_out:
csv.writer(f_out).writerows(rows)
I need to filter and do some math on data coming from CSV files.
I've wrote a simple Pyhton script to isolate the rows I need to get (they should contain certain keywords like "Kite"), but my script does not work and I can't find why. Can you tell me what is wrong with it? Another thing: once I get to the chosen row/s, how can I point to each (comma separated) column?
Thanks in advance.
R.
import csv
with open('sales-2013.csv', 'rb') as csvfile:
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite":
print ",".join(row)
You are reading the file in bytes. Change the open('filepathAndName.csv, 'r') command or convert your strings like "Kite".encode('UTF-8'). The second mistake could be that you are looking for a line with the word "Kite", but if "Kite" is a substring of that line it will not be found. In this case you have to use if "Kite" in row:.
with open('sales-2013.csv', 'rb') as csvfile: # <- change 'rb' to 'r'
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite": # <- this would be better: if "Kite" in row:
print ",".join(row)
Read this:
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
To find the rows than contain the word "Kite", then you should use
for row in sales: # here you iterate over every row (a *list* of cells)
if "Kite" in row:
# do stuff
Now that you know how to find the required rows, you can access the desired cells by indexing the rows. For example, if you want to select the second cell of a row, you simply do
cell = row[1] # remember, indexes start with 0