I'm trying to create an array in Python, so I can access the last cell in it without defining how many cells there are in it.
Example:
from csv import reader
a = []
i = -1
with open("ccc.csv","r") as f:
csv_reader = reader(f)
for row in csv_reader:
a[i] = row
i = i-1
Here I'm trying to take the first row in the CSV file and put it in the last cell on the array, in order to put it in reverse order on another file.
In this case, I don't know how many rows are in the CSV file, so I can not set the cells in the array as the number of the rows in the file
I tried to use f.append(row), but it inserts the values to the first cell of the array, and I want it to insert the values to the last cell of the array.
Read all the rows in the normal order, and then reverse the list:
from csv import reader
with open('ccc.csv') as f:
a = list(reader(f))
a.reverse()
First up, your current code is going to raise an index error on account of there being no elements, so a[-1] points to nothing at all.
The function you're looking for is list.insert which it inherits from the generic sequence types. list.insert takes two arguments, the index to insert a value in and the value to be inserted.
To rewrite your current code for this, you'd end up with something like
import dbf
from csv import reader
a = []
with open("ccc.csv", "r") as f:
csv_reader = reader(f)
for row in csv_reader:
a.insert(0, row)
This would reverse the contents of the csv file, which you can then write to a new file or use as you need
Related
I am new to Python and I attempt to read .csv files sequentially in a for or while loop (I have about 250 .csv). The aim is to read each .csv file and isolate only all the columns, whenever a specific column (let's call it "wanted_column") is non-empty (i.e. its non-empty rows). Then, save all the non-empty rows of the "wanted_column", as well as all its columns, in a new .csv file.
Hence, at the end, I want to have 250 .csv files with all columns for each row that has non-empty elements in the "wanted_column".
Hopefully this is clear. I would appreciate any ideas.
George
I wrote this code below just to give you an idea of how to do it. Beware that this code below does not check for any errors. Its behavior is undefined if one of your CSV files is empty, if it couldn't find the file, and if the column you defined is a none existence column in one of the file. There could be more. Thus, you would want to build a check code around it. Also, your CSV formatting could greatly be depended on python csv package.
So now to the code explanation. For the "paths" variable. You can give that a string, a tuple, or a list. If you give it a string, it will convert that to a tuple with one index. You can give that variable the file(s) that you want to work with.
For the "column" variable, that one should be a string. You need to build an error checking for that if needed.
For code routines, the function will read all the CSV files of the paths list. Each time it read a file, it will read the first line first and save the content to a variable(rowFields).
After that, it generates the dict header(fields) with key(column name) to value(position). That dict is used to search for the column position by using its name. For here, you could also go through each field and if the field matches the column name then you save that value as the column position. Then that position could later be used instead of keep on searching the dict for the position using the name. The later method described in this paragraph should be the fastest.
After that, it goes on and read each row of the CSV file until the end. Each time it read a row, it will check if the length of the string from the column that defined by the "column" variable is larger than zero. If that string length is larger than zero, then it will append that row to the variable contentRows.
After the function done reading the CSV file, it will write the contents of variable "rowFields" and "contentRows" to a CSV file that defined by the "outfile" variable. To make it easy for me, outfile simply equal to input file + ".new". You can just change that.
import csv
def getNoneEmpty( paths, column ):
if isinstance(paths, str):
paths = (paths, )
if not isinstance(paths, list) and not isinstance(paths, tuple):
raise("paths have to be a or string, list, or tuple")
quotechar='"'
delimiter=","
lineterminator="\n"
for f in paths:
outfile = f + ".new" # change this area to how you want to generate the new file
fields = {}
rowFields = None
contentRows = []
with open(f, newline='') as csvfile:
csvReader = csv.reader(csvfile, delimiter=delimiter, quotechar=quotechar, lineterminator=lineterminator)
rowFields = next(csvReader)
for i in range(0, len(rowFields)):
fields[rowFields[i]] = i
for row in csvReader:
if len(row[fields[column]]) != 0:
contentRows.append(row)
with open(outfile, 'w') as csvfile:
csvWriter = csv.writer(csvfile,delimiter=delimiter, quotechar=quotechar,quoting=csv.QUOTE_MINIMAL, lineterminator=lineterminator)
csvWriter.writerow(rowFields)
csvWriter.writerows(contentRows)
getNoneEmpty(["test.csv","test2.csv"], "1958")
test.csv content:
"Month","1958","1959","1960"
"JAN",115,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",11,420,472
"JUN",124,472,535
"JUL",158,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
test2.csv content:
"Month","1958","1959","1960"
"JAN",,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",,420,472
"JUN",,472,535
"JUL",,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
Hopefully it will work:
def main():
temp = []
with open(r'old_csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
for row in csv_reader:
for x in row:
temp.append(x)
with open(r'new_csv', mode='w') as new_file:
writer = csv.writer(new_file, delimiter=',', lineterminator='\n')
for col in temp:
list_ = col.split(',')
writer.writerow(list_)
I'm new to Python but I need help creating a script that will take in three different csv files, combine them together, remove duplicates from the first column as well as remove any rows that are blank, then change a revenue area to a number.
The three CSV files are setup the same.
The first column is a phone number and the second column is a revenue area (city).
The first column will need all duplicates & blank values removed.
The second column will have values like "Macon", "Marceline", "Brookfield", which will need to be changed to a specific value like:
Macon = 1
Marceline = 8
Brookfield = 4
And then if it doesn't match one of those values put a default value of 9.
Welcome to Stack Overflow!
Firstly, you'll want to be using the csv library for the "reader" and "writer" functions, so import the csv module.
Then, you'll want to open the new file to be written to, and use the csv.writer function on it.
After that, you'll want to define a set (I name it seen). This will be used to prevent duplicates from being written.
Write your headers (if you need them) to the new file using the writer.
Open your first old file, using csv module's "reader". Iterate through the rows using a for loop, and add the rows to the "seen" set. If a row has been seen, simply "continue" instead of writing to the file. Repeat this for the next two files.
To assign the values to the cities, you'll want to define a dictionary that holds the old names as the keys, and new values for the names as the values.
So, your code should look something like this:
import csv
myDict = {'Macon' : 1, 'Marceline' : 8, 'Brookfield' : 4}
seen = set()
newFile = open('newFile.csv', 'wb', newline='') #newline argument will prevent the writer from writing extra newlines, preventing empty rows.
writer = csv.writer(newFile)
writer.writerow(['Phone Number', 'City']) #This will write a header row for you.
#Open the first file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('firstFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the second file, read each row, skip if row is empty, skip duplicate rows, change value of "City", write to new file.
with open('secondFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Open the third file, read each row, skip empty rows, skip duplicate rows, change value of "City", write to new file.
with open('thirdFile.csv', 'rb') as inFile:
for row in csv.reader(inFile):
if any(row):
row[1] = myDict[row[1]]
if row in seen:
continue
seen.add(row)
writer.writerow(row)
#Close the output file
newFile.close()
I have not tested this myself, but it is very similar to two different programs that I wrote, and I have attempted to combine them into one. Let me know if this helps, or if there is something wrong with it!
-JCoder96
I am trying to read a csv file, and parse the data and return on row (start_date) only if the date is before September 6, 2010. Then print the corresponding values from row (words) in ascending order. I can accomplish the first half using the following:
import csv
with open('sample_data.csv', 'rb') as f:
read = csv.reader(f, delimiter =',')
for row in read:
if row[13] <= '1283774400':
print(row[13]+"\t \t"+row[16])
It returns the correct start_date range, and corresponding word column values, but they are not returning in ascending order which would display a message if done correctly.
I have tried to use the sort() and sorted() functions, after creating an empty list to populate then appending it to the rows, but I am just not sure where or how to incorporate that into the existing code, and have been terribly unsuccessful. Any help would be greatly appreciated.
Just read the list, filter the list according to the < date criteria and sort it according to the 13th row as integer
Note that the common mistake would be to filter as ASCII (which may appear to work), but integer conversion is relaly required to avoid sort problems.
import csv
with open('sample_data.csv', 'r') as f:
read = csv.reader(f, delimiter =',')
# csv has a title, we have to skip it (comment if no title)
title_row = next(read)
# read csv and filter out to keep only earlier rows
lines = filter(lambda row : int(row[13]) < 1283774400,read)
# sort the filtered list according to the 13th row, as numerical
slist = sorted(lines,key=lambda row : int(row[13]))
# print the result, including title line
for row in title_row+slist:
#print(row[13]+"\t \t"+row[16])
print(row)
I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column.
Also how would I go about adding the two values together? Thanks.
import csv
import numpy as np
f = open("Test.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(np.array(row))
f.close()
There is no need to use csv module.
This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas.
with open("Test.csv") as fo:
table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")]
print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little...
#!/usr/bin/python
import csv
#import numpy as np
csvaslist = []
f = open("TechCrunchcontinentalUSA.csv")
csv_f = csv.reader(f)
for row in csv_f:
# print(np.array(row))
csvaslist.append(row)
f.close()
# Now your data is in a dict. Everything past this point is just playing
# Add together a couple of arbitrary values...
print int(csvaslist[2][7]) + int(csvaslist[11][7])
# Add using a conditional...
print "\nNow let's see what Facebook has received..."
fbsum = 0
for sublist in csvaslist:
if sublist[0] == "facebook":
print sublist
fbsum += int(sublist[7])
print "Facebook has received", fbsum
I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together:
import csv
f = open("Test.csv")
csv_f = list(csv.reader(f))
#returns the value in the second row, second column of your file
print csv_f[1][1]
#returns sum of two specific values (in this example, value of second row, second column and value of first row, first column
sum = int(csv_f[1][1]) + int(csv_f[0][0])
print sum
I am an absolute programming novice trying to work with some csv files. Though what I am trying to do overall is more complex, I am currently stuck on this problem:
The csv files I have contain a fixed number of 'columns' and a variable number of rows. What I want to do is open each csv file in a directory, while in memory store the files values to a 2d list, and then pull one 'column' of data from that list. By doing this in a loop, I could append a list with one column of data from each csv file.
When I do this for a single file, it works:
csvFile = 'testdata.csv'
currentFile = csv.reader(open(csvFile), delimiter=';')
errorValues = []
for data in currentFile:
rows = [r for r in currentFile] #Store current csv file into a 2d list
errorColumn = [row[34] for row in rows] #Get position 34 of each row in 2D list
errorColumn = filter(None, errorColumn) #Filter out empty strings
errorValues.append(errorColumn) #Append one 'column' of data to overall list
When I try to loop it for all files in my directory, I get a 'list index out of range' error:
dirListing = os.listdir(os.getcwd())
errorValues = []
for dataFile in dirListing:
currentFile = csv.reader(open(dataFile), delimiter=';')
for data in currentFile:
rows = [r for r in currentFile] #Store current csv file into a 2d list
errorColumn = [row[34] for row in rows] #Get position 34 of each row in 2D list
errorColumn = filter(None, errorColumn) #Filter out empty strings
errorValues.append(errorColumn) #Append one 'column' of data to overall list
errorColumn = [] #Clear out errorColumn for next iteration
The error occurs at 'errorColumn = [row[34] for row in rows]'. I have tried all sorts of ways to do this, always failing to an index out of range error. The fault is not with my csv files as I have used the working script to test them one by one. What could be the problem?
Many thanks for any help.
I'm a bit surprised that the error you mention is at the [r for r in currentFile]. At worst, your rows list would be empty...
Are you 100% sure all your lines have at least 35 columns ? That you don't have an empty line somewhere ? At the very end ? It'd be worth checking whether
errorColumn = [row[34] for row in rows if row]
still gives an error. Provided that you got rid of the for data in currentFile line first (that you don't use and more important consumes your currentFile, leaving you with rows==[])
The for loop goes through the lines of the CSV file. Each line is converted to the row of element by the reader. This way, the data in the loop is already the row. The next construct also iterates through the open file. This is wrong.
There is a problem with your open(). The file must be opened in binary mode (in Python 2).
Try the following (I did not put everything you wanted inside):
dirListing = os.listdir(os.getcwd())
errorValues = []
rows = [] # empty array of rows initially
for fname in dirListing:
f = open(fname, 'rb') # open in binary mode (see the doc)
reader = csv.reader(f, delimiter=';')
errorColumn = [] # initialized for the file
for row in reader:
rows.append(row) #Store current csv file into a 2d list
if len(row) > 34:
errorColumn.append(row[34]) #Get position 34 of each row in 2D list
errorValues.append(errorColumn)
f.close() # you should always close your files
Beware! The os.listdir() returns also the names of subdirectories. Try to add
if os.path.isfile(fname):
...
By the way, you should clearly describe what is your actual goal. There may be a better way to solve it. You may be mentally fixed to the solution that came first to your mind. Take advantage of this media to have more eyes and more headst to suggest the solution.