Writing in separate columns in csv python - python

from operator import itemgetter
COLS = 15,21,27
COLS1 = 16,22,28
filename = "result.csv"
getters = itemgetter(*(col-1 for col in COLS))
getters1 = itemgetter(*(col-1 for col in COLS1))
with open('result.csv', newline='') as csvfile:
for row in csv.reader(csvfile):
row = zip(getters(row))
for row1 in csv.reader(csvfile):
row1 = zip(getters1(row1))
print(row)
print(row1)
with open('results1.csv', "w", newline='') as f:
fieldnames = ['AAA','BBB']
writer = csv.writer(f,delimiter=",")
for row in row:
writer.writerow(row)
writer.writerow(row1)
I am getting a NameError: name 'row1' is not defined error. I want to write each of the COLS in a separate column in the results1 file. How would I go about this?

So, there are few things going on in the code that are potentially leading to errors.
First is the way csv.reader(csvfile) works in python. When reading the file with csv.reader it basically scans the next line in the file when called and returns it. The csv part in this case simply uses the .cvs format and returns the data in a list, rather than a simple string of text in the standard python file reader. This is fine for a lot of use cases, but the issue here we are running into, is that when you run:
for row in csv.reader(csvfile):
row = zip(getters(row))
the csv.reader(csvfile) gets called for every row in the entire file and the for loop only stops when it runs out of data in the "results.csv" file. Meaning, if you want to use the data from each row, you need to store it in some way before running out the file. I think that's what you are trying to achieve with row = zip(getters(row)) but the issue here is row is both being assigned to zip(getters(row)) and being used as the variable in the for loop. Since you are essentially calling csv.reader, returning to variable row, then reassigning row to being zip(getters(row)), you are just writing over the variable row every iteration of the for loop and the result is nothing gets stored.
In order to store your csv data, try this:
data = [[]]
for row in csv.reader(csvfile):
temp = zip(getters(row))
data.append(temp)
This will store your row in a list called data.
Then, the second error is the one you are asking about, which is row1 not being defined. This happened in your code because the for loop ran through every row in the csv file. When you then call csv.reader again in the second for loop it can't read anything because the first for loop already read through the entire csv file and it doesn't know to start over at the beginning of the file. Therefore, row1 never gets declared or assigned, and therefore when you call again it in writer.writerow(row1), row1 doesn't exist.
There a couple ways to fix this. You could close the file, reopen it again and start from the beginning of the file again. Or you could store it at the same time in the first for loop. So like this:
data = [[]]
data1 = [[]]
for row in csv.reader(csvfile):
temp = zip(getters(row))
data.append(temp)
temp2 = zip(getters1(row))
data2.append(temp2)
Now you will have 3 columns of data in both data and data1.
Now for writing to the "results1.csv" file. Here you used row as the for loop variable as well as the iterable to run through, which does not work. Also, you call writer.writerow(row) then writer.writerow(row1), which also doesn't work. Try this instead:
with open('results1.csv', "w", newline='') as f:
writer = csv.writer(f,delimiter=",")
for row in range(len(data)):
writer.writerow(data[row] + data1[row])
Now it also looks like you want to add headers for each column in fieldnames = ['AAA','BBB'] . Unfortunetly, csv.writer does not have an easy way to do this, instead csv.DictWriter and writer.writeheader() must be used first.
with open('results1.csv', "w", newline='') as f:
fieldnames = ['A','A','A','B','B','B']
writer = csv.DictWriter(f,delimiter=",", fieldnames=fieldnames)
writer.writeheader()
writer = csv.writer(f,delimiter=",")
for row in range(len(data)):
writer.writerow(data[row] + data1[row])
Hope this helps!

Related

Updating a specific csv column based on randomname

My code pulls a random name from a csv file. When a button is pressed i want my code to search through the csv file, and update the cell next to the name generated previously in the code.
The variable in which the name is stored in is called name
The index which pulls the random name from the csv file is stored in the variable y
The function looks like this. I have asked this question previously however have had no luck in receiving answers, so i have made edits to the function and hopefully made it more clear.
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
writer = csv.writer(namelist_file)
rownum=0
array=[]
for row in reader:
if row == name:
writer.writerow([y], "hello")
Only the first two columns of the csv file are relevant
This is the function which pulls a random name from the csv file.
def NameGenerator():
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
rownum=0
array=[]
for row in reader:
if row[0] != '':
array.append(row[0])
rownum=rownum+1
length = len(array)-1
i = random.randint(1,length)
global name
name = array[i]
return name
There are a number of issues with your code:
You're trying to have both a reader object and a writer on the same file at the same time. Instead, you should read the file contents in, make any changes necessary and then write the whole file back out at the end.
You need to open the file in write mode in order to actually make changes to the contents. Currently, you don't specify what mode you're using so it defaults to read mode.
row is actually a list representing all data in the row. Therefore, it cannot be equal to the name you're searching, only the 0th index might be.
The following should work:
with open('StudentNames&Questions.csv', 'r') as infile:
reader = csv.reader(infile)
data = [row for row in reader]
for row in data:
if row[0] == name:
row[1] += 1
with open('StudentNames&Questions.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(data)

Add one identical column to next column respectively

I am trying to add one duplicated column next to the existing column in my csv file. For example, a dataset looks like this.
A,B,C,D
D,E,F,G
Then to add one duplicated column.
A,A,B,B,C,C,D,D
D,D,E,E,F,F,G,G
Below is code I have tried but apparently it does not work.
import csv
with open('in.csv','r') as csvin:
with open('out.csv', 'wb') as csvout:
writer = csv.writer(csvout, lineterminator=',')
reader = csv.reader(csvin, lineterminator=',')
goal = []
for line in reader:
for i in range(1,len(line)+1,2):
line.append(line[i])
goal.append(line)
writer.writerows(goal)
Any hints please?
Well you can do it succinctly as follows
from itertools import repeat
# open the file, create a reader
for row in reader:
row_ = [i for item in row for i in itertools.repeat(item,2)]
# now do whatever you want to do with row_
I think that
for i in range(0,len(line)):
goal.append(i);
goal.append(i);
not best implentation, but it should work

Trying to convert a CSV file to int in Python [duplicate]

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Writing a filtered CSV file to a new file and iterating through a folder

I have been trying initially to create a program to go through one file and select certain columns that will then be moved to a new text file. So far I have
import os, sys, csv
os.chdir("C://Users//nelsonj//Desktop//Master_Project")
with open('CHS_2009_test.txt', "rb") as sitefile:
reader = csv.reader(sitefile, delimiter=',')
pref_cols = [0,1,2,4,6,8,10,12,14,18,20,22,24,26,30,34,36,40]
for row in reader:
new_cols = list(row[i] for i in pref_cols)
print new_cols
I have been trying to use the csv functions to write the new file but I am continuosly getting errors. I will eventually need to do this over a folder of files, but thought I would try to do it on one before tackling that.
Code I attempted to use to write this data to a new file
for row in reader:
with open("CHS_2009_edit.txt", 'w') as file:
new_cols = list(row[i] for i in pref_cols)
newfile = csv.writer(file)
newfile.writerows(new_cols)
This kind of works in that I get a new file, but in only prints the second row of values from my csv, i.e., not the header values and places commas in between each individual character, not just copying over the original columns as they were.
I am using PythonWin with Python 2.6(from ArcGIS)
Thanks for the help!
NEW UPDATED CODE
import os, sys, csv
path = ('C://Users//nelsonj//Desktop//Master_Project')
for filename in os.listdir(path):
pref_cols = [0,1,2,4,6,8,10,12,14,18,20,22,24,26,30,34,36,40]
with open(filename, "rb") as sitefile:
with open(filename.rsplit('.',1)[0] + "_Master.txt", 'w') as output_file:
reader = csv.reader(sitefile, delimiter=',')
writer = csv.writer(output_file)
for row in reader:
new_row = list(row[i] for i in pref_cols)
writer.writerow(new_row)
print new_row
Getting list index out of range for the new_row, but it seems to still be processing the file. Only thing I can't get it to do now is loop through all files in my directory. Here's a hyperlink to Screenshot of data text file
Try this:
new_header = list(row[i] for i in pref_cols if i in row)
That should avoid the error, but it may not avoid the underlying problem. Would you paste your CSV file somewhere that I can access, and I'll fix this for you?
For your purpose of filtering, you don't have to treat the header differently from the rest of the data. You can go ahead remove the following block:
headers = reader.next()
for row in headers:
new_header = list(row[i] for i in pref_cols)
print new_header
Your code did not work because you treated headers as a list of rows, but headers is just one row.
Update
This update deals with writing the CSV data to a new file. You should move the open statement above the for row...
with open("CHS_2009_edit.txt", 'w') as output_file:
writer = csv.writer(output_file)
for row in reader:
new_cols = list(row[i] for i in pref_cols)
writer.writerows(new_cols)
Update 2
This update deals with the header output problem. If you followed my suggestions, you should not have this problem. I don't know what your current code looks like, but it looks like you supplies a string where the code expects a list. Here is the code that I tried on my system (using my made-up data) and it seems to work:
pref_cols = [...] # <<=== Should be set before entering the loop
with open('CHS_2009_test.txt', "rb") as sitefile:
with open('CHS_2009_edit.txt', 'w') as output_file:
reader = csv.reader(sitefile, delimiter=',')
writer = csv.writer(output_file)
for row in reader:
new_row = list(row[i] for i in pref_cols)
writer.writerow(new_row)
One thing to notice: I use writerow() to write a single row, where you use writerows() -- that makes a difference.

How to ignore the first line of data when processing CSV data?

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Categories