I'm new to python and don't know why I get this kind of error.
I have a csv file from which I read some data. I compare the data with another csv file and if I find similarities I want to copy some data from the second file. However here's the problem:
with open('WeselVorlageRE5Woche.csv') as woche:
with open('weselfund.csv','+a',newline='') as fund:
readCSV1 = csv.reader(woche, delimiter=';')
for row1 in readCSV1:
if row[1]==row1[4]: #find starting time
if row[3]==row1[1]: # find same train
if row[2]=='cancelled': # condition for taking row
zug=row1[6] #copy trainnumber
print(zug)
for row2 in readCSV1:
if row2[6]==zug: #find all trainnumbers
#write data to csv
writer = csv.writer(fund, delimiter=';')
writer.writerow(row2)
In my second for loop it appears as if the first row is skipped. Every time the for loop starts, the first row of data isn't written in the new file.
Dataset i read from
Dataset that is written
Can someone tell me why the first one is always missing?
If I add a dummy-row in the dataset I read from I get exactly what I want written, but I don't want to add all dummies.
A csv reader gets 'used up' if you iterate over it. This is why the second loop doesn't see the first row, because the first loop has already 'used' it. We can show this by making a simple reader over a list of terms:
>>> import csv
>>> test = ["foo", "bar", "baz"]
>>> reader = csv.reader(test)
>>> for row in reader:
... print(row)
...
['foo']
['bar']
['baz']
>>> for row in reader:
... print(row)
...
>>>
The second time it prints nothing because the iterator has already been exhausted. If your dataset is not too large you can solve this by storing the rows in a list, and thus in memory, instead:
data = [row for row in readCSV1]
If the document is too big you will need to make a second file reader and feed it to a second csv reader.
The final code becomes:
with open('WeselVorlageRE5Woche.csv') as woche:
with open('weselfund.csv','+a',newline='') as fund:
readCSV1 = [row for row in csv.reader(woche, delimiter=';')]
for row1 in readCSV1:
if row[1]==row1[4]: #find starting time
if row[3]==row1[1]: # find same train
if row[2]=='cancelled': # condition for taking row
zug=row1[6] #copy trainnumber
print(zug)
for row2 in readCSV1:
if row2[6]==zug: #find all trainnumbers
#write data to csv
writer = csv.writer(fund, delimiter=';')
writer.writerow(row2)
with the solution to store it in memory. If you want to use a second reader instead, it becomes
with open('WeselVorlageRE5Woche.csv') as woche:
with open('weselfund.csv','+a',newline='') as fund:
readCSV1 = [row for row in csv.reader(woche, delimiter=';')]
for row1 in readCSV1:
if row[1]==row1[4]: #find starting time
if row[3]==row1[1]: # find same train
if row[2]=='cancelled': # condition for taking row
zug=row1[6] #copy trainnumber
print(zug)
with open('WeselVorlageRE5Woche.csv') as woche2:
readCSV2 = csv.reader(woche2, delimiter=';')
for row2 in readCSV2:
if row2[6]==zug: #find all trainnumbers
#write data to csv
writer = csv.writer(fund, delimiter=';')
writer.writerow(row2)
Related
from operator import itemgetter
COLS = 15,21,27
COLS1 = 16,22,28
filename = "result.csv"
getters = itemgetter(*(col-1 for col in COLS))
getters1 = itemgetter(*(col-1 for col in COLS1))
with open('result.csv', newline='') as csvfile:
for row in csv.reader(csvfile):
row = zip(getters(row))
for row1 in csv.reader(csvfile):
row1 = zip(getters1(row1))
print(row)
print(row1)
with open('results1.csv', "w", newline='') as f:
fieldnames = ['AAA','BBB']
writer = csv.writer(f,delimiter=",")
for row in row:
writer.writerow(row)
writer.writerow(row1)
I am getting a NameError: name 'row1' is not defined error. I want to write each of the COLS in a separate column in the results1 file. How would I go about this?
So, there are few things going on in the code that are potentially leading to errors.
First is the way csv.reader(csvfile) works in python. When reading the file with csv.reader it basically scans the next line in the file when called and returns it. The csv part in this case simply uses the .cvs format and returns the data in a list, rather than a simple string of text in the standard python file reader. This is fine for a lot of use cases, but the issue here we are running into, is that when you run:
for row in csv.reader(csvfile):
row = zip(getters(row))
the csv.reader(csvfile) gets called for every row in the entire file and the for loop only stops when it runs out of data in the "results.csv" file. Meaning, if you want to use the data from each row, you need to store it in some way before running out the file. I think that's what you are trying to achieve with row = zip(getters(row)) but the issue here is row is both being assigned to zip(getters(row)) and being used as the variable in the for loop. Since you are essentially calling csv.reader, returning to variable row, then reassigning row to being zip(getters(row)), you are just writing over the variable row every iteration of the for loop and the result is nothing gets stored.
In order to store your csv data, try this:
data = [[]]
for row in csv.reader(csvfile):
temp = zip(getters(row))
data.append(temp)
This will store your row in a list called data.
Then, the second error is the one you are asking about, which is row1 not being defined. This happened in your code because the for loop ran through every row in the csv file. When you then call csv.reader again in the second for loop it can't read anything because the first for loop already read through the entire csv file and it doesn't know to start over at the beginning of the file. Therefore, row1 never gets declared or assigned, and therefore when you call again it in writer.writerow(row1), row1 doesn't exist.
There a couple ways to fix this. You could close the file, reopen it again and start from the beginning of the file again. Or you could store it at the same time in the first for loop. So like this:
data = [[]]
data1 = [[]]
for row in csv.reader(csvfile):
temp = zip(getters(row))
data.append(temp)
temp2 = zip(getters1(row))
data2.append(temp2)
Now you will have 3 columns of data in both data and data1.
Now for writing to the "results1.csv" file. Here you used row as the for loop variable as well as the iterable to run through, which does not work. Also, you call writer.writerow(row) then writer.writerow(row1), which also doesn't work. Try this instead:
with open('results1.csv', "w", newline='') as f:
writer = csv.writer(f,delimiter=",")
for row in range(len(data)):
writer.writerow(data[row] + data1[row])
Now it also looks like you want to add headers for each column in fieldnames = ['AAA','BBB'] . Unfortunetly, csv.writer does not have an easy way to do this, instead csv.DictWriter and writer.writeheader() must be used first.
with open('results1.csv', "w", newline='') as f:
fieldnames = ['A','A','A','B','B','B']
writer = csv.DictWriter(f,delimiter=",", fieldnames=fieldnames)
writer.writeheader()
writer = csv.writer(f,delimiter=",")
for row in range(len(data)):
writer.writerow(data[row] + data1[row])
Hope this helps!
SOLVED! Proper code in bottom half.
--
I included a link to a sample from a long CSV file, with any identifying data changed. I need every row that begins with "W", and then every row before it as well. The code I included writes every "W" row to a list. The final line, of course, doesn't work. I would like every previous row from a "W" row to be in its own list. Ultimately, I will combine them into an 8-column csv (using the zip function?), since each of these are 2-row associated data.
(To clarify - the associated rows in the whole table are sometimes in sets of 2, and sometimes in sets of 3. So I can't approach it by counting rows. I don't care about the 3rd row, when it exists. The key is the "W" rows)
What am I not figuring out? I've been searching all day and am not nailing this.
Sample from table
import csv
rows1 = [] #all 'W' rows
rows2 = [] #all rows before 'W' rows
with open ('Businesses.csv', 'r') as file1:
csvreader = csv.reader(file1)
for row in csvreader:
previousrow = row
if row[0].startswith('W'):
rows1.append(row)
rows2.append(previousrow)
#FIGURED IT OUT! With this -
import csv
rows1 = [] #all 'W' rows
rows2 = [] #all rows before 'W' rows
with open ('Businesses.csv', 'r') as file1:
csvreader = csv.reader(file1)
templist = []
for row in csvreader:
if not row[0].startswith('W'):
templist.append(row)
if row[0].startswith('W'):
rows1.append(row)
rows2.append(templist[-1])
On the last line, row - 1 is taking an array and subtracting 1 from it, which isn't meaningful.
What will work is if you store the current row temporarily (as previousrow for example), then when you check the next current row, if it matches, then save both the current row and the previous row to your accumulator arrays.
I have about 20 rows of data with 4 columns each.
How do I print only a certain row. Like for instance print only Row 15, Row 16 and Row 17.
When I try row[0] it only prints out the first column but not the entire row. I am confused here.
Right now I can read out each of the rows by doing:
for lines in reader:
print(lines)
If you have relatively small dataset you can read the entire thing and select the rows you want
reader = csv.reader(open("somefile.csv"))
table = list(reader) # reads entire file
for row in table[15:18]:
print(row)
You could also save a bit of time by only reading as much as you need
with open("somefile.csv") as f:
reader = csv.reader(f)
for _ in range(14):
next(reader) # dicarc
for _ in range(3):
print(next(reader))
Try iloc method
import pandas as pd
## suppose you want 15 row only
data=pd.read_csv("data.csv").iloc[14,:]
I have a CSV file with 100 rows.
How do I read specific rows?
I want to read say the 9th line or the 23rd line etc?
You could use a list comprehension to filter the file like so:
with open('file.csv') as fd:
reader=csv.reader(fd)
interestingrows=[row for idx, row in enumerate(reader) if idx in (28,62)]
# now interestingrows contains the 28th and the 62th row after the header
Use list to grab all the rows at once as a list. Then access your target rows by their index/offset in the list. For example:
#!/usr/bin/env python
import csv
with open('source.csv') as csv_file:
csv_reader = csv.reader(csv_file)
rows = list(csv_reader)
print(rows[8])
print(rows[22])
You simply skip the necessary number of rows:
with open("test.csv", "rb") as infile:
r = csv.reader(infile)
for i in range(8): # count from 0 to 7
next(r) # and discard the rows
row = next(r) # "row" contains row number 9 now
You could read all of them and then use normal lists to find them.
with open('bigfile.csv','rb') as longishfile:
reader=csv.reader(longishfile)
rows=[r for r in reader]
print row[9]
print row[88]
If you have a massive file, this can kill your memory but if the file's got less than 10,000 lines you shouldn't run into any big slowdowns.
You can do something like this :
with open('raw_data.csv') as csvfile:
readCSV = list(csv.reader(csvfile, delimiter=','))
row_you_want = readCSV[index_of_row_you_want]
May be this could help you , using pandas you can easily do it with loc
'''
Reading 3rd record using pandas -> (loc)
Note : Index start from 0
If want to read second record then 3-1 -> 2
loc[2]` -> read second row and `:` -> entire row details
'''
import pandas as pd
df = pd.read_csv('employee_details.csv')
df.loc[[2],:]
Output :
I have two files, the first one is called book1.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
The second file is called book2.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,4
My goal is to copy the column that contains the 5's in book1.csv to the corresponding column in book2.csv.
The problem with my code seems to be that it is not appending right nor is it selecting just the index that I want to copy.It also gives an error that I have selected an incorrect index position. The output is as follows:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,41,2,3,4,5
Here is my code:
import csv
with open('C:/Users/SAM/Desktop/book2.csv','a') as csvout:
write=csv.writer(csvout, delimiter=',')
with open('C:/Users/SAM/Desktop/book1.csv','rb') as csvfile1:
read=csv.reader(csvfile1, delimiter=',')
header=next(read)
for row in read:
row[5]=write.writerow(row)
What should I do to get this to append properly?
Thanks for any help!
What about something like this. I read in both books, append the last element of book1 to the book2 row for every row in book2, which I store in a list. Then I write the contents of that list to a new .csv file.
with open('book1.csv', 'r') as book1:
with open('book2.csv', 'r') as book2:
reader1 = csv.reader(book1, delimiter=',')
reader2 = csv.reader(book2, delimiter=',')
both = []
fields = reader1.next() # read header row
reader2.next() # read and ignore header row
for row1, row2 in zip(reader1, reader2):
row2.append(row1[-1])
both.append(row2)
with open('output.csv', 'w') as output:
writer = csv.writer(output, delimiter=',')
writer.writerow(fields) # write a header row
writer.writerows(both)
Although some of the code above will work it is not really scalable and a vectorised approach is needed. Getting to work with numpy or pandas will make some of these tasks easier so it is great to learn a bit of it.
You can download pandas from the Pandas Website
# Load Pandas
from pandas import DataFrame
# Load each file into a pandas dataframe, this is based on a numpy array
data1 = DataFrame.from_csv('csv1.csv',sep=',',parse_dates=False)
data2 = DataFrame.from_csv('csv2.csv',sep=',',parse_dates=False)
#Now add 'header5' from data1 to data2
data2['header5'] = data1['header5']
#Save it back to csv
data2.to_csv('output.csv')
Regarding the "error that I have selected an incorrect index position," I suspect this is because you're using row[5] in your code. Indexing in Python starts from 0, so if you have A = [1, 2, 3, 4, 5] then to get the 5 you would do print(A[4]).
Assuming the two files have the same number of rows and the rows are in the same order, I think you want to do something like this:
import csv
# Open the two input files, which I've renamed to be more descriptive,
# and also an output file that we'll be creating
with open("four_col.csv", mode='r') as four_col, \
open("five_col.csv", mode='r') as five_col, \
open("five_output.csv", mode='w', newline='') as outfile:
four_reader = csv.reader(four_col)
five_reader = csv.reader(five_col)
five_writer = csv.writer(outfile)
_ = next(four_reader) # Ignore headers for the 4-column file
headers = next(five_reader)
five_writer.writerow(headers)
for four_row, five_row in zip(four_reader, five_reader):
last_col = five_row[-1] # # Or use five_row[4]
four_row.append(last_col)
five_writer.writerow(four_row)
Why not reading the files line by line and use the -1 index to find the last item?
endings=[]
with open('book1.csv') as book1:
for line in book1:
# if not header line:
endings.append(line.split(',')[-1])
linecounter=0
with open('book2.csv') as book2:
for line in book2:
# if not header line:
print line+','+str(endings[linecounter]) # or write to file
linecounter+=1
You should also catch errors if row numbers don't match.