CSV reading with Python - python

I have about 20 rows of data with 4 columns each.
How do I print only a certain row. Like for instance print only Row 15, Row 16 and Row 17.
When I try row[0] it only prints out the first column but not the entire row. I am confused here.
Right now I can read out each of the rows by doing:
for lines in reader:
print(lines)

If you have relatively small dataset you can read the entire thing and select the rows you want
reader = csv.reader(open("somefile.csv"))
table = list(reader) # reads entire file
for row in table[15:18]:
print(row)
You could also save a bit of time by only reading as much as you need
with open("somefile.csv") as f:
reader = csv.reader(f)
for _ in range(14):
next(reader) # dicarc
for _ in range(3):
print(next(reader))

Try iloc method
import pandas as pd
## suppose you want 15 row only
data=pd.read_csv("data.csv").iloc[14,:]

Related

Python Writer Skips first row

I'm new to python and don't know why I get this kind of error.
I have a csv file from which I read some data. I compare the data with another csv file and if I find similarities I want to copy some data from the second file. However here's the problem:
with open('WeselVorlageRE5Woche.csv') as woche:
with open('weselfund.csv','+a',newline='') as fund:
readCSV1 = csv.reader(woche, delimiter=';')
for row1 in readCSV1:
if row[1]==row1[4]: #find starting time
if row[3]==row1[1]: # find same train
if row[2]=='cancelled': # condition for taking row
zug=row1[6] #copy trainnumber
print(zug)
for row2 in readCSV1:
if row2[6]==zug: #find all trainnumbers
#write data to csv
writer = csv.writer(fund, delimiter=';')
writer.writerow(row2)
In my second for loop it appears as if the first row is skipped. Every time the for loop starts, the first row of data isn't written in the new file.
Dataset i read from
Dataset that is written
Can someone tell me why the first one is always missing?
If I add a dummy-row in the dataset I read from I get exactly what I want written, but I don't want to add all dummies.
A csv reader gets 'used up' if you iterate over it. This is why the second loop doesn't see the first row, because the first loop has already 'used' it. We can show this by making a simple reader over a list of terms:
>>> import csv
>>> test = ["foo", "bar", "baz"]
>>> reader = csv.reader(test)
>>> for row in reader:
... print(row)
...
['foo']
['bar']
['baz']
>>> for row in reader:
... print(row)
...
>>>
The second time it prints nothing because the iterator has already been exhausted. If your dataset is not too large you can solve this by storing the rows in a list, and thus in memory, instead:
data = [row for row in readCSV1]
If the document is too big you will need to make a second file reader and feed it to a second csv reader.
The final code becomes:
with open('WeselVorlageRE5Woche.csv') as woche:
with open('weselfund.csv','+a',newline='') as fund:
readCSV1 = [row for row in csv.reader(woche, delimiter=';')]
for row1 in readCSV1:
if row[1]==row1[4]: #find starting time
if row[3]==row1[1]: # find same train
if row[2]=='cancelled': # condition for taking row
zug=row1[6] #copy trainnumber
print(zug)
for row2 in readCSV1:
if row2[6]==zug: #find all trainnumbers
#write data to csv
writer = csv.writer(fund, delimiter=';')
writer.writerow(row2)
with the solution to store it in memory. If you want to use a second reader instead, it becomes
with open('WeselVorlageRE5Woche.csv') as woche:
with open('weselfund.csv','+a',newline='') as fund:
readCSV1 = [row for row in csv.reader(woche, delimiter=';')]
for row1 in readCSV1:
if row[1]==row1[4]: #find starting time
if row[3]==row1[1]: # find same train
if row[2]=='cancelled': # condition for taking row
zug=row1[6] #copy trainnumber
print(zug)
with open('WeselVorlageRE5Woche.csv') as woche2:
readCSV2 = csv.reader(woche2, delimiter=';')
for row2 in readCSV2:
if row2[6]==zug: #find all trainnumbers
#write data to csv
writer = csv.writer(fund, delimiter=';')
writer.writerow(row2)

How to create a matrix from a csv file?

I have been trying for two hours to create a table of values from a matrix and so far i have been able to create a column from the csv file. I know this is going to be easy for everybody, but when reading from a csv file, i cant seem to phrase it right so would people please put me in the right direction?
import csv
file = open('data.csv', 'rU')
reader = csv.reader(file)
for row in reader:
print row[0]
so far I can only print out the first column, any advice guys?
You can do it with a list comprehension:
import csv
with open('data.csv', 'rU') as file:
table = [row for row in csv.reader(file)]
print(table)
This will create a list of lists where each sublist is a row of the csv file.
In your code row is a list of all the columns, So:
row[0] is the first column, row[1] is the second etc.
you can write:
print(str(row))
to print all the columns, or iterate over the with:
for column in row:
print(column)

Add one identical column to next column respectively

I am trying to add one duplicated column next to the existing column in my csv file. For example, a dataset looks like this.
A,B,C,D
D,E,F,G
Then to add one duplicated column.
A,A,B,B,C,C,D,D
D,D,E,E,F,F,G,G
Below is code I have tried but apparently it does not work.
import csv
with open('in.csv','r') as csvin:
with open('out.csv', 'wb') as csvout:
writer = csv.writer(csvout, lineterminator=',')
reader = csv.reader(csvin, lineterminator=',')
goal = []
for line in reader:
for i in range(1,len(line)+1,2):
line.append(line[i])
goal.append(line)
writer.writerows(goal)
Any hints please?
Well you can do it succinctly as follows
from itertools import repeat
# open the file, create a reader
for row in reader:
row_ = [i for item in row for i in itertools.repeat(item,2)]
# now do whatever you want to do with row_
I think that
for i in range(0,len(line)):
goal.append(i);
goal.append(i);
not best implentation, but it should work

CSV read specific row

I have a CSV file with 100 rows.
How do I read specific rows?
I want to read say the 9th line or the 23rd line etc?
You could use a list comprehension to filter the file like so:
with open('file.csv') as fd:
reader=csv.reader(fd)
interestingrows=[row for idx, row in enumerate(reader) if idx in (28,62)]
# now interestingrows contains the 28th and the 62th row after the header
Use list to grab all the rows at once as a list. Then access your target rows by their index/offset in the list. For example:
#!/usr/bin/env python
import csv
with open('source.csv') as csv_file:
csv_reader = csv.reader(csv_file)
rows = list(csv_reader)
print(rows[8])
print(rows[22])
You simply skip the necessary number of rows:
with open("test.csv", "rb") as infile:
r = csv.reader(infile)
for i in range(8): # count from 0 to 7
next(r) # and discard the rows
row = next(r) # "row" contains row number 9 now
You could read all of them and then use normal lists to find them.
with open('bigfile.csv','rb') as longishfile:
reader=csv.reader(longishfile)
rows=[r for r in reader]
print row[9]
print row[88]
If you have a massive file, this can kill your memory but if the file's got less than 10,000 lines you shouldn't run into any big slowdowns.
You can do something like this :
with open('raw_data.csv') as csvfile:
readCSV = list(csv.reader(csvfile, delimiter=','))
row_you_want = readCSV[index_of_row_you_want]
May be this could help you , using pandas you can easily do it with loc
'''
Reading 3rd record using pandas -> (loc)
Note : Index start from 0
If want to read second record then 3-1 -> 2
loc[2]` -> read second row and `:` -> entire row details
'''
import pandas as pd
df = pd.read_csv('employee_details.csv')
df.loc[[2],:]
Output :

Copying one column of a CSV file and adding it to another file using python

I have two files, the first one is called book1.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
The second file is called book2.csv, and looks like this:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,4
My goal is to copy the column that contains the 5's in book1.csv to the corresponding column in book2.csv.
The problem with my code seems to be that it is not appending right nor is it selecting just the index that I want to copy.It also gives an error that I have selected an incorrect index position. The output is as follows:
header1,header2,header3,header4,header5
1,2,3,4
1,2,3,4
1,2,3,41,2,3,4,5
Here is my code:
import csv
with open('C:/Users/SAM/Desktop/book2.csv','a') as csvout:
write=csv.writer(csvout, delimiter=',')
with open('C:/Users/SAM/Desktop/book1.csv','rb') as csvfile1:
read=csv.reader(csvfile1, delimiter=',')
header=next(read)
for row in read:
row[5]=write.writerow(row)
What should I do to get this to append properly?
Thanks for any help!
What about something like this. I read in both books, append the last element of book1 to the book2 row for every row in book2, which I store in a list. Then I write the contents of that list to a new .csv file.
with open('book1.csv', 'r') as book1:
with open('book2.csv', 'r') as book2:
reader1 = csv.reader(book1, delimiter=',')
reader2 = csv.reader(book2, delimiter=',')
both = []
fields = reader1.next() # read header row
reader2.next() # read and ignore header row
for row1, row2 in zip(reader1, reader2):
row2.append(row1[-1])
both.append(row2)
with open('output.csv', 'w') as output:
writer = csv.writer(output, delimiter=',')
writer.writerow(fields) # write a header row
writer.writerows(both)
Although some of the code above will work it is not really scalable and a vectorised approach is needed. Getting to work with numpy or pandas will make some of these tasks easier so it is great to learn a bit of it.
You can download pandas from the Pandas Website
# Load Pandas
from pandas import DataFrame
# Load each file into a pandas dataframe, this is based on a numpy array
data1 = DataFrame.from_csv('csv1.csv',sep=',',parse_dates=False)
data2 = DataFrame.from_csv('csv2.csv',sep=',',parse_dates=False)
#Now add 'header5' from data1 to data2
data2['header5'] = data1['header5']
#Save it back to csv
data2.to_csv('output.csv')
Regarding the "error that I have selected an incorrect index position," I suspect this is because you're using row[5] in your code. Indexing in Python starts from 0, so if you have A = [1, 2, 3, 4, 5] then to get the 5 you would do print(A[4]).
Assuming the two files have the same number of rows and the rows are in the same order, I think you want to do something like this:
import csv
# Open the two input files, which I've renamed to be more descriptive,
# and also an output file that we'll be creating
with open("four_col.csv", mode='r') as four_col, \
open("five_col.csv", mode='r') as five_col, \
open("five_output.csv", mode='w', newline='') as outfile:
four_reader = csv.reader(four_col)
five_reader = csv.reader(five_col)
five_writer = csv.writer(outfile)
_ = next(four_reader) # Ignore headers for the 4-column file
headers = next(five_reader)
five_writer.writerow(headers)
for four_row, five_row in zip(four_reader, five_reader):
last_col = five_row[-1] # # Or use five_row[4]
four_row.append(last_col)
five_writer.writerow(four_row)
Why not reading the files line by line and use the -1 index to find the last item?
endings=[]
with open('book1.csv') as book1:
for line in book1:
# if not header line:
endings.append(line.split(',')[-1])
linecounter=0
with open('book2.csv') as book2:
for line in book2:
# if not header line:
print line+','+str(endings[linecounter]) # or write to file
linecounter+=1
You should also catch errors if row numbers don't match.

Categories