Python: Effective reading from a file using csv module

Python: Effective reading from a file using csv module - python

I have just started learning csv module recently. Suppose we have this CSV file:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
And we want to read this file and create a dictionary containing names (as keys) and totals of each column (as values). So in this case we would end up with:
d = {"John" : 284, "Jeff" : 275, "Judy" : 258}
So I wrote this code which apparently works well, but I am not satisfied with it and was wondering if anyone knows of better or more efficient/elegant way of doing this. Because there's just too many lines in there :D (Or maybe a way we could generalize it a bit - i.e. we would not know how many fields are there.)
d = {}
import csv
with open("file.csv") as f:
readObject = csv.reader(f)
totals0 = 0
totals1 = 0
totals2 = 0
totals3 = 0
currentRowTotal = 0
for row in readObject:
currentRowTotal += 1
if currentRowTotal == 1:
continue
totals0 += int(row[0])
totals1 += int(row[1])
totals2 += int(row[2])
if row[3] == "":
totals3 += 0
f.close()
with open(filename) as f:
readObject = csv.reader(f)
currentRow = 0
for row in readObject:
while currentRow <= 0:
d.update({row[0] : totals0})
d.update({row[1] : totals1})
d.update({row[2] : totals2})
d.update({row[3] : totals3})
currentRow += 1
return(d)
f.close()
Thanks very much for any answer :)

Not sure if you can use pandas, but you can get your dict as follows:
import pandas as pd
df = pd.read_csv('data.csv')
print(dict(df.sum()))
Gives:
{'Jeff': 275, 'Judy': 258, 'John': 284}

Use the top row to figure out what the column headings are. Initialize a dictionary of totals based on the headings.
import csv
with open("file.csv") as f:
reader = csv.reader(f)
titles = next(reader)
while titles[-1] == '':
titles.pop()
num_titles = len(titles)
totals = { title: 0 for title in titles }
for row in reader:
for i in range(num_titles):
totals[titles[i]] += int(row[i])
print(totals)
Let me add that you don't have to close the file after the with block. The whole point of with is that it takes care of closing the file.
Also, let me mention that the data you posted appears to have four columns:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
That's why I did this:
while titles[-1] == '':
titles.pop()

It's a little dirty, but try this (operating without the empty last column):
#!/usr/bin/python
import csv
import numpy
with open("file.csv") as f:
reader = csv.reader(f)
headers = next(reader)
sums = reduce(numpy.add, [map(int,x) for x in reader], [0]*len(headers))
for name, total in zip(headers,sums):
print("{}'s total is {}".format(name,total))

Base on Michasel's solution, I would try with less code and less variables and no dependency on Numpy:
import csv
with open("so.csv") as f:
reader = csv.reader(f)
titles = next(reader)
sum_result = reduce(lambda x,y: [ int(a)+int(b) for a,b in zip(x,y)], list(reader))
print dict(zip(titles, sum_result))

Related

Python output every nth row without Pandas

I'm really new to Python and my task is to rewrite a CSV with Python. I managed to program a working script for my task already. Now I would like to get only every 10th row of the CSV as output.
Is there an easy way to do this?
I already tried to use Jason Reeks answer.
Now it works, thank you!
import csv
import sys
userInputFileName = sys.argv[1]
outPutFileSkipped = userInputFileName.split('.')[0] + '-Skipped.csv'
cnt = 0
first = True
with open(outPutFileSkipped, 'w', newline='') as outputCSV:
csv_reader_object_skipped = csv.reader((x.replace('\0', '') for x in open(userInputFileName)), delimiter=',')
csv_writer_object_skipped = csv.writer(outputCSV, delimiter=',')
for row, line in enumerate(csv_reader_object_skipped):
if row % 10 == 0:
print(line)
csv_writer_object_skipped.writerow(line)
print('Es wurden erfolgreich ' + str(cnt) + ' Zeilen formatiert!')

Here's a native way to do it without pandas:
import csv
with open('file.csv', 'r') as f:
reader = csv.reader(f)
for row, line in enumerate(reader):
# Depending on your reference point you may want to + 1 to row
# to get every 10th row.
if row % 10 == 0:
print(line)

There's an easy way with Pandas:
import pandas as pd
df = pd.DataFrame({"a": range(100), "b": range(100, 200)})
df.loc[::10]

Extract two columns sorted from CSV

I have a large csv file, containing multiple values, in the form
Date,Dslam_Name,Card,Port,Ani,DownStream,UpStream,Status
2020-01-03 07:10:01,aart-m1-m1,204,57,302xxxxxxxxx,0,0,down
I want to extract the Dslam_Name and Ani values, sort them by Dslam_name and write them to a new csv in two different columns.
So far my code is as follows:
import csv
import operator
with open('bad_voice_ports.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
sortedlist = sorted(readCSV, key=operator.itemgetter(1))
for row in sortedlist:
bad_port = row[1][:4],row[4][2::]
print(bad_port)
f = open("bad_voice_portsnew20200103SORTED.csv","a+")
f.write(row[1][:4] + " " + row[4][2::] + '\n')
f.close()
But my Dslam_Name and Ani values are kept in the same column.
As a next step I would like to count how many times the same value appears in the 1st column.

You are forcing them to be a single column. Joining the two into a single string means Python no longer regards them as separate.
But try this instead:
import csv
import operator
with open('bad_voice_ports.csv') as readfile, open('bad_voice_portsnew20200103SORTED.csv', 'w') as writefile:
readCSV = csv.reader(readfile)
writeCSV = csv.writer(writefile)
for row in sorted(readCSV, key=operator.itemgetter(1)):
bad_port = row[1][:4],row[4][2::]
print(bad_port)
writeCSV.writerow(bad_port)
If you want to include the number of times each key occurred, you can easily include that in the program, too. I would refactor slightly to separate the reading and the writing.
import csv
import operator
from collections import Counter
with open('bad_voice_ports.csv') as readfile:
readCSV = csv.reader(readfile)
rows = []
counts = Counter()
for row in readCSV:
rows.append([row[1][:4], row[4][2::]])
counts[row[1][:4]] += 1
with open('bad_voice_portsnew20200103SORTED.csv', 'w') as writefile:
writeCSV = csv.writer(writefile)
for row in sorted(rows):
print(row)
writeCSV.writerow([counts[row[0]]] + row)
I would recommend to remove the header line from the CSV file entirely; throwing away (or separating out and prepending back) the first line should be an easy change if you want to keep it.
(Also, hard-coding input and output file names is problematic; maybe have the program read them from sys.argv[1:] instead.)

So my suggestion is failry simple. As i stated in a previous comment there is good documentation on CSV read and write in python here: https://realpython.com/python-csv/
As per an example, to read from a csv the columns you need you can simply do this:
>>> file = open('some.csv', mode='r')
>>> csv_reader = csv.DictReader(file)
>>> for line in csv_reader:
... print(line["Dslam_Name"] + " " + line["Ani"])
...
This would return:
aart-m1-m1 302xxxxxxxxx
Now you can just as easilly create a variable and store the column values there and later write them to a file or just open up a new file wile reading lines and writing the column values in there. I hope this helps you.

After the help from #tripleee and #marxmacher my final code is
import csv
import operator
from collections import Counter
with open('bad_voice_ports.csv') as csv_file:
readCSV = csv.reader(csv_file, delimiter=',')
sortedlist = sorted(readCSV, key=operator.itemgetter(1))
line_count = 0
rows = []
counts = Counter()
for row in sortedlist:
Dslam = row[1][:4]
Ani = row[4][2:]
if line_count == 0:
print(row[1], row[4])
line_count += 1
else:
rows.append([row[1][:4], row[4][2::]])
counts[row[1][:4]] += 1
print(Dslam, Ani)
line_count += 1
for row in sorted(rows):
f = open("bad_voice_portsnew202001061917.xls","a+")
f.write(row[0] + '\t' + row[1] + '\t' + str(counts[row[0]]) + '\n')
f.close()
print('Total of Bad ports =', str(line_count-1))
As with this way the desired values/columns are extracted from the initial csv file and a new xls file is generated with the desired values stored in different columns and the total values per key are counted, along with the total of entries.
Thanks for all the help, please feel free for any improvement suggestions!

You can use sorted:
import csv
_h, *data = csv.reader(open('filename.csv'))
with open('new_csv.csv', 'w') as f:
write = csv.writer(f)
csv.writerows([_h, *sorted([(i[1], i[4]) for i in data], key=lambda x:x[0])])

Unsure how to write text into specific column in csv file.

Hey I'm working on this project where I take this text and translate it and store it back into the same CSV file. The next open column is at index 10 or Column K. I've been trying to write the data but I just can't get it.
Reading works fine. I tried to do all this into single while loop but I couldn't get it to work. Sorry for any formatting errors!
from googletrans import Translator
import csv
translater = Translator()
f = open("#ElNuevoDia.csv", "r+")
csv_f = csv.reader(f)
csv_wf = csv.writer(f)
tmp = {}
x = 0
for row in csv_f:
tmp[x] = translater.translate(row[4], dest="en")
#print(tmp[x].text)
#print("\n")
#print(tmp[x].text)
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in csv_wf:
csv_wf[10].writerow(tmp[x].text)
f.close()

You should update row in reader and then write it back (as you mentioned in the comment, writer is not iterable). Something like that (part of your code):
for row in csv_f:
row[10] = translater.translate(row[4], dest="en")
tmp[x] = row
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in tmp:
csv_wf.writerow(row)
f.close()
Edit 1:
For text variable you can do that:
row[10] = translater.translate(row[4], dest="en").text
and you can write it back in one step:
csv_wf.writerows(tmp)

Python: How can I sum integers in a CSV file, while only summing the integers of a certain variable?

I'm trying to program some data in a csvfile by using Python. I have a list of countries and results of the Eurovision Songcontest, and it looks like this:
Country,Points,Year
Belgium;181;2016
Netherlands;153;2016
Australia;511;2016
Belgium;217;2015
Australia;196;2015
Et cetera.
In summary, I want to sum the total of points that any country received throughout the years, so the output should look something like this:
'Belgium: 398','Netherlands: 153','Australia: 707' and so on.
This is what my code looks like:
import csv
with open('euro20042016.csv', 'r') as csvfile:
pointsallyears = []
countriesallyears = []
readFILE = csv.reader(csvfile, delimiter=';')
for row in readFILE:
countriesallyears.append(row[0])
pointsallyears.append(row[1])
csvfile.close()
results = []
for result in pointsallyears:
result = int(result)
results.append(result)
scorebord = zip(countriesallyears,results)
So I already made sure that the results / points are actual integers and I filtered out the third row (Year), but I have no idea how to proceed from here. Thanks a lot in advance!

Just put #Mikk's comment into an actual answer. Two lines except the import
import pandas as pd
df = pd.read_csv('euro20042016.csv', sep = ';')
print df.groupby('Country')['Points'].sum()
The only extra thing you need to do is to change the first line of your file to be delimited by ; instead of ,.

I slightly changed your code to use a dictionary and used country names as keys. In result dictionary d will have country names as key and value is the total points.
import csv
d = dict()
with open('euro20042016.csv', 'r') as csvfile:
readFILE = csv.reader(csvfile, delimiter=';')
print (readFILE)
c_list = []
for row in readFILE:
if row[0] in c_list:
d[row[0]] = d[row[0]] + int(row[1])
else:
c_list.append(row[0])
d[row[0]] = int(row[1])
csvfile.close()
print(d)

I decided to play around a bit with your code, and this is what I came up with. Here, row[0] contains the country names, and row[1] contains the values we need. We check if the country already exists in the dictionary we use to maintain the aggregates, and if it doesn't we create it.
import csv
with open('euro20042016.csv', 'r') as csvfile:
score_dict={}
readFILE = csv.reader(csvfile, delimiter=';')
for row in readFILE:
# Only rows with 3 elements have the data we need
if len(row) == 3:
if row[0] in score_dict:
score_dict[row[0]]+=int(row[1])
else:
score_dict[row[0]]=int(row[1])
csvfile.close()
print score_dict
What I get as output is this
{'Belgium': 398, 'Australia': 707, 'Netherlands': 153}
which I believe is what you were aiming for.
Let me know in the comments if you face a problem understanding anything.

I have solution of that. but make sure your euro20042016.csv file same as
Belgium;181;2016
Netherlands;153;2016
Australia;511;2016
Belgium;217;2015
Australia;196;2015
and this code get output in list. like
[('Belgium', 398), ('Australia', 707), ('Netherlands', 153)]
Code is here
try:
f = open('euro20042016.csv', 'r+')
s = f.read()
lst = list(map(lambda x: x.split(';'), s.split('\n')))
points, country = [], []
for line in lst:
points.append(int(line[1]))
country.append(line[0])
countrypoints = sorted(zip(country, points), key=lambda x: x[1])
country = list(set(country))
total = [0]*len(country)
for rec in countrypoints:
total[country.index(rec[0])] = total[country.index(
rec[0])] + rec[1]
f.close()
finalTotal = list(zip(country, total))
print finalTotal
except IOError as ex:
print ex
except Exception as ex:
print ex
I hope this will help you.

How can I get a specific field of a csv file?

I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...

import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............

#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]

import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.

There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]

Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"

Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe

import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Effective reading from a file using csv module - python

Not sure if you can use pandas, but you can get your dict as follows: import pandas as pd df = pd.read_csv('data.csv') print(dict(df.sum())) Gives: {'Jeff': 275, 'Judy': 258, 'John': 284}

Related

Python output every nth row without Pandas

Extract two columns sorted from CSV

Unsure how to write text into specific column in csv file.

Python: How can I sum integers in a CSV file, while only summing the integers of a certain variable?

How can I get a specific field of a csv file?

Categories

Resources