How to parse CSV files in Python? - python

I have downloaded the billing reports from AWS, which are in CSV format, onto my server.
Now, I have to parse those CSV files in Python so that it shows consolidated/individual cost information on day/week/monthly basis.
Can anyone please help me with this?
import csv
with open('588399947422-aws-billing-detailed-line-items-2015-09.c‌​sv') as csvfile:
readCSV = csv.reader(csvfile,delimiter=',')
for row in readCSV :
print row
CSV headers
"InvoiceID","PayerAccountId","LinkedAccountId","RecordType",‌​"RecordId","ProductN‌​ame","RateId","Subsc‌​riptionId","PricingP‌​lanId","UsageType","‌​Operation","Availabi‌​lityZone","ReservedI‌​nstance","ItemDescri‌​ption","UsageStartDa‌​te","UsageEndDate","‌​UsageQuantity","Blen‌​dedRate","BlendedCos‌​t","UnBlendedRate","‌​UnBlendedCost","Reso‌​urceId","user:Applic‌​ation Name","user:Business Unit"

Use built-in csv module.
From docs:
>>> import csv
>>> with open(path_to_your_file, 'rb') as csvfile:
... reader = csv.reader(csvfile, delimiter=',', quotechar='|')
... for row in reader: # iterate over reader line by line (line is a list of values in this case)
... print row # list of values
First, you have to open csv, the best option is to use with open(filename,'rb') as f:.
Then, instantiate reader - you have to specify delimiter (comma in most cases) and quotechar (quotes if there are some).
Then you can iterate over reader line by line.

Related

Writing the output to a different csv file column

I have some vocabulary and their counterparts to create an Anki deck. I need the program to write the output of my code in two columns of a csv file; first for the vocabulary and second for the meaning. I've tried two codes but neither of them worked. How can I solve this problem?
Notebook content(vocab):
obligatory,義務的
sole,単独,唯一
defined,一定
obey,従う
...
First try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.reader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w", newline="") as new_file:
csv_writer = csv.writer(new_file, delimiter=",")
for line in csv_reader:
csv_writer.writerow(line)
Second try:
with open("C:/Users/berka/Desktop/Vocab.txt") as csv_file:
csv_reader = csv.DictReader(csv_file)
with open("C:/Users/berka/Desktop/v.csv", "w",) as f:
field_names = ["Vocabulary", "Meaning"]
csv_writer = csv.DictWriter(f, fieldnames=field_names, extrasaction="ignore")
csv_writer.writeheader()
for line in csv_reader:
csv_writer.writerow(line)
Result of the first try:
https://cdn.discordapp.com/attachments/696432733882155138/746404430123106374/unknown.png
#Second try was not even close
Expected result:
https://cdn.discordapp.com/attachments/734460259560849542/746432094825087086/unknown.png
Like Kevin said, Excel uses ";" as delimiter and your csv code creates a csv file with comma(,) delimiter. That's why it's shown with commas in your Csv Reader. You can pass ";" as delimiter if you want Excel to read your file correctly. Or you can create a csv file with your own Csv Reader and read it with notepad if you want to see which delimiter it uses.
Your first try works, it's the app you're using for importing that is not recognizing the , as the delimiter. I'm not sure where you're importing this to, but at least in Google Sheets you can choose what the delimiter is, even after the fact.

Can't skip header row in csv file with python

I'm using the CSV module in python to read a CSV into memory and I need to skip the header row.
I am using the next command to skip the headers, but it isn't working.
import csv
with open(aws_env_list) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
next(csv_reader)
The headers are still being produced, crashing my script. My script produces the following line:
Working in AWS Account: companyAccountName,AWSAccountName,Description,LOB,AWSAccountNumber,CIDRBlock,ConnectedtoMontvale,PeninsulaorIsland,URL,Owner,EngagementCode,CloudOpsAccessType
In the original CSV the headers are only on the first line.
The first few lines of my csv file look like this.
What's wrong with the above and why is this not skipping the headers? Is there a better way?
I don't think you are using the next() function correctly.
Here's the an example from the documentation:
import csv
with open('eggs.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print(', '.join(row))
When you use csv.reader, it takes the csv file and for each row creates an iterable object. So, if you want to skip the first row (the header row) simply make this change.
with open(aws_env_list) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in list(csv_reader)[1:]:
""" do whatever action """
When you add [1:] to the end of csv_reader, it tells it to select only the 2nd object (since 0 is the first). You create in essence a subset of the object which does not contain the first element.

Python Read/Write csv file

Reading and writing data from/to csv file. When I run the program its formatted correctly in the console window, however, the formatting is off in the csv file I'm writing to (has a comma after each letter). What am I missing here?
import csv
with open("WJU stats.csv", 'r') as csv_file:
csv_reader = csv.reader(csv_file)
with open('wjudata.csv', 'w') as new_file:
csv_writer = csv.writer(new_file)
for row in csv_reader:
csv_writer.writerow(row[0])
print(row[0])
The function writerow takes an iterable, for example a list, so it writes each element of the iterable to the file in a comma separated row. The thing is strings are also iterables which elements are characters. If you want a single column csv you should use
csv_writer.writerow([row[0]])

Convert csv file to pipe delimited file in Python

I want to convert a comma-delimited CSV file to a pipe-delimited file with Python:
This is how I am reading my csv file:
with open('C://Path//InputFile.csv') as fOpen:
reader = csv.DictReader(fOpen)
for row in reader:
for (k, v) in row.items():
columns[k].append(v)
c = csv.writer(open("C://Path//OutputFile.txt","wb"), delimiter="|")
How would I write it as pipe delimited file?
Adapting martineau's answer to fix newline issues in Python 3.
import csv
with open('C:/Path/InputFile.csv') as fin:
# newline='' prevents extra newlines when using Python 3 on Windows
# https://stackoverflow.com/a/3348664/3357935
with open('C:/Path/OutputFile.txt', 'w', newline='') as fout:
reader = csv.DictReader(fin, delimiter=',')
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
This does what I think you want:
import csv
with open('C:/Path/InputFile.csv', 'rb') as fin, \
open('C:/Path/OutputFile.txt', 'wb') as fout:
reader = csv.DictReader(fin)
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
writer.writeheader()
writer.writerows(reader)
I found a quick way to change the comma delimiter to a pipe with pandas. When I converted my dataframe to a csv using "|" as delimiter:
df.to_csv(fileName, sep="|")
I don't have much experience with the csv module so if these solutions aren't interchangeable then someone might have to chime in. But this worked surprisingly well for me.
You can use pandas to achieve the conversion of csv to pipe-delimited (or desired delimited) file.
import pandas as pd
df = pd.read_csv(r'C:\Users\gupta\Documents\inputfile.csv') #read inputfile in a dataframe
df.to_csv(r'C:\Users\gupta\Desktop\outputfile.txt', sep = '|', index=False) #write dataframe df to the outputfile with pipe delimited
https://docs.python.org/2/library/csv.html for Python 2.x
https://docs.python.org/3.3/library/csv.html for Python 3.x
These pages explain how to use csv.writer.
Without testing it, your code looks syntacticly valid.
All you need to do is add some c.writerow('data','here') to write your data.

Inline CSV File Editing with Python

Can I modify a CSV file inline using Python's CSV library, or similar technique?
Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:
with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
writer = csv.writer(csvOutput, delimiter=',', quotechar='"')
with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[0] = row[0].title()
writer.writerow(row)
The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.
I've tried the follow, but this appends the new records to the end of the file instead of replacing them.
with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(csvFile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
No, you should not attempt to write to the file you are currently reading from. You can do it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.
The canonical method is to write to a new, temporary file and move that into place over the old file you read from.
from tempfile import NamedTemporaryFile
import shutil
import csv
filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile('w+t', newline='', delete=False)
with open(filename, 'r', newline='') as csvFile, tempfile:
reader = csv.reader(csvFile, delimiter=',', quotechar='"')
writer = csv.writer(tempfile, delimiter=',', quotechar='"')
for row in reader:
row[1] = row[1].title()
writer.writerow(row)
shutil.move(tempfile.name, filename)
I've made use of the tempfile and shutil libraries here to make the task easier.
There is no underlying system call for inserting data into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entire file from the point you made your edit down to the end.
As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.
If you just want to modify a csv file inline by using Python, you may just employ pandas:
import pandas as pd
df = pd.read_csv('yourfilename.csv')
# modify the "name" in row 1 as "Lebron James"
df.loc[1, 'name'] = "Lebron James"
# save the file using the same name
df.to_csv("yourfilename.csv")

Categories