I'm new to programming and trying to supplement my learning by doing some online tutorials. Today, I started looking at working with CSV files using a tutorial that seemed easy enough to follow, but I've ran into what amounts to an immaterial problem, but it's frustrating me that I can't figure it out, haha. I've spent around two hours Googling and testing things, but I'm just not savvy enough to know what to try next. Help, please! haha.
Here's the code in question:
# importing the csv module
import csv
# csv filename
filename = r'C:\Users\XXX\Documents\AAPL.csv'
# initialize the titles and row list
fields = []
rows = []
# read the csv file
with open(filename, 'r') as csvfile:
# create the csv reader object
csvreader = csv.reader(csvfile)
# extract field names through the first row
fields = next(csvreader)
# extract each data row one by one
for row in csvreader:
rows.append(row)
# get total number of rows
print("total no. of rows: %d"%(csvreader.line_num))
# print the field names
print("Field names are: " + ", ".join(field for field in fields))
# print the first 5 rows of data
print("\nFirst 5 rows are:\n")
for row in rows[:5]:
#parse each column of a row
for col in row:
print("%10s"%col),
print("\n")
The tutorial was actually written for Python 2.X, so the I found the updated formatting for 3.6 and changed that last statement to be:
for col in row:
print('{:>10}'.format(col))
print("\n")
Either way it's written, the results come out in this format:
First 5 rows are:
2013-09-18
66.168571
66.621429
65.808571
66.382858
60.492519
114215500
...
instead of the expected columnar format shown on the tutorial.
I thought I finally found the solution when I read somewhere that you needed the formatting for each item, so I tried:
for col in row:
print('{:>10} {:>10} {:>10} {:>10} {:>10} {:>10} {:>10}'.format(*col))
print("\n")
so that the formatting was there for each column, however that seems to create a column for each letter in the field, e.g:
2 0 1 3 - 0 9
The CSV is just a file of AAPLs stock prices--here's the first 9 rows of data if you want to create a CSV for testing:
Date,Open,High,Low,Close,Adj Close,Volume
2013-09-18,66.168571,66.621429,65.808571,66.382858,60.492519,114215500
2013-09-19,67.242859,67.975716,67.035713,67.471428,61.484497,101135300
2013-09-20,68.285713,68.364288,66.571426,66.772858,60.847912,174825700
2013-09-23,70.871429,70.987144,68.942856,70.091431,63.872025,190526700
2013-09-24,70.697144,70.781425,69.688568,69.871429,63.671543,91086100
2013-09-25,69.885712,69.948570,68.775711,68.790001,62.686062,79239300
2013-09-26,69.428574,69.794289,69.128571,69.459999,63.296616,59305400
2013-09-27,69.111427,69.238571,68.674286,68.964287,62.844891,57010100
2013-09-30,68.178574,68.808571,67.772858,68.107140,62.063782,65039100
# importing csv module
import csv
# csv file name
filename = r'C:\Users\XXX\Documents\AAPL.csv'
# initialize the titles and row list
fields = []
rows = []
# read the csv file
with open(filename, 'r') as csvfile:
# create the csv reader object
csvreader = csv.reader(csvfile)
# extract field names through the first row
fields = next(csvreader)
# extract each data row one by one
for row in csvreader:
rows.append(row)
# get total number of rows
print("total no. of rows: %d"%(csvreader.line_num))
# print the field names
print("Field names are: " + ", ".join(field for field in fields))
# print the first 5 rows of data
print("\nFirst 5 rows are:\n")
for row in rows[:5]:
#parse each column of a row
for col in row:
print("%10s"%col,end=',')
print("\n")
You need to replace
print("%10s"%col), with print("%10s"%col,end=',')
Krishnaa208's answer didn't quite give me the right format. print("%10s"%col,end=',') gave a table that included the comma and each field was surrounded by quotes. But it did point me in the right direction, which was:
# print the first 5 rows of data
print("\nFirst 5 rows are:\n")
for row in rows[:5]:
#parse each column of a row
for col in row:
print('{:>12}'.format(col), end = '')
print("\n")
and my results were:
First 5 rows are:
2013-09-18 66.168571 66.621429 65.808571 66.382858 60.492519 114215500
2013-09-19 67.242859 67.975716 67.035713 67.471428 61.484497 101135300
2013-09-20 68.285713 68.364288 66.571426 66.772858 60.847912 174825700
2013-09-23 70.871429 70.987144 68.942856 70.091431 63.872025 190526700
2013-09-24 70.697144 70.781425 69.688568 69.871429 63.671543 91086100
{:>10} was a little to close together since my CSV had the prices down to six decimal points.)
Thanks for the answer, though. I really did help!
Related
Hy guys my teacher has assing me to get the integer from a row string in one column. This all thing is going to be by read a csv file with the help from python.So my terminal dosen't hit but i dont get nothing as a guide problem, i want from every row to take the integer and print them.
Here is my code :
import pandas as pd
tx = [ "T4.csv" ]
for name_csv in tx :
df = pd.read_csv( name_csv, names=["A"])
for row in df:
if row == ('NSIT ,A: ,'):
# i dont know how to use the split for to take the integer and print them !!!!
print("A",row)
else
# i dont know how to use the split for to take the integer and print them !!!!
print("B",row)
Also here is and what it have the the csv file :(i have the just them all in the column A)
NSIT ,A: ,-213
NSIT ,A: ,-43652
NSIT ,B: ,-39
NSIT ,A: ,-2
NSIT ,B: ,-46
At the end i have put my try on python, i hope you guys to understand the problem i have.
df = pd.read_csv( "T4.csv", names=["c1", "c2", "c3"])
print(df.c3)
Read the file one line at a time. Split each line on comma. Print the last item in the resulting list.
with open('T4.csv') as data:
for line in data:
len(tokens := line.split(',')) == 3:
print(tokens[2])
Alternative:
with open('T4.csv') as data:
d = {}
for line in data:
if len(tokens := line.split(',')) == 3:
_, b, c = map(str.strip, tokens)
d.setdefault(b, []).append(c)
for k, v in d.items():
print(k, end='')
print(*v, sep=',', end='')
print(f' sum={sum(map(int, v))}')
Output:
A:-213,-43652,-2 sum=-43867
B:-39,-46 sum=-85
Your question was not very clear. So I assume you want to print out the 3rd column of the CSV file. I also think that you opened the CSV file in Excel, which is why you see that all the data is put in Column A.
A CSV (comma-separated values) file is a plain text file that contains data organised as a table of rows and columns, where each row represents a record, and each column represents a field or attribute of the form.
A newline character typically separates each row of data in a CSV file, and the values in each column are separated by a delimiter character, such as a comma (,). For example, here is a simple CSV file with three rows and three columns:
S.No, Student Name, Student Roll No.
1, Alpha, 123
2, Beta, 456
3, Gamma, 789
For a simple application like what you mention, Pandas might not be required. You can use the standard csvreader library of Python to do this.
Please find the code below to print out the 3rd column of your CSV file.
import csv
with open("T4.csv") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
headers = next(csv_reader) # Get the column headers
print(headers[2]) # Print the 3rd column header
for row in csv_reader:
print(row[2]) # Print the 3rd column data
I have a massive csv files with over 12 million rows and with 4 columns, the first column is just to put it in order from 0 to 12 million, the second one has the name of the region where this thing is, third one is a city (each city is a number) and 4th one has the number of visitors.
What I would like to do is plot the third and fourth column (one on the x and one on the y) but just for a certain region, I tried so many things to just read the part of the file that says 'Essex' but there is nothing that works, the second column Is called "region" the region i am interested in is 'Essex', any help? Thank you!
You should look into the standard library called "csv". Something like this to get you going:
import csv
with open("name of csv file") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
# Check for Essex
if row[1] == 'Essex':
# Do whatever
pass
The above example assumes there is no header line in your CSV file. If you do have a header, you can skip it like this:
with open("name of csv file") as csvfile:
# Read and skip a header line.
header = csvfile.readline()
reader = csv.reader(csvfile)
for row in reader:
# As above
or look into csv.DictReader().
I have about 20 rows of data with 4 columns each.
How do I print only a certain row. Like for instance print only Row 15, Row 16 and Row 17.
When I try row[0] it only prints out the first column but not the entire row. I am confused here.
Right now I can read out each of the rows by doing:
for lines in reader:
print(lines)
If you have relatively small dataset you can read the entire thing and select the rows you want
reader = csv.reader(open("somefile.csv"))
table = list(reader) # reads entire file
for row in table[15:18]:
print(row)
You could also save a bit of time by only reading as much as you need
with open("somefile.csv") as f:
reader = csv.reader(f)
for _ in range(14):
next(reader) # dicarc
for _ in range(3):
print(next(reader))
Try iloc method
import pandas as pd
## suppose you want 15 row only
data=pd.read_csv("data.csv").iloc[14,:]
So I'm trying to clean up a .csv that our badging system exports. One of the issues with this export is that it doesn't separate the badging info (badge ID, activation state, company, etc.) into separate columns.
Here's what I need to do:
Create new .csv with only some of the columns
Rename top row
Clean up the CREDENTIALS column so it only outputs the activated badge number
Problem: I already did steps 1 and 2, however I need help going through the CREDENTIALS [3] column, to find the "Active" keyword and delete everything except for the first set of numbers. However, some credentials will have multiple badges separated by a |.
For instance, here is how the original .csv will looks like:
COMMAND,PERSONID,PARTITION,CREDENTIALS,EMAIL,FIRSTNAME,LASTNAME
NoCommand,43,Master,{9065~9065~Company~Active~~~},personone#company.com,person,one
NoCommand,57,Master,{9482~9482~Company~Active~~~},persontwo#company.com,person,two
NoCommand,323,Master,{8045~8045~Company~Disabled~~~},personthree#company.com,person,three
NoCommand,84,Master,{8283~8283~Company~Disabled~~~|9861~9861~Company~Active~~~},personfour#company.com,person,four
NoCommand,46,Master,{9693~9693~Company~Lost~~~|9648~9648~Company~Active~~~},personfive#company.com,person,five
As you can see, the CREDENTIALS column [3] has a bunch of data included. It will also have multiple badge credentials separated by a |.
Here's what I have so far to complete steps 1 and 2:
import csv
# Empty data set that will eventually be written with the new sanitized data
data = []
# Keyword to search for
word = 'Active'
# Source .csv file that we will be working with
input_filename = '/path/to/original/csv'
# Output .csv file that we will create with the data from input_filename
output_filename = '/path/to/new/csv'
with open(input_filename, "rb") as the_file:
reader = csv.reader(the_file, delimiter=",")
next(reader, None)
# Test sanitizing column 3
for row in reader:
for col in row[3]:
if word in row[3]:
print col
new_row = [row[3], row[5], row[6], row[4]]
data.append(new_row)
with open(output_filename, "w+") as to_file:
writer = csv.writer(to_file, delimiter=",")
writer.writerow(['BadgeID', 'FirstName', 'LastName', 'EmployeeEmail'])
for new_row in data:
writer.writerow(new_row)
So far the new .csv is looking like this:
BadgeID,FirstName,LastName,EmployeeEmail
{9065~9065~Company~Active~~~},person,one,personone#company.com
{9482~9482~Company~Active~~~},person,two,persontwo#company.com
{8045~8045~Company~Disabled~~~},person,three,personthree#company.com
{8283~8283~Company~Disabled~~~|9861~9861~Company~Active~~~},person,four,personfour#company.com
{9693~9693~Company~Lost~~~|9648~9648~Company~Active~~~},person,five,personfive#company.com
I want it to look like this, with the "Active" credentials:
BadgeID,FirstName,LastName,EmployeeEmail
9066,person,one,personone#company.com
9482,person,two,persontwo#company.com
8045,person,three,personthree#company.com
8283,person,four,personfour#company.com
9693,person,five,personfive#company.com
However, for my column 3 testing code block, I'm trying to at least make sure I'm grabbing the correct data. The weird thing is that when I print that column it comes out looking weird:
# Test sanitizing column 3
for row in reader:
for col in row[3]:
if word in row[3]:
print col
It outputs something like this:
C
a
r
d
s
~
A
c
t
i
v
e
~
~
~
}
{
8
8
2
4
~
8
8
2
4
~
Anyone have any thoughts?
Going by your output, you're grabbing the correct data! The problem is: Column 3 is a string. You're treating it like a list from the outset, resulting in characters being pulled from words. Use string methods to get lists of words first.
Step by step with pseudo-code:
Strip those brackets
column3 = column3.strip("{}")
Since you might have multiple badges separated by "|", you should
badges_str = column3.split("|")
Now you have a list of strings, each representing a single badge.
badges = []
for badge in badges_str:
badges.append(badge.split("~"))
Now you have a list of individual badge listings that you can use indexes on.
for badge in badges:
# test for the Active badges, then do things
if badge[3] == "Active":
do_something(badge[0])
do_something_else(badge[1])
etc...
That doesn't give you actual code, but should get you to the next steps to get there.
I need to filter and do some math on data coming from CSV files.
I've wrote a simple Pyhton script to isolate the rows I need to get (they should contain certain keywords like "Kite"), but my script does not work and I can't find why. Can you tell me what is wrong with it? Another thing: once I get to the chosen row/s, how can I point to each (comma separated) column?
Thanks in advance.
R.
import csv
with open('sales-2013.csv', 'rb') as csvfile:
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite":
print ",".join(row)
You are reading the file in bytes. Change the open('filepathAndName.csv, 'r') command or convert your strings like "Kite".encode('UTF-8'). The second mistake could be that you are looking for a line with the word "Kite", but if "Kite" is a substring of that line it will not be found. In this case you have to use if "Kite" in row:.
with open('sales-2013.csv', 'rb') as csvfile: # <- change 'rb' to 'r'
sales = csv.reader(csvfile)
for row in sales:
if row == "Kite": # <- this would be better: if "Kite" in row:
print ",".join(row)
Read this:
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
To find the rows than contain the word "Kite", then you should use
for row in sales: # here you iterate over every row (a *list* of cells)
if "Kite" in row:
# do stuff
Now that you know how to find the required rows, you can access the desired cells by indexing the rows. For example, if you want to select the second cell of a row, you simply do
cell = row[1] # remember, indexes start with 0