Top 5 scores using csv - python

I am having difficulty with presenting the top 5 scores from a csv file.
This is my current code:
def leaderboards():
with open('userData.csv', 'r') as data:
reader = csv.reader(data)
for idx, row in enumerate(reader):
print('{} | {} | {}'.format(str(idx), row[0], row[2]))
This is what I currently get:
>>> leaderboards()
0 | Name | Score
1 | Ryan | 3
2 | Liam | 12
3 | Steve | 3
4 | Donald | 3
5 | Alan | 3
6 | Harry | 1
There are two main elements that I would like to implement but don't know how.
They are to sort the score so that the highest score is at the top, and to limit the number of players the leaderboards show to 5.
Current code:
with open('userData.csv', 'r') as data:
reader = csv.DictReader(data)
newList = sorted(reader, key=lambda row: row['Score'],
reverse=False)[0:5]
print (' | Name | Score')
for i, r in enumerate(newList):
print('{} | {} | {}'.format(str(i), r['Name'], r['Score']))
Current result:
| Name | Score
0 | Liam | 12
1 | Harry | 2
2 | Ryan | 3
3 | Steve | 3
4 | Donald | 3
It works for the most part, however, the 2 is higher up than the 3. I'm not really sure why but maybe you might realise something. Once again, thank you so much and I'm sorry that I'm just asking so much but this is out of my league in terms of coding knowledge. :D

If you're trying to sort on the last element of each row, the blank space is throwing an error. Also, I think you're script will also consider the header row for sorting, so if you're not looking to consider that, you should skip that row when sorting.
If you want to print out the score the way you want according to your file structure, you could do something like the following:
with open('...file name here...', 'r') as data:
reader = csv.reader(data)
for idx, row in enumerate(reader):
print '{} | {} | {}'.format(str(idx), row[0], row[2])
That will print things out line by line according to how you've indexed your csv. Since you have a header row, I would also suggest that you take a look at DictReader within the Python csv module. That way you won't have to rely on index position in the future.
Also, if you've having issues with extra lines in csv's, take a look at this answer.
Update for sorting based on the request in comments:
To sort your reader you can just use lambda based on the appropriate key in your csv columns. Assuming that Score is the column you wish to sort on and that Score is of index 2, you can do:
newList = sorted(reader, key=lambda row: row[2], reverse=True)
That will give you a sorted list in descending order by the 2nd index. Remove reverse=True for an ascending list order.
Then, if you want to take only the first five entries, you can just slice the list:
slicedList = newList[0:5]
Then, you can print out your list as usual.
Update 2 to accommodate second request:
with open('...name of your file...', 'r') as data:
reader = csv.reader(data)
newList = sorted(reader, key=lambda row: row[2], reverse=True)[0:5]
for i, r in enumerate(newList):
print('{} | {} | {}'.format(str(i), r[0], r[2]))
The above code assumes (from your original csv file) that r[0] is the "Name" column and r[2] is the "Score" column.
Also, as an alternative, you could use csv.DictReader just in case your column headers change index. The alternative code would be something like:
with open('...your file name here...', 'r') as data:
reader = csv.DictReader(data)
#### COMMENTING THIS OUT in favor of the newList below
# newList = sorted(reader, key=lambda row: row['Score'], reverse=True)[0:5]
##### EDIT to evaluate the Score as an int()
# cast row['Score'] to an int() below so that sorted() treats the row like and int rather than str
newList = sorted(reader, key=lambda row: int(row['Score']), reverse=True)[0:5]
print (' | Name | Score')
for i, r in enumerate(newList):
print('{} | {} | {}'.format(str(i), r['Name'], r['Score']))

Related

How to generate table using Python

I am quite struggling with as I tried many libraries to print table but no success - so I thought to post here and ask.
My data is in a text file (resource.txt) which looks like this (the exact same way it prints)
pipelined 8 8 0 17 0 0
nonpipelined 2 2 0 10 0 0
I want my data print in the following manner
Design name LUT Lut as m Lut as I FF DSP BRAM
-------------------------------------------------------------------
pipelined 8 8 0 17 0 0
Non piplined 2 2 0 10 0 0
Some time data may be more line column remain same but rows may increase.
(i have python 2.7 version)
I am using this part in my python code all code working but am couldn't able print data which i extracted to text file in tabular form. As I can't use panda library as it won't support for python 2.7, but I can use tabulate and all library. Can anyone please help me?
I tried using tabulate and all but I keep getting errors.
I tried at end simple method to print but its not working (same code works if I put at top of code but at the end of code this won't work). Does anyone have any idea?
q11=open( "resource.txt","r")
for line in q11:
print(line)
Here's a self contained function that makes a left-justified, technical paper styled table.
def makeTable(headerRow,columnizedData,columnSpacing=2):
"""Creates a technical paper style, left justified table
Author: Christopher Collett
Date: 6/1/2019"""
from numpy import array,max,vectorize
cols = array(columnizedData,dtype=str)
colSizes = [max(vectorize(len)(col)) for col in cols]
header = ''
rows = ['' for i in cols[0]]
for i in range(0,len(headerRow)):
if len(headerRow[i]) > colSizes[i]: colSizes[i]=len(headerRow[i])
headerRow[i]+=' '*(colSizes[i]-len(headerRow[i]))
header+=headerRow[i]
if not i == len(headerRow)-1: header+=' '*columnSpacing
for j in range(0,len(cols[i])):
if len(cols[i][j]) < colSizes[i]:
cols[i][j]+=' '*(colSizes[i]-len(cols[i][j])+columnSpacing)
rows[j]+=cols[i][j]
if not i == len(headerRow)-1: rows[j]+=' '*columnSpacing
line = '-'*len(header)
print(line)
print(header)
print(line)
for row in rows: print(row)
print(line)
And here's an example using this function.
>>> header = ['Name','Age']
>>> names = ['George','Alberta','Frank']
>>> ages = [8,9,11]
>>> makeTable(header,[names,ages])
------------
Name Age
------------
George 8
Alberta 9
Frank 11
------------
Since the number of columns remains the same, you could just print out the first line with ample spaces as required. Ex-
print("Design name",' ',"LUT",' ',"Lut as m",' ',"and continue
like that")
Then read the csv file. datafile will be
datafile = open('resource.csv','r')
reader = csv.reader(datafile)
for col in reader:
print(col[0],' ',col[1],' ',col[2],' ',"and
continue depending on the number of columns")
This is not he optimized solution but since it looks like you are new, therefore this will help you understand better. Or else you can use row_format print options in python 2.7.
Here is code to print table in nice table, you trasfer all your data to sets then you can data or else you can trasfer data in text file line to one set and print it
from beautifultable import BeautifulTable
h0=["jkgjkg"]
h1=[2,3]
h2=[2,3]
h3=[2,3]
h4=[2,3]
h5=[2,3]
h0.append("FPGA resources")
table = BeautifulTable()
table.column_headers = h0
table.append_row(h1)
table.append_row(h2)
table.append_row(h3)
table.append_row(h4)
table.append_row(h5)
print(table)
Out Put:
+--------+----------------+
| jkgjkg | FPGA resources |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+

Extract 2 lists out of csv file in python [duplicate]

I'm trying to parse through a csv file and extract the data from only specific columns.
Example csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
I'm trying to capture only specific columns, say ID, Name, Zip and Phone.
Code I've looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn't.
Here's what I've done so far:
import sys, argparse, csv
from settings import *
# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
fromfile_prefix_chars="#" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file
# open csv file
with open(csv_file, 'rb') as csvfile:
# get number of columns
for line in csvfile.readlines():
array = line.split(',')
first_item = array[0]
num_columns = len(array)
csvfile.seek(0)
reader = csv.reader(csvfile, delimiter=' ')
included_cols = [1, 2, 6, 7]
for row in reader:
content = list(row[i] for i in included_cols)
print content
and I'm expecting that this will print out only the specific columns I want for each row except it doesn't, I get the last column only.
The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.
This is most likely the end of your code:
for row in reader:
content = list(row[i] for i in included_cols)
print content
You want it to be this:
for row in reader:
content = list(row[i] for i in included_cols)
print content
Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.
Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:
names = df.Names
It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('file.txt') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
print(columns['name'])
print(columns['phone'])
print(columns['street'])
With a file like
name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.
Will output
>>>
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']
Or alternatively if you want numerical indexing for the columns:
with open('file.txt') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
print(columns[0])
>>>
['Bob', 'James', 'Smithers']
To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")
Use pandas:
import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']
Discard unneeded columns at parse time:
my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
P.S. I'm just aggregating what other's have said in a simple manner. Actual answers are taken from here and here.
You can use numpy.loadtext(filename). For example if this is your database .csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
And you want the Name column:
import numpy as np
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))
>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
More easily you can use genfromtext:
b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
With pandas you can use read_csv with usecols parameter:
df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
Example:
import pandas as pd
import io
s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''
df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)
total_bill day size
0 16.99 Sun 2
1 10.34 Sun 3
2 21.01 Sun 3
Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things 'manually' with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you've done pip install petl. The documentation is excellent.
Answer: Let's say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.
from petl import fromcsv, look, cut, tocsv
#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')
I think there is an easier way
import pandas as pd
dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values
So in here iloc[:, 0], : means all values, 0 means the position of the column.
in the example below ID will be selected
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
import pandas as pd
csv_file = pd.read_csv("file.csv")
column_val_list = csv_file.column_name._ndarray_values
Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:
myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']
A few things to consider:
The snippet above will produce a pandas Series and not dataframe.
The suggestion from ayhan with usecols will also be faster if speed is an issue.
Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.
And don't forget import pandas as pd
If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively "unzip"). So for your example:
ids, names, zips, phones = zip(*(
(row[1], row[2], row[6], row[7])
for row in reader
))
import pandas as pd
dataset = pd.read_csv('Train.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
X is a a bunch of columns, use it if you want to read more that one column
y is single column, use it to read one column
[:, 1:-1] are [row_index : to_row_index, column_index : to_column_index]
SAMPLE.CSV
a, 1, +
b, 2, -
c, 3, *
d, 4, /
column_names = ["Letter", "Number", "Symbol"]
df = pd.read_csv("sample.csv", names=column_names)
print(df)
OUTPUT
Letter Number Symbol
0 a 1 +
1 b 2 -
2 c 3 *
3 d 4 /
letters = df.Letter.to_list()
print(letters)
OUTPUT
['a', 'b', 'c', 'd']
import csv
with open('input.csv', encoding='utf-8-sig') as csv_file:
# the below statement will skip the first row
next(csv_file)
reader= csv.DictReader(csv_file)
Time_col ={'Time' : []}
#print(Time_col)
for record in reader :
Time_col['Time'].append(record['Time'])
print(Time_col)
From CSV File Reading and Writing you can import csv and use this code:
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['first_name'], row['last_name'])
To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.
with open(csv_file, 'rb') as csvfile:
# get number of columns
line = csvfile.readline()
first_item = line.split(',')

How can I write part of dictionary value string as headers to csv file in python? Is there a better way?

I have a dictionary that uses log file names as keys. When reading through a text file if it contains a value I was searching for, it saves the entire line as a new value in the list of values for that key(log file name).
I.e Key: logFile0 Value: [Val1 5, Val2 3, Val3 72].
I want to output to a csv file the values as headers and which log file it was found and the value.
Key | Value
log1 | [Value_name Val_num], [Value1_name Val_num]
log2 | [Value1_name Val_num], [Value3_name Valu_num]
log3 | [Value2_name Val_num], [Value3_name Val_num]
I want it to be displayed in the csv file where for each time the value was found in the log file, it's value is displayed as:
Value_name | Value1_name | Value2_name | Value3_name
log1 val_num | val_num | |
log2 | val_num | | val_num
log3 | | val_num | val_num
Does anybody know how to do this? Or is there a better way to store all of this information and then display?
Here is my approach:
import csv
d = {"log1":[[0,"Val_num"],[1,"Val_num"]],
"log2":[[1,"Val_num"],[3,"Val_num"]],
"log3":[[2,"Val_num"],[3,"Val_num"]]}
I made a dictionary. Use yours but the "Val_num" are meant to be the actual numbers. The first number is the value_name e.g. 1 -> Value1_name
headers = ["Log File","Value_num","Value1_num","Value2_num","Value3_num"]
log1Row = ["log1","","","",""]
log2Row = ["log2","","","",""]
log3Row = ["log3","","","",""]
Here I simply initialised the rows to be written into the csv file.
for i in d["log1"]:
log1Row[i[0] + 1] = i[1]
for i in d["log2"]:
log2Row[i[0] + 1] = i[1]
for i in d["log3"]:
log3Row[i[0] + 1] = i[1]
Now I am going to iterate over each key in the dictionary and append them into the arrays. The + 1 is needed as the array position for where theses numbers begin from is index 1 not 0 as this is the word "log".
with open("file.csv","w") as file:
writer = csv.writer(file)
writer.writerow(headers)
writer.writerow(log1Row)
writer.writerow(log2Row)
writer.writerow(log3Row)
Just open the file to write to and write these rows.
Hopefully this works as long as your dictionary follows the way mine does.
Let me know of any erros.
Thank you.

How to read in only rows with specific values on a specific column? [duplicate]

I'm trying to parse through a csv file and extract the data from only specific columns.
Example csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
I'm trying to capture only specific columns, say ID, Name, Zip and Phone.
Code I've looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn't.
Here's what I've done so far:
import sys, argparse, csv
from settings import *
# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
fromfile_prefix_chars="#" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file
# open csv file
with open(csv_file, 'rb') as csvfile:
# get number of columns
for line in csvfile.readlines():
array = line.split(',')
first_item = array[0]
num_columns = len(array)
csvfile.seek(0)
reader = csv.reader(csvfile, delimiter=' ')
included_cols = [1, 2, 6, 7]
for row in reader:
content = list(row[i] for i in included_cols)
print content
and I'm expecting that this will print out only the specific columns I want for each row except it doesn't, I get the last column only.
The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.
This is most likely the end of your code:
for row in reader:
content = list(row[i] for i in included_cols)
print content
You want it to be this:
for row in reader:
content = list(row[i] for i in included_cols)
print content
Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.
Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:
names = df.Names
It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('file.txt') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
print(columns['name'])
print(columns['phone'])
print(columns['street'])
With a file like
name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.
Will output
>>>
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']
Or alternatively if you want numerical indexing for the columns:
with open('file.txt') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
print(columns[0])
>>>
['Bob', 'James', 'Smithers']
To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")
Use pandas:
import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']
Discard unneeded columns at parse time:
my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
P.S. I'm just aggregating what other's have said in a simple manner. Actual answers are taken from here and here.
You can use numpy.loadtext(filename). For example if this is your database .csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
And you want the Name column:
import numpy as np
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))
>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
More easily you can use genfromtext:
b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
With pandas you can use read_csv with usecols parameter:
df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
Example:
import pandas as pd
import io
s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''
df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)
total_bill day size
0 16.99 Sun 2
1 10.34 Sun 3
2 21.01 Sun 3
Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things 'manually' with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you've done pip install petl. The documentation is excellent.
Answer: Let's say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.
from petl import fromcsv, look, cut, tocsv
#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')
I think there is an easier way
import pandas as pd
dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values
So in here iloc[:, 0], : means all values, 0 means the position of the column.
in the example below ID will be selected
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
import pandas as pd
csv_file = pd.read_csv("file.csv")
column_val_list = csv_file.column_name._ndarray_values
Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:
myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']
A few things to consider:
The snippet above will produce a pandas Series and not dataframe.
The suggestion from ayhan with usecols will also be faster if speed is an issue.
Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.
And don't forget import pandas as pd
If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively "unzip"). So for your example:
ids, names, zips, phones = zip(*(
(row[1], row[2], row[6], row[7])
for row in reader
))
import pandas as pd
dataset = pd.read_csv('Train.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
X is a a bunch of columns, use it if you want to read more that one column
y is single column, use it to read one column
[:, 1:-1] are [row_index : to_row_index, column_index : to_column_index]
SAMPLE.CSV
a, 1, +
b, 2, -
c, 3, *
d, 4, /
column_names = ["Letter", "Number", "Symbol"]
df = pd.read_csv("sample.csv", names=column_names)
print(df)
OUTPUT
Letter Number Symbol
0 a 1 +
1 b 2 -
2 c 3 *
3 d 4 /
letters = df.Letter.to_list()
print(letters)
OUTPUT
['a', 'b', 'c', 'd']
import csv
with open('input.csv', encoding='utf-8-sig') as csv_file:
# the below statement will skip the first row
next(csv_file)
reader= csv.DictReader(csv_file)
Time_col ={'Time' : []}
#print(Time_col)
for record in reader :
Time_col['Time'].append(record['Time'])
print(Time_col)
From CSV File Reading and Writing you can import csv and use this code:
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['first_name'], row['last_name'])
To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.
with open(csv_file, 'rb') as csvfile:
# get number of columns
line = csvfile.readline()
first_item = line.split(',')

Read specific columns from a csv file with csv module?

I'm trying to parse through a csv file and extract the data from only specific columns.
Example csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
I'm trying to capture only specific columns, say ID, Name, Zip and Phone.
Code I've looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn't.
Here's what I've done so far:
import sys, argparse, csv
from settings import *
# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
fromfile_prefix_chars="#" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file
# open csv file
with open(csv_file, 'rb') as csvfile:
# get number of columns
for line in csvfile.readlines():
array = line.split(',')
first_item = array[0]
num_columns = len(array)
csvfile.seek(0)
reader = csv.reader(csvfile, delimiter=' ')
included_cols = [1, 2, 6, 7]
for row in reader:
content = list(row[i] for i in included_cols)
print content
and I'm expecting that this will print out only the specific columns I want for each row except it doesn't, I get the last column only.
The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.
This is most likely the end of your code:
for row in reader:
content = list(row[i] for i in included_cols)
print content
You want it to be this:
for row in reader:
content = list(row[i] for i in included_cols)
print content
Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.
Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:
names = df.Names
It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('file.txt') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
print(columns['name'])
print(columns['phone'])
print(columns['street'])
With a file like
name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.
Will output
>>>
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']
Or alternatively if you want numerical indexing for the columns:
with open('file.txt') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
print(columns[0])
>>>
['Bob', 'James', 'Smithers']
To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")
Use pandas:
import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']
Discard unneeded columns at parse time:
my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
P.S. I'm just aggregating what other's have said in a simple manner. Actual answers are taken from here and here.
You can use numpy.loadtext(filename). For example if this is your database .csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
And you want the Name column:
import numpy as np
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))
>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
More easily you can use genfromtext:
b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
With pandas you can use read_csv with usecols parameter:
df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
Example:
import pandas as pd
import io
s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''
df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)
total_bill day size
0 16.99 Sun 2
1 10.34 Sun 3
2 21.01 Sun 3
Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things 'manually' with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you've done pip install petl. The documentation is excellent.
Answer: Let's say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.
from petl import fromcsv, look, cut, tocsv
#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')
I think there is an easier way
import pandas as pd
dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values
So in here iloc[:, 0], : means all values, 0 means the position of the column.
in the example below ID will be selected
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
import pandas as pd
csv_file = pd.read_csv("file.csv")
column_val_list = csv_file.column_name._ndarray_values
Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:
myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']
A few things to consider:
The snippet above will produce a pandas Series and not dataframe.
The suggestion from ayhan with usecols will also be faster if speed is an issue.
Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.
And don't forget import pandas as pd
If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively "unzip"). So for your example:
ids, names, zips, phones = zip(*(
(row[1], row[2], row[6], row[7])
for row in reader
))
import pandas as pd
dataset = pd.read_csv('Train.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
X is a a bunch of columns, use it if you want to read more that one column
y is single column, use it to read one column
[:, 1:-1] are [row_index : to_row_index, column_index : to_column_index]
SAMPLE.CSV
a, 1, +
b, 2, -
c, 3, *
d, 4, /
column_names = ["Letter", "Number", "Symbol"]
df = pd.read_csv("sample.csv", names=column_names)
print(df)
OUTPUT
Letter Number Symbol
0 a 1 +
1 b 2 -
2 c 3 *
3 d 4 /
letters = df.Letter.to_list()
print(letters)
OUTPUT
['a', 'b', 'c', 'd']
import csv
with open('input.csv', encoding='utf-8-sig') as csv_file:
# the below statement will skip the first row
next(csv_file)
reader= csv.DictReader(csv_file)
Time_col ={'Time' : []}
#print(Time_col)
for record in reader :
Time_col['Time'].append(record['Time'])
print(Time_col)
From CSV File Reading and Writing you can import csv and use this code:
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['first_name'], row['last_name'])
To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.
with open(csv_file, 'rb') as csvfile:
# get number of columns
line = csvfile.readline()
first_item = line.split(',')

Categories