csv module: ordered dictionary manipulation? - python

I have a csv with two fields, 'positive' and 'negative'. I am trying to add the positive words to a list from the csv using the DictReader() module. Here is the following code.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
if n == 'positive' and csv_reader[n] != None :
positive_list.append(csv_reader[n])
However the program returns an empty list. Any idea how to get around this issue? Or what am I doing wrong?

That's because you can only read once from the csv_reader generator. In this case your do this with the print statement.
With a little re-arranging it should work fine:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
# put your print statement inside of the generator loop.
# otherwise the generator will be empty by the time your run the logic.
print(n)
# as n is a dict, you want to grab the right value from that dict.
# if it contains a value, then do something with it.
if n['positive']:
# Here you want to call the value from your dict.
# Don't try to call the csv_reader - but use the given data.
positive_list.append(n['positive'])

Every row in DictReader is a dictionary, so you can retrieve "columns values" using column name as "key" like this:
positive_column_values = []
for row in csv_dict_reader:
positive_column_value = row["positive"]
positive_column_values.append(positive_column_value)
After execution of this code, "positive_column_values" will have all values from "positive" column.
You can replace this code with your code to get desired result:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for row in csv_reader:
positive_list.append(row["positive"])
print(positive_list)

Here's a short way with a list comprehension. It assumes there is a header called header that holds (either) positive or negative values.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = [line for line in csv_reader if line.get('header') == 'positive']
print(positive_list)
alternatively if your csv's header is positive:
positive_list = [line for line in csv_reader if line.get('positive')]

Related

List of strings in python

I have a csv file which has a column of dates and I m importing that using the below code.
Problem is when i map that to a list of strings, it is printed as below.
["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
With this I'm unable to check if the list contains my value(eg: another date) after doing necessary formatting.
I would like this to be
['05/06/2020', '1/6/2020', '5/22/2020']
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = list(map(str,csv_Reader))
print(listDates)
You can just simply add one extra line like so:
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = list(map(str,csv_Reader))
listDates = [x.split("'")[1] for x in listDates]
print(listDates)
Hope this helps :)
Use ast.literal_eval in a list comprehension to evaluate individual elements and capture the first entry:
import ast
lst = ["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
res = [ast.literal_eval(x)[0] for x in lst]
# ['05/06/2020', '1/6/2020', '5/22/2020']
Like this:
l = ["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
l = [s[2:-2] for s in l]
print(l)
Output:
['05/06/2020', '1/6/2020', '5/22/2020']
If your file looks like this
05/06/2020
01/06/2020
05/22/2020
all you need is
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = [row[0] for row in csv_Reader]
Each row will be a list of fields, even if there is only one field.

Can I print lines randomly from a csv in Python?

I'm trying print lines randomly from a csv.
Lets say the csv has the below 10 lines -
1,One
2,Two
3,Three
4,Four
5,Five
6,Six
7,Seven
8,Eight
9,Nine
10,Ten
If I write a code like below, it prints each line as a list in the same order as present in the CSV
import csv
with open("MyCSV.csv") as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader):
print(row)
Instead, I'd like it to be random.
Its just a print for now. I'll later pass each line as a List to a Function.
This should work. You can reuse the lines list in your code as it is shuffled.
import random
with open("tmp.csv", "r") as f:
lines = f.readlines()
random.shuffle(lines)
print(lines)
import csv
import random
csv_elems = []
with open("MyCSV.csv") as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader):
csv_elems.append(row)
random.shuffle(csv_elems)
print(csv_elems[0])
As you can see I'm just printing the first elem, you can iterate over the list, keep shuffling & print
Well you can define a list, append all elements of csv file into it, then shuffle it and print them, assume that the name of this list is temp
import csv
import random
temp = []
with open("your csv file.csv") as file:
reader = csv.reader(file)
for row_num, row in enumerate(reader):
temp.append(row)
random.shuffle(temp)
for i in range(len(temp)):
print(temp[i])
Why better don't you use pandas to handle csv?
import pandas as pd
data = pd.read_csv("MyCSV.csv")
And to get the samples you are looking for just write:
data.sample() # print one sample
data.sample(5) # to write 5 samples
Also if you want to pass each line to a function.
data_after_function = data.appy(function_name)
and inside the function you can cast the line into a list with list()
Hope this helps!
Couple of things to do:
Store CSV into a sequence of some sort
Get the data randomly
For 1, it’s probably best to use some form of sequence comprehension (I’ve gone for nested tuple in a list as it seems you want the row numbers and we can’t use dictionaries for shuffle).
We can use the random module for number 2.
import random
import csv
with open("MyCSV.csv") as f:
reader = csv.reader(f)
my_csv = [(row_num, row) for row_num, row in enumerate(reader)]
# get only 1 item from the list at random
random_row = random.choice(my_csv)
# randomise the order of all the rows
shuffled_csv = random.shuffle(my_csv)

How to output certain element from a row from CSV file, Python

I am using python to parse CSV file but I face an issue how to extract "Davies" element from second row.
CSV looks like this
"_submissionusersID","_submissionresponseID","username","firstname","lastname","userid","phone","emailaddress","load_date"
"b838b35d-ca18-4c7c-874a-828298ae3345","e9cde2ff-33a7-477e-b3b9-12ceb0d214e0","DAVIESJO","John","Davies","16293","","john_davies#test2.com","2019-08-30 15:37:03"
"00ec3205-6fcb-4d6d-b806-25579b49911a","e9cde2ff-11a7-477e-b3b9-12ceb0d934e0","MORANJO","John","Moran","16972","+1 (425) 7404555","brian_moran2#test2.com","2019-08-30 15:37:03"
"cc44e6bb-af76-4165-8839-433ed8cf6036","e9cde2ff-33a7-477e-b3b9-12ceb0d934e0","TESTNAN","Nancy","Test","75791","+1 (412) 7402344","nancy_test#test2.com","2019-08-30 15:37:03"
"a8ecd4db-6c8d-453c-a2a7-032553e2f0e6","e9cde2ff-33a7-477e-b3b9-12ceb0d234e0","SMITHJO","John","Smith","197448","+1 (415) 5940445","john_smith#test2.com","2019-08-30 15:37:03"
I'm stuck here:
with open('Docs/CSV/submis/submis.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
You are absolutely correct with the code and each and every row is returned as a Dict so you need to parse the Dict and obtain the required results you want to,
as shown below.
import csv
with open('/home/liferay172/Documents/Sundeep/stackoverflow/text.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
print(row)
print("Username :: "+row['username'])
print("Firstname :: "+row['firstname'])
print("Lastname :: "+row['lastname'])
For a specific row
import csv
rowNumber = 1
with open('/home/liferay172/Documents/Sundeep/stackoverflow/text.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
print(list(csv_reader)[rowNumber-1]['lastname']) # -1 as the index starts from 0
Returns > Davies
Here's how to put, for example, "Davies" record in result variable and also print its data if found.
import csv
with open('/home/liferay172/Documents/Sundeep/stackoverflow/text.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
if (row['username'] == "Davies"):
match = row
print("Username:\t" + row['username'])
print("Firstname:\t" + row['firstname'])
print("Lastname:\t" + row['lastname'])
break
print(match)
You can convert the CSV reader object to a list and then it can be accessed by index.
import csv
with open('Docs/CSV/submis/submis.csv') as csv_file:
csv_reader = list(csv.reader(csv_file))
# 2nd row
print(csv_reader[1])
# 2nd row 3rd column
print(csv_reader[1][2])

Reading column names alone in a csv file

I have a csv file with the following columns:
id,name,age,sex
Followed by a lot of values for the above columns.
I am trying to read the column names alone and put them inside a list.
I am using Dictreader and this gives out the correct details:
with open('details.csv') as csvfile:
i=["name","age","sex"]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
But what I want to do is, I need the list of columns, ("i" in the above case)to be automatically parsed with the input csv than hardcoding them inside a list.
with open('details.csv') as csvfile:
rows=iter(csv.reader(csvfile)).next()
header=rows[1:]
re=csv.DictReader(csvfile)
for row in re:
print row
for x in header:
print row[x]
This gives out an error
Keyerrror:'name'
in the line print row[x]. Where am I going wrong? Is it possible to fetch the column names using Dictreader?
Though you already have an accepted answer, I figured I'd add this for anyone else interested in a different solution-
Python's DictReader object in the CSV module (as of Python 2.6 and above) has a public attribute called fieldnames.
https://docs.python.org/3.4/library/csv.html#csv.csvreader.fieldnames
An implementation could be as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
d_reader = csv.DictReader(f)
#get fieldnames from DictReader object and store in list
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
In the above, d_reader.fieldnames returns a list of your headers (assuming the headers are in the top row).
Which allows...
>>> print(headers)
['MyCol1', 'MyCol2', 'MyCol3']
If your headers are in, say the 2nd row (with the very top row being row 1), you could do as follows:
import csv
with open('C:/mypath/to/csvfile.csv', 'r') as f:
#you can eat the first line before creating DictReader.
#if no "fieldnames" param is passed into
#DictReader object upon creation, DictReader
#will read the upper-most line as the headers
f.readline()
d_reader = csv.DictReader(f)
headers = d_reader.fieldnames
for line in d_reader:
#print value in MyCol1 for each row
print(line['MyCol1'])
You can read the header by using the next() function which return the next row of the reader’s iterable object as a list. then you can add the content of the file to a list.
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
rest = list(reader)
Now i has the column's names as a list.
print i
>>>['id', 'name', 'age', 'sex']
Also note that reader.next() does not work in python 3. Instead use the the inbuilt next() to get the first line of the csv immediately after reading like so:
import csv
with open("C:/path/to/.filecsv", "rb") as f:
reader = csv.reader(f)
i = next(reader)
print(i)
>>>['id', 'name', 'age', 'sex']
The csv.DictReader object exposes an attribute called fieldnames, and that is what you'd use. Here's example code, followed by input and corresponding output:
import csv
file = "/path/to/file.csv"
with open(file, mode='r', encoding='utf-8') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print([col + '=' + row[col] for col in reader.fieldnames])
Input file contents:
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
00,01,02,03,04,05,06,07,08,09
10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29
30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49
50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69
70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89
90,91,92,93,94,95,96,97,98,99
Output of print statements:
['col0=00', 'col1=01', 'col2=02', 'col3=03', 'col4=04', 'col5=05', 'col6=06', 'col7=07', 'col8=08', 'col9=09']
['col0=10', 'col1=11', 'col2=12', 'col3=13', 'col4=14', 'col5=15', 'col6=16', 'col7=17', 'col8=18', 'col9=19']
['col0=20', 'col1=21', 'col2=22', 'col3=23', 'col4=24', 'col5=25', 'col6=26', 'col7=27', 'col8=28', 'col9=29']
['col0=30', 'col1=31', 'col2=32', 'col3=33', 'col4=34', 'col5=35', 'col6=36', 'col7=37', 'col8=38', 'col9=39']
['col0=40', 'col1=41', 'col2=42', 'col3=43', 'col4=44', 'col5=45', 'col6=46', 'col7=47', 'col8=48', 'col9=49']
['col0=50', 'col1=51', 'col2=52', 'col3=53', 'col4=54', 'col5=55', 'col6=56', 'col7=57', 'col8=58', 'col9=59']
['col0=60', 'col1=61', 'col2=62', 'col3=63', 'col4=64', 'col5=65', 'col6=66', 'col7=67', 'col8=68', 'col9=69']
['col0=70', 'col1=71', 'col2=72', 'col3=73', 'col4=74', 'col5=75', 'col6=76', 'col7=77', 'col8=78', 'col9=79']
['col0=80', 'col1=81', 'col2=82', 'col3=83', 'col4=84', 'col5=85', 'col6=86', 'col7=87', 'col8=88', 'col9=89']
['col0=90', 'col1=91', 'col2=92', 'col3=93', 'col4=94', 'col5=95', 'col6=96', 'col7=97', 'col8=98', 'col9=99']
How about
with open(csv_input_path + file, 'r') as ft:
header = ft.readline() # read only first line; returns string
header_list = header.split(',') # returns list
I am assuming your input file is CSV format.
If using pandas, it takes more time if the file is big size because it loads the entire data as the dataset.
I am just mentioning how to get all the column names from a csv file.
I am using pandas library.
First we read the file.
import pandas as pd
file = pd.read_csv('details.csv')
Then, in order to just get all the column names as a list from input file use:-
columns = list(file.head(0))
Thanking Daniel Jimenez for his perfect solution to fetch column names alone from my csv, I extend his solution to use DictReader so we can iterate over the rows using column names as indexes. Thanks Jimenez.
with open('myfile.csv') as csvfile:
rest = []
with open("myfile.csv", "rb") as f:
reader = csv.reader(f)
i = reader.next()
i=i[1:]
re=csv.DictReader(csvfile)
for row in re:
for x in i:
print row[x]
here is the code to print only the headers or columns of the csv file.
import csv
HEADERS = next(csv.reader(open('filepath.csv')))
print (HEADERS)
Another method with pandas
import pandas as pd
HEADERS = list(pd.read_csv('filepath.csv').head(0))
print (HEADERS)
import pandas as pd
data = pd.read_csv("data.csv")
cols = data.columns
I literally just wanted the first row of my data which are the headers I need and didn't want to iterate over all my data to get them, so I just did this:
with open(data, 'r', newline='') as csvfile:
t = 0
for i in csv.reader(csvfile, delimiter=',', quotechar='|'):
if t > 0:
break
else:
dbh = i
t += 1
Using pandas is also an option.
But instead of loading the full file in memory, you can retrieve only the first chunk of it to get the field names by using iterator.
import pandas as pd
file = pd.read_csv('details.csv'), iterator=True)
column_names_full=file.get_chunk(1)
column_names=[column for column in column_names_full]
print column_names

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """

Categories