I am trying to create an array after I put a for loop through an if argument that is read from a csv file. In this code below, I print the results. Instead of printing the results, I would like to store them in an array. How do I do this?
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
print(line['tckr'],line['Rel Volume'])
Try it like this:
arr = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
arr.append(line['tckr'],line['Rel Volume'])
If you want a nested list of lists (where each inner list contains two elements), do this:
my_list = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
my_list.append([line['tckr'],line['Rel Volume']])
If you want a flat (one-dimensional) list, do this:
my_list = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
my_list.extend([line['tckr'],line['Rel Volume']])
The only difference between the two examples is that one uses extend() and the other uses append(). Note that in each case, we pass a single list to whichever method we choose, by putting square brackets around line['tckr'],line['Rel Volume'].
You can use list comprehension to do this! My example below will use a namedtuple, but you don't have to include that, it's not necessary.
from collections import namedtuple
CSVLine = namedtuple('CSVLine', ['tckr', 'Rel_Volume'])
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
csvfilter = [ CSVLine(line['tckr'], line['Rel Volume']) for line in csv_reader if line['Rel Volume'] > 2.5]
Using the namedtuple, your array data will look something like this:
CSVLine(tckr='abcd', Rel_Volume=3.2), CSVLine(tckr='efgh', Rel_Volume=3.0), CSVLine(tckr='ijkl', Rel_Volume=4.2)
Without the namedtuple, it will simply look like this:
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
csvfilter = [(line['tckr'], line['Rel Volume']) for line in csv_reader if line['Rel Volume'] > 2.5 ]
I used a tuple in both examples, because I assumed you would want to pair the data from each line together, for later use.
Pandas module is great for munging:
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 09 17:21:57 2018
#author: soyab
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#%%
## Load as a Pandas Dataframe ; Select rows based on Column logic
new_df = pd.read_csv('_Stocks.csv')
df_over_two_five = new_df.loc[new_df['Rel Volume'] > 2.5].copy()
df_over_two_five
It's good to post a chuck of your data with a question like this. Then I can make sure to catch silly errors.
Related
I have a csv with two fields, 'positive' and 'negative'. I am trying to add the positive words to a list from the csv using the DictReader() module. Here is the following code.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
if n == 'positive' and csv_reader[n] != None :
positive_list.append(csv_reader[n])
However the program returns an empty list. Any idea how to get around this issue? Or what am I doing wrong?
That's because you can only read once from the csv_reader generator. In this case your do this with the print statement.
With a little re-arranging it should work fine:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
# put your print statement inside of the generator loop.
# otherwise the generator will be empty by the time your run the logic.
print(n)
# as n is a dict, you want to grab the right value from that dict.
# if it contains a value, then do something with it.
if n['positive']:
# Here you want to call the value from your dict.
# Don't try to call the csv_reader - but use the given data.
positive_list.append(n['positive'])
Every row in DictReader is a dictionary, so you can retrieve "columns values" using column name as "key" like this:
positive_column_values = []
for row in csv_dict_reader:
positive_column_value = row["positive"]
positive_column_values.append(positive_column_value)
After execution of this code, "positive_column_values" will have all values from "positive" column.
You can replace this code with your code to get desired result:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for row in csv_reader:
positive_list.append(row["positive"])
print(positive_list)
Here's a short way with a list comprehension. It assumes there is a header called header that holds (either) positive or negative values.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = [line for line in csv_reader if line.get('header') == 'positive']
print(positive_list)
alternatively if your csv's header is positive:
positive_list = [line for line in csv_reader if line.get('positive')]
I have a csv file which has a column of dates and I m importing that using the below code.
Problem is when i map that to a list of strings, it is printed as below.
["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
With this I'm unable to check if the list contains my value(eg: another date) after doing necessary formatting.
I would like this to be
['05/06/2020', '1/6/2020', '5/22/2020']
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = list(map(str,csv_Reader))
print(listDates)
You can just simply add one extra line like so:
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = list(map(str,csv_Reader))
listDates = [x.split("'")[1] for x in listDates]
print(listDates)
Hope this helps :)
Use ast.literal_eval in a list comprehension to evaluate individual elements and capture the first entry:
import ast
lst = ["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
res = [ast.literal_eval(x)[0] for x in lst]
# ['05/06/2020', '1/6/2020', '5/22/2020']
Like this:
l = ["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
l = [s[2:-2] for s in l]
print(l)
Output:
['05/06/2020', '1/6/2020', '5/22/2020']
If your file looks like this
05/06/2020
01/06/2020
05/22/2020
all you need is
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = [row[0] for row in csv_Reader]
Each row will be a list of fields, even if there is only one field.
I am trying to read a line of numbers in a csv file, then call on them individually to compute with them. Here is my code so far:
import sys
import os
import csv
import numpy as np
with open('data.csv') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
x = np.array(line[3])
print(x)
Within this line of the csv file there are numbers with decimals like 4.65 that I need to use in the calculations. I tried things like:
print(5 + x[14]) but it won't work.
I assume I need to convert the line into a list of integers or something, please help.
Thank you in advance.
According to your example line you want to add delimiter=' ' to the csv.reader()
csv_data = csv.reader(csv_file, delimiter=' ')
Taking a guess at the structure of your csv, but under the assumption that it contains no headings and you want to keep the decimals:
with open('new_data.txt') as csv_file:
csv_data = csv.reader(csv_file, delimiter=' ')
for line in csv_data:
x = [float(n) for n in line.split(',')]
print(x)
This will fail if you have string values, such as 'A string'
Here is an alternative to #GiantsLoveDeathMetal's solution with map (also it shows a way to provide us a copy/paste-able piece of code containing a sample of your csv file with io.StringIO) :
EDITED the StringIO to contain data in columns and with empty rows
import csv
import io
f = io.StringIO("""
7.057
7.029
5.843
5.557
4.186
4.1
2.286""")
csv_reader = csv.reader(f, delimiter=' ')
for line in csv_reader:
line = list(map(float, filter(None, line)))
print(line)
In python 2 (and in some cases in python 3 if your code works with a generator, which is not the case in this example), you don't need to convert the result of map to a list.
So line = list(map(float, line)) can be replaced by the much cleaner map(float, line). Which can be considered cleaner and more explicit than a list comprehension (or a generator expression).
For instance this will work :
import csv
import io
f = io.StringIO("""7.057 7.029 5.843 5.557 4.186 4.1 2.286""")
csv_reader = csv.reader(f, delimiter=' ')
for line in csv_reader:
line = map(float, line)
print(sum(line))
# 36.05800000000001
If you're interested in the map vs list comprehension debate, here you go for more details.
I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.
There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """