List of strings in python - python

I have a csv file which has a column of dates and I m importing that using the below code.
Problem is when i map that to a list of strings, it is printed as below.
["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
With this I'm unable to check if the list contains my value(eg: another date) after doing necessary formatting.
I would like this to be
['05/06/2020', '1/6/2020', '5/22/2020']
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = list(map(str,csv_Reader))
print(listDates)

You can just simply add one extra line like so:
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = list(map(str,csv_Reader))
listDates = [x.split("'")[1] for x in listDates]
print(listDates)
Hope this helps :)

Use ast.literal_eval in a list comprehension to evaluate individual elements and capture the first entry:
import ast
lst = ["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
res = [ast.literal_eval(x)[0] for x in lst]
# ['05/06/2020', '1/6/2020', '5/22/2020']

Like this:
l = ["['05/06/2020']", "['1/6/2020']", "['5/22/2020']"]
l = [s[2:-2] for s in l]
print(l)
Output:
['05/06/2020', '1/6/2020', '5/22/2020']

If your file looks like this
05/06/2020
01/06/2020
05/22/2020
all you need is
with open('holidays.csv','r') as csv_file:
csv_Reader = csv.reader(csv_file)
next(csv_Reader)
listDates = [row[0] for row in csv_Reader]
Each row will be a list of fields, even if there is only one field.

Related

csv module: ordered dictionary manipulation?

I have a csv with two fields, 'positive' and 'negative'. I am trying to add the positive words to a list from the csv using the DictReader() module. Here is the following code.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
if n == 'positive' and csv_reader[n] != None :
positive_list.append(csv_reader[n])
However the program returns an empty list. Any idea how to get around this issue? Or what am I doing wrong?
That's because you can only read once from the csv_reader generator. In this case your do this with the print statement.
With a little re-arranging it should work fine:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
# put your print statement inside of the generator loop.
# otherwise the generator will be empty by the time your run the logic.
print(n)
# as n is a dict, you want to grab the right value from that dict.
# if it contains a value, then do something with it.
if n['positive']:
# Here you want to call the value from your dict.
# Don't try to call the csv_reader - but use the given data.
positive_list.append(n['positive'])
Every row in DictReader is a dictionary, so you can retrieve "columns values" using column name as "key" like this:
positive_column_values = []
for row in csv_dict_reader:
positive_column_value = row["positive"]
positive_column_values.append(positive_column_value)
After execution of this code, "positive_column_values" will have all values from "positive" column.
You can replace this code with your code to get desired result:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for row in csv_reader:
positive_list.append(row["positive"])
print(positive_list)
Here's a short way with a list comprehension. It assumes there is a header called header that holds (either) positive or negative values.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = [line for line in csv_reader if line.get('header') == 'positive']
print(positive_list)
alternatively if your csv's header is positive:
positive_list = [line for line in csv_reader if line.get('positive')]

Store in array instead of printing

I am trying to create an array after I put a for loop through an if argument that is read from a csv file. In this code below, I print the results. Instead of printing the results, I would like to store them in an array. How do I do this?
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
print(line['tckr'],line['Rel Volume'])
Try it like this:
arr = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
arr.append(line['tckr'],line['Rel Volume'])
If you want a nested list of lists (where each inner list contains two elements), do this:
my_list = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
my_list.append([line['tckr'],line['Rel Volume']])
If you want a flat (one-dimensional) list, do this:
my_list = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
my_list.extend([line['tckr'],line['Rel Volume']])
The only difference between the two examples is that one uses extend() and the other uses append(). Note that in each case, we pass a single list to whichever method we choose, by putting square brackets around line['tckr'],line['Rel Volume'].
You can use list comprehension to do this! My example below will use a namedtuple, but you don't have to include that, it's not necessary.
from collections import namedtuple
CSVLine = namedtuple('CSVLine', ['tckr', 'Rel_Volume'])
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
csvfilter = [ CSVLine(line['tckr'], line['Rel Volume']) for line in csv_reader if line['Rel Volume'] > 2.5]
Using the namedtuple, your array data will look something like this:
CSVLine(tckr='abcd', Rel_Volume=3.2), CSVLine(tckr='efgh', Rel_Volume=3.0), CSVLine(tckr='ijkl', Rel_Volume=4.2)
Without the namedtuple, it will simply look like this:
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
csvfilter = [(line['tckr'], line['Rel Volume']) for line in csv_reader if line['Rel Volume'] > 2.5 ]
I used a tuple in both examples, because I assumed you would want to pair the data from each line together, for later use.
Pandas module is great for munging:
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 09 17:21:57 2018
#author: soyab
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#%%
## Load as a Pandas Dataframe ; Select rows based on Column logic
new_df = pd.read_csv('_Stocks.csv')
df_over_two_five = new_df.loc[new_df['Rel Volume'] > 2.5].copy()
df_over_two_five
It's good to post a chuck of your data with a question like this. Then I can make sure to catch silly errors.

Python: How can I sum integers in a CSV file, while only summing the integers of a certain variable?

I'm trying to program some data in a csvfile by using Python. I have a list of countries and results of the Eurovision Songcontest, and it looks like this:
Country,Points,Year
Belgium;181;2016
Netherlands;153;2016
Australia;511;2016
Belgium;217;2015
Australia;196;2015
Et cetera.
In summary, I want to sum the total of points that any country received throughout the years, so the output should look something like this:
'Belgium: 398','Netherlands: 153','Australia: 707' and so on.
This is what my code looks like:
import csv
with open('euro20042016.csv', 'r') as csvfile:
pointsallyears = []
countriesallyears = []
readFILE = csv.reader(csvfile, delimiter=';')
for row in readFILE:
countriesallyears.append(row[0])
pointsallyears.append(row[1])
csvfile.close()
results = []
for result in pointsallyears:
result = int(result)
results.append(result)
scorebord = zip(countriesallyears,results)
So I already made sure that the results / points are actual integers and I filtered out the third row (Year), but I have no idea how to proceed from here. Thanks a lot in advance!
Just put #Mikk's comment into an actual answer. Two lines except the import
import pandas as pd
df = pd.read_csv('euro20042016.csv', sep = ';')
print df.groupby('Country')['Points'].sum()
The only extra thing you need to do is to change the first line of your file to be delimited by ; instead of ,.
I slightly changed your code to use a dictionary and used country names as keys. In result dictionary d will have country names as key and value is the total points.
import csv
d = dict()
with open('euro20042016.csv', 'r') as csvfile:
readFILE = csv.reader(csvfile, delimiter=';')
print (readFILE)
c_list = []
for row in readFILE:
if row[0] in c_list:
d[row[0]] = d[row[0]] + int(row[1])
else:
c_list.append(row[0])
d[row[0]] = int(row[1])
csvfile.close()
print(d)
I decided to play around a bit with your code, and this is what I came up with. Here, row[0] contains the country names, and row[1] contains the values we need. We check if the country already exists in the dictionary we use to maintain the aggregates, and if it doesn't we create it.
import csv
with open('euro20042016.csv', 'r') as csvfile:
score_dict={}
readFILE = csv.reader(csvfile, delimiter=';')
for row in readFILE:
# Only rows with 3 elements have the data we need
if len(row) == 3:
if row[0] in score_dict:
score_dict[row[0]]+=int(row[1])
else:
score_dict[row[0]]=int(row[1])
csvfile.close()
print score_dict
What I get as output is this
{'Belgium': 398, 'Australia': 707, 'Netherlands': 153}
which I believe is what you were aiming for.
Let me know in the comments if you face a problem understanding anything.
I have solution of that. but make sure your euro20042016.csv file same as
Belgium;181;2016
Netherlands;153;2016
Australia;511;2016
Belgium;217;2015
Australia;196;2015
and this code get output in list. like
[('Belgium', 398), ('Australia', 707), ('Netherlands', 153)]
Code is here
try:
f = open('euro20042016.csv', 'r+')
s = f.read()
lst = list(map(lambda x: x.split(';'), s.split('\n')))
points, country = [], []
for line in lst:
points.append(int(line[1]))
country.append(line[0])
countrypoints = sorted(zip(country, points), key=lambda x: x[1])
country = list(set(country))
total = [0]*len(country)
for rec in countrypoints:
total[country.index(rec[0])] = total[country.index(
rec[0])] + rec[1]
f.close()
finalTotal = list(zip(country, total))
print finalTotal
except IOError as ex:
print ex
except Exception as ex:
print ex
I hope this will help you.

Read in the first column of a CSV in Python

I have a CSV (mylist.csv) with 2 columns that look similar to this:
jfj840398jgg item-2f
hd883hb2kjsd item-9k
jie9hgtrbu43 item-12
fjoi439jgnso item-3i
I need to read the first column into a variable so I just get:
jfj840398jgg
hd883hb2kjsd
jie9hgtrbu43
fjoi439jgnso
I tried the following, but it is only giving me the first letter of each column:
import csv
list2 = []
with open("mylist.csv") as f:
for row in f:
list2.append(row[0])
So the results of the above code are giving me list2 as:
['j', 'h', 'j', 'f']
You should split the row and then append the first item
list2 = []
with open("mylist.csv") as f:
for row in f:
list2.append(row.split()[0])
You could also use a list comprehension which are pretty standard for creating lists:
with open("mylist.csv") as f:
list2 = [row.split()[0] for row in f]
You can also use pandas here:
import pandas as pd
df = pd.read_csv(mylist.csv)
Then, getting the first column is as easy as:
matrix2 = df[df.columns[0]].as_matrix()
list2 = matrix2.tolist()
This will return only the first column in list. You might want to consider leaving the data in numpy, if you're conducting further data operation on the result you get.
You can use the csv module:
import csv
with open("processes_infos.csv", "r", newline="") as file:
reader = csv.reader(file, delimiter=",")
for row in reader:
print(row[0], row[1])
You can change the delimiter "," into " ".
you import csv, but then never use it to actually read the CSV. Then you open mylist.csv as a normal file, so when you declare:
for row in f:
list2.append(row[0])
What you're actually telling Python to do is "iterate through the lines, and append the first element of the lines (which would be the first letter) to list2". What you need to do, if you want to use the CSV module, is:
import csv
with open('mylist.csv', 'r') as f:
csv_reader = csv.reader(f, delimiter=' ')
for row in csv_reader:
list2.append(row[0])
The simplest answer
import pandas as pd
df = pd.read_csv(mylist.csv)
matrix2 = df[df.columns[0]].to_numpy()
list1 = matrix2.tolist()
print(list1)

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """

Categories