How to index rows in csv files and take columns as arguments? - python

Here is a sample of my csv file:
Date Open High Low Close
9/2/2021 34.05 40.34 33.03 36.7
9/3/2021 35.9 41.98 34.9 36.89
Here is a sample of my code:
import csv
def StockMarket():
while True:
command = input('$')
if command.lower() == 'quit':
break
elif command.lower() == 'readfiles':
mrna_data,pfe_data =
ReadFiles('MRNA.csv.numbers','PFE.1.csv')
#elif command.lower() = 'pricesondate'
def ReadFiles(MRNA,PFE):
file = open(MRNA,'r', newline = '')
reader = csv.reader(file,delimiter ='\t')
next(reader)
mrna_data = []
for row in reader:
#index rows only need row 0 and 4th row
mrna_data.append(row)
reader.close()
file = open(PFE,'r', newline = '')
reader = csv.reader(file,delimiter ='\t')
next(reader)
pfe_data = []
for row in reader:
pfe_data.append(row)
reader.close()
return mrna_data,pfe_data
I want to index rows 0 and 4 since they are the only rows I'm using. Then I would like to use row 0 as an argument in "YYYY-MM-DD" format which would then return me the corresponding row 4(This would be a separate function).
I've tried doing multiple methods from examples online and such but none of them work. If someone could help, I would really appreciate it.

Related

How to print the 2nd column of a searched string in a CSV?

I am trying to search for a file name in a CSV (in column A). If it finds it, then I want to print only the second column (column B), not the whole row.
The CSV is like this:
File Name,ID
1234.bmp,1A
1111.bmp,2B
This is what I have so far, but it prints both the columns:
import os
import csv
f_name = os.listdir(r'C:\Users\Peter\Documents\Python test\Files')[0]
data = []
with open ("test.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
col = [x[0] for x in data]
if f_name in col:
for x in range(len(data)):
if f_name ==data[x][0]:
action = print(data[x])
else:
print("File not listed")
You were close. You only had a problem with the indexing (and the print statement).
After this part of the code:
data = []
with open ("test.csv") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
The data would now be a list of lists:
[
['File Name', 'ID'],
['1234.bmp', '1A'],
['1111.bmp', '2B']
]
In the part where you check the 1st column:
if f_name == data[x][0]:
action = print(data[x])
You printed data[x] which would be one row. You need to index it further to access the 2nd column:
print(data[x]) # ['1234.bmp', '1A']
print(data[x][1]) # 1A
Furthermore, print returns None, so None would be saved into action:
>>> action = print("123")
123
>>> print(action)
None
You need to assign the value to action then print(action):
if f_name == data[x][0]:
action = data[x][1]
print(action) # 1A or 2B
You can also further improve the code by eliminating col. I understand that it's for checking if f_name is in the 1st column ("File Name") of the CSV. Since you are already iterating over each row, you can already check it there if f_name is in row. If it finds it, store the index of that row in a variable (ex. idx_fname_in_csv), so that later, you can access it directly from data. This eliminates the extra variable col and avoids iterating over the data twice.
import os
import csv
f_name = os.listdir(r'C:\Users\Peter\Documents\Python test\Files')[0]
data = []
idx_fname_in_csv = -1 # invalid
with open("test.csv") as csvfile:
reader = csv.reader(csvfile)
for idx, row in enumerate(reader):
data.append(row)
if f_name in row:
idx_fname_in_csv = idx
if idx_fname_in_csv > 0:
action = data[idx_fname_in_csv][1]
print(action)
else:
print("File not listed")
Here data would still have the same contents (list of lists) but I used enumerate to keep track of the index.

Split CSV by columns [duplicate]

This question already has answers here:
splitting CSV file by columns
(4 answers)
Closed 4 years ago.
I am trying to split a CSV file containing stock data of 1500+ companies. The first column contains dates and subsequent columns contain company data.
Goal 1: I'm trying to split the huge CSV file into smaller pieces. Let's say 30 companies per smaller file. To do this, I need to split the CSV by column number, not rows. I've been looking up code snippets but I haven't found anything that does this exactly. Also, each separate file would need to contain the first column, i.e. the dates.
Goal 2: I want to make the company name a column of its own, the date a column of its own and the indicators columns of their own. So, I can call the data for a company as a single record (row) in Django - I don't need all the dates, just the last day of every quarter. Right now, I'm having to filter the data by date and indicator and set that as an object to display in my frontend.
If you have questions, just ask.
EDIT:
Following is some code I patched together.
import os
import csv
from math import floor
from datetime import datetime
import re
class SimFinDataset:
def __init__(self, dataFilePath, csvDelimiter = "semicolon"):
self.numIndicators = None
self.numCompanies = 1
# load data
self.loadData(dataFilePath, csvDelimiter)
def loadData(self, filePath, delimiter):
numRow = 0
delimiterChar = ";" if delimiter == "semicolon" else ","
csvfile = open(filePath, 'rb')
reader = csv.reader(csvfile, delimiter=delimiterChar, quotechar='"')
header = next(reader)
row_count = sum(1 for _ in reader)
csvfile.seek(0)
for row in reader:
numRow += 1
if numRow > 1 and numRow != row_count and numRow != row_count-1:
# company id row
if numRow == 2:
rowLen = len(row)
idVal = None
for index, columnVal in enumerate(row):
if index > 0:
if idVal is not None and idVal != columnVal:
self.numCompanies += 1
if self.numIndicators is None:
self.numIndicators = index - 1
if index + 1 == rowLen:
if self.numIndicators is None:
self.numIndicators = index
idVal = columnVal
if numRow > 2 and self.numIndicators is None:
return
else:
filename = 1
with open(str(filename) + '.csv', 'wb') as csvfile:
if self.numCompanies % 30 == 0:
print ("im working")
spamwriter = csv.writer(csvfile, delimiter=';')
spamwriter.writerow(header)
spamwriter.writerow(row)
filename += 1
#print (self.numIndicators)
dataset = SimFinDataset('new-data.csv','semicolon')
a solution for goal1 is here.
splitting CSV file by columns
However you have the pandas way:
import pandas as pd
# let's say first 10 columns
csv_path="mycsv.csv"
out_path ="\\...\\out.csv"
pd.read_csv(csv_path).iloc[:, :10].to_csv(out_path)
You can also do something like
mydf.groupby("company_name").unstack()`
To make each company a column of its own

Dump rows of a CSV file which contain a sequence of blank fields

I am trying to write a python program to clean survey data coming from a CSV file.
I would like to dump rows which contain a sequence of blank fields, like the first and the third line in the following example.
"1","a","b","c",,,,,
"2","a","b","c","d","e","f",,"h"
"3","a","b","c",,,,,
"4","a","z","u","d","i","f","x","h"
"5","d","c","c",,"c","f","g","z"
Following my unsuccessful code:
import csv
fname = raw_input("Enter input file name: ")
if len(fname) < 1 : fname = "survey.csv"
foutput = raw_input("Enter output file name: ")
if len(foutput) < 1 : foutput = "output_"+fname
input = open(fname, 'rb')
output = open(foutput, 'wb')
searchFor = 5*['']
writer = csv.writer(output)
for row in csv.reader(input):
if searchFor not in row :
writer.writerow(row)
input.close()
output.close()
Use counter to check if one list is subset of another as below. If you want to remove empty elements then just use None, bool or lento filter blanks and discard them-
import csv
from itertools import repeat
from collections import Counter
input = open(fname, 'rb')
output = open(foutput, 'wb')
writer = csv.writer(output)
#Helper function
def counterSubset(list1, list2):
c1, c2 = Counter(list1), Counter(list2)
for k, n in c1.items():
if n > c2[k]:
return False
return True
for row in csv.reader(input):
if not counterSubset(list(repeat('',5)),row):# i used 5 for five '' you can change it
writer.writerow(row)#use filter(None,row) or filter(bool,row) or filter(len,row) to remove empty elements
input.close()
output.close()
Output-
1,a,b,c,,
2,a,b,c,d,e,f,g,h
4,a,,z,u,d,i,f,x,h
5,d,c,c,d,c,f,g,z
How about
# change this to whatever a blank item is from the csv reader
# probably "" or None
blank_item = None
for row in csv.reader(input):
# filter out all blank elements
blanks = [x for x in row if x == blank_item]
if len(blanks) < 5:
writer.writerow(row)
This will count the number of blanks in a row and let you drop them as desired.

Read and Compare 2 CSV files on a row and column basis

I have two CSV files. data.csv and data2.csv.
I would like to first of Strip the two data files down to the data I am interested in. I have figured this part out with data.csv. I would then like to compare by row making sure that if a row is missing to add it.
Next I want to look at column 2. If there is a value there then I want to write to column 3 if there is data in column 3 then write to 4, etc.
My current program looks like sow. Need some guidance
Oh and I am using Python V3.4
__author__ = 'krisarmstrong'
#!/usr/bin/python
import csv
searched = ['aircheck', 'linkrunner at', 'onetouch at']
def find_group(row):
"""Return the group index of a row
0 if the row contains searched[0]
1 if the row contains searched[1]
etc
-1 if not found
"""
for col in row:
col = col.lower()
for j, s in enumerate(searched):
if s in col:
return j
return -1
inFile = open('data.csv')
reader = csv.reader(inFile)
inFile2 = open('data2.csv')
reader2 = csv.reader(inFile2)
outFile = open('data3.csv', "w")
writer = csv.writer(outFile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
header = next(reader)
header2 = next(reader2)
"""Built a list of items to sort. If row 12 contains 'LinkRunner AT' (group 1),
one stores a triple (1, 12, row)
When the triples are sorted later, all rows in group 0 will come first, then
all rows in group 1, etc.
"""
stored = []
writer.writerow([header[0], header[3]])
for i, row in enumerate(reader):
g = find_group(row)
if g >= 0:
stored.append((g, i, row))
stored.sort()
for g, i, row in stored:
writer.writerow([row[0], row[3]])
inFile.close()
outFile.close()
Perhaps try:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
col1.append(row[0])
col2.append(row[1])
for i in xrange(len(col1))
if col1[i] == '':
#thing to do if there is nothing for col1
if col2[i] == '':
#thing to do if there is nothing for col2
This is a start at "making sure that if a row is missing to add it".

Replace element in column with previous one in CSV file using python

3rd UPDATE: To describe the problem in precise:-
================================================
First post, so not able to format it well. Sorry for this.
I have a CSV file called sample.CSV. I need to add additional columns to this file, I could do it using below script. What is missing in my script
If present value in column named "row" is different from previous element. Then update the column named "value" with the previous row column value. If not, update it as zero in the "value" column.
Hope my question is clear. Thanks a lot for your support.
My script:
#!/usr/local/bin/python3 <bl
import csv, os, sys, time
inputfile='sample.csv'
with open(inputfile, 'r') as input, open('input.csv', 'w') as output:
reader = csv.reader(input, delimiter = ';')
writer = csv.writer(output, delimiter = ';')
list1 = []
header = next(reader)
header.insert(1,'value')
header.insert(2,'Id')
list1.append(header)
count = 0
for column in reader:
count += 1
list1.append(column)
myvalue = []
myvalue.append(column[4])
if count == 1:
firstmyvalue = myvalue
if count > 2 and myvalue != firstmyvalue:
column.insert(0, myvalue[0])
else:
column.insert(0, 0)
if column[0] != column[8]:
del column[0]
column.insert(0,0)
else:
del column[0]
column.insert(0,myvalue[0])
column.insert(1, count)
column.insert(0, 1)
writer.writerows(list1)
sample.csv:-
rate;sec;core;Ser;row;AC;PCI;RP;ne;net
244000;262399;7;5;323;29110;163;-90.38;2;244
244001;262527;6;5;323;29110;163;-89.19;2;244
244002;262531;6;5;323;29110;163;-90.69;2;244
244003;262571;6;5;325;29110;163;-88.75;2;244
244004;262665;7;5;320;29110;163;-90.31;2;244
244005;262686;7;5;326;29110;163;-91.69;2;244
244006;262718;7;5;323;29110;163;-89.5;2;244
244007;262753;7;5;324;29110;163;-90.25;2;244
244008;277482;5;5;325;29110;203;-87.13;2;244
My expected output:-
rate;value;Id;sec;core;Ser;row;AC;PCI;RP;ne;net
1;0;1;244000;262399;7;5;323;29110;163;-90.38;2;244
1;0;2;244001;262527;6;5;323;29110;163;-89.19;2;244
1;0;3;244002;262531;6;5;323;29110;163;-90.69;2;244
1;323;4;244003;262571;6;5;325;29110;163;-88.75;2;244
1;325;5;244004;262665;7;5;320;29110;163;-90.31;2;244
1;320;6;244005;262686;7;5;326;29110;163;-91.69;2;244
1;326;7;244006;262718;7;5;323;29110;163;-89.5;2;244
1;323;8;244007;262753;7;5;324;29110;163;-90.25;2;244
1;324;9;244008;277482;5;5;325;29110;203;-87.13;2;244
This will do the part you were asking for in a generic way, however your output clearly has more changes to it than the question asks for. I added in the Id column just to show how you can order the column output too:
df = pd.read_csv('sample.csv', sep=";")
df.loc[:,'value'] = None
df.loc[:, 'Id'] = df.index + 1
prev = None
for i, row in df.iterrows():
if prev is not None:
if row.row == prev.row:
df.value[i] = prev.value
else:
df.value[i] = prev.row
prev = row
df.to_csv('output.csv', index=False, cols=['rate','value','Id','sec','core','Ser','row','AC','PCI','RP','ne','net'], sep=';')
previous = []
for i, entry in enumerate(csv.reader(test.csv)):
if not i: # do this on first entry only
previous = entry # initialize here
print(entry)
else: # other entries
if entry[2] != previous[2]: # check if this entries row is equal to previous entries row
entry[1] = previous[2] # add previous entries row value to this entries var
previous = entry
print(entry)
import csv
with open('test.csv') as f, open('output.csv','w') as o:
out = csv.writer(o, delimiter='\t')
out.writerow(["id", 'value', 'row'])
reader = csv.DictReader(f, delimiter="\t") #Assuming file is tab delimited
prev_row = '100'
for line in reader:
if prev_row != line["row"]:
prev_row = line["row"]
out.writerow([line["id"],prev_row,line["row"]])
else:
out.writerow(line.values())
o.close()
content of output.csv:
id value row
1 0 100
2 0 100
3 110 110
4 140 140

Categories