How to check certain value of column and skip certain row

How to check certain value of column and skip certain row - python

import ctypes
import csv
with open('data.csv') as csv_file:
reader = csv.reader(csv_file)
next(reader)
for row in reader:
if(int(row[3])>=5):
print(row)
mymessage = 'A message'
title = 'Popup window'
ctypes.windll.user32.MessageBoxA(0, mymessage, title, 0)
else:
continue
I currently have the above csv file and code. However, when I do print(row),
['string1','00','00','21','00'..]
['string2','00','00','84','00'..]
['string3','00','00','21','00'..]
['string4','00','00','21','00'..]
.
.
.
['string7','00','00','21','00'..]
['string8','00','00','84','00'..]
['string9','00','00','15','00'..]
['string10','00','00','84','00'..]
[' ','precision','recall','f1-score','support'..]
It prints like above.
with open('data.csv') as csv_file:
reader = csv.reader(csv_file)
next(reader)
for row in reader[1:-8]:
print(row)
if(int(row[3])>=5):
mymessage = 'A message'
title = 'Popup window'
ctypes.windll.user32.MessageBoxA(0, mymessage, title, 0)
else:
continue
So in order to ignore the red part of the picture, I thought I could skip the last eight lines through the above code. But there's a problem.
My two problems with using this are:
I don't know the total number of lines, because
the number of strings1 to n changes every time when I write CSV file.
As I tried, I would like to remove the unnecessary lines below while checkinging whether each value in a particular column is greater than 5.

This gives you the number of lines of your csv before you import it into python.
lines = sum(1 for line in open('data.csv')) # gives you the total number of lines without loading csv
Then you can import it without the last 8 rows (like the example with pandas below)
import pandas as pd
data = pd.read_csv('data.csv', delimiter=';', decimal='.', header=0, skipfooter=8) # loads the csv without last 8 rows
or you load it with your code without the last 8 rows.
Connecting your code it should be:
lines = sum(1 for line in open('data.csv')) # gives you the total number of lines without loading csv
with open('data.csv') as csv_file:
reader = csv.reader(csv_file)
next(reader)
for row in range(lines-8): # using the number of lines here. It doesn't even load the last 8 lines
print(row)
if(int(row[3])>=5):
mymessage = 'A message'
title = 'Popup window'
ctypes.windll.user32.MessageBoxA(0, mymessage, title, 0)
else:
continue

I finally got the answer. We can make the csv reader read the certain range of rows through itertools.islice.
from itertools import islice
with open('data.csv') as csv_file:
reader = csv.reader(csv_file)
#next(reader) We don't need this anymore beacause we read only certain rows.
for row in islice(reader, 0, 10): #reads the row from 1 to 11
print(row)
if(int(row[3])>=5):
mymessage = 'A message'
title = 'Popup window'
ctypes.windll.user32.MessageBoxA(0, mymessage, title, 0)
else:
continue
So I used this code.
lines = sum(1 for line in open('data.csv'))
for row in islice(reader, 0, lines)
In case your total csv rows are not fixed, you can use the above code like this.

Related

Print first 5 rows of large csv file (not using pandas)

Im attempting to simplify a python code that will print the first five rows (plus header) of a large csv file in a more condensed output if possible. I would prefer to use pandas, however in this case I would like to just to just use the import cv and import os (Mac user).
Code as follows:
import csv
filename = "/Users/xx/Desktop/xx.csv"
fields = []
rows = []
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)
fields = next(csvreader)
for row in csvreader:
rows.append(row)
print("Total no. of rows:%d"%(csvreader.line_num))
print('Field names are:' + ', '.join(field for field in fields))
print('\nFirst 5 rows are:\n')
for row in rows[:5]:
for col in row:
print("%10s"%col,end=" "),
print('\n')

Row reading issue in csv containing html format data

I have one html file containing a table in it. Total rows in the tables are around 3500. I want to read and print rows with same values. PFA Image of the html data.
I transform the data into csv where I could see same data in html format.
As shown in image. I want to print and write all the rows containing "MyData" to another CSV and then need to mail it.
I tried using Soupbeautiful but not able to get the result.
I tried using CSV and Pandas but it is not returning the expected output.
My python code is as follows;
import csv
import numpy as np
import pandas as pd
import sys
csv.field_size_limit(sys.maxsize)
df = pd.read_csv('test.csv')
data = print (df.iloc[0:5])
Another code I tried
search_string = "MyData"
with open('test.csv') as f, open('test2.csv', 'w') as g:
reader = csv.reader(f)
next(reader, None) # discard the header
writer = csv.writer(g)
for row in reader:
if row[2] == search_string:
writer.writerow(row[:2])
print(row)
When I enter complete row from info_data then it gives me that particular row but not other rows where the string "MyData" is present.
Thanks !

You are currently testing the entry for an exact match with your search string. That entry contains a JSON string, so you could use in to see if it contains search_string rather than is an exact match for it, for example:
search_string = "MyData"
with open('test.csv') as f, open('test2.csv', 'w') as g:
reader = csv.reader(f)
next(reader, None) # discard the header
writer = csv.writer(g)
for row in reader:
if search_string in row[2]:
writer.writerow(row[:2])
print(row)
You would then want to add code to further decode you JSON data.

Python3.7 CSV file with multiples tables, how to get the middle table only

I have a CSV file with multiples headers and tables which was created by our system. Number of rows are dynamics but the table title name is always the same. Between each table, there's a blank row.
I'm using python3.7.3 and want to get the middle table(Device table) then upload to our database.
How can I do to get the middle table only? Can regex work with CSV file in this case?
Original file:
Report title:ABC
Created Date:Jul-15-2019
Model
Model Name,Number
abc,1
abc,2
Device
Device Name,Number
efg,1
efg,2
efg,3
Missing Device
Device Name,Number
xyz,3
xyz,4
The table I want to have(without table name):
Device Name,Number
efg,1
efg,2
efg,3

If you know, that all tables are separated by a newline you could just count the newlines and then parse the target table. Something like that:
import csv
table_ix = 2
with open('test.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
empty_line_count = 0
for row in csv_reader:
if len(row) == 0:
empty_line_count += 1
if empty_line_count == table_ix:
# do your parsing here
print(row)
Its not beautiful but it works. But I would suggest that you look at tools like Pandas etc.

Here is an approach:
Open the file for input
Skip all the lines until you reach the one that contains the header
From there, take all the lines that are not empty
Feed these lines into a CSV reader
Code
import csv
import itertools
with open('report.txt') as fh:
fh = itertools.dropwhile(lambda line: 'Device Name,Number' not in line, fh)
fh = itertools.takewhile(lambda line: line != '\n', fh)
reader = csv.reader(fh)
for row in reader:
print(row)
Output
['Device Name', 'Number']
['efg', '1']
['efg', '2']
['efg', '3']
Notes
I used itertools.dropwhile to perform step #2
... and itertools.takewhile for step #3

Counting using DictReader

Still new to Python, this is how far I've managed to get:
import csv
import sys
import os.path
#VARIABLES
reader = None
col_header = None
total_rows = None
rows = None
#METHODS
def read_csv(csv_file):
#Read and display CSV file w/ HEADERS
global reader, col_header, total_rows, rows
#Open assign dictionaries to reader
with open(csv_file, newline='') as csv_file:
#restval = blank columns = - /// restkey = extra columns +
reader = csv.DictReader(csv_file, fieldnames=None, restkey='+', restval='-', delimiter=',',
quotechar='"')
try:
col_header = reader.fieldnames
print('The headers: ' + str(reader.fieldnames))
for row in reader:
print(row)
#Calculate number of rows
rows = list(reader)
total_rows = len(rows)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(csv_file, reader.line_num, e))
def calc_total_rows():
print('\nTotal number of rows: ' + str(total_rows))
My issue is that, when I attempt to count the number of rows, it comes up as 0 (impossible because csv_file contains 4 rows and they print on screen.
I've placed the '#Calculate number of rows' code above my print row loop and it works, however the rows then don't print. It's as if each task is stealing the dictionary from one another? How do I solve this?

The problem is that the reader object behaves like a file as its iterating through the CSV. Firstly you iterate through in the for loop, and print each row. Then you try to create a list from whats left - which is now empty as you've iterated through the whole file. The length of this empty list is 0.
Try this instead:
rows = list(reader)
for row in rows:
print(row)
total_rows = len(rows)

How can I get a specific field of a csv file?

I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...

import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............

#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]

import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.

There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]

Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"

Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe

import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to check certain value of column and skip certain row - python

Related

Print first 5 rows of large csv file (not using pandas)

Row reading issue in csv containing html format data

Python3.7 CSV file with multiples tables, how to get the middle table only

Counting using DictReader

How can I get a specific field of a csv file?

Categories

Resources