Deleting specific lines in csv files

Deleting specific lines in csv files - python

The csv file looks like below: (with a thousand more lines)
step0:
141
step1:
140
step2:
4
step3:
139
step4:
137
step5:
136
15
step6:
134
13
139
step7:
133
19
I am trying to read each line and remove lines (the ones that includes numbers only) that are, say, greater than 27.
Originally, my csv file a string file, so all of the lines are considered strings.
What I have done is the following:
first loop through the lines that does not include "step" in them
change them into float
remove all that are greater than 27
Now I want to save (overwrite) my file after deleting these numbers but I am stuck.
Could someone assist?
import csv
f = open('list.csv', 'r')
reader = csv.reader(f, delimiter="\n")
for row in reader:
for e in row:
if 'step' not in e:
d=float(e)
if d>27:
del(d)

import csv
with open('output.csv', 'w+') as output_file:
with open('input.csv') as input_file: #change you file name here
reader = csv.reader(input_file, delimiter = '\n')
line_index = 0 # debugging
for row in reader:
line_index += 1
line = row[0]
if 'step' in line:
output_file.write(line)
output_file.write('\n')
else:
try:
number = int(line) # you can use float, but then 30 become 30.0
if number <= 27:
output_file.write(line)
output_file.write('\n')
except:
print("Abnormal data at line %s", str(line_index))
I assume that your input file is input.csv. This program writes to new output file. The output is output.csv:
step0:
step1:
step2:
4
step3:
step4:
step5:
15
step6:
13
step7:
19

One solution with re module:
import re
with open('file.txt', 'r') as f_in:
data = f_in.read()
data = re.sub(r'\b(\d+)\n*', lambda g: '' if int(g.group()) > 27 else g.group(), data)
with open('file_out.txt', 'w') as f_out:
f_out.write(data)
The content of file_out.txt will be:
step0:
step1:
step2:
4
step3:
step4:
step5:
15
step6:
13
step7:
19
26

import csv
with open('list.csv', 'r') as list:
with open('new_list.csv', 'w') as new_list:
reader = csv.reader(list, delimiter="\n")
writer = csv.writer(new_list, delimiter="\n")
for row in reader:
if 'step' not in e:
if float(e) < 27:
writer.writerow(e)
else:
writer.writerow(e)
Essentially you're going to just copy over the rows you want to your new file. If the line is step, we write it. If the line is less than 27, we write it. If you'd prefer to just overwrite your file when you're done:
import csv
rows_to_keep = []
with open('list.csv', 'r') as list:
reader = csv.reader(list, delimiter="\n")
for row in reader:
if 'step' not in e:
if float(e) < 27:
rows_to_keep.append(e)
else:
rows_to_keep.append(e)
with open('list.csv', 'w') as new_list:
writer = csv.writer(list, delimiter="\n")
writer.write_rows(rows_to_keep)

Related

how to add random values to the column of a csv file?

I want to append a column in a prefilled csv file with 3 million rows using python. Then, i want to fill the column with random values in the range of (1, 50). something like this:
input csv file,
awareness trip amount
25 1 30
30 2 35
output csv file,
awareness trip amount size
25 1 30 49
30 2 35 20
how can i do this?
the code i have written is as follows:
with open('2019-01-1.csv', 'r') as CSVIN: with open('2019-01-2.csv', 'w') as
CSVOUT:
CSVWrite = csv.writer(CSVOUT, lineterminator='\n') CSVRead =
csv.reader(CSVIN)
CSVWrite = csv.writer(CSVOUT, lineterminator='\n')
CSVRead = csv.reader(CSVIN)
NewDict = []
row = next(CSVRead)
row.append('Size')
NewDict.append(row)
print(NewDict.append(row))
for row in CSVRead:
randSize = np.random.randint(1, 50)
row.append(row[0])
NewDict.append(row)
CSVWrite.writerows(NewDict)

Check out this answer: Python Add string to each line in a file
I've found it much easier to use with for files instead of importing csv or other special filetype libraries unless my use case is very specific.
So in your case, it would be something like:
input_file_name = "2019-01-1.csv"
output_file_name = "2019-01-2.csv"
with open(input_file_name, 'r') as f:
file_lines = [''.join([x, ",Size,{}".format(random.randint(1, 50)), '\n']) for x in f.readlines()]
with open(output_file_name, 'w') as f:
f.writelines(file_lines)

How to extract and copy lines from csv file to another csv file in python?

Let's suppose that I have a big data in a csv file:This is a set of lines from my file:
frame.number frame.len frame.cap_len frame.Type
1 100 100 ICMP_tt
2 64 64 UDP
3 100 100 ICMP_tt
4 87 64 ICMP_nn
I want to extract 30 % from this file and put it in another csv file.
I try by using this code but it gives me selection per row not per line:
import csv
data = [] #Buffer list
with open("E:\\Test.csv", "rb") as the_file:
reader = csv.reader(the_file, delimiter=",")
for line in reader:
try:
new_line = [line[0], line[1]]
#Basically ´write the rows to a list
data.append(new_line)
except IndexError as e:
print e
pass
with open("E:\\NewTest.csv", "w+") as to_file:
writer = csv.writer(to_file, delimiter=",")
for new_line in data:
writer.writerow(new_line)
I try

Python csv module use , by default, so its not necessary to specify delimiter unless you have a different delimiter. I suppose you have following csv file:
frame.number,frame.len,frame.cap_len,frame.Type
1,100,100,ICMP_tt
2,64,64,UDP
3,100,100,ICMP_tt
4,87,64,ICMP_nn
Each line in the file represents a row.
# read
data = []
with open('test.csv', 'r') as f:
f_csv = csv.reader(f)
# header = next(f_csv)
for row in f_csv:
data.append(row)
# write
with open('newtest.csv', 'w+') as f:
writer = csv.writer(f)
for i in range(int(len(data) * 30 / 100)):
writer.writerow(data[i])

remove extra rows in a file in python

I have a text file with 8 columns. The first one is ID and the 8th one is type. In the first column there are many repetitive rows per ID but in the 8th column there many types per ID and one type is H and there is only one H per ID.
ID type
E0 B
E0 H
E0 S
B4 B
B4 H
I want to make another file in which there is only one row per ID (only the row which has H in the 8th column). This example would be like this:
ID type
E0 H
B4 H

Just updated solution of inspectorG4dget for Python 2.7.3:
Only consider two columns in input csv file which are ID and type separated by \t
Code:
import csv
with open('/home/vivek/Desktop/input.csv', 'rb') as infile, open('/home/vivek/Desktop/output.csv', 'wb') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
reader_row = next(reader)
writer.writerow([reader_row[0], reader_row[1]])
for row in reader:
if row[1]=="H":
writer.writerow(row)
Output:
ID type
E0 H
B4 H
Check following for 2.6.6 I have not tested following code for python 2.6.6 because I have python 2.7.3 on my machine.
with open('/home/vivek/Desktop/input.csv', 'rb') as infile:
with open('/home/vivek/Desktop/output.csv', 'wb') as outfile:
reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, delimiter='\t')
reader_row = next(reader)
writer.writerow([reader_row[0], reader_row[1]])
for row in reader:
if row[1]=="H":
writer.writerow(row)

Assuming your file is simply a text file with spaces/tabs delimiting columns, and the column containing 'type' is right at the end of the row:
with open('input.txt', 'r') as input_file:
input_lines = input_file.readlines()
# Take the header line, and all the subsequent lines whose last character is 'H'
output_lines = input_lines[:1] + [line for line in input_lines if line[-2] == 'H']
output_string = ''.join(output_lines)
with open('output.txt', 'w') as output_file:
output_file.write(output_string)
The above code assumes that the 'type' column ends immediately after the single-character type code. If there can be whitespace after the data, or if you can have multi-character type codes that might look like 'AH' etc, then substitute the row beneath the comment with the below:
output_lines = input_lines[:1] + [line for line in input_lines if line.split()[-1] == 'H']
Edit: If your file turns out to be huge and you don't want to load it all into memory and manipulate, you can use a generator expression, which is lazily evaluated:
with open('input.txt', 'r') as input_file:
output_lines = (line for i, line in enumerate(input_lines)
if line[-2] == 'H' or i == 0)
with open('output.txt', 'w') as output_file:
for line in output_lines:
output_file.write(line)

search column, return row from excel using python

I have a csv file with column A that has dates. I want to search the dates and return the corresponding row in array (VERY IMPT) format. How would I do that using python? Thank you.
excel file:
A B C
1 12202014 403 302
2 12212014 312 674
3 12222014 193 310
input:
Search csv file for 12212014
output:
[12212014,312,674]
attempt code:
date = '12212014'
with open('dates.csv', 'rt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
if date == row[0]:
print "found the date"
How do I return the row instead of just a print statement?

The basic idea is to create a dictionary/mapping date->line and get the value by the key:
import csv
data = {}
with open('test.csv', 'r') as f:
reader = csv.reader(f)
for line in reader:
data[line[0]] = line
date_to_find = '12212014'
print data.get(date_to_find, 'Date not found')
prints:
['12212014', '312', '674']
This helps if you need the mapping afterwards to find the dates in it.
Another approach is to use generator expression to iterate over lines in a file until we find the date:
import csv
date_to_find = '12212014'
with open('test.csv', 'r') as f:
print next((line for line in csv.reader(f) if line[0] == date_to_find),
'Date not found')
prints:
['12212014', '312', '674']
Hope that helps.

CSV calculation with Python. Get data from list

Input txt:
May 2014, 156
May 2013, 556
May 2013, 651
I add this data to input.csv file:
import csv
with open("text_file.txt") as inputFile:
for line in inputFile:
vars = [line.strip().split(",") for line in inputFile]
print vars
#set True if you want to export input as csv file
convert_to_csv = True
if convert_to_csv == True:
with open('input.csv', 'w') as fp:
a = csv.writer(fp, delimiter=',')
data = [['Text', 'Count']] + vars
a.writerows(data)
How can I get variables from my vars and add 10 (example: 156 + 10) to my 'Count' value?
print vars output:
[['May 2013', 156], ['May 2013', 556]]
Count after calculation = Count + 10 and I want to write output also to csv file (output.csv) with header ['Text', 'Count after calc']

Refactoring a bit:
#!/usr/bin/python
import csv
with open('text_file.txt') as input_file:
csv_data = []
for row in csv.reader(input_file, delimiter=','):
csv_data += [[row[0], int(row[1].strip()) + 10]]
convert_to_csv = True
if convert_to_csv:
with open('input.csv', 'w') as output_file:
csv_file = csv.writer(output_file, delimiter=',')
csv_file.writerow(['Text', 'Count'])
for row in csv_data:
csv_file.writerow(row)
For what you've described it's working well. Good luck! :)

What you need is to parse the string into an integer, and add 10 to it.
vars = [line.strip().split(",") for line in inputFile]
vars = [[v[0],int(v[1])+10] for v in vars]
# ^^^^^^^^^^^^
print vars
Note that this is not a complete code, you will have to perform a few more checks, like what if the second item is not an integer? Upon splitting on ,, what if the number of items is not equal to 2 ?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deleting specific lines in csv files - python

Related

how to add random values to the column of a csv file?

How to extract and copy lines from csv file to another csv file in python?

remove extra rows in a file in python

search column, return row from excel using python

CSV calculation with Python. Get data from list

Categories

Resources