I am trying to read a file with below data
Et1, Arista2, Ethernet1
Et2, Arista2, Ethernet2
Ma1, Arista2, Management1
I need to read the file replace Et with Ethernet and Ma with Management. At the end of them the digit should be the same. The actual output should be as follows
Ethernet1, Arista2, Ethernet1
Ethernet2, Arista2, Ethernet2
Management1, Arista2, Management1
I tried a code with Regular expressions, I am able to get to the point I can parse all Et1, Et2 and Ma1. But unable to replace them.
import re
with open('test.txt','r') as fin:
for line in fin:
data = re.findall(r'\A[A-Z][a-z]\Z\d[0-9]*', line)
print(data)
The output looks like this..
['Et1']
['Et2']
['Ma1']
import re
#to avoid compile in each iteration
re_et = re.compile(r'^Et(\d+),')
re_ma = re.compile(r'^Ma(\d+),')
with open('test.txt') as fin:
for line in fin:
data = re_et.sub('Ethernet\g<1>,', line.strip())
data = re_ma.sub('Management\g<1>,', data)
print(data)
This example follows Joseph Farah's suggestion
import csv
file_name = 'data.csv'
output_file_name = "corrected_data.csv"
data = []
with open(file_name, "rb") as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
data.append(row)
corrected_data = []
for row in data:
tmp_row = []
for col in row:
if 'Et' in col and not "Ethernet" in col:
col = col.replace("Et", "Ethernet")
elif 'Ma' in col and not "Management" in col:
col = col.replace("Ma", "Management")
tmp_row.append(col)
corrected_data.append(tmp_row)
with open(output_file_name, "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for row in corrected_data:
writer.writerow(row)
print data
Here are the steps you should take:
Read each line in the file
Separate each line into smaller list items using the comments as delimiters
Use str.replace() to replace the characters with the words you want; keep in mind that anything that says "Et" (including the beginning of the word "ethernet") will be replaced, so remember to account for that. Same goes for Ma and Management.
Roll it back into one big list and put it back in the file with file.write(). You may have to overwrite the original file.
Related
I have a csv file with the following structure:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
I need him to stay like this:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
I received this .csv file from someone else, so I do not know how the conversion was done. I am trying unsuccessfully with the code below:
input_fd = open("/home/gustavo/Downloads/Redes/Despesas/csvfile.csv", 'r')
output_fd = open('dados_2018_1.csv', 'w')
for line in input_fd.readlines():
line.replace("\"","")
output_fd.write(line)
input_fd.close()
output_fd.close()
Is it possible to make this change or will I have to do the conversion from an xml file to a csv, and make this change during the conversion?
First: tell the reader to use delimiter=";" and quoting=csv.QUOTE_NONE. This will properly split your second line which is a string literal containing your delimiter, which you desire to be split. We'll tweak that data to remove the quotation marks (otherwise our output will be quoted strings like '"txNomeParlamentar"', etc).
import csv
with open('file.txt') as f:
reader = csv.reader(f, delimiter=";", quoting=csv.QUOTE_NONE)
data = [list(map(lambda s: s.replace('"', ''), row)) for row in reader]
Then: we write the file back out, with the delimiter=";", and quoting=csv.QUOTE_ALL to ensure each item is set in quotes
with open('out.txt', 'w', newline='') as o:
writer = csv.writer(o, delimiter=";", quoting=csv.QUOTE_ALL)
writer.writerows(data)
Input:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
Output:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
A couple things. First, you do NOT have a csv file because in a csv file, the delimiter is a comma by definition. I'm assuming you want the values in your data file to (1) remain separated by semicolons [why not fix it and make it commas?] and (2) you want each value to be in quotation marks.
If so, I think this will work:
# data reader
in_file = 'data.txt'
out_file = 'fixed.txt'
output = open(out_file, 'w')
with open(in_file, 'r') as source:
for line in source:
# split by semicolon
data = line.strip().split(';')
# remove all quotes found
data = [t.replace('"','') for t in data]
for item in data[:-1]:
output.write(''.join(['"', item, '"',';']))
# write the last item separately, without the trailing ';'
output.write(''.join(['"', item, '"']))
output.write('\n')
output.close()
If your target user is python, you should consider replacing the semicolons with commas (correct csv format) and forgoing the quotes. Everything python reads from csv is taken in as string anyhow.
Using csv module.
Ex:
import csv
with open(filename) as csvfile:
reader = csv.reader(csvfile, delimiter=";")
headers = next(reader) #Read Headers
data = [row.strip('"').split(";") for row in csvfile] #Format data
with open(filename, "w") as csvfile_out:
writer = csv.writer(csvfile_out, delimiter=";")
writer.writerow(headers) #Write Headers
writer.writerows(data) #Write data
You could use the csv module to do it if you massage the input data a little first.
import csv
#input_csv = '/home/gustavo/Downloads/Redes/Despesas/csvfile.csv'
input_csv = 'gustavo_input.csv'
output_csv = 'dados_2018_1.csv'
with open(input_csv, 'r', newline='') as input_fd, \
open(output_csv, 'w', newline='') as output_fd:
reader = csv.DictReader(input_fd, delimiter=';')
writer = csv.DictWriter(output_fd, delimiter=';',
fieldnames=reader.fieldnames,
quoting=csv.QUOTE_ALL)
first_field = reader.fieldnames[0]
for row in reader:
fields = row[first_field].split(';')
newrow = dict(zip(reader.fieldnames, fields))
writer.writerow(newrow)
print('done')
I would like to read each row of the csv file and match each word in the row with a list of strings. If any of the strings appears in the row, then write that string at the end of the line separated by comma.
The code below doesn't give me what I want.
file = 'test.csv'
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
name=''
with open('testnew2.csv','a') as f:
for line in text_lines:
line=str(line)
#words = line.split()
with open('names.csv', 'r') as fd:
reader = csv.reader(fd, delimiter=',')
for row in reader:
if row[0] in line:
name=row
print(name)
f.write(line+","+name[0]+'\n')
A sample of test.csv would look like this:
A,B,C,D
ABCD,,,
Total,Robert,,
Name,Annie,,
Total,Robert,,
And the names.csv would look:
Robert
Annie
Amanda
The output I want is:
A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
Currently the code will get rid of lines that don't result in a match, so I got:
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
Process each line by testing row[1] and appending the 5th column, then writing it. The name list isn't really a csv. If it's really long use a set for lookup. Read it only once for efficiency as well.
import csv
with open('names.txt') as f:
names = set(f.read().strip().splitlines())
# newline='' per Python 3 csv documentation...
with open('input.csv',newline='') as inf:
with open('output.csv','w',newline='') as outf:
r = csv.reader(inf)
w = csv.writer(outf)
for row in r:
row.append(row[1] if row[1] in names else '')
w.writerow(row)
Output:
A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert
I think the problem is you're only writing when the name is in the row. To fix that move the writing call outside of the if conditional:
if row[0] in line:
name=row
print(name)
f.write(line+","+name[0]+'\n')
I'm guessing that print statement is for testing purposes?
EDIT: On second thought, you may need to move name='' inside the loop as well so it is reset after each row is written, that way you don't get names from matched rows bleeding into unmatched rows.
EDIT: Decided to show an implementation that should avoid the (possible) problem of two matched names in a row:
EDIT: Changed name=row and the call of name[0] in f.write() to name=row[0] and a call of name in f.write()
file = 'test.csv'
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
with open('testnew2.csv','a') as f:
for line in text_lines:
name=''
line=str(line)
#words = line.split()
with open('names.csv', 'r') as fd:
reader = csv.reader(fd, delimiter=',')
match=False
while match == False:
for row in reader:
if row[0] in line:
name=row[0]
print(name)
match=True
f.write(line+","+name+'\n')
Try this as well:
import csv
myFile = open('testnew2.csv', 'wb+')
writer = csv.writer(myFile)
reader2 = open('names.csv').readlines()
with open('test.csv') as File1:
reader1 = csv.reader(File1)
for row in reader1:
name = ""
for record in reader2:
record = record.replace("\n","")
if record in row:
row.append(record)
writer.writerow(row)
break
I am probably making a stupid mistake, but I can't find where it is. I want to count the number of lines in my csv file. I wrote this, and obviously isn't working: I have row_count = 0 while it should be 400. Cheers.
f = open(adresse,"r")
reader = csv.reader(f,delimiter = ",")
data = [l for l in reader]
row_count = sum(1 for row in reader)
print row_count
with open(adresse,"r") as f:
reader = csv.reader(f,delimiter = ",")
data = list(reader)
row_count = len(data)
You are trying to read the file twice, when the file pointer has already reached the end of file after saving the data list.
First you have to open the file with open
input_file = open("nameOfFile.csv","r+")
Then use the csv.reader for open the csv
reader_file = csv.reader(input_file)
At the last, you can take the number of row with the instruction 'len'
value = len(list(reader_file))
The total code is this:
input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))
Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position
If you are working with python3 and have pandas library installed you can go with
import pandas as pd
results = pd.read_csv('f.csv')
print(len(results))
I would consider using a generator. It would do the job and keeps you safe from MemoryError of any kind
def generator_count_file_rows(input_file):
for row in open(input_file,'r'):
yield row
And then
for row in generator_count_file_rows('very_large_set.csv'):
count+=1
The important stuff is hidden in comments section of solution which is marked correct.
Re-sharing Erdős-Bacon's solution here for better visibility.
Why ?
Because: It saves lot of memory without having to create list.
So I think it is better do this way
def read_raw_csv(file_name):
with open(file_name, 'r') as file:
csvreader = csv.reader(file)
# count number of rows
entry_count = sum(1 for row in csvreader)
print(entry_count-1) # -1 is for discarding header row.
Checkout this link for more info
# with built in libraries
opened_file = open('f.csv')
from csv import reader
read_file = reader(opened_file)
apps_data = list(read_file)
rowcount = len(apps_data) #which incudes header row
print("Total rows incuding header: " + str(rowcount))
Simply Open the csv file in Notepad++. It shows the total row count in a jiffy. :)
Or
in cmd prompt , Provide file path and key in the command
find \c \v "some meaningless string" Filename.csv
I am trying to read a specific comma value from a csv file but i am getting the full row value how can i get the specific comma value
My csv looks like this
Index,Time,Energy
1,1.0,45.034
i need to get the values of Energy in each column.
import csv
with open('somefile.csv') as f:
reader = csv.DictReader(f, delimiter=',')
rows = list(reader)
for row in rows:
print(row['Energy'])
f = open('file.txt')
f.readline() # To skip header
for line in f:
print(line.split(',')[2])
f.close()
If you want it working even if the position of column Energy changes, you can do this:
with open('your_file.csv') as f:
# Get header
header = f.readline()
energy_index = header.index('Energy')
# Get Energy value
for line in f.readlines():
energy = line.split(',')[energy_index]
# do whatever you want to do with Energy
Check the below code. Hoping this is what you are looking for.
import csv
try:
fobj = open(file_name, 'r')
file_content = csv.reader(fobj, delimiter=',', quotechar='|')
except:
fobj.close()
file_content = False
if file_content:
for row_data in file_content:
try:
# This will return the 3rd columns value i.e 'energy'
row_data[2] = row_data[2].strip()
print row_data[2]
except:
pass
fobj.close()
I have a CSV file that is being constantly appended. It has multiple headers and the only common thing among the headers is that the first column is always "NAME".
How do I split the single CSV file into separate CSV files, one for each header row?
here is a sample file:
"NAME","AGE","SEX","WEIGHT","CITY"
"Bob",20,"M",120,"New York"
"Peter",33,"M",220,"Toronto"
"Mary",43,"F",130,"Miami"
"NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
"Larry","USA","Football",14,"Baseball",22
"Jenny","UK","Rugby",5,"Field Hockey",11
"Jacques","Canada","Hockey",19,"Volleyball",4
"NAME","DRINK","QTY"
"Jesse","Beer",6
"Wendel","Juice",1
"Angela","Milk",3
If the size of the csv files is not huge -- so all can be in memory at once -- just use read() to read the file into a string and then use a regex on this string:
import re
with open(ur_csv) as f:
data=f.read()
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',data,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
If the size of the file is a concern, you can use mmap to create something that looks like a big string but is not all in memory at the same time.
Then use the mmap string with a regex to separate the csv chunks like so:
import mmap
import re
with open(ur_csv) as f:
mf=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',mf,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
In either case, this will write all the chunks in files named 1.csv, 2.csv etc.
Copy the input to a new output file each time you see a header line. Something like this (not checked for errors):
partNum = 1
outHandle = None
for line in open("yourfile.csv","r").readlines():
if line.startswith('"NAME"'):
if outHandle is not None:
outHandle.close()
outHandle = open("part%d.csv" % (partNum,), "w")
partNum += 1
outHandle.write(line)
outHandle.close()
The above will break if the input does not begin with a header line or if the input is empty.
You can use the python csv package to read your source file and write multile csv files based on the rule that if element 0 in your row == "NAME", spawn off a new file. Something like this...
import csv
outfile_name = "out_%.csv"
out_num = 1
with open('nameslist.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
csv_buffer = []
for row in csvreader:
if row[0] != "NAME":
csv_buffer.append(row)
else:
with open(outfile_name % out_num, 'wb') as csvout:
for b_row in csv_buffer:
csvout.writerow(b_row)
out_num += 1
csv_buffer = [row]
P.S. I haven't actually tested this but that's the general concept
Given the other answers, the only modification that I would suggest would be to open using csv.DictReader. pseudo code would be like this. Assuming that the first line in the file is the first header
Note that this assumes that there is no blank line or other indicator between the entries so that a 'NAME' header occurs right after data. If there were a blank line between appended files the you could use that as an indicator to use infile.fieldnames() on the next row. If you need to handle the inputs as a list, then the previous answers are better.
ifile = open(filename, 'rb')
infile = cvs.Dictreader(ifile)
infields = infile.fieldnames
filenum = 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
for row in infile:
if row['NAME'] != 'NAME':
#process this row here and do whatever is needed
else:
close(ofile)
# build infields again from this row
infields = [row["NAME"], ...] # This assumes you know the names & order
# Dict cannot be pulled as a list and keep the order that you want.
filenum += 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
# This is the end of the loop. All data has been read and processed
close(ofile)
close(ifile)
If the exact order of the new header does not matter except for the name in the first entry, then you can transfer the new list as follows:
infileds = [row['NAME']
for k in row.keys():
if k != 'NAME':
infields.append(row[k])
This will create the new header with NAME in entry 0 but the others will not be in any particular order.