I am new to data processing using CSV module. And i have input file And using this code`
import csv
path1 = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\charity.a.data"
csv_file_path = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\output.csv.bak"
with open(path1, 'r') as in_file:
in_file.__next__()
stripped = (line.strip() for line in in_file)
lines = (line.split(":$%:") for line in stripped if line)
with open(csv_file_path, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('id', 'donor_id','last_name','first_name','year','city','state','postal_code','gift_amount'))
writer.writerows(lines)
`
Is it possible to remove (:) in the first and last column of csv file. And i want output be like
Please help me.
If you just want to eliminate the ':' at the first and last column, this should work. Keep in mind that your dataset should be tab (or something other than comma) separated before you read it, because as I commented in your question, there are commas ',' in your dataset.
path1 = '/path/input.csv'
path2 = '/path/output.csv'
with open(path1, 'r') as input, open(path2, 'w') as output:
file = iter(input.readlines())
output.write(next(file))
for row in file:
output.write(row[1:][:-2] + '\n')
Update
So after giving your code, I added a small change to do the whole process starting from the initial file. The idea is the same. You should just exclude the first and the last char of each line. So instead of line.strip() you should have line.strip()[1:][:-2].
import csv
path1 = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\charity.a.data"
csv_file_path = "C:\\Users\\apple\\Downloads\\Challenge\\raw\\output.csv.bak"
with open(path1, 'r') as in_file:
in_file.__next__()
stripped = (line.strip()[1:][:-2] for line in in_file)
lines = (line.split(":$%:") for line in stripped if line)
with open(csv_file_path, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('id', 'donor_id','last_name','first_name','year','city','state','postal_code','gift_amount'))
writer.writerows(lines)
Related
This is my original .txt data:
HKEY_CURRENT_USER\SOFTWARE\7-Zip
HKEY_CURRENT_USER\SOFTWARE\AppDataLow
HKEY_CURRENT_USER\SOFTWARE\Chromium
HKEY_CURRENT_USER\SOFTWARE\Clients
HKEY_CURRENT_USER\SOFTWARE\CodeBlocks
HKEY_CURRENT_USER\SOFTWARE\Discord
HKEY_CURRENT_USER\SOFTWARE\Dropbox
HKEY_CURRENT_USER\SOFTWARE\DropboxUpdate
HKEY_CURRENT_USER\SOFTWARE\ej-technologies
HKEY_CURRENT_USER\SOFTWARE\Evernote
HKEY_CURRENT_USER\SOFTWARE\GNU
And I need to have a new file where the new lines contain only part of those strings, like:
7-Zip
AppDataLow
Chromium
Clients
...
how to do it in python?
Try this:
## read file content as string
with open("file.txt", "r") as file:
string = file.read()
## convert each line to list
lines = string.split("\n")
## write only last part after "\" in each line
with open("new.txt", "w") as file:
for line in lines:
file.write(line.split("\\")[-1] + "\n")
One approach would be to read the entire text file into a Python string. Then use split on each line to find the final path component.
with open('file.txt', 'r') as file:
data = file.read()
lines = re.split(r'\r?\n', data)
output = [x.split("\\")[-1] for x in lines]
# write to file if desired
text = '\n'.join(output)
f_out = open('output.txt', 'w')
f_out.write(text)
f_out.close()
I'm trying to write a code that will search for specific data from multiple report files, and write them into columns in a single csv.
The report file lines i'm looking for aren't always on the same line, so i'm looking for the data associated on the lines below:
Estimate file: pog_example.bef
Estimate ID: o1_p1
61078 (100.0%) estimated.
And I want to write the data from each text file into columns in a csv as below:
example.bef, o1_p1, 61078 (100.0%) estimated
So far I have this script which will list out the first of my criteria, but I can't figure out how to loop it through to find my second and third lines to populate the second and third columns
from glob import glob
import fileinput
import csv
with open('percentage_estimated.csv', 'w', newline='') as est_report:
writer = csv.writer(est_report)
for line in fileinput.input(glob('*.bef*')):
if 'Estimate file' in line:
writer.writerow([line.split('pog_')[1].strip()])
I'm pretty new to python so any help would be appreciated!
I think I see what you're trying to do, but I'm not sure.
I think your BEF file might look something like this:
a line
another line
Estimate file: pog_example.bef
Estimate ID: o1_p1
61078 (100.0%) estimated.
still more lines
If that's true, then once you find a line with 'Estimate file', you need to take control from the for-loop and start manually iterating the lines because you know which lines are coming up.
This is a very simple example script which opens my mock BEF file (above) and automatically iterates the lines till it finds 'Estimate file'. From there it processes each line specifically, using next(bef_file) to iterate to the next line, expecting them to have the correct text:
import csv
all_rows = []
bef_file = open('input.bef')
for line in bef_file:
if 'Estimate file' in line:
fname = line.split('pog_')[1].strip()
line = next(bef_file)
est_id = line.split('Estimate ID:')[1].strip()
line = next(bef_file)
value = line.strip()
row = [fname, est_id, value]
all_rows.append(row)
break # stop iterating lines in this file
csv_out = open('output.csv', 'w', newline='')
writer = csv.writer(csv_out)
writer.writerow(['File name', 'Est ID', 'Est Value'])
writer.writerows(all_rows)
When I run that I get this for output.csv:
File name,Est ID,Est Value
example.bef,o1_p1,61078 (100.0%) estimated.
If there are blank lines in your data between the lines you care about, manually step over them with next(bef_file) statements.
if anyone wants to see what finally worked for me
from glob import glob
import csv
all_rows = []
with open('percentage_estimated.csv', 'w', newline='') as bef_report:
writer = csv.writer(bef_report)
writer.writerow(['File name', 'Est ID', 'Est Value'])
for file in glob('*.bef*'):
with open(file,'r') as f:
for line in f:
if 'Estimate file' in line:
fname = line.split('pog_')[1].strip()
line = next(f)
est_id = line.split('Estimate ID:')[1].strip()
line = next(f)
line = next(f)
line = next(f)
line = next(f)
line = next(f)
line = next(f)
line = next(f)
value = line.strip()
row = [fname, est_id, value]
all_rows.append(row)
break
writer.writerows(all_rows)
I obtain multiple CSV files from API, in which I need to remove New Lines present in the CSV and join the record, consider the data provided below;
My Code to remove the New Line:
## Loading necessary libraries
import glob
import os
import shutil
import csv
## Assigning necessary path
source_path = "/home/Desktop/Space/"
dest_path = "/home/Desktop/Output/"
# Assigning file_read path to modify the copied CSV files
file_read_path = "/home/Desktop/Output/*.csv"
## Code to copy .csv files from one folder to another
for csv_file in glob.iglob(os.path.join(source_path, "*.csv"), recursive = True):
shutil.copy(csv_file, dest_path)
## Code to delete the second row in all .CSV files
for filename in glob.glob(file_read_path):
with open(filename, "r", encoding = 'ISO-8859-1') as file:
reader = list(csv.reader(file , delimiter = ","))
for i in range(0,len(reader)):
reader[i] = [row_space.replace("\n", "") for row_space in reader[i]]
with open(filename, "w") as output:
writer = csv.writer(output, delimiter = ",", dialect = 'unix')
for row in reader:
writer.writerow(row)
I actually copy the CSV files into a new folder and then use the above code to remove any new line present in the file.
You are fixing the csv File, because they have wrong \n the problem here is how
to know if the line is a part of the previous line or not. if all lines starts
with specifics words like in your example SV_a5d15EwfI8Zk1Zr or just SV_ You can do something like this:
import glob
# this is the FIX PART
# I have file ./data.csv(contains your example) Fixed version is in data.csv.FIXED
file_read_path = "./*.csv"
for filename in glob.glob(file_read_path):
with open(filename, "r", encoding='ISO-8859-1') as file, open(filename + '.FIXED', "w", encoding='ISO-8859-1') as target:
previous_line = ''
for line in file:
# check if it's a new line or a part of the previous line
if line.startswith('SV_'):
if previous_line:
target.write( previous_line + '\n')
previous_line = line[:-1] # remove \n
else:
# concatenate the broken part with previous_line
previous_line += line[:-1] # remove \n
# add last line
target.write(previous_line + '\n')
Ouput:
SV_a5d15EwfI8Zk1Zr;QID4;"<span style=""font-size:16px;""><strong>HOUR</strong> Interview completed at:</span>";HOUR;TE;SL;;;true;ValidNumber;0;23.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID6;"<span style=""font-size:16px;""><strong>MINUTE</strong> Interview completed:</span>";MIN;TE;SL;;;true;ValidNumber;0;59.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID8;Number of Refusals - no language<br />For <strong>Zero Refusals - no language</strong> use 0;REFUSAL1;TE;SL;;;true;ValidNumber;0;99.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID10;<strong>DAY OF WEEK:</strong>;WEEKDAY;MC;SACOL;TX;;true;;0;;;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID45;"<span style=""font-size:16px;"">Using points from 0 to 10, how likely would you be recommend Gatwick Airport to a friend or colleague?</span><div> </div>";NPSCORE;MC;NPS;;;true;;0;;;882;-873;
EDITS:
Can Be Simpler using split too, this will fix the file it self:
import glob
# this is the FIX PART
# I have file //data.csv the fixed version in the same file
file_read_path = "./*.csv"
# assuming that all lines starts with SV_
STARTING_KEYWORD = 'SV_'
for filename in glob.glob(file_read_path):
with open(filename, "r", encoding='ISO-8859-1') as file:
lines = file.read().split(STARTING_KEYWORD)
with open(filename, 'w', encoding='ISO-8859-1') as file:
file.write('\n'.join(STARTING_KEYWORD + l.replace('\n', '') for l in lines if l))
Well I'm not sure on the restrictions you have. But if you can use the pandas library , this is simple.
import pandas as pd
data_set = pd.read_csv(data_file,skip_blank_lines=True)
data_set.to_csv(target_file,index=False)
This will create a CSV File will all new lines removed. You can save a lot of time with available libraries.
I am trying to convert a txt file into a csv file in Python. The current format of the txt file are several strings separated by spaces. I would like to write each string into one cell in the csv file.
The txt file has got following structure:
UserID Desktop Display (Version) (Server/Port handle), Date
UserID Desktop Display (Version) (Server/Port handle), Date
etc.
My approach would be following:
with open('licfile.txt', "r+") as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split(" ") for line in stripped if line)
with open('licfile.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('user', 'desktop', 'display', 'version', 'server', 'handle', 'date'))
writer.writerows(lines)
Unfortunately this is not working as expected. I do get following ValueError: I/O operation on closed file. Additionally only the intended row headers are shown in one cell in the csv file.
Any tips on how to proceed? Many thanks in advance.
how about
with open('licfile.txt', 'r') as in_file, open('licfile.csv', 'w') as out_file:
for line in in_file:
if line.strip():
out_file.write(line.strip().replace(' ', ',') + '\n')
and for the german Excel enthusiasts...
...
...
...
... .replace(' ', ';') + '\n')
:)
You can also use the built in csv module to accomplish this easily:
import csv
with open('licfile.txt', 'r') as in_file, open('licfile.csv', 'w') as out_file:
reader = csv.reader(in_file, delimiter=" ")
writer = csv.writer(out_file, lineterminator='\n')
writer.writerows(reader)
I used lineterminator='\n' argument here as the default is \r\n and it ends up giving you an extra line of return per row in most cases.
There are also a few arguments you could use if say quoting is needed or a different delimiter is desired: https://docs.python.org/3/library/csv.html#csv-fmt-params
You are using comprehension with round brackets which will cause to create tuple object. Instead of that just use square bracket which will return list. see below example:
stripped = [line.strip() for line in in_file]
lines = [line.split(" ") for line in stripped if line]
licfile_df = pd.read_csv('licfile.txt',sep=",", header=None)
The problem is I have this text, csv file which is missing commas and I would like to insert it in order to run the file on LaTex and make a table. I have a MWE of a code from another problem which I ran and it did not work. Is it possible someone could guide me on how to change it.
I have used a Python code which provides a blank file, and another one which provides a blank document, and another which removes the spaces.
import fileinput
input_file = 'C:/Users/Light_Wisdom/Documents/Python Notes/test.txt'
output= open('out.txt','w+')
with open('out.txt', 'w+') as output:
for each_line in fileinput.input(input_file):
output.write("\n".join(x.strip() for x in each_line.split(',')))
text file contains more numbers but its like this
0 2.58612
0.00616025 2.20018
0.0123205 1.56186
0.0184807 0.371172
0.024641 0.327379
0.0308012 0.368863
0.0369615 0.322228
0.0431217 0.171899
Outcome
0.049282, -0.0635003
0.0554422, -0.110747
0.0616025, 0.0701394
0.0677627, 0.202381
0.073923, 0.241264
0.0800832, 0.193697
Renewed Attempt:
with open("CSV.txt","r") as file:
new = list(map(lambda x: ''.join(x.split()[0:1]+[","]+x.split()[0:2]),file.readlines()))
with open("New_CSV.txt","w+") as output:
for i in new:
output.writelines(i)
output.writelines("\n")
This can be using .split and .join by splitting the line into a list and then joining the list separated by commas. This enables us to handle several subsequent spaces in the file:
f1 = open(input_file, "r")
with open("out.txt", 'w') as f2:
for line in f1:
f2.write(",".join(line.split()) + "\n")
f1.close()
You can also use csv to handle the writing automatically:
import csv
f1 = open(input_file, "r")
with open("out.txt", 'w') as f2:
writer = csv.writer(f2)
for line in f1:
writer.writerow(line.split())
f1.close()