I'm trying to shave the top 7 lines off a csv file.
There is probably a more concise way to do this, but right now I am reading one file and writing each line other than the first 7 to another file. When I write to the file though, all the contents for the line show up in the first cell instead of spread out in organized columns.
Here is my code:
with open('file1.csv', 'r') as file_org:
with open("file2.csv","w") as file_stripped:
writer = csv.writer(file_stripped)
for i, line in enumerate(file_org, -7):
if i>=0:
writer.writerow([line])
Thank you!
reading in csv do need you to specify the seperator, which is usually ";", you can find the constructor usage in manual, and you should open the file and see the content, not by some other tool like excel
if you are not meant to change the content, you could just treat them as normal files / line, or just by
line.split(";")
or
";".join(splited_line)
manully
Related
I'm struggeling with one task that can save plenty of time. I'm new to Python so please don't kill me :)
I've got huge txt file with millions of records. I used to split them in MS Access, delimiter "|", filtered data so I can have about 400K records and then copied to Excel.
So basically file looks like:
What I would like to have:
I'm using Spyder so it would be great to see data in variable explorer so I can easily check and (after additional filters) export it to excel.
I use LibreOffice so I'm not 100% sure about Excel but if you change the .txt to .csv and try to open the file with Excel, it should allow to change the delimiter from a comma to '|' and then import it directly. That work with LibreOffice Calc anyway.
u have to split the file in lines then split the lines by the char l and map the data to a list o dicts.
with open ('filename') as file:
data = [{'id': line[0], 'fname':line[1]} for line in f.readlines()]
you have to fill in tve rest of the fields
Doing this with pandas will be much easier
Note: I am assuming that each entry is on a new line.
import pandas as pd
data = pd.read_csv("data.txt", delimiter='|')
# Do something here or let it be if you want to just convert text file to excel file
data.to_excel("data.xlsx")
I'm new to Python and I need to perform the following two tasks in a .txt file which contains more than 500 lines with lots of information: dates, hours, comments, names, etc.
(1) Replace the substrings "p. m." and "a. m." to "PM" and "AM". (Already did)
(2) I need to save the output into another file since I need to keep the original one. (This is the main issue).
I'm familiar with the concepts of open, read and close. But I have not solved this task yet:
with open('Dates of arrival.txt','r', encoding='utf-8') as file:
filedata = file.read()
filedata.replace("p.\xa0m.", "PM").replace("a.\xa0m.", "AM") # This output is the one I want to save as a .txt file.
I know I have to open another file to contain the information, but the file 'dates of arrival1.txt' is empty.
with open('dates of arrival1.txt', 'w') as wf:
wf.write(file) # I am not sure if file is the correct word to put there.
So, the main problem is how to nest these two codes into one in order to perform the tasks (1) and (2) and save the output into a .txt file. It may not be as difficult as I think but I need a little help on this one.
Thanks for the help.
Assuming you're happy with your string replace statements
Code can be simplified to the following:
with open('Dates of arrival.txt','r', encoding='utf-8') as file, open('dates of arrival1.txt', 'w') as wf:
wf.write(file.read().replace("p.\xa0m.", "PM").replace("a.\xa0m.", "AM"))
Is there a way I can extract multiple pieces of data from a text file in python and save it as a row in a new .csv file? I need to do this for multiple input files and save the output as a single .csv file for all of the input files.
I have never used Python before so I am quite clueless. I have used matlab before and I know how I would do it in matlab if it was numbers (but unfortunately it is text which is why I am trying python). So to be clear I need a new line in the .csv output file for each "ID" in the input files.
An example of the data is show below (2 separate files)
EXAMPLE DATA - FILE 1:
id,ARI201803290
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
info,date,2018/03/29
id,ARI201803300
data,er,corbp001,2
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
info,date,2018/03/30
data,er,delaj001,0
EXAMPLE DATA - FILE 2:
id,NYN201803290
version,2
info,visteam,SLN
info,hometeam,NYN
info,site,NYC20
info,usedh,false
info,date,2018/03/29
data,er,famij001,0
id,NYN201803310
version,2
info,visteam,SLN
info,hometeam,NYN
info,site,NYC20
info,date,2018/03/31
data,er,gselr001,0
I'm hoping to get the data in a .csv format with all the details from one "id" on 1 line. There are multiple "id's" per text file and there are multiple files. I want to repeat this process for multiple text files so the outputs are in the same .csv output file. I want the output to look as follows in the .csv file, with each piece of info as a new cell:
ARI201803290 COL ARI PHO01 2018/03/29 2
ARI201803300 COL ARI PHO01 2018/03/30 0
NYN201803290 SLN NYN NYC20 2018/03/29 0
NYN201803310 SLN NYN NYC20 2018/03/31 0
If I was doing it in matlab I'd use a for loop and if statement and say
j=1
k=1
for i=1:size(myMatrix, 1)
if file1(i;1)==id
output(k,1)=(i;2)
k=k+1
else if
file1(i;1)==info && file1(i;1)==info
output(j,2)=(i;3)
j=j+1
etc.....
However I obviously can't do this in matlab because I have comma separated text files, not a matrix. Does anyone have any suggestions how I can translate my idea to python code? Or any other suggestion. I am super new to python so willing to try anything that might work.
Thank you very much in advance!
python is very flexible and can do these jobs very easily,
there is a lot of csv tools/modules in python to handle pretty much all type of csv and excel files, however i prefer to handle a csv the same as a text file because csv is simply a text file with comma separated text, so simple is better than complicated
below is the code with comments to explain most of it, you can tweak it to match your needs exactly
import os
input_folder = 'myfolder/' # path of folder containing the text files on your disk
# create a list with file names with their full paths using list comprehension
data_files = [os.path.join(input_folder, file) for file in os.listdir(input_folder)]
# open our csv file for writing
csv = open('myoutput.csv', 'w') # better to open files with context manager like below but i am trying to show you different methods
def write_to_csv(line):
print(line)
csv.write(line)
# loop thru your text files
for file in data_files:
with open(file, 'r') as f: # use context manager to open files (best practice)
buff = []
for line in f:
line = line.strip() # remove spaces and new lines
line = line.split(',') # split line to list of values
if buff and line[0] == 'id': # hit another 'id'
write_to_csv(','.join(buff) + '\n')
buff = []
buff.append(line[-1]) # add the last word in line
write_to_csv(','.join(buff) + '\n')
csv.close() # must close any open file handles opened manually "no context manager i.e. no with"
output:
ARI201803290,2,COL,ARI,PHO01,2018/03/29,2
ARI201803300,2,COL,ARI,PHO01,2018/03/30,0
NYN201803290,2,SLN,NYN,NYC20,false,2018/03/29,0
NYN201803310,2,SLN,NYN,NYC20,2018/03/31,0
I have a text file which contains text in the first 20 or so lines, followed by CSV data. Some of the text in the text section contains commas and so trying csv.reader or csv.dictreader doesn't work well.
I want to skip past the text section and only then start to parse the CSV data.
Searches don't yield much other than instructions to either use csv.reader/csv.dictreader and iterate through the rows that are returned (which doesn't work because of the commas in the text), or to read the file line-by-line and split the lines using ',' as the delimiter.
The latter works up to a point, but it produces strings, not numbers. I could convert the strings to numbers but I'm hoping that there's a simple way to do this either with the csv or numpy libraries.
As requested - Sample data:
This is the first line. This is all just text to be skipped.
The first line doesn't always have a comma - maybe it's in the third line
Still no commas, or was there?
Yes, there was. And there it is again.
and so on
There are more lines but they finally stop when you get to
EndOfHeader
1,2,3,4,5
8,9,10,11,12
3, 6, 9, 12, 15
Thanks for the help.
Edit#2
A suggested answer gave the following link entitled Read file from line 2...
That's kind of what I'm looking for, but I want to be able to read through the lines until I find the "EndOfHeader" and then call on the CSV library to handle the remainder of the file.
The reply by saimadhu.polamuri is part of what I've tried, specifically
with open(filename , 'r') as f:
first_line = f.readline()
for line in f:
#test if line equals EndOfHeader. If true then parse as CSV
But that's where it comes apart - I can't see how to have CSV work with the data from this point forward.
With thanks to #Mike for the suggestion, the code is actually reasonably straightforward.
with open('data.csv') as f: # open the file
for i in range(7): # Loop over first 7 lines
str=f.readline() # just read them. Could also do f.next()
r = csv.reader(f, delimiter=',') # Now pass the file handle to a csv reader
for row in r: # and loop over the resulting rows
print(row) # Print the row. Or do something else.
In my actual code, it will search for the EndOfHeader line and use that to decide where to start parsing the CSV
I'm posting this as an answer, as the question that this one supposedly duplicates doesn't explicitly consider this issue of the file handle and how it can be passed to a CSV reader, and so it may help someone else.
Thanks to all who took time to help.
i'm very new to python, so far i have written the following code below, which allows me to search for text files in a folder, then read all the lines from it, open an excel file and save the read lines in it. (Im still unsure whether this does it for all the text files one by one)
Having run this, i only see the file text data being read and saved into the excel file (first column). Or it could be that it is overwriting the the data from multiple text files into the same column until it finishes.
Could anyone point me in the right direction on how to get it to write the stripped data to the next available column in excel through each text file?
import os
import glob
list_of_files = glob.glob('./*.txt')
for fileName in list_of_files:
fin = open( fileName, "r" )
data_list = fin.readlines()
fin.close() # closes file
del data_list[0:17]
del data_list[1:27] # [*:*]
fout = open("stripD.xls", "w")
fout.writelines(data_list)
fout.flush()
fout.close()
Can be condensed in
import glob
list_of_files = glob.glob('./*.txt')
with open("stripD.xls", "w") as fout:
for fileName in list_of_files:
data_list = open( fileName, "r" ).readlines()
fout.write(data_list[17])
fout.writelines(data_list[44:])
Are you aware that writelines() doesn't introduce newlines ? readlines() keeps newlines during a reading, so there are newlines present in the elements of data_list written in the file by writelines() , but this latter doesn't introduce newlines itself
You may like to check this and for simple needs also csv.
These lines are "interesting":
del data_list[0:17]
del data_list[1:27] # [*:*]
You are deleting as many of the first 17 lines of your input file as exist, keeping the 18th (if it exists), deleting another 26 (if they exist), and keeping any following lines. This is a very unusual procedure, and is not mentioned at all in your description of what you are trying to do.
Secondly, you are writing the output lines (if any) from each to the same output file. At the end of the script, the output file will contain data from only the last input file. Don't change your code to use append mode ... opening and closing the same file all the time just to append records is very wasteful, and only justified if you have a real need to make sure that the data is flushed to disk in case of a power or other failure. Open your output file once, before you start reading files, and close it once when you have finished with all the input files.
Thirdly, any old arbitrary text file doesn't become an "excel file" just because you have named it "something.xls". You should write it with the csv module and name it "something.csv". If you want more control over how Excel will interpret it, write an xls file using xlwt.
Fourthly, you mention "column" several times, but as you have not given any details about how your input lines are to be split into "columns", it is rather difficult to guess what you mean by "next available column". It is even possible to suspect that you are confusing columns and rows ... assuming less than 43 lines in each input file, the 18th ROW of the last input file will be all you will see in the output file.