How to read this particular file format?

How to read this particular file format? - python

I have the following text in a csv file:
b'DataMart\n\nDate/Time Generated,11/7/16 8:54 PM\nReport Time Zone,America/New_York\nAccount ID,8967\nDate Range,10/8/16 - 11/6/16\n\nReport Fields\nSite (DCM),Creative\nGlobest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter\nGlobest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter'
Essentially there are multiple new line characters in this file instead of a single big string so you can picture the same text as follows
DataMart
Date/Time Generated,11/7/16 8:54 PM
Report Time Zone,America/New_York
Account ID,8967
Date Range,10/8/16 - 11/6/16
Report Fields
Site (DCM),Creative
Globest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter
Globest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter
I need to grab the last two lines, which is basically the data. I tried doing a for loop:
with open('file.csv','r') as f:
for line in f:
print(line)
It instead prints the entire line again with \n.

Just read the file and get the last two lines:
my_file = file("/path/to/file").read()
print(my_file.splitlines()[-2:])
The [-2:] is known as slicing: it creates a slice, starting from the second to last element, going to the end.

ok, after struggling around for a bit, i found out that i need to change the decoding of the file from binary to 'utf-8' and then i can apply the split functions. The problem was split functions are not applicable to the binary file.
This is the actual code that seems to be working for me now:
with open('BinaryFile.csv','rb') as f1:
data=f1.read()
text=data.decode('utf-8')
with open('TextFile.csv', 'w') as f2:
f2.write(text)
with open('TextFile.csv','r') as f3:
for line in f3:
print(line.split('\\n')[9:])
thanks for your help guys

Related

File has two parts - 1st is text 2nd is CSV. How to parse only the CSV part with python

I have a text file which contains text in the first 20 or so lines, followed by CSV data. Some of the text in the text section contains commas and so trying csv.reader or csv.dictreader doesn't work well.
I want to skip past the text section and only then start to parse the CSV data.
Searches don't yield much other than instructions to either use csv.reader/csv.dictreader and iterate through the rows that are returned (which doesn't work because of the commas in the text), or to read the file line-by-line and split the lines using ',' as the delimiter.
The latter works up to a point, but it produces strings, not numbers. I could convert the strings to numbers but I'm hoping that there's a simple way to do this either with the csv or numpy libraries.
As requested - Sample data:
This is the first line. This is all just text to be skipped.
The first line doesn't always have a comma - maybe it's in the third line
Still no commas, or was there?
Yes, there was. And there it is again.
and so on
There are more lines but they finally stop when you get to
EndOfHeader
1,2,3,4,5
8,9,10,11,12
3, 6, 9, 12, 15
Thanks for the help.
Edit#2
A suggested answer gave the following link entitled Read file from line 2...
That's kind of what I'm looking for, but I want to be able to read through the lines until I find the "EndOfHeader" and then call on the CSV library to handle the remainder of the file.
The reply by saimadhu.polamuri is part of what I've tried, specifically
with open(filename , 'r') as f:
first_line = f.readline()
for line in f:
#test if line equals EndOfHeader. If true then parse as CSV
But that's where it comes apart - I can't see how to have CSV work with the data from this point forward.

With thanks to #Mike for the suggestion, the code is actually reasonably straightforward.
with open('data.csv') as f: # open the file
for i in range(7): # Loop over first 7 lines
str=f.readline() # just read them. Could also do f.next()
r = csv.reader(f, delimiter=',') # Now pass the file handle to a csv reader
for row in r: # and loop over the resulting rows
print(row) # Print the row. Or do something else.
In my actual code, it will search for the EndOfHeader line and use that to decide where to start parsing the CSV
I'm posting this as an answer, as the question that this one supposedly duplicates doesn't explicitly consider this issue of the file handle and how it can be passed to a CSV reader, and so it may help someone else.
Thanks to all who took time to help.

Lines were dropped when merging two text files using the write() method

I started with relevant_files, which was a list of paths of two CSV files. I then attempted to create a file with path output_filename, using the following block of code.
new_file = open(output_filename, 'w+')
for x in relevant_files:
for line in open(x):
new_file.write(line)
The code looks perfectly reasonable, but I totally randomly decided to check the lengths, before and after the merge. file_1 had length 6,740,108 and file_2 had length 4,938,459. Those sum to 11,678,567. However, the new file has length 11,678,550, which is 17 lines shorter than the combined length of the two source files. I then checked the CSV files by hand -- indeed, it was exactly the final 17 lines of the 2nd text file (i.e., 2nd entry in relevant_files) that had gotten dropped.
What went wrong? Is there a maximum file length or something?

I'm not sure exactly what is wrong with your script, but it's good to use with statements when working with files in python. They get rid of the need to close the file once you've opened it, which it seems you haven't done here.
with open(output_file, 'w+') as f:
lines = []
for file in relevant_files:
for line in open(file, 'r').read().split('\n'):
lines.append(line)
f.write('\n'.join(lines))
This is what I would use to complete your task.

IndexError when printing with readlines()

I keep encountering an index error when trying to print a line from a text file. I'm new to python and I'm still trying to learn so I'd appreciate if you can try to be patient with me; if there is something else needed from me, please let me know!
The traceback reads as
...
print(f2.readlines()[1]):
IndexError: list index out of range
When trying to print line 2 (...[1]), I am getting this out of range error.
Here's the current script.
with open("f2.txt", "r") as f2:
print(f2.readlines()[1])
There are 3 lines with text in the file.
contents of f2.txt
peaqwenasd
lasnebsat
kikaswmors

It seems that f2.seek(0) was necessary here to solve the issue.
with open("f2.txt", "r") as f2:
f2.seek(0)
print(f2.readlines()[1])

You haven't given all the code needed to solve your problem, but your given symptoms point to multiple calls to readlines.
Read the documentation: readlines() reads the entire file and returns a list of the contents. As a consequence, the file pointer is at the end of the file. If you call readlines() again at this point, it returns an empty file.
You apparently have a readlines() call before the code you gave us. seek(0) resets the file pointer to the start of the file, and you're reading the entire file a second time.
There are many tutorials that show you canonical ways to iterate through the contents of a file. I strongly recommend that you use one of those. For instance:
with open("f2.txt", "r") as f2:
for line in f2.readlines():
# Here you can work with the lines in sequence
If you need to deal with the lines in non-sequential order, then
with open("f2.txt", "r") as f2:
content = list(f2.readlines())
# Now you can access content[2], content[1], etc.

Replace string in specific line of nonstandard text file

Similar to posting: Replace string in a specific line using python, however results were not forethcomming in my slightly different instance.
I working with python 3 on windows 7. I am attempting to batch edit some files in a directory. They are basically text files with .LIC tag. I'm not sure if that is relevant to my issue here. I am able to read the file into python without issue.
My aim is to replace a specific string on a specific line in this file.
import os
import re
groupname = 'Oldtext'
aliasname = 'Newtext'
with open('filename') as f:
data = f.readlines()
data[1] = re.sub(groupname,aliasname, data[1])
f.writelines(data[1])
print(data[1])
print('done')
When running the above code I get an UnsupportedOperation: not writable. I am having some issue writing the changes back to the file. Based on suggestion of other posts, I edited added the w option to the open('filename', "w") function. This causes all text in the file to be deleted.
Based on suggestion, the r+ option was tried. This leads to successful editing of the file, however, instead of editing the correct line, the edited line is appended to the end of the file, leaving the original intact.

Writing a changed line into the middle of a text file is not going to work unless it's exactly the same length as the original - which is the case in your example, but you've got some obvious placeholder text there so I have no idea if the same is true of your actual application code. Here's an approach that doesn't make any such assumption:
with open('filename', 'r') as f:
data = f.readlines()
data[1] = re.sub(groupname,aliasname, data[1])
with open('filename', 'w') as f:
f.writelines(data)
EDIT: If you really wanted to write only the single line back into the file, you'd need to use f.tell() BEFORE reading the line, to remember its position within the file, and then f.seek() to go back to that position before writing.

Writing to the end of specific line in python

I have a text file that contains key value pairs separated by a tab like this:
KEY\tVALUE
I have opened this file in append mode(a+) so I can both read and write. Now it may happen that a particular key has more than 1 value. For that I want to be able to go to that particular key and write the next value beside original one separated by a some delimiter(or ,).
Here is what I wish to do:
import io
ft = io.open("test.txt",'a+')
ft.seek(0)
for line in ft:
if (line.split('\t')[0] == "querykey"):
ft.write(unicode("nextvalue"));#Write the another key value beside the original one
Now there are two problems with it:
I will iterate through the file to see on which line the key is present(Is there a faster way?)
I will write a string to the end of that line.
I would be grateful if I can get help with the second point.
The write function always writes at the end of file. How should I write to the end of a specific line? I have searched and have not got very clear answers as to how to do that

You can read whole of file content, do your edit and write edited content to file.
with open('test.txt') as f:
lines = f.readlines()
f= open('test.txt', 'w')#open file for write
for line in lines:
if line.split('\t')[0] == "querykey":
line = line + ',newkey'
f.write('\n'.join(lines))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.