I am trying to read a fixed with file that has lines/records of different lengths. I need to stuff spaces at the end of the lines which are less than the standard length specified.
Any help appreciated.
enter image description here
You can use string.format to pad a string to a specific length.
The documentation says that < pads to the right so to pad a string with spaces to the right to a specific length you can do something like this:
>>> "{:<30}".format("foo")
'foo '
You could consider to use ljust string method.
If line is a line read from your file:
line = line.ljust(50)
will stuff the end of the line with spaces to get a 50 characters long line. If line is longer that 50, line is simply copied without any change.
Related
After parsing, I've got a lot of urls that have unfortunately joined together in one line. It will take a long time to re-parse, so I ask if there is a method as one long line with Url to turn into a multiple lines - 1 Url per line?
What i have:
'https:// url1.com/bla1','https:// url1.com/bla2',..thousands of urls..,'https:// url999.com/blaN'
What i need:
'https:// url1.com/bla-1',
'https:// url1.com/bla-2',
etc
'https:// url999.com/bla-N'
I've already tried to uncheck Line breaks in Python - Wrapping and Braces and check Ensure right margin is not exceeded - doesn't work
So how can i fix it?
Yes.
First set Code->Style->Wrapping and Braces->Method parameters/Method call arguments to wrap if long or chop down if long.
After that simply call reformat code on the line (Command+Alt+L).
Let's try a simple method, if I understand your query correctly. Read the first file, replace commas with newline character, and write the result to the same file.
urlsfile = open('test1.txt', 'r+') # in case you are getting the data from file itself
urls = urlsfile.readline()
urlsfile.close()
newlines = urls.replace(",", "\n") # otherwise replace newlines with the variable name that you are trying to write to the file
newfile = open('test1.txt','w+')
newfile.write(newlines)
newfile.close()
I want to compare two text files in Python, and return the lines that are different. My attempt uses difflib, but I'm open to other suggestions. I need to get the lines that are different, as well as the lines that appear in one file but not the other. Order is somewhat important, but if a good solution exists that doesn't take order into consideration, I can let go of that.
The problem is that one file has lines that have multiple trailing characters \t and \n, while the other doesn't; I don't want to consider that as a difference. For other files, the first file has only \n and the other files has \t characters at the end. The lines contain elements that are separated by tabs or spaces, so those are important; I just don't care for the trailing characters \t and \n.
My solution:
from difflib import Differ
with open(file_path) as actual:
with open(test_file_path) as test:
differ = Differ()
for line in differ.compare(actual.readlines(), test.readlines()):
if line.startswith('-'):
log.error('EXPECTED: {}'.format(line[2:]))
if line.startswith('+'):
log.error('TEST FILE: {}'.format(line[2:]))
I expect the output to show EXPECTED and TEST FILE lines when there's a difference, and just EXPECTED or just TEST FILE when one contains a line the other doesn't. Right now, I'm seeing a lot of the following types of errors:
00:02:40: ERROR EXPECTED: Issuer Type OBal Net WAC OTerm WAM Age GrossCpn HighRemTerm Grp
00:02:40: ERROR TEST FILE: Issuer Type OBal Net WAC OTerm WAM Age GrossCpn HighRemTerm Grp
As you can see (if you highlight it), the first line contains a number of spaces after 'Grp' and the other line doesn't. I want to consider these two lines the same.
I've tried to explicitly specify the tabs and line breaks:
actual_file = actual.readlines()
expected_file = []
for line in actual_file:
if line[-1] == '\n':
expected_file.append(line.rstrip('\n').rstrip('\t') + '\n')
else:
expected_file.append(line.rstrip('\t'))
However, it (a) slows the process down quite a bit, and (b) is required for every file type in a different way, since some files have trailing tabs followed by line breaks, some have just line breaks, and some have nothing at all. If there's no better way, I can strip every line of every trailing tab and linebreak, but it seems like a lot of processing power (I have to run a lot of files) for something that seems fairly easy to resolve.
Take a look at string.rstrip() here: https://docs.python.org/2/library/string.html#string.rstrip
string.rstrip() should do exactly what you need by stripping whitespace off the end of a string, while leaving \t and \n characters before the end alone.
Check it out:
>>> import string
>>> s = "This \t is \t a \t line \t\t\t\n\n\n"
>>> print(s)
This is a line
>>>
>>> s = string.rstrip(s)
>>> s
'This \t is \t a \t line'
>>> print(s)
This is a line
>>>
Hope this helps!
I have to read data from a text file from the command line. It is not too difficult to read in each line, but I need a way to separate each part of the line.
The file contains the following in order for several hundred lines:
String (Sometimes more than 1 word)
Integer
String (Sometimes more than 1 word)
Integer
So for example the input could have:
Hello 5 Sample String 10
The current implementation I have for reading in each line is as follows... how can I modify it to separate it into what I want? I have tried splitting the line, but I always end up getting only one character of the first string this way with no integers or any part of the second string.
with open(sys.argv[1],"r") as f:
for line in f:
print(line)
The desired output would be:
Hello
5
Sample String
10
and so on for each line in the file. There could be thousands of lines in the file. I just need to separate each part so I can work with them separately.
The program can't magically split lines the way you want. You will need to read in one line at a time and parse it yourself based on the format.
Since there are two integers and an indeterminate number of (what I assume are) space-delimited words, you may be able to use a regular expression to find the integers then use them as delimiters to split up the line.
I have long a text file where each line looks something like /MM0001 (Table(12,)) or /MM0015 (Table(11,)). I want to keep only the four-digit number next to /MM. If it weren't for the "table(12,)" part I could just strip all the non-numeric characters, but I don't know how to extract the four-digit numbers only. Any advice on getting started?
If it's exactly that format, you could just print out line[3:7]
You could parse text line by line and then use 4th to 7th char of every line.
ln[3:7]
import re
R=re.compile(r'/MM(\d+)')
for line in file:
L=R.match(line)
if L:
print L.group(1)
or, more succinctly...
lines=[R.match(line).group(1) for line in file] #works if the lines are guaranteed to start with \MM
This should give you only the integers following a /MM and should work no matter how long the strings of integers are. If they're guaranteed to be a certain length, then you're better off with one of the other examples (which don't use regex).
if each line starts with /MM then just go through the file and print out line[3:7] e.g.
for line in file:
print line[3:7]
I have a peculiar problem. I need to read (from a txt file) using python only those substrings that are present at predefined range of offsets. Let's say 5-8 and 12-16.
For example, if a line in the file is something like:
abcdefghi akdhflskdhfhglskdjfhghsldk
then I would like to read the two words - "efgh" and "kdhfl". Because, in the word "efgh", the offset of character "e" is 5 and that of "h" is 8. Similarly, the other word "kdhfl".
Please note that the whitespaces also add to the offset. Infact, the white spaces in my file are not "consistenty occurring" in every line and cannot be depended upon to extract the words of interest. Which is why, I have to bank on the offsets.
I hope I've been able to make the question clear.
Awaiting answers!
Edit -
yes, the whitespace amount in each line can change and accounts for the offsets also. For example, consider these two lines -
abcz d
a bc d
In both cases, I view the offset of the final character "d" as the same. As I said, the white spaces in the file are not consistent and I cannot rely on them. I need to pick up the characters based on their offsets. Does your answer still hold?
assuming its a file,
for line in open("file"):
print line[4:8] , line[11:16]
To extract pieces from offsets simply read each line into a string and then access a substring with a slice ([from:to]).
It's unclear what you're saying about the inconsistent whitespace. If whitespace adds to the offset, it must be consistent to be meaningful. If the whitespace amount can change but actually accounts for the offsets, you can't reliably extract your data.
In your added example, as long as d's offset stays the same, you can extract it with slicing.
>>> s = 'a bc d'
>>> s[5:6]
'd'
>>> s = 'abc d'
>>> s[5:6]
'd'
What's to stop you from using a regular expression? Besides the whitespace do the offsets vary?
/.{4}(.{4}).{4}(.{4})/