How can I remove the last two characters from my file?

How can I remove the last two characters from my file? - python

I'm taking out all of the coordinates from a kml file. This works, but my issue is that at the end of my file I end up with "}, }" instead of a "}}". I realize I can just manually edit the end of the file after I make it, but I'd rather have that be automatically done in the code. The commented-out section contains the code that I found in another answer, but it doesn't do anything for me.
import re
import os
KML = open('NYC_Tri-State_Area.kml','r')
NYC_Coords = open('NYC_Coords.txt', 'w')
coords = re.findall(r'((?<=<coordinates>).*(?=<\/coordinates>))', KML.read())
NYC_Coords.write("{")
for coord in coords:
NYC_Coords.write("{" + str(coord) + "}, ")
...
with open('NYC_Coords.txt', 'rb+') as filehandle:
filehandle.seek(-2, os.SEEK_END)
filehandle.truncate()
...
NYC_Coords.write("}")
KML.close()
NYC_Coords.close()

There are a number of suggestions for fixing your problem. First, it's probably a bad idea to use a regex to parse XML-derived documents. There are many dedicated modules for parsing KML, like pyKML
Second, you can eliminate the need to truncate completely by correctly generating your string. In this case, by replacing:
for coord in coords:
NYC_Coords.write("{" + str(coord) + "}, ")
with the very simple one-liner:
NYC_Coords.write(', '.join('{{{}}}'.format(coord) for coord in coords))
You will now no longer have an extra trailing ', ' at the end of your document.

for coord in coords:
NYC_Coords.write("{" + str(coord) + "}, ")
Here, you write ", " at the end of every coord. But what you really want to do is write ", " between each coord. join can be used to interleave strings in this way.
NYC_Coords.write(", ".join("{" + str(coord) + "}" for coord in coords))
Now you will have no trailing comma at the end of your final coord.

As coords is a list of strings, you could do:
NYC_Coords.write("{{{{{0}}}}}".format("}, {".join(coords)))
Unfortunately, your output uses the same syntax as str.format, so you need to escape a lot of curly braces in the template... Demo:
>>> coords = ["foo", "bar", "baz"]
>>> "{{{{{0}}}}}".format("}, {".join(coords))
'{{foo}, {bar}, {baz}}'
You could avoid the escapes with C-style string formatting:
>>> "{{%s}}" % "}, {".join(coords)
'{{foo}, {bar}, {baz}}'

Related

Does Python have a string function to find a section of a string

I'm trying to filter some log files that are in the format of a table/dataset but .endswith() and .startswith() are not meeting my requirments. I'm using an anonymous function but need to adapt my Python code to check if a string contains .jpg
logfilejpg = sc.textFile("/loudacre/logs/*.log").filter(lambda line: line.endswith('.jpg'))

Use in:
'.jpg' in 'something.jpg foo'
Out: True
You can also put it in your lambda expression:
lambda line: '.jpg' in line
Example:
list(filter(lambda line: '.jpg' in line, ["foo", "foo.jpg.bar", "bar.jpg"]))
Out: ['foo.jpg.bar', 'bar.jpg']

To get the index of where the ".jpg" starts at:
hello = "world.jpg"
print(hello.find(".jpg"))

You can split the inintial string by " " (space) then by "." and take the second value in the resulting array. Of course it depends on how your initial string is. The basic idea is you can isolate the ".jpg" and use equal to check.
To verify that the file is actually a jog, you can try to open it. If it fails, the file is ether other format or corrupt, see also the excepption you get.

Using str.find() and len(), you could find the substring like so:
a_string = 'there is a .jpg here.'
start = a_string.find('.jpg') # The lowest index in a_string where '.jpg' is found
end = start + len('.jpg')
print(a_string[start:end])
# .jpg

Python formatting: How to insert blank spaces in between array data elements of data type float

I have a question regarding formatting. I am trying to extract relevant data and insert this data into a fortran file. Thankfully, I am using python to accomplish this task. It just so happens that the fortran file is sensitive to the number of spaces between text. So, this brings me to my question. My array array data looks like:
[[ -1.80251269 12.14048223 15.47522331]
[ -2.63865822 13.1656285 15.97462801]
[ -1.76966256 11.35311123 16.13958474]
[ -0.76320052 12.45171386 15.34209158]
[ -2.12634889 11.84315415 14.48020468]]
[[-14.80251269 1.14048223 1.47522331]
[ -2.63865822 13.1656285 15.97462801]
[ -1.76966256 11.35311123 16.13958474]
[ -0.76320052 12.45171386 15.34209158]
[ -2.12634889 11.84315415 14.48020468]]
[[ -0.80251269 0.14048223 0.47522331]
[ -2.63865822 13.1656285 15.97462801]
[ -1.76966256 11.35311123 16.13958474]
[ -0.76320052 12.45171386 15.34209158]
[ -2.12634889 11.84315415 14.48020468]]
These elements are floats, not strings. For example, I wanted the the first row (and every row thereafter) of the data to look like:
-1.80251269 12.14048223 15.47522331
How would I accomplish this? To be specific, there are 5 white spaces that seperate the left margin from the 1st number, -1.80251269, and 5 white spaces that seperate each of the three numbers. Notice also that I need the array brackets gone, but I suspect I can do this with a trim function. Sorry for my lack of knowledge guys; I do not even know how to begin this problem as my knowledge in Python syntax is limited. Any help or tips would be appreciated. Thanks!
EDIT: this is the code I am using to generate the array:
fo = np.genfromtxt("multlines.inp")
data=scipy.delete(fo, 0, 1)
txt = np.hsplit(data,3)
all_data = np.vsplit(data, 4)
i=0
num_molecules = int(raw_input("Enter the number of molecules: "))
print "List of unaltered coordinates:"
while i < (num_molecules):
print all_data[i]

If you are using NumPy, you can use np.savetxt:
np.savetxt('a.txt', a.reshape(15,3), '%16.8f')
To get
-1.80251269 12.14048223 15.47522331
-2.63865822 13.16562850 15.97462801
-1.76966256 11.35311123 16.13958474
...
(You need to reshape your array into 2-dimensions to do what I think you want).

If you have your data formatted as a list, then I suspect that #kamik423's answer will help you. If it if formatted as a string, you may wish to try something like the following.
def properly_format(line):
nums = line.strip(' []\t').split()
spaces = ' '
return spaces + nums[0] + spaces + nums[1] + spaces + nums[2]
lines = my_array_string.splitlines() #if your data is a multiline string
for line in lines:
formatted_line = properly_format(line)
# do something with formatted_line
Edit: forgot to split the string.

If you don't care about the length of each block you can just do
for i in whateverYouArrayIsCalled:
print str(i[0]) + " " + str(i[1]) + " " + str(i[2])
if you however want to have all the elements to be inline try
for i in whateverYouArrayIsCalled:
print (str(i[0]) + " ")[:20] + (str(i[1]) + " ")[:20] + str(i[2])
where the 20 is the length of each block
(for 2.7)

I will assume that the data array is saved in a data.txt file and you want to save the result into fortran.txt, then:
fortran_file = open('fortran.txt','w') # Open fortran.txt for writing
with open('data.txt',r) as data_file: #Open data.txt for reading
while True:
line = data_file.readline()
if not line: break # EOF
result = line.strip('[]').split()
result = " " + " ".join(result)
fortran_file.write(result)
fortran_file.close()

try this:
import numpy
numpy.set_printoptions(sign=' ')

how to manipulate SREC file

I have an S19 file looking something like below:
S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8
I want to separate the first two characters and also the next two characters, and so on... I want it to look like below (last two characters are also to be separated for each line):
S0, 03, 0000, FC
S3, 0D, 0003C000, 0F00000000000000, 20
S3, FD, 00000000, 782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, ED, 000000F8, 3D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, 15, 00000400, FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF, 7D
S3, FD, 00000410, 10B5DFF828000468012147F22C10C4F20300016047F22010C4F20300, 00
S7, 05, 00008EB4, B8
How can I do this in Python?
I have something like this:
#!/usr/bin/python
import string,os,sys,re,fileinput
print "hi"
inputfile = "k60.S19"
outputfile = "k60_out.S19"
# open the source file and read it
fh = file(inputfile, 'r')
subject = fh.read()
fh.close()
# create the pattern object. Note the "r". In case you're unfamiliar with Python
# this is to set the string as raw so we don't have to escape our escape characters
pattern2 = re.compile(r'S3')
pattern3 = re.compile(r'S7')
pattern1 = re.compile(r'S0')
# do the replace
result1 = pattern1.sub("S0, ", subject)
result2 = pattern2.sub("S3, ", subject)
result3 = pattern3.sub("S7, ", subject)
# write the file
f_out = file(outputfile, 'w')
f_out.write(result1)
f_out.write(result2)
f_out.write(result3)
f_out.close()
#EoF
but it is not working as I like!! Can someone help me with how to come up with proper regular expression use for this?

try package bincopy, maybe you need it.
bincopy - Interpret strings as packed binary data
Mangling of various file formats that conveys binary information (Motorola S-Record, Intel HEX and binary files).
import bincopy
f = bincopy.BinFile()
f.add_srec_file("path/to/your/s19/flie.s19")
f.as_binary() # print s19 as binary
or you can easily use open() for a file:
with open("path/to/your/s19/flie.s19") as s19:
for line in s19:
type = line[0:2]
count = line[2:4]
adress = line[4:12]
data = line[12:-2]
crc = line[-2:]
print type + ", "+ count + ", " + adress + ", " + data + ", " + crc + "\n"
hope it helps.
Motorola S-record file format

You can do it using a callback function as replacement with re.sub:
#!/usr/bin/python
import re
data = r'''S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8'''
pattern = re.compile(r'^(..)(..)((?:.{4}){1,2})(.*)(?=..)', re.M)
def repl(m):
repstr = ''
for g in m.groups():
if (g):
repstr += g + ', '
return repstr
print re.sub(pattern, repl, data)
However, as Mark Setchell notices it, there is probably a nice way to do it with slicing.

I know you are thinking Python and regexes, but this was made for awk and the following will maybe help you work out the way to do it using slicing:
awk '{r=length($0);print substr($0,1,2),substr($0,3,2),substr($0,5,8),substr($0,13,r-14),substr($0,r-1)}' OFS=, k60.s19
That says "get the length of the line in variable r, then print the first two characters, the next two characters, the next 8 characters and so on... and use a comma as the field separator".
EDITED
Here are a few more hints to get you started...
if you want to avoid printing line 1, you can do
awk 'FNR==1{next} ...rest of awk script above ... '
If you want to only process lines longer than 40 characters, you can do
awk 'length($0)>40 {print}' yourfile
If you only want to process lines where the second field is "xx", you can do
awk '$2 ~ "xx" {print}' yourfile

Python RegEx Woes

I'm not sure why this isn't working:
import re
import csv
def check(q, s):
match = re.search(r'%s' % q, s, re.IGNORECASE)
if match:
return True
else:
return False
tstr = []
# test strings
tstr.append('testthisisnotworking')
tstr.append('This is a TEsT')
tstr.append('This is a TEST mon!')
f = open('testwords.txt', 'rU')
reader = csv.reader(f)
for type, term, exp in reader:
for i in range(2):
if check(exp, tstr[i]):
print exp + " hit on " + tstr[i]
else:
print exp + " did NOT hit on " + tstr[i]
f.close()
testwords.txt contains this line:
blah, blah, test
So essentially 'test' is the RegEx pattern. Nothing complex, just a simple word. Here's the output:
test did NOT hit on testthisisnotworking
test hit on This is a TEsT
test hit on This is a TEST mon!
Why does it NOT hit on the first string? I also tried \s*test\s* with no luck. Help?

The csv module by default returns blank spaces around words in the input (this can be changed by using a different "dialect"). So exp contains " test" with a leading space.
A quick way to fix this would be to add:
exp = exp.strip()
after you read from the CSV file.

Adding a print repr(exp) to the top of the first for loop shows that exp is ' test', note the leading space.
This isn't that surprising since csv.reader() splits on commas, try changing your code to the following:
for type, term, exp in reader:
exp = exp.strip()
for s in tstr:
if check(exp, s):
print exp + " hit on " + s
else:
print exp + " did NOT hit on " + s
Note that in addition to the strip() call which will remove the leading a trailing whitespace, I change your second for loop to just loop directly over the strings in tstr instead of over a range. There was actually a bug in your current code because tstr contained three values but you only checked the first two because for i in range(2) will only give you i=0 and i=1.

Python Textwrap - forcing 'hard' breaks

I am trying to use textwrap to format an import file that is quite particular in how it is formatted. Basically, it is as follows (line length shortened for simplicity):
abcdef <- Ok line
abcdef
ghijk <- Note leading space to indicate wrapped line
lm
Now, I have got code to work as follows:
wrapper = TextWrapper(width=80, subsequent_indent=' ', break_long_words=True, break_on_hyphens=False)
for l in lines:
wrapline=wrapper.wrap(l)
This works nearly perfectly, however, the text wrapping code doesn't do a hard break at the 80 character mark, it tries to be smart and break on a space (at approx 20 chars in).
I have got round this by replacing all spaces in the string list with a unique character (#), wrapping them and then removing the character, but surely there must be a cleaner way?
N.B Any possible answers need to work on Python 2.4 - sorry!

A generator-based version might be a better solution for you, since it wouldn't need to load the entire string in memory at once:
def hard_wrap(input, width, indent=' '):
for line in input:
indent_width = width - len(indent)
yield line[:width]
line = line[width:]
while line:
yield '\n' + indent + line[:indent_width]
line = line[indent_width:]
Use it like this:
from StringIO import StringIO # Makes strings look like files
s = """abcdefg
abcdefghijklmnopqrstuvwxyz"""
for line in hard_wrap(StringIO(s), 12):
print line,
Which prints:
abcdefg
abcdefghijkl
mnopqrstuvw
xyz

It sounds like you are disabling most of the functionality of TextWrapper, and then trying to add a little of your own. I think you'd be better off writing your own function or class. If I understand you right, you're simply looking for lines longer than 80 chars, and breaking them at the 80-char mark, and indenting the remainder by one space.
For example, this:
s = """\
This line is fine.
This line is very long and should wrap, It'll end up on a few lines.
A short line.
"""
def hard_wrap(s, n, indent):
wrapped = ""
n_next = n - len(indent)
for l in s.split('\n'):
first, rest = l[:n], l[n:]
wrapped += first + "\n"
while rest:
next, rest = rest[:n_next], rest[n_next:]
wrapped += indent + next + "\n"
return wrapped
print hard_wrap(s, 20, " ")
produces:
This line is fine.
This line is very lo
ng and should wrap,
It'll end up on a
few lines.
A short line.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I remove the last two characters from my file? - python

Related

Does Python have a string function to find a section of a string

Python formatting: How to insert blank spaces in between array data elements of data type float

how to manipulate SREC file

Python RegEx Woes

Python Textwrap - forcing 'hard' breaks

Categories

Resources