Python - go to two lines above match - python

In a text file like this:
First Name last name #
secone name
Address Line 1
Address Line 2
Work Phone:
Home Phone:
Status:
First Name last name #
....same as above...
I need to match string 'Work Phone:' then go two lines up and insert character '|' in the begining of line. so pseudo code would be:
if "Work Phone:" in line:
go up two lines:
write | + line
write rest of the lines.
File is about 10 mb and there are about 1000 paragraphs like this.
Then i need to write it to another file. So desired result would be:
First Name last name #
secone name
|Address Line 1
Address Line 2
Work Phone:
Home Phone:
Status:
thanks for any help.

This solution doesn't read whole file into memory
p=""
q=""
for line in open("file"):
line=line.rstrip()
if "Work Phone" in line:
p="|"+p
if p: print p
p,q=q,line
print p
print q
output
$ python test.py
First Name last name #
secone name
|Address Line 1
Address Line 2
Work Phone:
Home Phone:
Status:

You can use this regex
(.*\n){2}(Work Phone:)
and replace the matches with
|\1\2
You don't even need Python, you can do such a thing in any modern text editor, like Vim.

Something like this?
lines = text.splitlines()
for i, line in enumerate(lines):
if 'Work Phone:' in line:
lines[i-2] = '|' + lines[i-2]

Related

How to add an empty line after a certain string within a text file?

I need to find a way to add a new line after a certain string within a text file.
discount_percent :20
discounted :true
final_price :1199
id :893850
name : THELONGING
original_price :1499
discount_percent :40
discounted :true
final_price :119
id :1476450
name : CyberHentai
original_price :199
discount_percent :30
discounted :true
final_price :139
id :1478030
name : MamboWave
original_price :199
discount_percent :15
discounted :true
final_price :84
id :1506230
name : BigfootForest
original_price :99
discount_percent :40
discounted :true
final_price :59
id :1502600
name : AlienX
original_price :99
Here I have a .txt file and I need a way to add a new line after any line containing 'original_price' and the price/ numbers after it.
there is probably an easy solution to this but I can't seem to figure this out, and I have practically 0 knowledge of how to use Regex
I have tried the following:
def fixSpacing():
file1 = open('data.txt', 'w')
file2 = open('tempdata.txt', 'r+')
for line in file2.readlines():
if line.startswith('original_price :'):
But I couldn't think of a way to add a new line after the numbers for price.
For me it looks like task for built-in module fileinput as it can work in inplace mode. Consider following example, let file.txt content be:
title: 1
price: 100
desc: one
title: 2
price: 200
desc: two
then
import fileinput
for line in fileinput.input("file.txt", inplace=True):
end = "\n" if line.startswith("price") else ""
print(line, end=end)
will change said file content to
title: 1
price: 100
desc: one
title: 2
price: 200
desc: two
explanation: in inplace mode standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place. so we can just use print, as input file lines' newlines are preserved using \n as print's end will result in additional empty line and empty str in keeping line unchanged. Keep in mind after using this you will no longer have original file, only altered.
Thanks, I tinkered around with it for about 10 min and got it to work with the following:
def fixSpacing():
file1 = open('data.txt', 'w')
file2 = open('tempdata.txt', 'r+')
for line in file2.readlines():
if 'original_price' in line:
file1.writelines(line + '\n')
else:
file1.write(line)
Thanks for the Help.

Python or Bash - How do you add words listed in a file in the middle of two sentences and put the output into another file?

I dont know how to define the variable 'set' and ' ' to then put a name and another ' ' before adding the last word 'username' and 'admin' in 'file2' for each name listed in 'file1'.
file1 = [/home/smith/file1.txt]
file2 = [/home/smith/file2.txt]
file3 = file1 + file2
Example:
[file1 - Names]
smith
jerry
summer
aaron
[file2 - Sentences]
set username
set admin
[file3 - Output]
set smith username
set smith admin
set jerry username
set jerry admin
set summer username
set summer admin
set aaron username
set aaron admin
Can you be more specific about your problem? And have you already tried something? If that is the case, please share it.
The way I see it you can open file2, read every line, split the two words on the space (and add it to a list for example). Then you can create a new string for every set of words you've created in that list. Loop on every line in file1. For every line in file1: take the first word from file2, add a space. Add the actual line from file1. And at last you add another space and the second word from.
You now have a new string which you can append to a new file for example. You should problably append that string to the new file in the same loop where you created the string.
But then again, I'm not shure if this is an answer to your question.
Try this one in Bash, it answers your question
#!/bin/bash
file1=".../f1.txt"
file2=".../f2.txt"
file3=".../f3.txt"
while read p1; do
while read p2; do
word1=$(echo $p2 | cut -f1 -d' ')
word2=$(echo $p2 | cut -f2 -d' ')
echo " $word1 $p1 $word2" >> $file3
done < $file2
done < $file1
Something like this, perhaps..
names = file("/home/smith/file1.txt").readlines()
commands = file("/home/smith/file2.txt").readlines()
res = []
for name in names:
for command in commands:
command = command.split(" ")
res.append(" ".join([command[0],name,command[1]]))
file("/home/smith/file3.txt","w").write("\n".join(res))
I'm sure this is not the prettiest way, but should work. But why do you want to do something like this...?
Yet another solution using utilities only:
join -1 2 -2 3 file1 file2 | awk '{printf "%s %s %s\n", $2, $1, $3}' > file3

How to count & print specific strings from a .txt file in python?

I'm having some trouble with the output I am receiving on this problem. Basically, I have a text file (https://www.py4e.com/code3/mbox.txt) and I am attempting to first have python print how many email addresses are found in it and then print each of those addresses on subsequent lines. A sample of my output is looking like this:
Received: (from apache#localhost)
There were 22003 email addresses in mbox.txt
for source#collab.sakaiproject.org; Thu, 18 Oct 2007 11:31:49 -0400
There were 22004 email addresses in mbox.txt
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to zach.thomas#txstate.edu using -f
There were 22005 email addresses in mbox.txt
What am I doing wrong here? Here's my code
fhand = open('mbox.txt')
count = 0
for line in fhand:
line = line.rstrip()
if '#' in line:
count = count + 1
print('There were', count, 'email addresses in mbox.txt')
if '#' in line:
print(line)
The following modifies your code to use a regular expression to find emails in text lines.
import re
# Pattern for email
# (see https://www.geeksforgeeks.org/extracting-email-addresses-using-regular-expressions-python/)
pattern = re.compile(r'\S+#\S+')
with open('mbox.txt') as fhand:
emails = []
for line in fhand:
# Detect all emails in line using regex pattern
found_emails = pattern.findall(line)
if found_emails:
emails.extend(found_emails)
print('There were', len(emails), 'email addresses in mbox.txt')
if emails:
print(*emails, sep="\n")
Output
There were 44018 email addresses in mbox.txt
stephen.marquard#uct.ac.za
<postmaster#collab.sakaiproject.org>
<200801051412.m05ECIaH010327#nakamura.uits.iupui.edu>
<source#collab.sakaiproject.org>;
<source#collab.sakaiproject.org>;
<source#collab.sakaiproject.org>;
apache#localhost)
source#collab.sakaiproject.org;
stephen.marquard#uct.ac.za
source#collab.sakaiproject.org
....
....
...etc...
Can you make it clearer what your expected output is compared to your actual output?
You have two if '#' in line' statements that should be combined; there's no reason to ask the same question twice.
You count the number of lines that contain an # symbol and then per line, print the current count.
If you want to only print the count once, then put it outside (after) your for loop.
If you want to print the email addresses and not the whole lines that contain them, then you'll need to do some more string processing to extract the email from the line.
Don't forget to close your file when you've finished with it.

Convert a Column oriented file to CSV output using shell

I have a file that come from map reduce output for the format below that needs conversion to CSV using shell script
25-MAY-15
04:20
Client
0000000010
127.0.0.1
PAY
ISO20022
PAIN000
100
1
CUST
API
ABF07
ABC03_LIFE.xml
AFF07/LIFE
100000
Standard Life
================================================
==================================================
AFF07-B000001
2000
ABC Corp
..
BE900000075000027
AFF07-B000002
2000
XYZ corp
..
BE900000075000027
AFF07-B000003
2000
3MM corp
..
BE900000075000027
I need the output like CSV format below where I want to repeat some of the values in the file and add the TRANSACTION ID as below format
25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,AFF07-B000001, 2000,ABC Corp,..,BE900000075000027
25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,AFF07-B000002,2000,XYZ Corp,..,BE900000075000027
TRANSACTION ID is AFF07-B000001,AFF07-B000002,AFF07-B000003 which have different values and I have put a marked line from where the Transaction ID starts . Before the demarkation , the values should be repeating and the transaction ID column needs to be added along with the repeating values before the line as given in above format
BASH shell script I may need and CentOS is the flavour of linux
I am getting error as below when I execute the code
Traceback (most recent call last):
File "abc.py", line 37, in <module>
main()
File "abc.py", line 36, in main
createTxns(fh)
File "abc.py", line 7, in createTxns
first17.append( fh.readLine().rstrip() )
AttributeError: 'file' object has no attribute 'readLine'
Can someone help me out
Is this a correct description of the input file and output format?
The input file consists of:
17 lines, followed by
groups of 10 lines each - each group holding one transaction id
Each output row consists of:
29 common fields, followed by
5 fields derived from each of the 10-line groups above
So we just translate this into some Python:
def createTxns(fh):
"""fh is the file handle of the input file"""
# 1. Read 17 lines from fh
first17 = []
for i in range(17):
first17.append( fh.readLine().rstrip() )
# 2. Form the common fields.
commonFields = first17 + first17[0:12]
# 3. Process the rest of the file in groups of ten lines.
while True:
# read 10 lines
group = []
for i in range(10):
x = fh.readline()
if x == '':
break
group.append( x.rstrip() )
if len(group) <> 10:
break # we've reached the end of the file
fields = commonFields + [ group[2], group[4], group[6], group[7[, group[9] ]
row = ",".join(fields)
print row
def main():
with open("input-file", "r") as fh:
createTxns(fh)
main()
This code shows how to:
open a file handle
read lines from a file handle
strip off the ending newline
check for end of input when reading from a file
concatenate lists together
join strings together
I would recommend you to read Input and Output if you are going for the python route.
You just have to break the problem down and try it. For the first 17 line use f.readline() and concat into the string. Then the replace method to get the begining of the string that you want in the csv.
str.replace("\n", ",")
Then use the split method and break them down into the list.
str.split("\n")
Then write the file out in the loop. Use a counter to make your life easier. First write out the header string
25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API
Then write the item in the list with a ",".
,AFF07-B000001, 2000,ABC Corp,..,BE900000075000027
At the count of 5 write the "\n" with the header again and don't forget to reset your counter so it can begin again.
\n25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API,ABF07,ABC03_LIFE.xml,AFF07/LIFE,100000,Standard Life, 25-MAY-15,04:20,Client,0000000010,127.0.0.1,PAY,ISO2002,PAIN000,100,1,CUST,API
Give it a try and let us know if you need more assistant. I assumed that you have some scripting background :) Good luck!!

delete only lines after match1 up to match2

I have checked and played with various examples and it appears that my problem is a bit more complex than what I have been able to find. What I need to do is search for a particular string and then delete the following line and keep deleting lines until another string is found. So an example would be the following:
a
b
color [
0 0 0,
1 1 1,
3 3 3,
] #color
y
z
Here, "color [" is match1, and "] #color" is match2. So then what is desired is the following:
a
b
color [
] #color
y
z
This "simple to follow" code example will get you started .. you can tweak it as needed. Note that it processes the file line-by-line, so this will work with any size file.
start_marker = 'startdel'
end_marker = 'enddel'
with open('data.txt') as inf:
ignoreLines = False
for line in inf:
if start_marker in line:
print line,
ignoreLines = True
if end_marker in line:
ignoreLines = False
if not ignoreLines:
print line,
It uses startdel and enddel as "markers" for starting and ending the ignoring of data.
Update:
Modified code based on a request in the comments, this will now include/print the lines that contain the "markers".
Given this input data (borrowed from #drewk):
Beginning of the file...
stuff
startdel
delete this line
delete this line also
enddel
stuff as well
the rest of the file...
it yields:
Beginning of the file...
stuff
startdel
enddel
stuff as well
the rest of the file...
You can do this with a single regex by using nongreedy *. E.g., assuming you want to keep both the "look for this line" and the "until this line is found" lines, and discard only the lines in between, you could do:
>>> my_regex = re.compile("(look for this line)"+
... ".*?"+ # match as few chars as possible
... "(until this line is found)",
... re.DOTALL)
>>> new_str = my_regex.sub("\1\2", old_str)
A few notes:
The re.DOTALL flag tells Python that "." can match newlines -- by default it matches any character except a newline
The parentheses define "numbered match groups", which are then used later when I say "\1\2" to make sure that we don't discard the first and last line. If you did want to discard either or both of those, then just get rid of the \1 and/or the \2. E.g., to keep the first but not the last use my_regex.sub("\1", old_str); or to get rid of both use my_regex.sub("", old_str)
For further explanation, see: http://docs.python.org/library/re.html or search for "non-greedy regular expression" in your favorite search engine.
This works:
s="""Beginning of the file...
stuff
look for this line
delete this line
delete this line also
until this line is found
stuff as well
the rest of the file... """
import re
print re.sub(r'(^look for this line$).*?(^until this line is found$)',
r'\1\n\2',s,count=1,flags=re.DOTALL | re.MULTILINE)
prints:
Beginning of the file...
stuff
look for this line
until this line is found
stuff as well
the rest of the file...
You can also use list slices to do this:
mStart='look for this line'
mStop='until this line is found'
li=s.split('\n')
print '\n'.join(li[0:li.index(mStart)+1]+li[li.index(mStop):])
Same output.
I like re for this (being a Perl guy at heart...)

Categories