Can not configure output in quoted text - python

Sir need some help.
I think it's an easy task but I cannot configure it.
I want to compare two text files & output the differences in quoted text.
I tried this post-https://stackoverflow.com/a/28216365/12031112 & it's working fine, but I cannot
get output in quoted text.
def compare(File1,File2):
with open(File1,'r') as f:
d=set(f.readlines())
with open(File2,'r') as f:
e=set(f.readlines())
open('file3.txt','w').close() #Create the file
with open('file3.txt','a') as f:
for line in list(d-e):
f.write(line)
File1 = 'Updatedfile1.txt'
File2 = 'Rawfile2.txt'
compare(File1,File2)
for example: In text file it was without a quote & each line ie
boofx.com
uooos.com
jiooq.com
But want output like below in file3.txt:
"boofx.com",
"uooos.com",
"jiooq.com",
"zcrcd.com",
"jeoce.com",
"xcoxc.com",
"cdfaq.com",
& in the last line without a comma(,).
Any solution will be highly appreciated.

Related

Python CSV remove new lines denoted by &#x0D

I have a BCP file that contains lots of 
 carriage return symbols. They are not meant to be there and I have no control over the original output so am left with trying to parse the file to remove them.
A sample of the data looks like....
"test1","apples","this is
some sample","3877"
"test66","bananas","this represents more
wrong data","378"
I am trying to send up with...
"test1","apples","this is some sample","3877"
"test66","bananas","this represents more wrong data","378"
Is there a simple way to do this prefereably using python CSV?
You can try:
import re
with open("old.csv") as f, open("new.csv", "w") as w:
for line in f:
line = re.sub(r"
\s*", "", line)
w.write(line)
"test1","apples","this is some sample","3877"
"test66","bananas","this represents more wrong data","378"
Demo

How to import and print seperate txt file in spyder?

I am trying to import several text files into my Spyder file, which I want to add to a list later on.
why does
test1 = open("test1.txt")
result in test1 as "TextIOWrapper"? How would I bring the content over into the python file?
Thanks in advance
You need to read the lines into your list after opening it. For example, the code should be:
with open('test1.txt') as f:
test1= f.readlines()
The above code will read the contents of your text file into the list test1. However, if the data in your text file is separated over multiple lines, the escape char '\n' will be included in your list.
To avoid this, use the below refined code:
test1= [line.rstrip('\n') for line in open('test1.txt')]

How to read this particular file format?

I have the following text in a csv file:
b'DataMart\n\nDate/Time Generated,11/7/16 8:54 PM\nReport Time Zone,America/New_York\nAccount ID,8967\nDate Range,10/8/16 - 11/6/16\n\nReport Fields\nSite (DCM),Creative\nGlobest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter\nGlobest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter'
Essentially there are multiple new line characters in this file instead of a single big string so you can picture the same text as follows
DataMart
Date/Time Generated,11/7/16 8:54 PM
Report Time Zone,America/New_York
Account ID,8967
Date Range,10/8/16 - 11/6/16
Report Fields
Site (DCM),Creative
Globest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter
Globest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter
I need to grab the last two lines, which is basically the data. I tried doing a for loop:
with open('file.csv','r') as f:
for line in f:
print(line)
It instead prints the entire line again with \n.
Just read the file and get the last two lines:
my_file = file("/path/to/file").read()
print(my_file.splitlines()[-2:])
The [-2:] is known as slicing: it creates a slice, starting from the second to last element, going to the end.
ok, after struggling around for a bit, i found out that i need to change the decoding of the file from binary to 'utf-8' and then i can apply the split functions. The problem was split functions are not applicable to the binary file.
This is the actual code that seems to be working for me now:
with open('BinaryFile.csv','rb') as f1:
data=f1.read()
text=data.decode('utf-8')
with open('TextFile.csv', 'w') as f2:
f2.write(text)
with open('TextFile.csv','r') as f3:
for line in f3:
print(line.split('\\n')[9:])
thanks for your help guys

python search for string in file return entire line + next line into new text file

I have a very large text file (50,000+ lines) that should always be in the same sequence. In python I want to search the text file for each of the $INGGA lines and join this line with the subsequent $INHDT to create a new text file. I need to do this without reading into memory as this causes it to crash every time. I can find return the $INGGA line but I'm not sure of the best way of then getting the next line and joining into a new string that is memory efficient
Thanks
Phil
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2016.05.06 09:11:34 =~=~=~=~=~=~=~=~=~=~=~= > $PRDID,2.15,-0.10,31.87*6E
$INGGA,091124.00,5249.8336,N,00120.9619,W,1,20,0.6,95.0,M,49.4,M,,*50
$INHDT,31.9,T*1E $INZDA,091124.0055,06,05,2016,,*7F
$INVTG,22.0,T,,M,4.4,N,8.1,K,A*24 $PRDID,2.13,-0.06,34.09*6C
$INGGA,091124.20,5249.8338,N,00120.9618,W,1,20,0.6,95.0,M,49.4,M,,*5D
$INHDT,34.1,T*13 $INZDA,091124.2055,06,05,2016,,*7D
$INVTG,24.9,T,,M,4.4,N,8.1,K,A*2B $PRDID,2.16,-0.03,36.24*61
$INGGA,091124.40,5249.8340,N,00120.9616,W,1,20,0.6,95.0,M,49.4,M,,*5A
$INHDT,36.3,T*13 $INZDA,091124.4055,06,05,2016,,*7B
$INVTG,27.3,T,,M,4.4,N,8.1,K,A*22 $PRDID,2.11,-0.05,38.33*68
$INGGA,091124.60,5249.8343,N,00120.9614,W,1,20,0.6,95.1,M,49.4,M,,*58
$INHDT,38.4,T*1A $INZDA,091124.6055,06,05,2016,,*79
$INVTG,29.5,T,,M,4.4,N,8.1,K,A*2A $PRDID,2.09,-0.02,40.37*6D
$INGGA,091124.80,5249.8345,N,00120.9612,W,1,20,0.6,95.1,M,49.4,M,,*56
$INHDT,40.4,T*15 $INZDA,091124.8055,06,05,2016,,*77
$INVTG,31.7,T,,M,4.4,N,8.1,K,A*21 $PRDID,2.09,0.02,42.42*40
$INGGA,091125.00,5249.8347,N,00120.9610,W,1,20,0.6,95.1,M,49.4,M,,*5F
$INHDT,42.4,T*17
You can just read a line of file and write to another new file.
Like this:
import re
#open new file with append
nf = open('newfile', 'at')
#open file with read
with open('file', 'rt') as f:
for line in f:
r = re.match(r'\$INGGA', line)
if r is not None:
nf.write(line)
nf.write("$INHDT,31.9,T*1E" + '\n')
You can use at to append write and wt to read line!
I have 150,000 lines file, It's run well!
I suggest using a simple regex that will parse and capture the parts you care about. Here is an example that will capture the piece you care about:
(\$INGGA.*\n\$INHDT.*\n)
https://regex101.com/r/tK1hF0/3
As in my above link, you'll notice that I used the "global" g setting on the regex, telling it to capture all groups that match. Otherwise, it'll stop after the first match.
I also had trouble determining where the actual line breaks exist in your above example file, so you can tweak the above to match exactly where the breaks occur.
Here is some starter python example code:
import re
test_str = # load your file here
p = re.compile(ur'(\$INGGA.*\n\$INHDT.*\n)')
matches = re.findall(p, test_str)
In the example PuTTY log you give, its all one line separated with space.
So in this case you can use this to replace the space with new line and gets new file -
cat large_file | sed 's/ /\n/g' > new_large_file
To iterate over the file separated with new line, run this -
cat new_large_file | python your_script.py
Your script get line by line so your computer should not crash.
your_script.py -
import sys
INGGA_line = ""
for line in sys.stdin:
line_striped = line.strip()
if line_striped.startswith("$INGGA"):
INGGA_line = line_striped
elif line_striped.startswith("$INZDA"):
print line_striped, INGGA_line
else:
print line_striped
This answer is aimed at python 3.
According to this other answer (and the docs), you can iterate your file line-by-line memory-efficiently:
with open(filename, 'r') as f:
for line in f:
...process...
An example of how you could fulfill your above criteria could be
# Target file write-only, source file read-only
with open(targetfile, 'w') as tf, open(sourcefile, 'r') as sf:
# Flag for whether we are looking for 1st or 2nd part
look_for_ingga = True
for line in sf:
if look_for_ingga:
if line.startswith('$INGGA,'):
tf.write(line)
look_for_ingga = False
elif line.startswith('$INHDT,'):
tf.write(line)
look_for_ingga = True
In the case where you have multiple '$INGGA,' prior to the '$INHDT,', this grabs the first one and disregards the rest. In case you want to take only the last '$INGGA,' before the '$INHDT,', store the last '$INGGA,' in a variable instead of writing it to disk. Then, when you find your '$INHDT,', store both.
In case you meant that you want to write to a separate new file for each INGGA-INHDT pair, the target file with-statement should be nested inside for line in sf instead, or the results should be buffered in a list for later storage.
Refer to the docs for introductions to with-statements and file reading/writing.

how do i delete a line that contains a certain string?

What I am trying to do here is :
1.read lines from a text file.
2.find lines that contain certain string.
3.delete that line and write the result in a new text file.
For instance, if I have a text like this:
Starting text
How are you?
Nice to meet you
That meat is rare
Shake your body
And if my certain string is 'are'
I want the output as:
Starting text
Nice to meet you
Shake your body
I don't want something like:
Starting text
Nice to meet you
Shake your body
I was trying something like this:
opentxt = open.('original.txt','w')
readtxt = opentxt.read()
result = readtxt.line.replace('are', '')
newtxt = open.('revised.txt','w')
newtxt.write(result)
newtxt.close()
But it don't seem to work...
Any suggestions? Any help would be great!
Thanks in advance.
Same as always. Open source file, open destination file, only copy lines that you want from the source file into the destination file, close both files, rename destination file to source file.
with open('data.txt') as f,open('out.txt') as f2:
for x in f:
if 'are' not in x:
f2.write(x.strip()+'\n') #strip the line first and then add a '\n',
#so now you'll not get a empty line between two lines

Categories