join separate files with python - python

I want to join 100 differeant files into one.
Example of file with data:
example1.txt has in this format:
something
something
somehting
example2.txt has in this format:
something
something
somehting
and all the 100 files have the same format of data and also have a common name example1.....example100 which mean the example is the same and have a number.
from itertools import chain
infiles = [open('{}_example.txt'.format(i+1), 'r') for i in xrange(113)]
with open('example.txt', 'w') as fout:
for lines in chain(*infiles):
fout.write(lines)
I used this but the problem is the first line of the next file joined with the last of the previous file

If you have 100 files, better to just use an array of files:
infiles = [open('example{}.txt'.format(i+1), 'r') for i in xrange(100)]
with open('Join.txt', 'w') as fout:
for lines in izip_longest(*infiles, fillvalue=''):
lines = [line.rstrip('\n') for line in lines]
print >> fout, separator.join(lines)

I would open a new file as writable: join.txt, and then loop through the files you want with a range(1,100):
join = open('Join.txt','w')
for file in range(1,100):
file = open('example'+file+'.txt','r')
file = file.readlines()
for line in file:
join.write(line)

Related

Python read .txt and split words after symbol #

I have a large 11 GB .txt file with email addresses. I would like to save only the strings till the # symbol among each other. My output only generate the first line.I have used this code of a earlier project. I would like to save the output in a different .txt file. I hope someone could help me out.
my code:
import re
def get_html_string(file,start_string,end_string):
answer="nothing"
with open(file, 'rb') as open_file:
for line in open_file:
line = line.rstrip()
if re.search(start_string, line) :
answer=line
break
start=answer.find(start_string)+len(start_string)
end=answer.find(end_string)
#print(start,end,answer)
return answer[start:end]
beginstr=''
end='#'
file='test.txt'
readstring=str(get_html_string(file,beginstr,end))
print readstring
Your file is quite big (11G) so you shouldn't keep all those strings in memory. Instead, process the file line by line and write the result before reading next line.
This should be efficient :
with open('test.txt', 'r') as input_file:
with open('result.txt', 'w') as output_file:
for line in input_file:
prefix = line.split('#')[0]
output_file.write(prefix + '\n')
If your file looks like this example:
user#google.com
user2#jshds.com
Useruser#jsnl.com
You can use this:
def get_email_name(file_name):
with open(file_name) as file:
lines = file.readlines()
result = list()
for line in lines:
result.append(line.split('#')[0])
return result
get_email_name('emails.txt')
Out:
['user', 'user2', 'Useruser']

Python regex from txt file

I have a text file, that has data.
PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER=
I want to extract data from that file, where nothing is after the equal sign.
So in my new text file, I want to get
CMD_REINIT
CMD_OLIVIER
How do I do this?
My code is like that righr now.
import os, os.path
DIR_DAT = "dat"
DIR_OUTPUT = "output"
print("Psst go check in the ouptut folder ;)")
for roots, dir, files in os.walk(DIR_DAT):
for filename in files:
filename_output = "/" + os.path.splitext(filename)[0]
with open(DIR_DAT + "/" + filename) as infile, open(DIR_OUTPUT + "/bonjour.txt", "w") as outfile:
for line in infile:
if not line.strip().split("=")[-1]:
outfile.write(line)
I want to collect all data in a single file. It doesn't work. Can anyone help me ?
The third step, it do crawl that new file, and only keep single values. As four files are appended into a single one. Some data might be there four, three, two times.
And I need to keep in a new file, that I will call output.txt. Only the lines that are in common in all the files.
You can use regex:
import re
data = """PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER="""
found = re.findall(r"^\s+(.*)=\s*$",data,re.M)
print( found )
Output:
['CMD_REINIT', 'CMD_OLIVIER']
The expression looks for
^\s+ line start + whitespaces
(.*)= anything before a = which is caputred as group
\s*$ followed by optional whitespaces and line end
using the re.M (multiline) flag.
Read your files text like so:
with open("yourfile.txt","r") as f:
data = f.read()
Write your new file like so:
with open("newfile.txt","w") as f:
f.write(''.join("\n",found))
You can use http://www.regex101.com to evaluate test-text vs regex-patterns, make sure to swith to its python mode.
I suggest you the following short solution using comprehension:
with open('file.txt', 'r') as f, open('newfile.txt', 'w') as newf:
for x in (line.strip()[:-1] for line in f if line.strip().endswith("=")):
newf.write(f'{x}\n')
Try this pattern: \w+(?==$).
Demo
Using a simple iteration.
Ex:
with open(filename) as infile, open(filename2, "w") as outfile:
for line in infile: #Iterate Each line
if not line.strip().split("=")[-1]: #Check for second Val
print(line.strip().strip("="))
outfile.write(line) #Write to new file
Output:
CMD_REINIT
CMD_OLIVIER

Python read file by line and write into a different file

SO basically what I am trying to do is that I am trying to make it so I can read a file line by line, and then have a certain text added after the text displayed
For Ex.
Code:
file = open("testlist.txt",'w')
file2 = open("testerlist.txt",'r+')
//This gives me a syntax error obviously.
file.write1("" + file + "" + file2 + "")
Textlist
In my testlist.txt it lists as:
os
Testerlist
In my testerlist.txt it lists as:
010101
I am trying to copy one text from one file and read another file and add it to the beginning of a new file for ex.[accounts.txt].
My End Result
For my end result I am trying to have it be like:
os010101
(btw I have all the correct code, its just that I am using this as an example so if I am missing any values its just because I was to lazy to add it.)
You can use file.read() to read the contents of a file. Then just concatenate the data from two files and write to the output file:
with open("testlist.txt") as f1, open("testerlist.txt") as f2, \
open("accounts.txt", "w") as f3:
f3.write(f1.read().strip() + f2.read().strip())
Note that 'mode' is not required when opening files for reading.
If you need to write the lines in particular order, you could use file.readlines() to read the lines into a list and file.writelines() to write multiple lines to the output file, e.g.:
with open("testlist.txt") as f1, open("testerlist.txt") as f2, \
open("accounts.txt", "w") as f3:
f1_lines = f1.readlines()
f3.write(f1_lines[0].strip())
f3.write(f2.read().strip())
f3.writelines(f1_lines[1:])
Try with something like this:
with open('testlist.txt', 'r') as f:
input1 = f.read()
with open('testerlist.txt', 'r') as f:
input2 = f.read()
output = input1+input2
with open("accounts.txt", "a") as myfile:
myfile.write(output)

python for save each list into some output file

i have already this code
#!usr.bin/env python
with open('honeyd.txt', 'r') as infile, open ('test.rule', 'w') as outfile:
for line in infile:
outfile.write('alert {} {} -> {} {}\n'.format(*line.split()))
this code is use to split all lines and save it into a file
my goal is to split all lines and save it into some files, as many as the line that i have in honeyd.txt. one line for one output file. if i have 3 lines, then each line much save in an output file. so i have 3 output files. if i have 10 lines, then each line much save in an output file. so i have 10 output files.
anyone can help with it?
Assuming you're ok with sequential numbering for your filenames:
with open('honeyd.txt', 'r') as infile:
for index, line in enumerate(infile, 1):
with open('test.rule{}'.format(index), 'w') as outfile:
outfile.write('alert {} {} -> {} {}\n'.format(*line.split()))
This will create files named test.rule1, test.rule2, etc.
Try this:
with open('honeyd.txt') as f:
lines = [line.strip().split() for line in f] # a list of lists
for i in range(len(lines)):
with open('test_{}.rule'.format(i), 'w') as f2:
f2.write("alert {} {} -> {} {}\n".format(*lines[i]))

Reading 1 line at a time from multiple files

I have a bunch of filenames. I need to read one line at a time from each of these files, do some processing and then read the one line again from each of these files, do some processing and so on.
I'm looking for suggestions on how to do this in a more Pythonic way. I know the number of lines present in each file so I'm hard-coding it for now, but I'd like to not have to do that.
UPDATE:
The files all have the same number of lines.
UPDATE2:
There are at least 30 different files.
filenames = []
line_count = 400
fileobjs = [open(i, 'r') for i in filenames]
for i in xrange(line_count):
lines = []
for each_fo in fileobjs:
for each_line in each_fo:
lines.append(each_line)
break
process(lines)
What about this?
from itertools import izip_longest
for file_lines in izip_longest(*map(open,filenames)):
for line in file_lines:
if line:
# process line
lines = [next(fo) for fo in fileobjs]
process(lines)
This will read both the files line by line at a time
with open('File1','r') as FileA, open('File2','r') as FileB:
for lineA,lineB in zip(FileA,FileB):
print lineA,lineB
filenames = []
files = [open(f, mode='r') for f in filenames]
for line in files[0]:
lines = [file.readline() for file in files]
process(lines)

Categories