I have two .csv files that I need to either join into a new file or append one to the other:
filea:
jan,feb,mar
80,50,52
74,73,56
fileb:
apr,may,jun
64,75,64
75,63,63
What I need is:
jan,feb,mar,apr,may,jun
80,50,52,64,75,64
74,73,56,75,63,63
What I'm getting:
jan,feb,mar
80,50,52
74,73,56
apr,may,jun
64,75,64
75,63,63
I'm using the simplest code I can find. A bit too simple I guess:
sourceFile = open('fileb.csv', 'r')
data = sourceFile.read()
with open('filea.csv', 'a') as destFile:
destFile.write(data
I'd be very grateful if anyone could tell me what I'm doing wrong and how to get them to append 'horizontally' instead of 'vertically'.
from itertools import izip_longest
with open("filea.csv") as source1,open("fileb.csv")as source2,open("filec.csv","a") as dest2:
zipped = izip_longest(source1,source2) # use izip_longest which will add None as a fillvalue where we have uneven length files
for line in zipped:
if line[1]: # if we have two lines to join
dest2.write("{},{}\n".format(line[0][:-1],line[1][:-1]))
else: # else we are into the longest file, just treat line as a single item tuple
dest2.write("{}".format(line[0]))
In case your files have the same length or at least contain blank fields:
filea.csv
jan,feb,mar
80,50,52
74,73,56
,,
fileb.csv
apr,may,jun
64,75,64
75,63,63
77,88,99
Script :
with open("filea.csv", "r") as source1, open("fileb.csv", "r") as source2, open("filec.csv","w") as dest:
for line1, line2 in zip(source1, source2):
dest.write(line1.strip()+','+line2)
If you need more compact version :
with open("filea.csv", "r") as source1, open("fileb.csv", "r") as source2, open("filec.csv","w") as dest:
[dest.write(line1.strip()+','+line2) for line1, line2 in zip(source1, source2)]
Result (filec.csv):
jan,feb,mar,apr,may,jun
80,50,52,64,75,64
74,73,56,75,63,63
,,,77,88,99
Related
How you can implement deleting lines in a text document up to a certain line?
I find the line number using the code:
#!/usr/bin/env python
lookup = '00:00:00'
filename = "test.txt"
with open(filename) as text_file:
for num, line in enumerate(text_file, 1):
if lookup in line:
print(num)
print(num) outputs me the value of the string, for example 66.
How do I delete all the lines up to 66, i.e. up to the found line by word?
As proposed here with a small modification to your case:
read all lines of the file.
iterate the lines list until you reach the keyword.
write all remaining lines
with open("yourfile.txt", "r") as f:
lines = iter(f.readlines())
with open("yourfile.txt", "w") as f:
for line in lines:
if lookup in line:
f.write(line)
break
for line in lines:
f.write(line)
That's easy.
filename = "test.txt"
lookup = '00:00:00'
with open(filename,'r') as text_file:
lines = text_file.readlines()
res=[]
for i in range(0,len(lines),1):
if lookup in lines[i]:
res=lines[i:]
break
with open(filename,'w') as text_file:
text_file.writelines(res)
Do you know what lines you want to delete?
#!/usr/bin/env python
lookup = '00:00:00'
filename = "test.txt"
with open(filename) as text_file, open('okfile.txt', 'w') as ok:
lines = text_file.readlines()
ok.writelines(lines[4:])
This will delete the first 4 lines and store them in a different document in case you wanna keep the original.
Remember to close the files when you're done with them :)
Providing three alternate solutions. All begin with the same first part - reading:
filename = "test.txt"
lookup = '00:00:00'
with open(filename) as text_file:
lines = text_file.readlines()
The variations for the second parts are:
Using itertools.dropwhile which discards items from the iterator until the predicate (condition) returns False (ie discard while predicate is True). And from that point on, yields all the remaining items without re-checking the predicate:
import itertools
with open(filename, 'w') as text_file:
text_file.writelines(itertools.dropwhile(lambda line: lookup not in line, lines))
Note that it says not in. So all the lines before lookup is found, are discarded.
Bonus: If you wanted to do the opposite - write lines until you find the lookup and then stop, replace itertools.dropwhile with itertools.takewhile.
Using a flag-value (found) to determine when to start writing the file:
with open(filename, 'w') as text_file:
found = False
for line in lines:
if not found and lookup in line: # 2nd expression not checked once `found` is True
found = True # value remains True for all remaining iterations
if found:
text_file.write(line)
Similar to #c yj's answer, with some refinements - use enumerate instead of range, and then use the last index (idx) to write the lines from that point on; with no other intermediate variables needed:
for idx, line in enumerate(lines):
if lookup in line:
break
with open(filename, 'w') as text_file:
text_file.writelines(lines[idx:])
I want to remove reversed order string tuples from my large text file (>16M lines).
For example, if I have the following two lines in my file:
352_0F, 352_1F, 0.913
352_1F, 352_0F, 0.913
The expected output would be keep either of those lines (instead of both) as:
352_0F, 352_1F, 0.913
FYI: The third column col3 will be same for a tuple and its reversed order tuple.
I tried the following code, but it is not working as expected.
from collections import defaultdict
data = defaultdict(list)
with open("OUTPUT.txt","w") as output:
for fileName in ["Large_INPUT.txt"]:
with open(fileName,'r') as file1:
for line in file1:
col1,col2,value = line.split(",")
if (col1,col2) not in data:
if (col2,col1) not in data:
data[(col1,col2,value)]
output.write(f"{col1},{col2} {value}\n")
Can anybody please help me with this?
Seeing your code has a list of a single file, I am assuming you are generalizing it to work with multiple files. In that case you failed to mention something, do you want the combinations to persist across files? You are close with your implementation. Instead of using a dictionary to get O(1) searches you can use the simpler structure, set, and also get O(1) searching.
persistent over list of files
found_combinations = set()
with open("OUTPUT.txt", "w") as output:
for fileName in ["Large_INPUT.txt"]:
with open(fileName, 'r') as file1:
for line in file1:
cols = [col.strip() for col in line.strip().split(',')]
new_combination = frozenset(cols)
if new_combination not in found_combinations:
found_combinations.add(new_combination)
out = ', '.join(cols) + '\n'
output.write(out)
not persistent over files
with open("OUTPUT.txt", "w") as output:
for fileName in ["Large_INPUT.txt"]:
found_combinations = set()
with open(fileName, 'r') as file1:
for line in file1:
cols = [col.strip() for col in line.strip().split(',')]
new_combination = frozenset(cols)
if new_combination not in found_combinations:
found_combinations.add(new_combination)
out = ', '.join(cols) + '\n'
output.write(out)
Note that the only difference between the two versions is the placement of found_combinations = set()
I have a text file, that has data.
PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER=
I want to extract data from that file, where nothing is after the equal sign.
So in my new text file, I want to get
CMD_REINIT
CMD_OLIVIER
How do I do this?
My code is like that righr now.
import os, os.path
DIR_DAT = "dat"
DIR_OUTPUT = "output"
print("Psst go check in the ouptut folder ;)")
for roots, dir, files in os.walk(DIR_DAT):
for filename in files:
filename_output = "/" + os.path.splitext(filename)[0]
with open(DIR_DAT + "/" + filename) as infile, open(DIR_OUTPUT + "/bonjour.txt", "w") as outfile:
for line in infile:
if not line.strip().split("=")[-1]:
outfile.write(line)
I want to collect all data in a single file. It doesn't work. Can anyone help me ?
The third step, it do crawl that new file, and only keep single values. As four files are appended into a single one. Some data might be there four, three, two times.
And I need to keep in a new file, that I will call output.txt. Only the lines that are in common in all the files.
You can use regex:
import re
data = """PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER="""
found = re.findall(r"^\s+(.*)=\s*$",data,re.M)
print( found )
Output:
['CMD_REINIT', 'CMD_OLIVIER']
The expression looks for
^\s+ line start + whitespaces
(.*)= anything before a = which is caputred as group
\s*$ followed by optional whitespaces and line end
using the re.M (multiline) flag.
Read your files text like so:
with open("yourfile.txt","r") as f:
data = f.read()
Write your new file like so:
with open("newfile.txt","w") as f:
f.write(''.join("\n",found))
You can use http://www.regex101.com to evaluate test-text vs regex-patterns, make sure to swith to its python mode.
I suggest you the following short solution using comprehension:
with open('file.txt', 'r') as f, open('newfile.txt', 'w') as newf:
for x in (line.strip()[:-1] for line in f if line.strip().endswith("=")):
newf.write(f'{x}\n')
Try this pattern: \w+(?==$).
Demo
Using a simple iteration.
Ex:
with open(filename) as infile, open(filename2, "w") as outfile:
for line in infile: #Iterate Each line
if not line.strip().split("=")[-1]: #Check for second Val
print(line.strip().strip("="))
outfile.write(line) #Write to new file
Output:
CMD_REINIT
CMD_OLIVIER
I've been trying to do this task all day, and I really want to learn how to do it using Python. I want to take two tab-delimited files, one with an ID only and the other with the same ID and some description. I can easily merge these files on the shared ID field with unix join, but for that I need to sort both and I want to keep the ordering of the first file.
Ive tried some code below, and my method has been to try and add things to a tuple, as from my understanding, they will keep their order as you add to it. I havent been able to get anything to work though. Can anyone help?
Sample files:
file1 ->
111889
1437390
123
27998
2525778
12
1345
file2 ->
2525778'\t'item778
1345'\t'item110
123'\t'item1000
12'\t'item8889
111889'\t'item1111
1437390'\t'item222
27998'\t'item12
output ->
111889'\t'item1111
1437390'\t'item222
123'\t'item1000
27998'\t'item12
2525778'\t'item778
12'\t'item8889
1345'\t'item110
This what I have so far:
import sys
add_list = ()
with open(sys.argv[1], 'rb') as file1, open(sys.argv[2], 'rb') as file2:
for line2 in file2:
f1, f2, f3 = line2.split('\t')
#print f1, f2, f3
for row in file1:
#print row
if row != f1:
break
else:
add_list.append(f1,f2,'\n')
break
The key is to use Python dictionaries, they are perfect for this taskā¦
Here is a complete answer:
import sys
# Each id is mapped to its item name
# (split() splits at whitespaces (including tabulation and newline), with no empty output strings):
items = dict(line.split() for line in open(sys.argv[2])) # Inspired by mgilson's answer
with open(sys.argv[1]) as ids:
for line in ids:
id = line.rstrip() # newline removed
print '{}\t{}'.format(id, items[id])
Here is the result:
% python out.py file1.txt file2.txt
111889 item1111
1437390 item222
123 item1000
27998 item12
2525778 item778
12 item8889
1345 item110
PS: Note that I did not open the files in rb mode, as there is no need to keep the original newline bytes, here, since we get rid of trailing newlines.
I would create a dictionary which maps the ID to the field value from the second file:
with open('file2') as fin:
d = dict(x.split(None, 1) for x in fin)
Then I would use the first file to construct the output in order from the dictionary:
with open('file1') as fin, open('output', 'w') as fout:
for line in fin:
key = line.strip()
fout.write('{key}\t{value}\n'.format(key=key, value=d[key])
out = {}
with open(sys.argv[1], 'rb') as file1, open(sys.argv[2], 'rb') as file2:
d2 = {}
for line in file2:
(key, val) = line.split('\t')
d2[key] = val
lines = file1.readlines()
out = { x:d2[x] for x in lines }
I am not sure about your sorting basis.
I'm new to python and the way it handles variables and arrays of variables in lists is quite alien to me. I would normally read a text file into a vector and then copy the last three into a new array/vector by determining the size of the vector and then looping with a for loop a copy function for the last size-three into a new array.
I don't understand how for loops work in python so I can't do that.
so far I have:
#read text file into line list
numberOfLinesInChat = 3
text_file = open("Output.txt", "r")
lines = text_file.readlines()
text_file.close()
writeLines = []
if len(lines) > numberOfLinesInChat:
i = 0
while ((numberOfLinesInChat-i) >= 0):
writeLine[i] = lines[(len(lines)-(numberOfLinesInChat-i))]
i+= 1
#write what people say to text file
text_file = open("Output.txt", "w")
text_file.write(writeLines)
text_file.close()
To get the last three lines of a file efficiently, use deque:
from collections import deque
with open('somefile') as fin:
last3 = deque(fin, 3)
This saves reading the whole file into memory to slice off what you didn't actually want.
To reflect your comment - your complete code would be:
from collections import deque
with open('somefile') as fin, open('outputfile', 'w') as fout:
fout.writelines(deque(fin, 3))
As long as you're ok to hold all of the file lines in memory, you can slice the list of lines to get the last x items. See http://docs.python.org/2/tutorial/introduction.html and search for 'slice notation'.
def get_chat_lines(file_path, num_chat_lines):
with open(file_path) as src:
lines = src.readlines()
return lines[-num_chat_lines:]
>>> lines = get_chat_lines('Output.txt', 3)
>>> print(lines)
... ['line n-3\n', 'line n-2\n', 'line n-1']
First to answer your question, my guress is that you had an index error you should replace the line writeLine[i] with writeLine.append( ). After that, you should also do a loop to write the output :
text_file = open("Output.txt", "w")
for row in writeLine :
text_file.write(row)
text_file.close()
May I suggest a more pythonic way to write this ? It would be as follow :
with open("Input.txt") as f_in, open("Output.txt", "w") as f_out :
for row in f_in.readlines()[-3:] :
f_out.write(row)
A possible solution:
lines = [ l for l in open("Output.txt")]
file = open('Output.txt', 'w')
file.write(lines[-3:0])
file.close()
This might be a little clearer if you do not know python syntax.
lst_lines = lines.split()
This will create a list containing all the lines in the text file.
Then for the last line you can do:
last = lst_lines[-1]
secondLAst = lst_lines[-2]
etc... list and string indexes can be reached from the end with the '-'.
or you can loop through them and print specific ones using:
start = start line, stop = where to end, step = what to increment by.
for i in range(start, stop-1, step):
string = lst_lines[i]
then just write them to a file.