Filling tabs until the maximum length of column - python

I have a tab-delimited txt that looks like
11 22 33 44
53 25 36 25
74 89 24 35 and
But there is no "tab" after 44 and 25. So the 1st and 2nd rows have 4 columns, 3rd row has 5 columns.
To rewrite it so that tabs are shown,
11\t22\t33\t44
53\t25\t36\t25
74\t89\t24\t35\tand
I need to have a tool to mass-add tabs where there are no entries.
If the maximum length of column is n (n=5 in the above example), then I want to fill tabs until that nth column for all rows to make
11\t22\t33\t44\t
53\t25\t36\t25\t
74\t89\t24\t35\tand
I tried to do it by notepad++, and python by using replacer code like
map_dict = {'':'\t'}
but it seems I need more logic to do it.

I am assuming your file also contains newlines so it would actually look like this:
11\t22\t33\t44\n
53\t25\t36\t25\n
74\t89\t24\t35\tand\n
If you know for sure that the maximum length of your columns is 5, you can do it like this:
with open('my_file.txt') as my_file:
y = lambda x: len(x.strip().split('\t'))
a = [line if y(line) == 5 else '%s%s\n' % (line.strip(), '\t'*(5 - y(line)))
for line in my_file.readlines()]
# ['11\t22\t33\t44\t\n', '53\t25\t36\t25\t\n', '74\t89\t24\t35\tand\n']
This will add ending tabs until you reach 5 columns. You will get a list of lines that you need to write back to a file (i have 'my_file2.txt' but you can write back to the original one if you want).
with open('my_file2.txt', 'w+') as out_file:
for line in a:
out_file.write(line)

If I understood it correctly, you can achieve this in Notepad++ only using following:
And yes, if you have several files on which you want to perform this, you can record this as a macro and bind it on to key as a shortcut

Related

Python to remove extra delimiter

We have a 100MB pipe delimited file that has 5 column/4 delimiters each separated by a pipe. However there are few rows where the second column has an extra pipe. For these few rows total delimiter are 5.
For example, in the below 4 rows, the 3rd is a problematic one as it has an extra pipe.
1|B|3|D|5
A|1|2|34|5
D|This is a |text|3|5|7
B|4|5|5|6
Is there any way we can remove an extra pipe from the second position where the delimiter count for the row is 5. So, post correction, the file needs to look like below.
1|B|3|D|5
A|1|2|34|5
D|This is a text|3|5|7
B|4|5|5|6
Please note that the file size is 100 MB. Any help is appreciated.
Source: my_file.txt
1|B|3|D|5
A|1|2|34|5
D|This is a |text|3|5|7
B|4|5|5|6
E|1 |9 |2 |8 |Not| a |text|!!!|3|7|4
Code
# If using Python3.10, this can be Parenthesized context managers
# https://docs.python.org/3.10/whatsnew/3.10.html#parenthesized-context-managers
with open('./my_file.txt') as file_src, open('./my_file_parsed.txt', 'w') as file_dst:
for line in file_src.readlines():
# Split the line by the character '|'
line_list = line.split('|')
if len(line_list) <= 5:
# If the number of columns doesn't exceed, just write the original line as is.
file_dst.write(line)
else:
# If the number of columns exceeds, count the number of columns that should be merged.
to_merge_columns_count = (len(line_list) - 5) + 1
# Merge the columns from index 1 to index x which includes all the columns to be merged.
merged_column = "".join(line_list[1:1+to_merge_columns_count])
# Replace all the items from index 1 to index x with the single merged column
line_list[1:1+to_merge_columns_count] = [merged_column]
# Write the updated line.
file_dst.write("|".join(line_list))
Result: my_file_parsed.txt
1|B|3|D|5
A|1|2|34|5
D|This is a text|3|5|7
B|4|5|5|6
E|1 9 2 8 Not a text!!!|3|7|4
A simple regular expression pattern like this works on Python 3.7.3:
from re import compile
bad_pipe_re = compile(r"[ \w]+\|[ \w]+(\|)[ \w]+\|[ \w]+\|[ \w]+\|[ \w]+\n")
with open("input", "r") as fp_1, open("output", "w") as fp_2:
line = fp_1.readline()
while line is not "":
mo = bad_pipe_re.fullmatch(line)
if mo is not None:
line = line[:mo.start(1)] + line[mo.end(1):]
fp_2.write(line)
line = fp_1.readline()

Editing a column by a multiplier, and then replacing the column with the multiplier and saving the file as a new .txt file -python

So I have a file that looks like this
mass (GeV) spectrum (1-100 GeV)
10 0.06751019803888393
20 0.11048827045815585
30 0.1399367785958526
40 0.1628781532692572
I want to multiply the spectrum by half or any percentage, then create a new file with the same data, but the spectrum is replaced with the new spectrum multiplied by the multiplier
DM_file=input("Name of DM.in file: ") #name of file is DMmumu.in
print(DM_file)
n=float(input('Enter the percentage of annihilation: '))
N=n*100
pct=(1-n)
counter = 0
with open (DM_file,'r+') as f:
with open ('test.txt','w') as output:
lines=f.readlines()
print(type(lines))
Spectrumnew=[]
Spectrum=[]
for i in range(8,58):
single_line=lines[i].split("\t")
old_number = single_line[1]
new_number = float(single_line[1])*pct
Spectrumnew.append(new_number)
Spectrum.append(old_number)
f.replace(Spectrum,Spectrumnew)
output.write(str(new_number))
The problem I'm having is f.replace(Spectrum,Spectrumnew) is not working, and if I were to comment it out, a new file is created called test.txt with just Spectrumnew nothing else. What is wrong with f.replace, am I using the wrong string method?
replace is a function that works on strings. f is not a string. (For that matter, neither is Spectrum or Spectrumnew.)
You need to construct the line you want in the output file as a string and then write it out. You already have string output working. To construct the output line, you can just concatenate the first number from the input, a tab character, and the product of the second number and the multiplier. You can convert a number to a string with the str() function and you can concatenate strings with the + operator.
There are several more specific answers on this site already that may be helpful, such as replacing text in a file with Python.

Interpreting a string received from a socket

I am trying to interpret a string that I have received from a socket. The first set of data is seen below:
2 -> 1
1 -> 2
2 -> 0
0 -> 2
0 -> 2
1 -> 2
2 -> 0
I am using the following code to get the numerical values:
for i in range(0,len(data)-1):
if data[i] == "-":
n1 = data[i-2]
n2 = data[i+3]
moves.append([int(n1),int(n2)])
But when a number greater than 9 appears in the data, the program only takes the second digit of that number (eg. with 10 the program would get 0). How would I get both of the digits from the code while maintaining the ability to get single digit numbers?
Well you just grab one character on each side ..
for the second value you can make it like this: data[i+3,len(data)-1]
for the first one: : data[0,i-2]
Use the split() function
numlist = data[i].split('->')
moves.append([int(numlist[0]),int(numlist[1])])
I assume each line is available as a (byte) string in a variable named line. If it's a whole bunch of lines then you can split it into individual lines with
lines = data.splitlines()
and work on each line inside a for statement:
for line in lines:
# do something with the line
If you are confident the lines will always be correctly formatted the easiest way to get the values you want uses the string split method. A full code starting from the data would then read like this.
lines = data.splitlines()
for line in lines:
first, _, second = line.split()
moves.append([int(first), int(second)])

Compare columns and print smaller and bigger rows

I have 2 files in this way
file 1 has 1 row:
6
4
13
25
35
50
65
75
and so on.....
file2 has 1 row in it
24
45
76
and so on.....
I want to take each value(one at a time) in file2 and compare with file1 and if the value of file1 is less than that number take those value and keep them in list and then sort them according to number and print the largest value
for example:
I took 24 number in file2 and compared with file1 and saw 6,4 and 13 are below that number,then I extract them keep it in list and sort it and print the largest value(i.e. 13)
Read each file into a list, converting each line to int. Then sort both lists to allow us to iterate efficiently,
file1 = sorted([int(l) for l in open('file1.txt').read().split()])
file2 = sorted([int(l) for l in open('file2.txt').read().split()])
i = 0
for file2_number in file2:
while i+1 < len(file1) and file1[i+1] < file2_number:
i += 1
print file1[i]
This currently prints the answers (13 35 75) but you can easily modify it to return a list if wanted.
Using Python, after first reading in all of the lines in file1 and all of the lines in file2 into two separate lists, then you can simply traverse through them, comparing each number from file 1 to each number in file2 like so:
#First load each of the lines in the data files into two separate lists
file1Numbers = [6, 4, 13, 25, 35, 50, 65, 75]
file2Numbers = [24, 45, 76]
extractedNumbers = []
#Loops through each number in file2
for file2Number in file2Numbers:
#Loops through each number in file
for file1Number in file1Numbers:
#Compares the current values of the numbers from file1 and file2
if (file1Number < file2Number):
#If the number in file1 is less than the number in file2, add
#the current number to the list of extracted numbers
extractedNumbers.append(file1Number)
#Sorts the list of extracted numbers from least to greatest
extractedNumbers.sort()
#Prints out the greater number in the list
#which is the number located at the end of the sorted list (position -1)
print extractedNumbers[-1]
awk solution:
awk 'NR==FNR{a[$0];next} {b[FNR]=$0}
END{
n=asort(b)
for(j in a)
for(i=n;i>0;i--)
if(b[i]<j){
print "for "j" in file2 we found : "b[i]
break
}
}' file2 file1
output:
for 45 in file2 we found : 35
for 76 in file2 we found : 75
for 24 in file2 we found : 13
Note : there is room to optimize. if performance is critical, you could consider to (just suggestion)
sort file1 descending
sort file2 ascending
take first from sorted file2, go through file1 from the file1.first, when you find the smaller one, record the position/index x
take the 2nd from sorted file2, from file1.x starting compare, when found the right one, update x
till the end of file2
the brute force way will take O(mxn) or O(nxm) depends on n and m which one is bigger.
The algorithm above... I didn't analyze , should be faster than O(mxn).. ;)
both python and awk could do the job. if possible, load the two files into memory. if you have monster files, that's another algorithm problem. e.g. sort huge file

Writing CSV values to file results in values being separated in single characters (Python)

Kinda knew to Python:
I have the following code:
def printCSV(output, values, header):
63 """
64 prints the output data as comma-separated values
65 """
66
67 try:
68 with open(output, 'w') as csvFile:
69 #print headers
70 csvFile.write(header)
71
72 for value in values:
73 #print value, "\n"
74 csvFile.write(",".join(value))
75 csvFile.write("\n")
76 except:
77 print "Error occured while writing CSV file..."
Values is a list constructed somewhat like this:
values = []
for i in range(0,5):
row = "A,%s,%s,%s" % (0,stringval, intval)
values.append(row)
When I open the file created by the above function, I expect to see something like this:
Col1,Col2,Col3,Col4
A,0,'hello',123
A,0,'foobar',42
Instead, I am seeing data like this:
Col1,Col2,Col3,Col4
A,0,'h','e','l','l','o',1,2,3
A,0,'f','o','o','b','a','r',4,2
Anyone knows what is causing this?
I even tried to use fopen and fwrite() directly, still the same problem exists.
Whats causing this?
The problem you're encountering is that you're doing ",".join(value) with value being a string. Strings act like a collection of characters, so the command translates to "Join each character with a comma."
What you could do instead is use a tuple instead of a string for your row values you pass to printCSV, like this:
values = []
for i in range(0,5):
row = ('A', 0, stringval, intval)
values.append(row)

Categories