Jump on specific line in txt file - python

I have a big text file and I want to open it and jump on a line that contains a specific string.
What I managed to do is jump on a specific line number.
import os
os.system('sudo gedit +50 test.txt')
How can i edit the code so that it searches for a specific string instead of a line number?

You can invoke the line as first ocurrence of text that are you looking for with next line:
gedit +$(grep -n "id" test.txt | head -n 1 | cut -f 1 -d ':') test.txt
This grep -n "text_to_find" test.txt | head -n 1 | cut -f 1 -d ':' means:
grep ... Tell me all lines where there are "test_to_find" and prefix with line number
head ...: get first occurrence
cut ... get the line number
You have to fix it in case that there are not ocurrences

You're not really going to be able to open the file and jump straight to a line, without first doing some logic to figure where that line is.
This should do the trick:
with open('test.txt', 'r') as myFile:
for line in myFile:
if 'My Search String Here' in line:
print line
You may also want to look at this: https://docs.python.org/2/library/linecache.html

Related

How to create a script to read a file line by line and concatenate them into a string? (Python or Bash)

I am trying to create a script currently that automates a command multiple times. I have a text file containing links to directories/files that is formatted line by line vertically. An example would be:
mv (X) /home/me
The X variable would change for every line in the directory/file text document. The script would execute the same command but change X each time. How would I got about doing this? Can someone point me in the right direction?
I appreciate the help!
Thanks a bunch!
That's a job for xargs:
xargs -d '\n' -I{} mv {} /path < file
Xargs will read standard input and for each element delimetered by a newline, it will substitute {} part by the readed part and execute mv.
import os
command = "mv {path} /home/me" # your command example, the {} will be replaced with the path
with open("path_to_file_list.txt", "r") as file:
paths = [s.strip() for s in file.readlines()] # assuming each line in the file is a path/file of the target files. the .strip() is to clear the newlines
for path in paths:
os.system(command.format(path=path)) # call each command, replacing the {path} with each file path from the text file.
cat file.txt | while read x; do
mv "$x" /home/me/
done

delete specific line numbers from history

The command grep -n blink ~/.bash_history outputs all lines that contain blink. I need a command that outputs only line numbers and executes the line numbers over history -d linenum
In python:
#list generated from command
linenumbers = [1,2,3,4,5]
for count in range(linenumbers):
os.system("history -d {}".format(count))
How do I do this?
In bash:
for offset in $(history | awk '/blink/ {print $1}' | tac)
do
history -d $offset
done
You can get the offsets directly from the history command, no need to generate line numbers with grep. Also you need to delete the lines in reverse (hence use of tac), because the offset of the commands following the one being deleted are shifted down.

Split Command - Choose Output Name

I have a text file named myfile.txt. The file contains 50,000 lines and I would like to split it into 50 text files. I know that this is easy with the split command:
split myfile.txt
This will output 50 1000-line files: xaa, xab, and xac.
My question, how do I run split my text file so that it names the output files:
1.txt
2.txt
3.txt
...
50.txt
Seeking answers in python or bash please. Thank you!
Here is a potential solution using itertools.islice to get the chunks and string formatting for the different file names:
from itertools import islice
with open('myfile.txt') as in_file:
for i in range(1, 51):
with open('{0}.txt'.format(i), 'w') as out_file:
lines = islice(in_file, 1000)
out_file.writelines(lines)
its not exactly what you are looking for, but running
split -d myfile.txt
will output
x00
x01
x02
...
To generate test data in empty directory, you can use
seq 50000 | split -d
To rename in the way that you want, you can use
ls x* | awk '{print $0, (substr($0,2)+1) ".txt"}' | xargs -n2 mv
Here's a funny one: if your split command supports the --filter option, you can use it!
If you call
split --filter=./banana myfile.txt
then the command ./banana will be executed with the environmental variable FILE set to the name split would choose to write the chunk it's processing. This command will receive on its standard input the chunk being processed. If this command returns a non-zero status code, then split will interrupt its operations.
Together with the -d option, that's exactly what you want. With the -d option, the name split will choose for the filenames will be x01, x02, etc.
Make a script:
#!/bin/bash
# remove the leading x from FILE
n=${FILE#x}
# check that n is a number
[[ $n = +([[:digit:]]) ]] || exit 1
# remove the leading zeroes from n
n=$((10#$n))
# send stdin to file
cat > "$n.txt"
Call this script banana, chmod +x it and let's go:
split -d --filter=./banana myfile.txt
This --filter option is really funny.
Here's an example of how you could split this file in bash:
split -l 1000 -d --additional-suffix=.txt myfile.txt
The -l argument determines the number of lines included in each split file (1000 in this case, for 50 total files), the -d argument uses numbers instead of letters for the suffixes, and the value we pass to the --additional-suffix argument here gives each file a .txt file extension.
This will create
x00.txt
x01.txt
x01.txt
etc.
If you wanted to change the 'x' portion of the output files, you'd want to add a prefix after the input file (e.g. myfile.txt f would create f01.txt, f02.txt, etc.)
Note that without --additional-suffix, your files will all lack filename extensions.
I've looked to see if there's a way to split a file and name them with only the suffix, but I haven't found anything.
A simple approach:
f=open('your_file')
count_line,file = 0,1
for x in f:
count_line +=1
if count%1000 == 1:
f1 = open(str(file) + '.txt','w')
f1.write(x)
file +=1
elif count_line%1000 == 0:
f1.write(x)
f1.close()
else:f1.write(x)

Compare 2 files and remove any lines in file2 when they match values found in file1

I have two files. i am trying to remove any lines in file2 when they match values found in file1. One file has a listing like so:
File1
ZNI008
ZNI009
ZNI010
ZNI011
ZNI012
... over 19463 lines
The second file includes lines that match the items listed in first:
File2
copy /Y \\server\foldername\version\20050001_ZNI008_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI010_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI012_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI009_162635.xml \\server\foldername\version\folder\
... continues listing until line 51360
What I've tried so far:
grep -v -i -f file1.txt file2.txt > f3.txt
does not produce any output to f3.txt or remove any lines. I verified by running
wc -l file2.txt
and the result is
51360 file2.txt
I believe the reason is that there are no exact matches. When I run the following it shows nothing
comm -1 -2 file1.txt file2.txt
Running
( tr '\0' '\n' < file1.txt; tr '\0' '\n' < file2.txt ) | sort | uniq -c | egrep -v '^ +1'
shows only one match, even though I can clearly see there is more than one match.
Alternatively putting all the data into one file and running the following:
grep -Ev "$(cat file1.txt)" 1>LinesRemoved.log
says argument has too many lines to process.
I need to remove lines matching the items in file1 from file2.
i am also trying this in python:
`
#!/usr/bin/python
s = set()
# load each line of file1 into memory as elements of a set, 's'
f1 = open("file1.txt", "r")
for line in f1:
s.add(line.strip())
f1.close()
# open file2 and split each line on "_" separator,
# second field contains the value ZNIxxx
f2 = open("file2.txt", "r")
for line in f2:
if line[0:4] == "copy":
fields = line.split("_")
# check if the field exists in the set 's'
if fields[1] not in s:
match = line
else:
match = 0
else:
if match:
print match, line,
`
it is not working well.. as im getting
'Traceback (most recent call last):
File "./test.py", line 14, in ?
if fields[1] not in s:
IndexError: list index out of range'
What about:
grep -F -v -f file1 file2 > file3
I like the grep solution from byrondrossos better, but here's another option:
sed $(awk '{printf("-e /%s/d ", $1)}' file1) file2 > file3
this is using Bash and GNU sed because of the -i switch
cp file2 file3
while read -r; do
sed -i "/$REPLY/d" file3
done < file1
there is surely a better way but here's a hack around -i :D
cp file2 file3
while read -r; do
(rm file3; sed "/$REPLY/d" > file3) < file3
done < file1
this exploits shell evaluation order
alright, I guess the correct way with this idea is using ed. This should be POSIX too.
cp file2 file3
while read -r line; do
ed file3 <<EOF
/$line/d
wq
EOF
done < file1
in any case, grep seems to do be the right tool for the job.
#byrondrossos answer should work for you well ;)
This is admittedly ugly but it does work. However, the path must be the same for all of the (except of course the ZNI### portion). All but the ZNI### of the path is removed so the command grep -vf can run correctly on the sorted files.
First Convert "testfile2" to "testfileconverted" to just show the ZNI###
cat /testfile2 | sed 's:^.*_ZNI:ZNI:g' | sed 's:_.*::g' > /testfileconverted
Second use inverse grep of the converted file compared to the "testfile1" and add the reformatted output to "testfile3"
bash -c 'grep -vf <(sort /testfileconverted) <(sort /testfile1)' | sed "s:^:\copy /Y \\\|server\\\foldername\\\version\\\20050001_:g" | sed "s:$:_162635\.xml \\\|server\\\foldername\\\version\\\folder\\\:g" | sed "s:|:\\\:g" > /testfile3

Bash or python for changing spacing in files

I have a set of 10000 files. In all of them, the second line, looks like:
AAA 3.429 3.84
so there is just one space (requirement) between AAA and the two other columns. The rest of lines on each file are completely different and correspond to 10 columns of numbers.
Randomly, in around 20% of the files, and due to some errors, one gets
BBB 3.429 3.84
so now there are two spaces between the first and second column.
This is a big error so I need to fix it, changing from 2 to 1 space in the files where the error takes place.
The first approach I thought of was to write a bash script that for each file reads the 3 values of the second line and then prints them with just one space, doing it for all the files.
I wonder what do oyu think about this approach and if you could suggest something better, bashm python or someother approach.
Thanks
Performing line-based changes to text files is often simplest to do in sed.
sed -e '2s/ */ /g' infile.txt
will replace any runs of multiple spaces with a single space. This may be changing more than you want, though.
sed -e '2s/^\([^ ]*\) /\1 /' infile.txt
should just replace instances of two spaces after the first block of space-free text with a single space (though I have not tested this).
(edit: inserted 2 before s in each instance to tie the edit to the second line, specifically.)
Use sed.
for file in *
do
sed -i '' '2s/ / /' "$file"
done
The -i '' flag means to edit in-place without a backup.
Or use ed!
for file in *
do
printf "2s/ / /\nwq\n" |ed -s "$file"
done
if the error always can occur at 2nd line,
for file in file*
do
awk 'NR==2{$1=$1}1' file >temp
mv temp "$file"
done
or sed
sed -i.bak '2s/ */ /' file* # do 2nd line
Or just pure bash scripting
i=1
while read -r line
do
if [ "$i" -eq 2 ];then
echo $line
else
echo "$line"
fi
((i++))
done <"file"
Since it seems every column is separated by one space, another approach not yet mentioned is to use tr to squeeze all multi spaces into single spaces:
tr -s " " < infile > outfile
I am going to be different and go with AWK:
awk '{print $1,$2,$3}' file.txt > file1.txt
This will handle any number of spaces between fields, and replace them with one space
To handle a specific line you can add line addresses:
awk 'NR==2{print $1,$2,$3} NR!=2{print $0}' file.txt > file1.txt
i.e. rewrite line 2, but leave unchanged the other lines.
A line address can be a regular expression as well:
awk '/regexp/{print $1,$2,$3} !/regexp/{print}' file.txt > file1.txt
This answer assumes you don't want to mess with any except the second line.
#!/usr/bin/env python
import sys, os
for fname in sys.argv[1:]:
with open(fname, "r") as fin:
line1 = fin.readline()
line2 = fin.readline()
fixedLine2 = " ".join(line2.split()) + '\n'
if fixedLine2 == line2:
continue
with open(fname + ".fixed", "w") as fout:
fout.write(line1)
fout.write(line2)
for line in fin:
fout.write(line)
# Enable these lines if you want the old files replaced with the new ones.
#os.remove(fname)
#os.rename(fname + ".fixed", fname)
I don't quite understand, but yes, sed is an option. I don't think any POSIX compliant version of sed has an in file option (-i), so a fully POSIX compliant solution would be.
sed -e 's/^BBB /BBB /' <file> > <newfile>
Use sed:
sed -e 's/[[:space:]][[:space:]]/ /g' yourfile.txt >> newfile.txt
This will replace any two adjacent spaces with one. The use of [[:space:]] just makes it a little bit clearer
sed -i -e '2s/ / /g' input.txt
-i: edit files in place

Categories