I have 50000 files and each one has 10000 lines. Each line is in the form:
value_1 (TAB) value_2 (TAB) ... value_n
I wanted to remove specific values from every line in every file (i used cut to remove values 14-17) and write the results to a new file.
For doing that in one file, i wrote this code:
file=nameOfFile
newfile=$file".new"
i=0
while read line
do
let i=i+1
echo line: $i
a=$i"p"
lineFirstPart=$(sed -n -e $a $file | cut -f 1-13)
#echo lineFirstPart: $lineFirstPart
lineSecondPart=$(sed -n -e $a $file | cut -f 18-)
#echo lineSecondPart: $lineSecondPart
newline=$lineFirstPart$lineSecondPart
echo $newline >> $newfile
done < $file
This takes ~45 secs for one file, which means for all it will take about: 45x50000 = 625h ~= 26 days!
Well, i need something faster, e.g. a solution that cats the whole file, applies the two cut commands simultaneusly or something like that i guess.
Also solutions in python are accepted + appreciated but bash scripting is preferable!
The entire while loop can be replaced with one line:
cut -f1-13,18- $file > $newfile
Related
I am trying to write a bash script to use in python code.
Multi-line bash command (this works perfectly when run directly from terminal)
mydatefile="/var/tmp/date"
while IFS= read line
do
echo $line
sh /opt/setup/Script/EPS.sh $(echo $line) | grep "WORD" | awk -F ',' '{print $6}'
sleep 1
done <"$mydatefile"
My single line conversion
mydatefile="/var/tmp/date;" while IFS= read line do echo $line; sh /opt/setup/Script/EPS.sh $(echo $line) | grep "WORD" | awk -F ',' '{print $6}'; sleep 1; done <"$mydatefile";
ERROR
-bash: syntax error near unexpected token `done'
Missing a ; (fatal syntax error):
while IFS= read line; do echo ...
# ^
# here
More in depth :
combined grep+awk in a single command
mydatefile="/var/tmp/date"
while IFS= read line; do
echo "$line"
sh /opt/setup/Script/EPS.sh "$line" |
awk -F ',' '/WORD/{print $6}'
sleep 1
done < "$mydatefile"
use more quotes !
Learn how to quote properly in shell, it's very important :
"Double quote" every literal that contains spaces/metacharacters and every expansion: "$var", "$(command "$var")", "${array[#]}", "a & b". Use 'single quotes' for code or literal $'s: 'Costs $5 US', ssh host 'echo "$HOSTNAME"'. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words
finally:
mydatefile="/var/tmp/date;" while IFS= read line; do echo $line; sh /opt/setup/Script/EPS.sh "$line" | awk -F ',' '/WORD/{print $6}'; sleep 1; done < "$mydatefile";
One way to do this conversion might be to paste the script onto the command-line, then look up in the history - though this might depend on the version of command-line editor you have. Note that you do need a semicolon before do, but NOT after. You are punished for too many semicolons as well as too few.
Another way would be to line-by-line fold each line in your script and keep testing it.
The binary chop approach is do the first half, test and undo or continue.
Once you have it down to 1 line that works you can pasted it into python.
I am trying to make a Python program that is based on the 20w14infinte Minecraft snapshot. The 'world gen.' was going to be made in Python 3 using os.system() but the lines were very long so I made an SH script to make the worlds for me. It should append a random number between 0 and 32767, the 16-bit limit, to the end of a file.
Here's my code:
Python:
# imports
import random
import os
# variables
game_name = "testing-world"
# functions
def mk_world():
os.system(f"./mk_world.sh {game_name}")
mk_world()
Bash (mk_world.sh):
#!/bin/bash
game_name=$1
cd ./games/$game_name/worlds/
seed=$RANDOM
mkdir $seed
cd $seed
touch world.dimension
echo $RANDOM
ls
for i in {1..100} ;do
echo $RANDOM > world.dimension
done
cat world.dimension
for i in {1..100} ;do
echo $RANDOM > world.dimension
done
This part will execute "echo $RANDOM > world dimension" 100 times, and ">" redirection means that world.dimension will be overwritten, so you should ">>" to append to the file
Possibly you want to do just the following:
echo $RANDOM >> world.dimension
A possible Python solution would be something like:
import random
# Open world.dimension in append mode
with open("world.dimension", "a") as world_dimension:
# 100 times
for i in range(100):
# Write a random integer between 0 and 32767 to the file and
# append a trailing newline character to split the lines
world_dimension.write("{}\n".format(random.randint(0, 32767)))
I'm looking for a way to rename a list of image files with gaps to be sequential. Also I want to give them a padding of 4. I'm currently using Python 2.7 and Linux bash to program this.
Example:
1.png
2.png
3.png
20.png
21.png
50.png
Should turn into:
0001.png
0002.png
0003.png
0004.png
0005.png
0006.png
I also would like for the files name to be the same as the directory that they are currently in.
Example:
c_users_johnny_desktop_images.0001.png
c_users_johnny_desktop_images.0002.png
c_users_johnny_desktop_images.0003.png
c_users_johnny_desktop_images.0004.png
c_users_johnny_desktop_images.0005.png
c_users_johnny_desktop_images.0006.png
Any help would be greatly appreciated! :)
Cheers
this is python
#first collect all files that start with a number and end with .png
my_files = [f for f in os.listdir(some_directory) if f[0].isdigit() and f.endswith(".png")]
#sort them based on the number
sorted_files = sorted(my_files,key=lambda x:int(x.split(".")[0])) # sort the file names by starting number
#rename them sequentially
for i,fn in enumerate(sorted_files,1): #thanks wim
os.rename(sorted_files[i],"{0:04d}.png".format(i))
I could have used list.sort(key=...) to sort in place but I figured this would be marginally more verbose and readable ...
Try doing this in a shell :
rename -n '
$s = substr(join("_", split("/", $ENV{PWD})), 1) . ".";
s/(\d+)\.png/$s . sprintf("%04d", ++$c) . ".png"/e
' *.png
Output :
1.png -> c_users_johnny_desktop_images.0001.png
2.png -> c_users_johnny_desktop_images.0002.png
3.png -> c_users_johnny_desktop_images.0003.png
20.png -> c_users_johnny_desktop_images.0004.png
21.png -> c_users_johnny_desktop_images.0005.png
50.png -> c_users_johnny_desktop_images.0006.png
rename is http://search.cpan.org/~pederst/rename/ and is the defalut rename command on many distros.
When the command is tested as well, you can remove the -n switch to do it for real.
Blah Blah Blah. CSH is bad. BASH is good. Python is better. Bah humbug. I still use TCSH...
% set i = 1
% foreach FILE ( `ls *[0-9].png | sort -n` )
echo mv $FILE `printf %04d $i`.png ; # i ++
end
Output:
mv 1.png 0001.png
mv 2.png 0002.png
mv 3.png 0003.png
mv 20.png 0004.png
mv 21.png 0005.png
mv 50.png 0006.png
Responding to comments:
Still need c_users_johnny_desktop_images.
Ok, so use:
echo mv $FILE c_users_johnny_desktop_images.`printf %04d $i`.png ; # i ++
It's not like my example was hard to read.
Correction: Perhaps you meant to automatically extract the current directory name and incorporate it. E.g.:
echo mv $FILE `echo $cwd | sed -e 's|^/||' -e 's|/|_|g'`.`printf %04d $i`.png ; # i ++
-
are globs not present in tcsh ? Your parsing of ls seems scary
Of course globs are present. That's what we are passing into ls. But globbing gives us a list that is sorted alphabetically, as in 1,2,20,21,3,50. We want a numerical sort, as in 1,2,3,20,21,50. Standard problem when we don't have leading zeros in the numbers.
sort -n does a numeric sort. ls gives us a newline after each filename. We could just as easily write:
foreach FILE ( `echo *[0-9].png | tr ' ' '\012' | sort -n` )
But I'm lazy and ls does the newline for me. What's so scary about it?
I have some files, consisting of end of day stock data in the following format :
Filename: NYSE_20120116.txt
<ticker>,<date>,<open>,<high>,<low>,<close>,<vol>
A,20120116,36.15,36.36,35.59,36.19,3327400
AA,20120116,10.73,10.78,10.53,10.64,20457600
How can I create files for every symbol?
For example for the company A
Filename : A.txt
<ticker>,<date>,<open>,<high>,<low>,<close>,<vol>
A,20120116,36.15,36.36,35.59,36.19,3327400
A,20120117,39.76,40.39,39.7,39.99,4157900
You want to split the first file at record level, then route each row to a different file based on the value of the first field?
# To skip first line, see later
cat endday.txt | while read line; do
# Careful with backslashes here - they're not quote signs
# If supported, use:
# symbol=$( echo "$line" | cut -f1 -d, )
symbol=`echo "$line" | cut -f1 -d,`
# If file is not there, create it with a header
# if [ ! -r $symbol.txt ]; then
# head -n 1 endday.txt > $symbol.txt
# fi
echo "$line" >> $symbol.txt
done
Not very efficient: Perl or Python would have been better.
If you have several files in a directory (mind you, you have to remove them yourself, or they will be processed again and again...), you can do:
for file in *.txt; do
echo "Now processing $file..."
# A quick and dirty way of ignoring line number 1 --- start at line 2.
tail -n +2 $file | while read line; do
# Careful with backslashes here - they're not quote signs
# If supported, use:
# symbol=$( echo "$line" | cut -f1 -d, )
symbol=`echo "$line" | cut -f1 -d,`
# If file is not there, create it with a header
# if [ ! -r $symbol.txt ]; then
# head -n 1 $file > $symbol.csv
# fi
# Output file is named .CSV so as not to create new .txt files
# which this script might find
echo "$line" >> $symbol.csv
done
# Change the name from .txt to .txt.ok, so it won't be found again
mv $file $file.ok
# or better move it elsewhere to avoid clogging this directory
# mv $file /var/data/files/already-processed
done
I would like to create 1000+ text files with some text to test a script, how to create this much if text files at a go using shell script or Perl. Please could anyone help me?
for i in {0001..1000}
do
echo "some text" > "file_${i}.txt"
done
or if you want to use Python <2.6
for x in range(1000):
open("file%03d.txt" % x,"w").write("some text")
#!/bin/bash
seq 1 1000 | split -l 1 -a 3 -d - file
Above will create 1000 files with each file having a number from 1 to 1000. The files will be named from file000 to file999.
In Perl:
use strict;
use warnings;
for my $i (1..1000) {
open(my $out,">",sprintf("file%04d",$i));
print $out "some text\n";
close $out;
}
Why the first 2 lines? Because they are good practice so I use them even in 1-shot programs like these.
Regards,
Offer
For variety:
#!/usr/bin/perl
use strict; use warnings;
use File::Slurp;
write_file $_, "$_\n" for map sprintf('file%04d.txt', $_), 1 .. 1000;
#!/bin/bash
for suf in $(seq -w 1000)
do
cat << EOF > myfile.$suf
this is my text file
there are many like it
but this one is mine.
EOF
done
I don't know in shell or perl but in python would be:
#!/usr/bin/python
for i in xrange(1000):
with open('file%0.3d' %i,'w') as fd:
fd.write('some text')
I think is pretty straightforward what it does.
You can use only Bash with no externals and still be able to pad the numbers so the filenames sort properly (if needed):
read -r -d '' text << 'EOF'
Some text for
my files
EOF
for i in {1..1000}
do
printf -v filename "file%04d" "$i"
echo "$text" > "$filename"
done
Bash 4 can do it like this:
for filename in file{0001..1000}; do echo $text > $filename; done
Both versions produce filenames like "file0001" and "file1000".
Just take any big file that has more than 1000 bytes (for 1000 files with content). There are lots of them on your computer. Then do (for example):
split -n 1000 /usr/bin/firefox
This is instantly fast.
Or a bigger file:
split -n 10000 /usr/bin/cat
This took only 0.253 seconds for creating 10000 files.
For 100k files:
split -n 100000 /usr/bin/gcc
Only 1.974 seconds for 100k files with about 5 bytes each.
If you only want files with text, look at your /etc directory. Create one million text files with almost random text:
split -n 1000000 /etc/gconf/schemas/gnome-terminal.schemas
20.203 seconds for 1M files with about 2 bytes each. If you divide this big file in only 10k parts it only takes 0.220 seconds and each file has 256 bytes of text.
Here is a short command-line Perl program.
perl -E'say $_ $_ for grep {open $_, ">f$_"} 1..1000'