Input File - input.csv
#######A Result:#########
2016-07-27 bar 51 14
2015-06-27 roujri 30 86
#######B Result:#########
2016-08-26 foo 34 83
2016-08-26 foo 34 83
#########################
Output result
A result:
Col-1: 81
Col-2: 100
B result:
Col-1: 68
Col-2: 166
I am trying to solve one problem according to above input, output. So far I can read only first block Text. I want to more generic function so possibly I will only initialise the variable that need to read within the block, not hard coding (e.g. #######A Result:#########) and furthermore pass the block info to another function that will sum up the value. Any suggestion will be greatly appreciate. Thanks :)
import re
def reading_block_text_file(infile):
with open(infile) as fp:
for result in re.findall('#######A Result:#########(.*?)#######B Result:#########', fp.read(), re.S):
print result,
reading_block_text_file(input_file)
Throw in a little bit of regex:
$ cat a
#######A Result:#########
2016-07-27 bar 51 14
2015-06-27 roujri 30 86
#######B Result:#########
2016-08-26 foo 34 83
2016-08-26 foo 34 83
#########################
$ cat a.py
import re
col_names = ['abc', 'xyz']
with open("/tmp/a", "r") as f:
tables = re.findall(r'#+(\w+ Result:)#+([^#]*)', f.read(), re.S)
for table in tables:
name = table[0]
rows = table[1].strip().split('\n')
print name
for i in range(len(col_names)):
print "\t{}: {}".format(col_names[i], sum(map(lambda x: int(x.split()[i + 2]), rows)))
$ python a.py
A Result:
abc: 81
xyz: 100
B Result:
abc: 68
xyz: 166
Regex Explanation:
#+(\w+ Result:)#+([^#]*)
Debuggex Demo
Related
i have the following task. I have to find a specific pattern(word) in my file.txt(is a song centered on page) and to print out the row number + the row which has the pattern in it getting rid of the left spaces.
You can see the correct output here:
92 Meant in croaking "Nevermore."
99 She shall press, ah, nevermore!
107 Quoth the Raven, "Nevermore."
115 Quoth the Raven, "Nevermore."
and without this: my_str += ' '+str(count)+ ' ' + line.lstrip(), it will print:
92 Meant in croaking "Nevermore."
99 She shall press, ah, nevermore!
107 Quoth the Raven, "Nevermore."
115 Quoth the Raven, "Nevermore."
This is my code, but i want to have only 4 lines of code
```python
def find_in_file(pattern,filename):
my_str = ''
with open(filename, 'r') as file:
for count,line in enumerate(file):
if pattern in line.lower():
if count >= 10 and count <= 99:
my_str += ' '+str(count)+ ' ' + line.lstrip()
else:
my_str += str(count)+ ' ' + line.lstrip()
print(my_str)
In fact, one line can be completed:
''.join(f' {count} {line.lstrip()}' if 10 <= count <= 99 else f'{count} {line.lstrip()}' for count, line in enumerate(file) if pattern in line.lower())
However, this seems a little too long...
According to the comment area, it can be simplified:
''.join(f'{count:3} {line.lstrip()}' for count, line in enumerate(file) if pattern in line.lower())
def find_in_file(pattern,filename):
with open(filename, 'r') as file:
# 0 based line numbering, for 1 based use enumerate(file,1)
for count,line in enumerate(file):
if pattern in line.lower():
print(f"{count:>3} {line.strip()}")
would be 4 lines of code (inside the function) and should be equivalent to what you got.
Possible in one line as well:
def find_in_file(pattern,filename):
# 1 based line numbering
return '\n'.join(f'{count:>3} {line.strip()}' for count, line in enumerate(file,1) if pattern in line.lower())
See pythons mini format language.
You can use formatted strings to make sure the numbers always use three characters, even when they have only 1 or 2 digits.
I also prefer to use str.strip rather than str.lstrip, to get rid of trailing whitespace; in particular, lines read from the file will typically end with a linebreak, and then print will add a second linebreak, and we end up with too many linebreaks if we don't strip them away.
def find_in_file(pattern,filename):
with open(filename, 'r') as file:
for count,line in enumerate(file):
if pattern in line.lower():
print('{:3d} {}'.format(count, line.strip()))
find_in_file('nevermore','theraven.txt')
# 55 Quoth the Raven "Nevermore."
# 62 With such name as "Nevermore."
# 69 Then the bird said "Nevermore."
# 76 Of 'Never—nevermore'."
# 83 Meant in croaking "Nevermore."
# 90 She shall press, ah, nevermore!
# 97 Quoth the Raven "Nevermore."
# 104 Quoth the Raven "Nevermore."
# 111 Quoth the Raven "Nevermore."
# 118 Quoth the Raven "Nevermore."
# 125 Shall be lifted—nevermore!
How can I print first 52 numbers in the same column and so on (in 6 columns in total that repeats). I have lots of float numbers and I want to keep the first 52 and so on numbers in the same column before starting new column that will as well have to contain the next 52 numbers. The numbers are listed in lines separated by one space in a file.txt document. So in the end I want to have:
1 53 105 157 209 261
2
...
52 104 156 208 260 312
313 ... ... ... ... ...
...(another 52 numbers and so on)
I have try this:
with open('file.txt') as f:
line = f.read().split()
line1 = "\n".join("%-20s %s"%(line[i+len(line)/52],line[i+len(line)/6]) for i in range(len(line)/6))
print(line1)
However this only prints of course 2 column numbers . I have try to add line[i+len()line)/52] six time but the code is still not working.
for row in range(52):
for col in range(6):
print line[row + 52*col], # Dangling comma to stay on this line
print # Now go to the next line
Granted, you can do this in more Pythonic ways, but this will show you the algorithm structure and let you tighten the code as you wish.
I'm trying to extract two ranges per line of an opened text file in python 3 by looping through.
The application has a Entry widget and the value is stored in self.search. I then loop through a text file which contains the values I want, and write out the results to self.searchresults and then display in a Textbox.
I've tried variations of the third line below but am not getting anywhere.
I want to write out characters in each line from 3:24 and 81:83 ...
for line in self.searchfile:
if self.search in line:
line = line[3:24]+[81:83]
self.searchresults.write(line+"\n")
Here's an abridged version of the text file I'm working with (original here):
! Author: Greg Thompson NCAR/RAP
! please mail corrections to gthompsn (at) ucar (dot) edu
! Date: 24 Feb 2015
! This file is continuously maintained at:
! http://www.rap.ucar.edu/weather/surface/stations.txt
!
! [... more comments ...]
ALASKA 19-SEP-14
CD STATION ICAO IATA SYNOP LAT LONG ELEV M N V U A C
AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7 US
AK AKHIOK PAKH AKK 56 56N 154 11W 14 X 8 US
AK AKUTAN PAUT 54 09N 165 36W 25 X 7 US
AK AMBLER PAFM AFM 67 06N 157 51W 88 X 7 US
AK ANAKTUVUK PASS PAKP AKP 68 08N 151 44W 642 X 7 US
AK ANCHORAGE INTL PANC ANC 70273 61 10N 150 01W 38 X T X A 5 US
You're not far off - your problem is that you need to specify what you're slicing each time you slice it:
for line in self.searchfile:
if self.search in line:
line = line[3:24] + line[81:83]
self.searchresults.write(line+"\n")
... but you'll probably want to separate the two fields with a space:
line = line[3:24] + " " + line[81:83]
However, if you find yourself using + more than once to construct a string, you should think about using Python's built-in string-formatting abilities instead (and while you're at it, you can add that newline in the same operation):
for line in self.searchfile:
if self.search in line:
formatted = "%s %s\n" % (line[3:24], line[81:83])
self.searchresults.write(formatted)
I know there are questions on how to extract numbers from a text file, which have helped partially. Here is my problem. I have a text file that looks like:
Some crap here: 3434
A couple more lines
of crap.
34 56 56
34 55 55
A bunch more crap here
More crap here: 23
And more: 33
54 545 54
4555 55 55
I am trying to write a script that extracts the lines with the three numbers and put them into separate text files. For example, I'd have one file:
34 56 56
34 55 55
And another file:
54 545 54
4555 55 55
Right now I have:
for line in file_in:
try:
float(line[1])
file_out.write(line)
except ValueError:
print "Just using this as placeholder"
This successfully puts both chunks of numbers into a single file. But I need it to put one chunk in one file, and another chunk in another file, and I'm lost on how to accomplish this.
You didn't specify what version of Python you were using but you might approach it this way in Python2.7.
string.translate takes a translation table (which can be None) and a group of characters to translate (or delete if table is None).
You can set your delete_chars to everything but 0-9 and space by slicing string.printable correctly:
>>> import string
>>> remove_chars = string.printable[10:-6] + string.printable[-4:]
>>> string.translate('Some crap 3434', None, remove_chars)
' 3434'
>>> string.translate('34 45 56', None, remove_chars)
'34 45 56'
Adding a strip to trim white space on the left and right and iterating over a testfile containing the data from your question:
>>> with open('testfile.txt') as testfile:
... for line in testfile:
... trans = line.translate(None, remove_chars).strip()
... if trans:
... print trans
...
3434
34 56 56
34 55 55
23
33
54 545 54
4555 55 55
You can use regex here.But this will require reading file into a variable by file.read() or something.(If the file is not huge)
((?:(?:\d+ ){2}\d+(?:\n|$))+)
See demo.
https://regex101.com/r/tX2bH4/20
import re
p = re.compile(r'((?:(?:\d+ ){2}\d+(?:\n|$))+)', re.IGNORECASE)
test_str = "Some crap here: 3434\nA couple more lines\nof crap.\n34 56 56\n34 55 55\nA bunch more crap here\nMore crap here: 23\nAnd more: 33\n54 545 54\n4555 55 55"
re.findall(p, test_str)
re.findall returns a list.You can easily put each content of list in a new file.
To know if a string is a number you can use str.isdigit:
for line in file_in:
# split line to parts
parts = line.strip().split()
# check all parts are numbers
if all([str.isdigit(part) for part in parts]):
if should_split:
split += 1
with open('split%d' % split, 'a') as f:
f.write(line)
# don't split until we skip a line
should_split = False
else:
with open('split%d' % split, 'a') as f:
f.write(line)
elif not should_split:
# skipped line means we should split
should_split = True
I have an output that looks like this, where the first number corresponds to the count of the type below (e.g. 72 for Type 4, etc)
72
Type
4
51
Type
5
66
Type
6
78
Type
7
..etc
Is there a way to organize this data to look something like this:
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
etc..
Essentially, the question is how to take a single column of data and sort /organize it into something more readable using bash, awk, python, etc. (Ideally, in bash, but interested to know how to do in Python).
Thank you.
Use paste to join 3 consecutive lines from stdin, then just rearrange the fields.
paste - - - < file | awk '{print $2, $3, "=", $1, "times"}'
It's simple enough with Python to read three lines of data at a time:
def perthree(iterable):
return zip(*[iter(iterable)] * 3)
with open(inputfile) as infile:
for count, type_, type_num in perthree(infile):
print('{} {} = {} times'.format(type_.strip(), type_num.strip(), count.strip()))
The .strip() calls remove any extra whitespace, including the newline at the end of each line of input text.
Demo:
>>> with open(inputfile) as infile:
... for count, type_, type_num in perthree(infile):
... print('{} {} = {} times'.format(type_.strip(), type_num.strip(), count.strip()))
...
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
Type 7 = 78 times
In Bash:
#!/bin/bash
A=() I=0
while read -r LINE; do
if (( (M = ++I % 3) )); then
A[M]=$LINE
else
printf "%s %s = %s times\n" "${A[2]}" "$LINE" "${A[1]}"
fi
done
Running bash script.sh < file creates:
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
Type 7 = 78 times
Note: With a default IFS ($' \t\n'), read would remove leading and trailing spaces by default.
Try this awk one liner:
$ awk 'NR%3==1{n=$1}NR%3==2{t=$1}NR%3==0{print t,$1,"=",n,"times"}' file
Type 4 = 72 times
Type 5 = 51 times
Type 6 = 66 times
Type 7 = 78 times
How it works?
awk '
NR%3==1{ # if we are on lines 1,4,7, etc (NR is the record number (or the line number)
n=$1 # set the variable n to the first (and only) word
}
NR%3==2{ # if we are on lines 2,5,7, etc
t=$1 # set the variable t to the first (and only) word
}
NR%3==0{ # if we are on lines 3,6,9, etc
print t,$1,"=",n,"times" # print the desired output
}' file