Collapse rows based on column 1 - python

I want to parse InterProScan results for TopGO R package.
I would like to have a file in a format a bit distant of what I have.
# input file (gene_ID GO_ID1, GO_ID2, GO_ID3, ....)
Q97R95 GO:0004349, GO:0005737, GO:0006561
Q97R95 GO:0004349, GO:0006561
Q97R95 GO:0005737, GO:0006561
Q97R95 GO:0006561
# desired output (removed duplicates and rows collapsed)
Q97R95 GO:0004349,GO:0005737,GO:0006561
You can test your tool with the whole data file here:
https://drive.google.com/file/d/0B8-ZAuZe8jldMHRsbGgtZmVlZVU/view?usp=sharing

You can make use of 2-d array of gnu awk:
awk -F'[, ]+' '{for(i=2;i<=NF;i++)r[$1][$i]}
END{for(x in r){
printf "%s ",x;b=0;
for(y in r[x]){printf "%s%s",(b?",":""),y;b=1}
print ""}
}' file
It gives:
Q97R95 GO:0005737,GO:0006561,GO:0004349
The duplicated fields are removed, however the order was not kept.

Here is a, hopefully tidy, Perl solution. It preserves order of keys and values as far as possible, and doesn't keep the whole file contents in memory, only as much as necessary to do the job.
#!perl
use strict;
use warnings;
my ($prev_key, #seen_values, %seen_values);
while (<>) {
# Parse the input
chomp;
my ($key, $values) = split /\s+/, $_, 2;
my #values = split /,\s*/, $values;
# If we have a new key...
if ($key ne $prev_key) {
# output the old data, as long as there is some,
if (#seen_values) {
print "$prev_key\t", join(", ", #seen_values), "\n";
}
# clear it out,
#seen_values = %seen_values = ();
# and remember the new key for next time.
$prev_key = $key;
}
# Merge this line's values with previous ones, de-duplicating
# but preserving order.
for my $value (#values) {
push #seen_values, $value unless $seen_values{$value}++;
}
}
# Output what's left after the last line
if (#seen_values) {
print "$prev_key\t", join(", ", #seen_values), "\n";
}

Related

Adding text in the middle of the file

I have these files:
actions.js - append before }
import {constants} from "./constants";
export const setUser = (value) => ({
type: constants.SET_USER,
payload: value,
});
//here
constants.js - append to the end
export const constants = {
SET_USER: "SET_USER",
//here
};
reducers.js - add a const above export and inside the combineReducers object
import {constants} from "./constants";
import {combineReducers} from "redux";
const user = (state = null, action) => action.type === constants.SET_USER ? action.payload : state;
//here
export const reducers = combineReducers({
user,
// here
})
And I want to add code into these files in the places where I put //here. How can I do that with Python? I know I can write over a file with open('file', 'w').write('string') but how can I actually add text without loosing and overwriting the file? I want to add the text to the existing file, not to create the file, or overwrite it. I want it to have the old text, and add the new text to it. How can I achieve this with Python?
I made it append to the actions.js like this:
import sys
import os
reducer = sys.argv[1]
open("actions.js","a").write("""export const set{reducer} = (value) => ({{
type: constants.{constant},
payload: value,
}});
""".format(reducer=reducer.capitalize(), constant=constant))
But I have no idea how to get the others done
Read the file, slice the string at index you want, concatenate in order, and then write to the file with cursor at 0. Let x.txt be your file. "export" in the index() method here refers to a unique non repeating word. You can use unique comments to slice the string at respective positions!
with open("x.txt","r+") as f:
old=f.read()
print(old)
constant_text= "What you want to add??"
result=old[0:old.index("export")] + constant_text + old[old.index("export"):]
# print(result)
f.seek(0)
f.write(result)
print("######################################")
print(result)
Make sure the index keywords are unique if you want to slice in multiple locations using keywords!
To my knowledge, this is not possible in the way you suggest in a single operation. My solution of choice would be to iterate over the file’s lines, and once you hit your // here - marker, insert the code.
new_content = ""
with open(file_name) as f:
for line in f.readlines():
new_content += line
if line.strip() == "// here":
new_content += text_to_insert
After this loop, new_content should hold the old text and the new* inserted at the right place, which you can then write to any file you like.
*assuming that your input is properly formatted, including line breaks and so on.

How can I elegantly combine/concat files by section with python?

Like many an unfortunate programmer soul before me, I am currently dealing with an archaic file format that refuses to die. I'm talking ~1970 format specification archaic. If it were solely up to me, we would throw out both the file format and any tool that ever knew how to handle it, and start from scratch. I can dream, but that unfortunately that won't resolve my issue.
The format: Pretty Loosely defined, as years of nonsensical revisions have destroyed almost all back compatibility it once had. Basically, the only constant is that there are section headings, with few rules about what comes before or after these lines. The headings are sequential (e.g. HEADING1, HEADING2, HEADING3,...), but not numbered and are not required (e.g HEADING1, HEADING3, HEADING7). Thankfully, all possible heading permutations are known. Here's a fake example:
# Bunch of comments
SHOES # First heading
# bunch text and numbers here
HATS # Second heading
# bunch of text here
SUNGLASSES # Third heading
...
My problem: I need to concatenate multiple of these files by these section headings. I have a perl script that does this quite nicely:
while(my $l=<>) {
if($l=~/^SHOES/i) { $r=\$shoes; name($r);}
elsif($l=~/^HATS/i) { $r=\$hats; name($r);}
elsif($l=~/^SUNGLASSES/i) { $r=\$sung; name($r);}
elsif($l=~/^DRESS/i || $l=~/^SKIRT/i ) { $r=\$dress; name($r);}
...
...
elsif($l=~/^END/i) { $r=\$end; name($r);}
else {
$$r .= $l;
}
print STDERR "Finished processing $ARGV\n" if eof;
}
As you can see, with the perl script I basically just change where a reference points to when I get to a certain pattern match, and concatenate each line of the file to its respective string until I get to the next pattern match. These are then printed out later as one big concated file.
I would and could stick with perl, but my needs are becoming more complex every day and I would really like to see how this problem can be solved elegantly with python (can it?). As of right now my method in python is basically to load the entire file as a string, search for the heading locations, then split up the string based on the heading indices and concat the strings. This requires a lot of regex, if-statements and variables for something that seems so simple in another language.
It seems that this really boils down to a fundamental language issue. I found a very nice SO discussion about python's "call-by-object" style as compared with that of other languages that are call-by-reference.
How do I pass a variable by reference?
Yet, I still can't think of an elegant way to do this in python. If anyone can help kick my brain in the right direction, it would be greatly appreciated.
That's not even elegant Perl.
my #headers = qw( shoes hats sunglasses dress );
my $header_pat = join "|", map quotemeta, #headers;
my $header_re = qr/$header_pat/i;
my ( $section, %sections );
while (<>) {
if (/($header_re)/) { name( $section = \$sections{$1 } ); }
elsif (/skirt/i) { name( $section = \$sections{'dress'} ); }
else { $$section .= $_; }
print STDERR "Finished processing $ARGV\n" if eof;
}
Or if you have many exceptions:
my #headers = qw( shoes hats sunglasses dress );
my %aliases = ( 'skirt' => 'dress' );
my $header_pat = join "|", map quotemeta, #headers, keys(%aliases);
my $header_re = qr/$header_pat/i;
my ( $section, %sections );
while (<>) {
if (/($header_re)/) {
name( $section = \$sections{ $aliases{$1} // $1 } );
} else {
$$section .= $_;
}
print STDERR "Finished processing $ARGV\n" if eof;
}
Using a hash saves the countless my declarations you didn't show.
You could also do $header_name = $1; name(\$sections{$header_name}); and $sections{$header_name} .= $_ for a bit more readability.
I'm not sure if I understand your whole problem, but this seems to do everything you need:
import sys
headers = [None, 'SHOES', 'HATS', 'SUNGLASSES']
sections = [[] for header in headers]
for arg in sys.argv[1:]:
section_index = 0
with open(arg) as f:
for line in f:
if line.startswith(headers[section_index + 1]):
section_index = section_index + 1
else:
sections[section_index].append(line)
Obviously you could change this to read or mmap the whole file, then re.search or just buf.find for the next header. Something like this (untested pseudocode):
import sys
headers = [None, 'SHOES', 'HATS', 'SUNGLASSES']
sections = defaultdict(list)
for arg in sys.argv[1:]:
with open(arg) as f:
buf = f.read()
section = None
start = 0
for header in headers[1:]:
idx = buf.find('\n'+header, start)
if idx != -1:
sections[section].append(buf[start:idx])
section = header
start = buf.find('\n', idx+1)
if start == -1:
break
else:
sections[section].append(buf[start:])
And there are plenty of other alternatives, too.
But the point is, I can't see anywhere where you'd need to pass a variable by reference in any of those solutions, so I'm not sure where you're stumbling on whichever one you've chosen.
So, what if you want to treat two different headings as the same section?
Easy: create a dict mapping headers to sections. For example, for the second version:
headers_to_sections = {None: None, 'SHOES': 'SHOES', 'HATS': 'HATS',
'DRESSES': 'DRESSES', 'SKIRTS': 'DRESSES'}
Now, in the code that doessections[section], just do sections[headers_to_sections[section]].
For the first, just make this a mapping from strings to indices instead of strings to strings, or replace sections with a dict. Or just flatten the two collections by using a collections.OrderedDict.
My deepest sympathies!
Here's some code (please excuse minor syntax errors)
def foundSectionHeader(l, secHdrs):
for s in secHdrs:
if s in l:
return True
return False
def main():
fileList = ['file1.txt', 'file2.txt', ...]
sectionHeaders = ['SHOES', 'HATS', ...]
sectionContents = dict()
for section in sectionHeaders:
sectionContents[section] = []
for file in fileList:
fp = open(file)
lines = fp.readlines()
idx = 0
while idx < len(lines):
sec = foundSectionHeader(lines[idx]):
if sec:
idx += 1
while not foundSectionHeader(lines[idx], sectionHeaders):
sectionContents[sec].append(lines[idx])
idx += 1
This assumes that you don't have content lines which look like "SHOES"/"HATS" etc.
Assuming you're reading from stdin, as in the perl script, this should do it:
import sys
import collections
headings = {'SHOES':'SHOES','HATS':'HATS','DRESS':'DRESS','SKIRT':'DRESS'} # etc...
sections = collections.defaultdict(str)
key = None
for line in sys.stdin:
sline = line.strip()
if sline not in headings:
sections[headings.get(key)].append(sline)
else:
key = sline
You'll end up with a dictionary where like this:
{
None: <all lines as a single string before any heading>
'HATS' : <all lines as a single string below HATS heading and before next heading> ],
etc...
}
The headings list does not have to be defined in the some order as the headings appear in the input.

Search/Replace/Delete Jekyll YAML Front Matter Category Tags

I've inherited a Jekyll website and I'm coming from a .NET world so it's a learning curve for me.
This Jekyll site takes forever to build and I think it is because there are literally thousands of category tags that require those pages to be removed. I'm able to get a list of all the categories and created a CSV that I'd like to loop through and figure out if a category tag is still needed. The structure of the CSV is:
old_tag,new_tag
Clearly I'd like to update the tags based on those (e.g. make all C#, C-Sharp, C # and C Sharp categories just C-Sharp). But, I'd also like to delete some where the old tag field exists and the new one is blank:
old_tag,new_tag
C#, C-Sharp
C Sharp, C-Sharp
Crazy,
C #, C-Sharp
Using Ruby or Python I'd like to figure out how to loop through over 4000 markdown files and use the CSV to conditionally update each one. The database person in me just can't think how this would work with flat files.
I'd recommend starting with a Hash, using it like a translation table. Hash lookups are very fast, and can organize your tags and their replacements nicely.
hash = {
# old_tag => new_tag
'C#' => 'C-Sharp',
'C Sharp' => 'C-Sharp',
'Crazy' => '',
'C #' => 'C-Sharp',
}
You can see there's a lot of redundancy in the values, which could be fixed by reversing the hash, which reduces it nicely:
hash = {
# new_tag => old_tag
'C-Sharp' => ['C#', 'C Sharp', 'C #'],
}
'Crazy' is an outlier, but we will deal with that.
Ruby's String.gsub has a nice, but little used feature, where we can pass it a regular expression, and a hash, and it'll replace all regex matches with the equivalent value in the hash. We can build that regex easily:
regex = /(?:#{ Regexp.union(hash.keys).source })/
=> /(?:C\-Sharp)/
Now, you're probably saying, "but wait, I have a lot more tags to find!", and, because of the way the hash is built, they're hidden in the values. To remedy that, we'll reverse the hash's keys and values, exploding the value arrays into their individual elements:
reversed_hash = Hash[hash.flat_map{ |k,v| v.map{ |i| [i,k] } }]
=> {
"C#" => "C-Sharp",
"C Sharp" => "C-Sharp",
"C #" => "C-Sharp",
}
Adding in 'Crazy' is easy, by merging a second hash of the "special cases":
special_cases = {
'Crazy' => ''
}
reversed_hash = Hash[hash.flat_map{ |k,v| v.map{ |i| [i,k] } }].merge(special_cases)
=> {
"C#" => "C-Sharp",
"C Sharp" => "C-Sharp",
"C #" => "C-Sharp",
"Crazy" => ""
}
Using that with the regex buildin' code:
regex = /(?:#{ Regexp.union(reversed_hash.keys).source })/
=> /(?:C\#|C\ Sharp|C\ \#|Crazy)/
That will find the tags using a auto-generated regex. If it needs to be case-insensitive, use:
regex = /(?:#{ Regexp.union(reversed_hash.keys).source })/i
Creating some text to test against:
text =<<EOT
This is "#C#"
This is "C Sharp"
This is "C #"
This is "Crazy"
EOT
=> "This is \"#C#\"\nThis is \"C Sharp\"\nThis is \"C #\"\nThis is \"Crazy\"\n"
And testing the gsub:
puts text.gsub(regex, reversed_hash)
Which outputs:
This is "#C-Sharp"
This is "#C-Sharp"
This is "#C-Sharp"
This is "#"
Now, I'm not a big fan of slurping big files into memory, because that doesn't scale well. Today's machines usually have many GB of memory, but I see files that still exceed the RAM in a machine. So, instead of using a File.read to load the file, then a single gsub to process it, I recommend using File.foreach. Using that changes the code.
Here's how I'd do it:
file_to_read = '/path/to/file/to/read'
File.open(file_to_read + '.new', 'w') do |fo|
File.foreach(file_to_read) do |li|
fo.puts li.gsub(regex, reversed_hash)
end
end
File.rename(file_to_read, file_to_read + '.bak')
File.rename(file_to_read + '.new', file_to_read)
This will create a .bak version of each file processed, so if something goes wrong you have a fall-back, which is always a good practice.
Edit: I forgot about the CSV file:
You can read/create one easily with Ruby using the CSV module, however I'd go with a YAML file because it allows you to easily create your hash layout in a file that is easy to edit by hand, or generate from a file.
Edit: More about CSV, YAML and generating one from the other
Here's how to read the CSV and convert it into the recommended hash format:
require 'csv'
text = <<EOT
C#, C-Sharp
C Sharp, C-Sharp
Crazy,
C #, C-Sharp
EOT
hash = Hash.new{ |h,k| h[k] = [] }
special_cases = []
CSV.parse(text) do |k,v|
(
(v.nil? || v.strip.empty?) ? special_cases : hash[v.strip]
) << k.strip
end
Picking up from before:
reversed_hash = Hash[hash.flat_map{ |k,v| v.map{ |i| [i,k] } }].merge(Hash[special_cases.map { |k| [k, ''] }])
puts reversed_hash
# => {"C#"=>"C-Sharp", "C Sharp"=>"C-Sharp", "C #"=>"C-Sharp", "Crazy"=>""}
To convert the CSV file to something more editable and useful, use the above code to create hash and special_cases, then:
require 'yaml'
puts ({
'hash' => hash,
'special_cases' => special_cases
}).to_yaml
Which looks like:
---
hash:
C-Sharp:
- C#
- C Sharp
- ! 'C #'
special_cases:
- Crazy
The rest you can figure out from the YAML docs.
Here's one possible approach; not sure how well it will work for large amounts of data:
require "stringio"
require "csv"
class MarkdownTidy
def initialize(rules)
#csv = CSV.new(rules.is_a?(IO) ? rules : StringIO.new(rules))
#from_to = {}.tap do |hsh|
#csv.each do |from, to|
re = Regexp.new(Regexp.escape(from.strip))
hsh[re] = to.strip
end
end
end
def tidy(str)
cpy = str.dup
#from_to.each do |re, canonical|
cpy.gsub! re, canonical
end
cpy
end
end
csv = <<-TEXT
C#, C-Sharp
C Sharp, C-Sharp
Crazy,
C #, C-Sharp
TEXT
markdown = <<-TEXT
C# some text C # some text Crazy
C#, C Sharp
TEXT
mt = MarkdownTidy.new(csv)
[markdown].each do |str|
puts mt.tidy(markdown)
end
The idea is that you would replace the loop at the very end with one that opens up the files, reads them and then saves them back to disk.

How can I sort out this data?

http://img32.imageshack.us/img32/6649/workspace1001.png
big version
I have this product data in a csv file, but some of the fields are wrong.
Look at the screenshot. Some of the images are like this:
image.jpg#foobar
When they need to be
image.jpg
Not all of them have this. They are all .jpg
Is there something I can do in Sed or Python/Perl to fix this?
sed -i.bk -e 's/jpg#[^,]*/jpg/g' filename
So all you want to do is strip the #... from column S, the image column right?
Perl can do this neatly. Handles quoted cols in CSV and only updates the columns you specifiy.
my $in = IO::File->new( "<old.csv" );
my $out = IO::File->new( ">new.csv" );
my $csv = Text::CSV_XS->new();
while( my $rec = $csv->getline($fh) )
{
$rec->[18] =~ s/\#.*$//s;
$csv->print( $row );
}

How would I go about parsing the following log?

I need to parse a log in the following format:
===== Item 5483/14800 =====
This is the item title
Info: some note
===== Item 5483/14800 (Update 1/3) =====
This is the item title
Info: some other note
===== Item 5483/14800 (Update 2/3) =====
This is the item title
Info: some more notes
===== Item 5483/14800 (Update 3/3) =====
This is the item title
Info: some other note
Test finished. Result Foo. Time 12 secunds.
Stats: CPU 0.5 MEM 5.3
===== Item 5484/14800 =====
This is this items title
Info: some note
Test finished. Result Bar. Time 4 secunds.
Stats: CPU 0.9 MEM 4.7
===== Item 5485/14800 =====
This is the title of this item
Info: some note
Test finished. Result FooBar. Time 7 secunds.
Stats: CPU 2.5 MEM 2.8
I only need to extract each item's title (next line after ===== Item 5484/14800 =====) and the result.
So i need to keep only the line with the item title and the result for that title and discard everything else.
The issue is that sometimes a item has notes (maxim 3) and sometimes the result is displayed without additional notes so this makes it tricky.
Any help would be appreciated. I'm doing the parser in python but don't need the actual code but some pointing in how could i achive this?
LE: The result I'm looking for is to discard everything else and get something like:
('This is the item title','Foo')
then
('This is this items title','Bar')
1) Loop through every line in the log
a)If line matches appropriate Regex:
Display/Store Next Line as the item title.
Look for the next line containing "Result
XXXX." and parse out that result for
including in the result set.
EDIT: added a bit more now that I see the result you're looking for.
I know you didn't ask for real code but this is too great an opportunity for a generator-based text muncher to pass up:
# data is a multiline string containing your log, but this
# function could be easily rewritten to accept a file handle.
def get_stats(data):
title = ""
grab_title = False
for line in data.split('\n'):
if line.startswith("====="):
grab_title = True
elif grab_title:
grab_title = False
title = line
elif line.startswith("Test finished."):
start = line.index("Result") + 7
end = line.index("Time") - 2
yield (title, line[start:end])
for d in get_stats(data):
print d
# Returns:
# ('This is the item title', 'Foo')
# ('This is this items title', 'Bar')
# ('This is the title of this item', 'FooBar')
Hopefully this is straightforward enough. Do ask if you have questions on how exactly the above works.
Maybe something like (log.log is your file):
def doOutput(s): # process or store data
print s
s=''
for line in open('log.log').readlines():
if line.startswith('====='):
if len(s):
doOutput(s)
s=''
else:
s+=line
if len(s):
doOutput(s)
I would recommend starting a loop that looks for the "===" in the line. Let that key you off to the Title which is the next line. Set a flag that looks for the results, and if you don't find the results before you hit the next "===", say no results. Else, log the results with the title. Reset your flag and repeat. You could store the results with the Title in a dictionary as well, just store "No Results" when you don't find the results between the Title and the next "===" line.
This looks pretty simple to do based on the output.
Regular expression with group matching seems to do the job in python:
import re
data = """===== Item 5483/14800 =====
This is the item title
Info: some note
===== Item 5483/14800 (Update 1/3) =====
This is the item title
Info: some other note
===== Item 5483/14800 (Update 2/3) =====
This is the item title
Info: some more notes
===== Item 5483/14800 (Update 3/3) =====
This is the item title
Info: some other note
Test finished. Result Foo. Time 12 secunds.
Stats: CPU 0.5 MEM 5.3
===== Item 5484/14800 =====
This is this items title
Info: some note
Test finished. Result Bar. Time 4 secunds.
Stats: CPU 0.9 MEM 4.7
===== Item 5485/14800 =====
This is the title of this item
Info: some note
Test finished. Result FooBar. Time 7 secunds.
Stats: CPU 2.5 MEM 2.8"""
p = re.compile("^=====[^=]*=====\n(.*)$\nInfo: .*\n.*Result ([^\.]*)\.",
re.MULTILINE)
for m in re.finditer(p, data):
print "title:", m.group(1), "result:", m.group(2)er code here
If You need more info about regular expressions check: python docs.
This is sort of a continuation of maciejka's solution (see the comments there). If the data is in the file daniels.log, then we could go through it item by item with itertools.groupby, and apply a multi-line regexp to each item. This should scale fine.
import itertools, re
p = re.compile("Result ([^.]*)\.", re.MULTILINE)
for sep, item in itertools.groupby(file('daniels.log'),
lambda x: x.startswith('===== Item ')):
if not sep:
title = item.next().strip()
m = p.search(''.join(item))
if m:
print (title, m.group(1))
You could try something like this (in c-like pseudocode, since i don't know python):
string line=getline();
regex boundary="^==== [^=]+ ====$";
regex info="^Info: (.*)$";
regex test_data="Test ([^.]*)\. Result ([^.]*)\. Time ([^.]*)\.$";
regex stats="Stats: (.*)$";
while(!eof())
{
// sanity check
test line against boundary, if they don't match, throw excetion
string title=getline();
while(1)
{
// end the loop if we finished the data
if(eof()) break;
line=getline();
test line against boundary, if they match, break
test line against info, if they match, load the first matched group into "info"
test line against test_data, if they match, load the first matched group into "test_result", load the 2nd matched group into "result", load the 3rd matched group into "time"
test line against stats, if they match, load the first matched group into "statistics"
}
// at this point you can use the variables set above to do whatever with a line
// for example, you want to use title and, if set, test_result/result/time.
}
Parsing is not done using regex. If you have a reasonably well structured text (which it looks as you do), you can use faster testing (e.g. line.startswith() or such).
A list of dictionaries seems to be a suitable data type for such key-value pairs. Not sure what else to tell you. This seems pretty trivial.
OK, so the regexp way proved to be more suitable in this case:
import re
re.findall("=\n(.*)\n", s)
is faster than list comprehensions
[item.split('\n', 1)[0] for item in s.split('=\n')]
Here's what I got:
>>> len(s)
337000000
>>> test(get1, s) #list comprehensions
0:00:04.923529
>>> test(get2, s) #re.findall()
0:00:02.737103
Lesson learned.
Here's some not so good looking perl code that does the job. Perhaps you can find it useful in some way. Quick hack, there are other ways of doing it (I feel that this code needs defending).
#!/usr/bin/perl -w
#
# $Id$
#
use strict;
use warnings;
my #ITEMS;
my $item;
my $state = 0;
open(FD, "< data.txt") or die "Failed to open file.";
while (my $line = <FD>) {
$line =~ s/(\r|\n)//g;
if ($line =~ /^===== Item (\d+)\/\d+/) {
my $item_number = $1;
if ($item) {
# Just to make sure we don't have two lines that seems to be a headline in a row.
# If we have an item but haven't set the title it means that there are two in a row that matches.
die "Something seems to be wrong, better safe than sorry. Line $. : $line\n" if (not $item->{title});
# If we have a new item number add previuos item and create a new.
if ($item_number != $item->{item_number}) {
push(#ITEMS, $item);
$item = {};
$item->{item_number} = $item_number;
}
} else {
# First entry, don't have an item.
$item = {}; # Create new item.
$item->{item_number} = $item_number;
}
$state = 1;
} elsif ($state == 1) {
die "Data must start with a headline." if (not $item);
# If we already have a title make sure it matches.
if ($item->{title}) {
if ($item->{title} ne $line) {
die "Title doesn't match for item " . $item->{item_number} . ", line $. : $line\n";
}
} else {
$item->{title} = $line;
}
$state++;
} elsif (($state == 2) && ($line =~ /^Info:/)) {
# Just make sure that for state 2 we have a line that match Info.
$state++;
} elsif (($state == 3) && ($line =~ /^Test finished\. Result ([^.]+)\. Time \d+ secunds{0,1}\.$/)) {
$item->{status} = $1;
$state++;
} elsif (($state == 4) && ($line =~ /^Stats:/)) {
$state++; # After Stats we must have a new item or we should fail.
} else {
die "Invalid data, line $.: $line\n";
}
}
# Need to take care of the last item too.
push(#ITEMS, $item) if ($item);
close FD;
# Loop our items and print the info we stored.
for $item (#ITEMS) {
print $item->{item_number} . " (" . $item->{status} . ") " . $item->{title} . "\n";
}

Categories