I have a data.dat file with this format:
REAL PART
FREQ 1.6 5.4 2.1 13.15 13.15 17.71
FREQ 51.64 51.64 82.11 133.15 133.15 167.71
.
.
.
IMAGINARY PART
FREQ 51.64 51.64 82.12 132.15 129.15 161.71
FREQ 5.64 51.64 83.09 131.15 120.15 160.7
.
.
.
REAL PART
FREQ 1.6 5.4 2.1 13.15 15.15 17.71
FREQ 51.64 57.64 82.11 183.15 133.15 167.71
.
.
.
IMAGINARY PART
FREQ 53.64 53.64 81.12 132.15 129.15 161.71
FREQ 5.64 55.64 83.09 131.15 120.15 160.7
All over the document REAL and IMAGINARY blocks are reported
Within the REAL PART block,
I would like to split each line that starts with FREQ.
I have managed to:
1) split lines and extract the value of FREQ and
2) append this result to a list of lists, and
3) create a final list, All_frequencies:
FREQ = []
fname ='data.dat'
f = open(fname, 'r')
for line in f:
if line.startswith(' FREQ'):
FREQS = line.split()
FREQ.append(FREQS)
print 'Final FREQ = ', FREQ
All_frequencies = list(itertools.chain.from_iterable(FREQ))
print 'All_frequencies = ', All_frequencies
The problem with this code is that it also extracts the IMAGINARY PART values of FREQ. Only the REAL PART values of FREQ would have to be extracted.
I have tried to make something like:
if line.startswith('REAL PART'):
if line.startswith('IMAGINARY PART'):
code...
or:
if line.startswith(' REAL') and line.startswith(' FREQ'):
code...
But this does not work. I would appreciate if you could help me
It appears based on the sample data in the question that lines starting with 'REAL' or 'IMAGINARY' don't have any data on them, they just mark the beginning of a block. If that's the case (and you don't go changing the question again), you just need to keep track of which block you're in. You can also use yield instead of building up an ever-larger list of frequencies, as long as this code is in a function.
def read_real_parts(fname):
f = open(fname, 'r')
real_part = False
for line in f:
if line.startswith(' REAL'):
real_part = True
elif line.startswith(' IMAGINARY'):
real_part = False
elif line.startswith(' FREQ') and real_part:
FREQS = line.split()
yield FREQS
FREQ = read_real_parts('data.dat') #this gives you a generator
All_frequencies = list(itertools.chain.from_iterable(FREQ)) #then convert to list
Think of this as a state machine having two states. In one state, when the program has read a line with REAL at the beginning it goes into the REAL state and aggregates values. When it reads a line with IMAGINARY it goes into the alternate state and ignores values.
REAL, IMAGINARY = 1,2
FREQ = []
fname = 'data.dat'
f = open(fname)
state = None
for line in f:
line = line.strip()
if not line: continue
if line.startswith('REAL'):
state = REAL
continue
elif line.startswith('IMAGINARY'):
state = IMAGINARY
continue
else:
pass
if state == IMAGINARY:
continue
freqs = line.split()[1:]
FREQ.extend(freqs)
I assume that you want only the numeric values; hence the [:1] at the end of the assignment to freqs near the end of the script.
Using your data file, without the ellipsis lines, produces the following result in FREQ:
['1.6', '5.4', '2.1', '13.15', '13.15', '17.71', '51.64', '51.64', '82.11', '133.15', '133.15', '167.71', '1.6', '5.4', '2.1', '13.15', '15.15', '17.71', '51.64', '57.64', '82.11', '183.15', '133.15', '167.71']
You would need to keep track of which part you are looking at, so you can use a flag to do this:
section = None #will change to either "real" or "imag"
for line in f:
if line.startswith("IMAGINARY PART"):
section = "imag"
elif line.startswith('REAL PART'):
section = "real"
else:
freqs = line.split()
if section == "real":
FREQ.append(freqs)
#elif section == "imag":
# IMAG_FREQ.append(freqs)
by the way, instead of appending to FREQ then needing to use itertools.chain.from_iterable you might consider just extending FREQ instead.
we start with a flag set to False. if we find a line that contains "REAL", we set it to True to start copying the data below the REAL part, until we find a line that contains IMAGINARY, which sets the flag to False and goes to the next line until another "REAL" is found (and hence the flag turns back to True)
using the flag concept in a simple way:
with open('this.txt', 'r') as content:
my_lines = content.readlines()
f=open('another.txt', 'w')
my_real_flag = False
for line in my_lines:
if "REAL" in line:
my_real_flag = True
elif "IMAGINARY" in line:
my_real_flag = False
if my_real_flag:
#do code here because we found real frequencies
f.write(line)
else:
continue #because my_real_flag isn't true, so we must have found a
f.close()
this.txt looks like this:
REAL
1
2
3
IMAGINARY
4
5
6
REAL
1
2
3
IMAGINARY
4
5
6
another.txt ends up looking like this:
REAL
1
2
3
REAL
1
2
3
Original answer that only works when there is one REAL section
If the file is "small" enough to be read as an entire string and there is only one instance of "IMAGINARY PART", you can do this:
file_str = file_str.split("IMAGINARY PART")[0]
which would get you everything above the "IMAGINARY PART" line.
You can then apply the rest of your code to this file_str string that contains only the real part
to elaborate more, file_str is a str which is obtained by the following:
with open('data.dat', 'r') as my_data:
file_str = my_data.read()
the "with" block is referenced all over stack exchange, so there may be a better explanation for it than mine. I intuitively think about it as "open a file named 'data.dat' with the ability to only read it and name it as the variable my_data. once its opened, read the entirety of the file into a str, file_str, using my_data.read(), then close 'data.dat' "
now you have a str, and you can apply all the applicable str functions to it.
if "IMAGINARY PART" happens frequently throughout the file or the file is too big, Tadgh's suggestion of a flag a break works well.
for line in f:
if "IMAGINARY PART" not in line:
#do stuff
else:
f.close()
break
Related
I have found so much information from previous search on this website but I seem to be stuck on the following issue.
I have two text files that looks like this
Inter.txt ( n-lines but only showed 4 lines,you get the idea)
7275
30000
6693
855
....
rules.txt (2n-lines)
7275
8500
6693
7555
....
3
1000
8
5
....
I want to compare the first line of Inter.txt with rules.txt and in case of a match, I jump for n-lines in order to get the score of that line. (E.g. with 7275, there is a match, I jump n to get the score 3)
I produced the following code but for some reasons, I only have the ouput of the first line when I should have one for each match from my first file. With the previous example, I should have 8 as an output for 6693.
import linecache
inter = open("Inter.txt", "r")
rules = open("rules.txt", "r")
iScore = 0
jump = 266
i=0
for lineInt in inter:
#i = i+1
#print(i)
for lineRul in rules:
i = i+1
#print(i)
if lineInt == lineRul:
print("Match")
inc = linecache.getline("rules.txt", i + jump)
#print(inc)
iScore = iScore + int(inc)
print(iScore)
#break
else:
continue
All the print(i) are there because I checked that all the lines were read. I am a novice in Python.
To sum up, I don't understand why I only have one output. Thanks in advance !
Ok, I think the main thing that blocks you from getting forward is that the for loops on files gets the pointer to the end of the file, and doesn't resets when you starts the loops again.
So when you only open rules.txt once, and uses its intance in the inner loop it only goes through all the lines at the first iteration of the outer loop, the second time it tries to go over the remains lines, which are non.
The solution is to close and open the file outside the inner loop.
This code worked for me.
import linecache
inter = open("Inter.txt", "r")
iScore = 0
jump = 4
for lineInt in inter:
i=0
#i = i+1
#print(i)
rules = open("rules.txt", "r")
for lineRul in rules:
i = i+1
#print(i)
if lineInt == lineRul:
print("Match")
inc = linecache.getline("rules.txt", i + jump)
#print(inc)
iScore = iScore + int(inc)
print(iScore)
#break
else:
continue
rules.close()
I also moved where you set the i to 0 to the beginning of the outer loop, but I guess you'd find it yourself.
And I changed jump to 4 to fit the example files your gave :p
Can you please try this solution:
def get_rules_values(rules_file):
with open(rules_file, "r") as rules:
return map(int, rules.readlines())
def get_rules_dict(rules_values):
return dict(zip(rules_values[:len(rules_values)/2], rules_values[len(rules_values)/2:]))
def get_inter_values(inter_file):
with open(inter_file, "r") as inter:
return map(int, inter.readlines())
rules_dict = get_rules_dict(get_rules_values("rules.txt"))
inter_values = get_inter_values("inter.txt")
for inter_value in inter_values:
print inter_value, rules_dict[inter_value]
Hope it's working for you!
I have two files A and B in FASTQ format, which are basically several hundred million lines of text organized in groups of 4 lines starting with an # as follows:
#120412_SN549_0058_BD0UMKACXX:5:1101:1156:2031#0/1
GCCAATGGCATGGTTTCATGGATGTTAGCAGAAGACATGAGACTTCTGGGACAGGAGCAAAACACTTCATGATGGCAAAAGATCGGAAGAGCACACGTCTGAACTCN
+120412_SN549_0058_BD0UMKACXX:5:1101:1156:2031#0/1
bbbeee_[_ccdccegeeghhiiehghifhfhhhiiihhfhghigbeffeefddd]aegggdffhfhhihbghhdfffgdb^beeabcccabbcb`ccacacbbccB
I need to compare the
5:1101:1156:2031#0/
part between files A and B and write the groups of 4 lines in file B that matched to a new file. I got a piece of code in python that does that, but only works for small files as it parses through the entire #-lines of file B for every #-line in file A, and both files contain hundreds of millions of lines.
Someone suggested that I should create an index for file B; I have googled around without success and would be very grateful if someone could point out how to do this or let me know of a tutorial so I can learn. Thanks.
==EDIT==
In theory each group of 4 lines should only exist once in each file. Would it increase the speed enough if breaking the parsing after each match or do I need a different algorithm altogether?
An index is just a shortened version of the information you are working with. In this case, you will want the "key" - the text between the first colon(':') on the #-line and the final slash('/') near the end - as well as some kind of value.
Since the "value" in this case is the entire contents of the 4-line block, and since our index is going to store a separate entry for each block, we would be storing the entire file in memory if we used the actual value in the index.
Instead, let's use the file position of the beginning of the 4-line block. That way, you can move to that file position, print 4 lines, and stop. Total cost is the 4 or 8 or however many bytes it takes to store an integer file position, instead of however-many bytes of actual genome data.
Here is some code that does the job, but also does a lot of validation and checking. You might want to throw stuff away that you don't use.
import sys
def build_index(path):
index = {}
for key, pos, data in parse_fastq(path):
if key not in index:
# Don't overwrite duplicates- use first occurrence.
index[key] = pos
return index
def error(s):
sys.stderr.write(s + "\n")
def extract_key(s):
# This much is fairly constant:
assert(s.startswith('#'))
(machine_name, rest) = s.split(':', 1)
# Per wikipedia, this changes in different variants of FASTQ format:
(key, rest) = rest.split('/', 1)
return key
def parse_fastq(path):
"""
Parse the 4-line FASTQ groups in path.
Validate the contents, somewhat.
"""
f = open(path)
i = 0
# Note: iterating a file is incompatible with fh.tell(). Fake it.
pos = offset = 0
for line in f:
offset += len(line)
lx = i % 4
i += 1
if lx == 0: # #machine: key
key = extract_key(line)
len1 = len2 = 0
data = [ line ]
elif lx == 1:
data.append(line)
len1 = len(line)
elif lx == 2: # +machine: key or something
assert(line.startswith('+'))
data.append(line)
else: # lx == 3 : quality data
data.append(line)
len2 = len(line)
if len2 != len1:
error("Data length mismatch at line "
+ str(i-2)
+ " (len: " + str(len1) + ") and line "
+ str(i)
+ " (len: " + str(len2) + ")\n")
#print "Yielding #%i: %s" % (pos, key)
yield key, pos, data
pos = offset
if i % 4 != 0:
error("EOF encountered in mid-record at line " + str(i));
def match_records(path, index):
results = []
for key, pos, d in parse_fastq(path):
if key in index:
# found a match!
results.append(key)
return results
def write_matches(inpath, matches, outpath):
rf = open(inpath)
wf = open(outpath, 'w')
for m in matches:
rf.seek(m)
wf.write(rf.readline())
wf.write(rf.readline())
wf.write(rf.readline())
wf.write(rf.readline())
rf.close()
wf.close()
#import pdb; pdb.set_trace()
index = build_index('afile.fastq')
matches = match_records('bfile.fastq', index)
posns = [ index[k] for k in matches ]
write_matches('afile.fastq', posns, 'outfile.fastq')
Note that this code goes back to the first file to get the blocks of data. If your data is identical between files, you would be able to copy the block from the second file when a match occurs.
Note also that depending on what you are trying to extract, you may want to change the order of the output blocks, and you may want to make sure that the keys are unique, or perhaps make sure the keys are not unique but are repeated in the order they match. That's up to you - I'm not sure what you're doing with the data.
these guys claim to parse a few gigs file while using a dedicated library, see http://www.biostars.org/p/15113/
fastq_parser = SeqIO.parse(fastq_filename, "fastq")
wanted = (rec for rec in fastq_parser if ...)
SeqIO.write(wanted, output_file, "fastq")
a better approach IMO would be to parse it once and load the data to some database instead of that output_file (i.e mysql) and latter run the queries there
This seems fairly trivial but I can't seem to work it out
I have a text file with the contents:
B>F
I am reading this with the code below, stripping the '>' and trying to convert the strings into their corresponding ASCII value, minus 65 to give me a value that will correspond to another list index
def readRoute():
routeFile = open('route.txt', 'r')
for line in routeFile.readlines():
route = line.strip('\n' '\r')
route = line.split('>')
#startNode, endNode = route
startNode = ord(route[0])-65
endNode = ord(route[1])-65
# Debug (this comment was for my use to explain below the print values)
print 'Route Entered:'
print line
print startNode, ',', endNode, '\n'
return[startNode, endNode]
However I am having slight trouble doing the conversion nicely, because the text file only contains one line at the moment but ideally I need it to be able to support more than one line and run an amount of code for each line.
For example it could contain:
B>F
A>D
C>F
E>D
So I would want to run the same code outside this function 4 times with the different inputs
Anyone able to give me a hand
Edit:
Not sure I made my issue that clear, sorry
What I need it do it parse the text file (possibly containing one line or multiple lines like above. I am able to do it for one line with the lines
startNode = ord(route[0])-65
endNode = ord(route[1])-65
But I get errors when trying to do more than one line because the ord() is expecting different inputs
If I have (below) in the route.txt
B>F
A>D
This is the error it gives me:
line 43, in readRoute endNode = ord(route[1])-65
TypeError: ord() expected a character, but string of length 2 found
My code above should read the route.txt file and see that B>F is the first route, strip the '>' - convert the B & F to ASCII, so 66 & 70 respectively then minus 65 from both to give 1 & 5 (in this example)
The 1 & 5 are corresponding indexes for another "array" (list of lists) to do computations and other things on
Once the other code has completed it can then go to the next line in route.txt which could be A>D and perform the above again
Perhaps this will work for you. I turned the fileread into a generator so you can do as you please with the parsed results in the for-i loop.
def readRoute(file_name):
with open(file_name, 'r') as r:
for line in r:
yield (ord(line[0])-65, ord(line[2])-65)
filename = 'route.txt'
for startnode, endnode in readRoute(filename):
print startnode, endnode
If you can't change readRoute, change the contents of the file before each call. Better yet, make readRoute take the filename as a parameter (default it to 'route.txt' to preserve the current behavior) so you can have it process other files.
What about something like this? It takes the routes defined in your file and turns them into path objects with start and end member variables. As an added bonus PathManager.readFile() allows you to load multiple route files without overwriting the existing paths.
import re
class Path:
def __init__(self, start, end):
self.start = ord(start) - 65 # Scale the values as desired
self.end = ord(end) - 65 # Scale the values as desired
class PathManager:
def __init__(self):
self.expr = re.compile("^([A-Za-z])[>]([A-Za-z])$") # looks for string "C>C"
# where C is a char
self.paths = []
def do_logic_routine(self, start, end):
# Do custom logic here that will execute before the next line is read
# Return True for 'continue reading' or False to stop parsing file
return True
def readFile(self, path):
file = open(path,"r")
for line in file:
item = self.expr.match(line.strip()) # strip whitespaces before parsing
if item:
'''
item.group(0) is *not* used here; it matches the whole expression
item.group(1) matches the first parenthesis in the regular expression
item.group(2) matches the second
'''
self.paths.append(Path(item.group(1), item.group(2)))
if not do_logic_routine(self.paths[-1].start, self.paths[-1].end):
break
# Running the example
MyManager = PathManager()
MyManager.readFile('route.txt')
for path in MyManager.paths:
print "Start: %s End: %s" % (path.start, path.end)
Output is:
Start: 1 End: 5
Start: 0 End: 3
Start: 2 End: 5
Start: 4 End: 3
Hey there, I have a rather large file that I want to process using Python and I'm kind of stuck as to how to do it.
The format of my file is like this:
0 xxx xxxx xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
1 xxx xxxx xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
So I basically want to read in the chunk up from 0-1, do my processing on it, then move on to the chunk between 1 and 2.
So far I've tried using a regex to match the number and then keep iterating, but I'm sure there has to be a better way of going about this. Any suggestion/info would be greatly appreciated.
If they are all within the same line, that is there are no line breaks between "1." and "2." then you can iterate over the lines of the file like this:
for line in open("myfile.txt"):
#do stuff
The line will be disposed of and overwritten at each iteration meaning you can handle large file sizes with ease. If they're not on the same line:
for line in open("myfile.txt"):
if #regex to match start of new string
parsed_line = line
else:
parsed_line += line
and the rest of your code.
Why don't you just read the file char by char using file.read(1)?
Then, you could - in each iteration - check whether you arrived at the char 1. Then you have to make sure that storing the string is fast.
If the "N " can only start a line, then why not use use the "simple" solution? (It sounds like this already being done, I am trying to reinforce/support it ;-))
That is, just reading a line at a time, and build up the data representing the current N object. After say N=0, and N=1 are loaded, process them together, then move onto the next pair (N=2, N=3). The only thing that is even remotely tricky is making sure not to throw out a read line. (The line read that determined the end condition -- e.g. "N " -- also contain the data for the next N).
Unless seeking is required (or IO caching is disabled or there is an absurd amount of data per item), there is really no reason not to use readline AFAIK.
Happy coding.
Here is some off-the-cuff code, which likely contains multiple errors. In any case, it shows the general idea using a minimized side-effect approach.
# given an input and previous item data, return either
# [item_number, data, next_overflow] if another item is read
# or None if there are no more items
def read_item (inp, overflow):
data = overflow or ""
# this can be replaced with any method to "read the header"
# the regex is just "the easiest". the contract is just:
# given "N ....", return N. given anything else, return None
def get_num(d):
m = re.match(r"(\d+) ", d)
return int(m.groups(1)) if m else None
for line in inp:
if data and get_num(line) ne None:
# already in an item (have data); current line "overflows".
# item number is still at start of current data
return [get_num(data), data, line]
# not in item, or new item not found yet
data += line
# and end of input, with data. only returns above
# if a "new" item was encountered; this covers case of
# no more items (or no items at all)
if data:
return [get_num(data), data, None]
else
return None
And usage might be akin to the following, where f represents an open file:
# check for error conditions (e.g. None returned)
# note feed-through of "overflow"
num1, data1, overflow = read_item(f, None)
num2, data2, overflow = read_item(f, overflow)
If the format is fixed, why not just read 3 lines at a time with readline()
If the file is small, you could read the whole file in and split() on number digits (might want to use strip() to get rid of whitespace and newlines), then fold over the list to process each string in the list. You'll probably have to check that the resultant string you are processing on is not initially empty in case two digits were next to each other.
If the file's content can be loaded in memory, and that's what you answered, then the following code (needs to have filename defined) may be a solution.
import re
regx = re.compile('^((\d+).*?)(?=^\d|\Z)',re.DOTALL|re.MULTILINE)
with open(filename) as f:
text = f.read()
def treat(inp,regx=regx):
m1 = regx.search(inp)
numb,chunk = m1.group(2,1)
li = [chunk]
for mat in regx.finditer(inp,m1.end()):
n,ch = mat.group(2,1)
if int(n) == int(numb) + 1:
yield ''.join(li)
numb = n
li = []
li.append(ch)
chunk = ch
yield ''.join(li)
for y in treat(text):
print repr(y)
This code, run on a file containing :
1 mountain
orange 2
apple
produce
2 gas
solemn
enlightment
protectorate
3 grimace
song
4 snow
wheat
51 guludururu
kelemekinonoto
52asabi dabada
5 yellow
6 pink
music
air
7 guitar
blank 8
8 Canada
9 Rimini
produces:
'1 mountain\norange 2\napple\nproduce\n'
'2 gas\nsolemn\nenlightment\nprotectorate\n'
'3 grimace\nsong\n'
'4 snow\nwheat\n51 guludururu\nkelemekinonoto\n52asabi dabada\n'
'5 yellow\n'
'6 pink \nmusic\nair\n'
'7 guitar\nblank 8\n'
'8 Canada\n'
'9 Rimini'
I'm trying to test for a /t or a space character and I can't understand why this bit of code won't work. What I am doing is reading in a file, counting the loc for the file, and then recording the names of each function present within the file along with their individual lines of code. The bit of code below is where I attempt to count the loc for the functions.
import re
...
else:
loc += 1
for line in infile:
line_t = line.lstrip()
if len(line_t) > 0 \
and not line_t.startswith('#') \
and not line_t.startswith('"""'):
if not line.startswith('\s'):
print ('line = ' + repr(line))
loc += 1
return (loc, name)
else:
loc += 1
elif line_t.startswith('"""'):
while True:
if line_t.rstrip().endswith('"""'):
break
line_t = infile.readline().rstrip()
return(loc,name)
Output:
Enter the file name: test.txt
line = '\tloc = 0\n'
There were 19 lines of code in "test.txt"
Function names:
count_loc -- 2 lines of code
As you can see, my test print for the line shows a /t, but the if statement explicitly says (or so I thought) that it should only execute with no whitespace characters present.
Here is my full test file I have been using:
def count_loc(infile):
""" Receives a file and then returns the amount
of actual lines of code by not counting commented
or blank lines """
loc = 0
for line in infile:
line = line.strip()
if len(line) > 0 \
and not line.startswith('//') \
and not line.startswith('/*'):
loc += 1
func_loc, func_name = checkForFunction(line);
elif line.startswith('/*'):
while True:
if line.endswith('*/'):
break
line = infile.readline().rstrip()
return loc
if __name__ == "__main__":
print ("Hi")
Function LOC = 15
File LOC = 19
\s is only whitespace to the re package when doing pattern matching.
For startswith, an ordinary method of ordinary strings, \s is nothing special. Not a pattern, just characters.
Your question has already been answered and this is slightly off-topic, but...
If you want to parse code, it is often easier and less error-prone to use a parser. If your code is Python code, Python comes with a couple of parsers (tokenize, ast, parser). For other languages, you can find a lot of parsers on the internet. ANTRL is a well-known one with Python bindings.
As an example, the following couple of lines of code print all lines of a Python module that are not comments and not doc-strings:
import tokenize
ignored_tokens = [tokenize.NEWLINE,tokenize.COMMENT,tokenize.N_TOKENS
,tokenize.STRING,tokenize.ENDMARKER,tokenize.INDENT
,tokenize.DEDENT,tokenize.NL]
with open('test.py', 'r') as f:
g = tokenize.generate_tokens(f.readline)
line_num = 0
for a_token in g:
if a_token[2][0] != line_num and a_token[0] not in ignored_tokens:
line_num = a_token[2][0]
print(a_token)
As a_token above is already parsed, you can easily check for function definition, too. You can also keep track where the function ends by looking at the current column start a_token[2][1]. If you want to do more complex things, you should use ast.
You string literals aren't what you think they are.
You can specify a space or TAB like so:
space = ' '
tab = '\t'