python--import data from file and autopopulate a dictionary - python

I am a python newbie and am trying to accomplish the following.
A text file contains data in a slightly weird format and I was wondering whether there is an easy way to parse it and auto-fill an empty dictionary with the correct keys and values.
The data looks something like this
01> A B 2 ##01> denotes the line number, that's all
02> EWMWEM
03> C D 3
04> EWWMWWST
05> Q R 4
06> WESTMMMWW
So each pair of lines describe a full set of instructions for a robot arm. For lines 1-2 is for arm1, 3-4 is for arm 2, and so on. The first line states the location and the second line states the set of instructions (movement, changes in direction, turns, etc.)
What I am looking for is a way to import this text file, parse it properly, and populate a dictionary that will generate automatic keys. Note the file only contains value. This is why I am having a hard time. How do I tell the program to generate armX (where X is the ID from 1 to n) and assign a tuple (or a pair) to it such that the dictionary reads.
dict = {'arm1': ('A''B'2, EWMWEM) ...}
I am sorry if the newbie-ish vocab is redundant or unclear. Please let me know and I will be happy to clarify.
A commented code that is easy to understand will help me learn the concepts and motivation.
Just to provide some context. The point of the program is to load all the instructions and then execute the methods on the arms. So if you think there is a more elegant way to do it without loading all the instructions, please suggest.

def get_instructions_dict(instructions_file):
even_lines = []
odd_lines = []
with open(instructions_file) as f:
i = 1
for line in f:
# split the lines into id and command lines
if i % 2==0:
# command line
even_lines.append(line.strip())
else:
# id line
odd_lines.append(line.strip())
i += 1
# create tuples of (id, cmd) and zip them with armX ( armX, (id, command) )
# and combine them into a dict
result = dict( zip ( tuple("arm%s" % i for i in range(1,len(odd_lines)+1)),
tuple(zip(odd_lines,even_lines)) ) )
return result
>>> print get_instructions_dict('instructions.txt')
{'arm3': ('Q R 4', 'WESTMMMWW'), 'arm1': ('A B 2', 'EWMWEM'), 'arm2': ('C D 3', 'EWWMWWST')}
Note dict keys are not ordered. If that matters, use OrderedDict

I would do something like that:
mydict = {} # empty dict
buffer = ''
for line in open('myFile'): # open the file, read line by line
linelist = line.strip().replace(' ', '').split('>') # line 1 would become ['01', 'AB2']
if len(linelist) > 1: # eliminates empty lines
number = int(linelist[0])
if number % 2: # location line
buffer = linelist[1] # we keep this till we know the instruction
else:
mydict['arm%i' % number/2] = (buffer, linelist[1]) # we know the instructions, we write all to the dict

robot_dict = {}
arm_number = 1
key = None
for line in open('sample.txt'):
line = line.strip().replace("\n",'')
if not key:
location = line
key = 'arm' + str(arm_number) #setting key for dict
else:
instruction = line
robot_dict[key] = (location,line)
key = None #reset key
arm_number = arm_number + 1

Related

How to make a text file (name1:hobby1 name2:hobby2) into this (name1:hobby1, hobby2 name2:hobby1, hobby2)?

I'm new to programming and I need some help. I have a text file with lots of names and hobbies that looks something like this:
Jack:crafting
Peter:hiking
Wendy:gaming
Monica:tennis
Chris:origami
Sophie:sport
Monica:design
Some of the names and hobbies are repeated. I'm trying to make the program display something like this:
Jack: crafting, movies, yoga
Wendy: gaming, hiking, sport
This is my program so far, but the 4 lines from the end are incorrect.
def create_dictionary(file):
newlist = []
dict = {}
file = open("hobbies_database.txt", "r")
hobbies = file.readlines()
for rows in hobbies:
rows1 = rows.split(":")
k = rows1[0] # nimi
v = (rows1[1]).rstrip("\n") # hobi
dict = {k: v}
for k, v in dict.items():
if v in dict[k]:
In this case I would use defaultdict.
import sys
from collections import defaultdict
def create_dictionary(inputfile):
d = defaultdict(list)
for line in inputfile:
name, hobby = line.split(':', 1)
d[name].append(hobby.strip())
return d
with open(sys.argv[1]) as fp:
for name, hobbies in create_dictionary(fp).items():
print(name, ': ', sep='', end='')
print(*hobbies, sep=', ')
Your example give me this result:
Sophie: sport
Chris: origami
Peter: hiking
Jack: crafting
Wendy: gaming
Monica: tennis, design
you may try this one
data = map(lambda x:x.strip(), open('hobbies_database.txt'))
tmp = {}
for i in data:
k,v = i.strip().split(':')
if not tmp.get(k, []):
tmp[k] = []
tmp[k].append(v)
for k,v in tmp.iteritems():
print k, ':', ','.join(v)
output:
Monica : tennis,design
Jack : crafting
Wendy : gaming
Chris : origami
Sophie : sport
Peter : hiking
You could try something like this. I've deliberately rewritten this as I'm trying to show you how you would go about this in a more "Pythonic way". At least making use of the language a bit more.
For example, you can create arrays within dictionaries to represent the data more intuitively. It will then be easier to print the information out in the way you want.
def create_dictionary(file):
names = {} # create the dictionary to store your data
# using with statement ensures the file is closed properly
# even if there is an error thrown
with open("hobbies_database.txt", "r") as file:
# This reads the file one line at a time
# using readlines() loads the whole file into memory in one go
# This is far better for large data files that wont fit into memory
for row in file:
# strip() removes end of line characters and trailing white space
# split returns an array [] which can be unpacked direct to single variables
name, hobby = row.strip().split(":")
# this checks to see if 'name' has been seen before
# is there already an entry in the dictionary
if name not in names:
# if not, assign an empty array to the dictionary key 'name'
names[name] = []
# this adds the hobby seen in this line to the array
names[name].append(hobby)
# This iterates through all the keys in the dictionary
for name in names:
# using the string format function you can build up
# the output string and print it to the screen
# ",".join(array) will join all the elements of the array
# into a single string and place a comma between each
# set(array) creates a "list/array" of unique objects
# this means that if a hobby is added twice you will only see it once in the set
# names[name] is the list [] of hobby strings for that 'name'
print("{0}: {1}\n".format(name, ", ".join(set(names[name]))))
Hope this helps, and perhaps points you in the direction of a few more Python concepts. If you haven't been through the introductory tutorial yet... i'd definitely recommend it.

write a whole line in a .txt file if not in a .yaml file

I am trying to write in a text (download.txt) the lines from open.txt that there are not the same 'id' and there are not in excepcions (idexception, classexcepcion). I have got writing the 'ids' not repeated and the idexcepcion.
MY QUESTION is how to add the condition 'classexception', I tried it but it is impossible. Any idea about dictionaries/conditionals I have to use?
c = open('open.txt','r') #structure: name:xxx; id:xxxx; class:xxxx; name:xxx; id:xxxx;class:xxxx etc
t=c.read()
d=open('download.txt','a')
allLines = t.split("\n")
lines = {}
class=[s[10:-1] for s in t.split() if s.startswith("class")]
for line in allLines:
idPos = line.find("id:")
colPos = line.find(";",idPos)
if idPos > -1:
id = line[idPos+4: colPos if colPos > -1 else None]
if id not in idexception:
lines.setdefault(id,line)
for l in lines:
d.write(lines[l]+'\n')
c.close()
d.close()
Generally you are quite unclear but if I understand correctly here is my approach to your problem with a lot o comments inside:
import re
id_exceptions = ['id_ex_1', 'id_ex_2']
class_exceptions = ['class_ex_1', 'class_ex_2']
# Values to be written to dowload.txt file
# Since id's needs to be unique, structure of this dict should be like this:
# {[single key as value of an id]: {name: xxx, class: xxx}}
unique_values = dict()
# All files should be opened using 'with' statement
with open('open.txt') as source:
# Read whole file into one single long string
all_lines = source.read().replace('\n', '')
# Prepare regular expression for geting values from: name, id and class as a dict
# Read https://regex101.com/r/Kby3fY/1 for extra explanation what does it do
reg_exp = re.compile('name:(?<name>[a-zA-Z0-9_-]*);\sid:(?<id>[a-zA-Z0-9_-]*);\sclass:(?<class>[a-zA-Z0-9_-]*);')
# Read single long string and match to above regular expression
for match in reg_exp.finditer(all_lines):
# This will produce a single dict {'name': xxx, 'id': xxx, 'class': xxx}
single_line = match.groupdict()
# Now we will check againt all conditions at once and
# if they are not True we will add values as an unique id
if single_line['id'] not in unique_values or # Check if not present already
single_line['id'] not in id_exceptions or # Check is not in id exceptions
single_line['class'] not in class_exceptions: # Check is not in class exceptions
# Add unique id values
unique_values[single_line['id']] = {'name': single_line['name'],
'class': single_line['class']}
# Now we just need to write it to download.txt file
with open('download.txt', 'w') as destintion:
for key, value in all_lines.items(): # In Python 2.x use all_lines.iteritems()
line = "id:{}; name:{}; class:{}".format(key, value['name'], value['class'])

python newbie - where is my if/else wrong?

Complete beginner so I'm sorry if this is obvious!
I have a file which is name | +/- or IG_name | 0 in a long list like so -
S1 +
IG_1 0
S2 -
IG_S3 0
S3 +
S4 -
dnaA +
IG_dnaA 0
Everything which starts with IG_ has a corresponding name. I want to add the + or - to the IG_name. e.g. IG_S3 is + like S3 is.
The information is gene names and strand information, IG = intergenic region. Basically I want to know which strand the intergenic region is on.
What I think I want:
open file
for every line, if the line starts with IG_*
find the line with *
print("IG_" and the line it found)
else
print line
What I have:
with open(sys.argv[2]) as geneInfo:
with open(sys.argv[1]) as origin:
for line in origin:
if line.startswith("IG_"):
name = line.split("_")[1]
nname = name[:-3]
for newline in geneInfo:
if re.match(nname, newline):
print("IG_"+newline)
else:
print(line)
where origin is the mixed list and geneInfo has only the names not IG_names.
With this code I end up with a list containing only the else statements.
S1 +
S2 -
S3 +
S4 -
dnaA +
My problem is that I don't know what is wrong to search so I can (attempt) to fix it!
Below is some step-by-step annotated code that hopefully does what you want (though instead of using print I have aggregated the results into a list so you can actually make use of it). I'm not quite sure what happened with your existing code (especially how you're processing two files?)
s_dict = {}
ig_list = []
with open('genes.txt', 'r') as infile: # Simulating reading the file you pass in sys.argv
for line in infile:
if line.startswith('IG_'):
ig_list.append(line.split()[0]) # Collect all our IG values for later
else:
s_name, value = line.split() # Separate out the S value and its operator
s_dict[s_name] = value.strip() # Add to dictionary to map S to operator
# Now you can go back through your list of IG values and append the appropriate operator
pulled_together = []
for item in ig_list:
s_value = item.split('_')[1]
# The following will look for the operator mapped to the S value. If it is
# not found, it will instead give you 'not found'
corresponding_operator = s_dict.get(s_value, 'Not found')
pulled_together.append([item, corresponding_operator])
print ('List structure')
print (pulled_together)
print ('\n')
print('Printout of each item in list')
for item in pulled_together:
print(item[0] + '\t' + item[1])
nname = name[:-3]
Python's slicing through list is very powerful, but can be tricky to understand correctly.
When you write [:-3], you take everything except the last three items. The thing is, if you have less than three element in your list, it does not return you an error, but an empty list.
I think this is where things does not work, as there are not much elements per line, it returns you an empty list. If you could tell what do you exactly want it to return there, with an example or something, it would help a lot, as i don't really know what you're trying to get with your slicing.
Does this do what you want?
from __future__ import print_function
import sys
# Read and store all the gene info lines, keyed by name
gene_info = dict()
with open(sys.argv[2]) as gene_info_file:
for line in gene_info_file:
tokens = line.split()
name = tokens[0].strip()
gene_info[name] = line
# Read the other file and lookup the names
with open(sys.argv[1]) as origin_file:
for line in origin_file:
if line.startswith("IG_"):
name = line.split("_")[1]
nname = name[:-3].strip()
if nname in gene_info:
lookup_line = gene_info[nname]
print("IG_" + lookup_line)
else:
pass # what do you want to do in this case?
else:
print(line)

Python read .txt File -> list

I have a .txt File and I want to get the values in a list.
The format of the txt file should be:
value0,timestamp0
value1,timestamp1
...
...
...
In the end I want to get a list with
[[value0,timestamp0],[value1,timestamp1],.....]
I know it's easy to get these values by
direction = []
for line in open(filename):
direction,t = line.strip().split(',')
direction = float(direction)
t = long(t)
direction.append([direction,t])
return direction
But I have a big problem: When creating the data I forgot to insert a "\n" in each row.
Thats why I have this format:
value0, timestamp0value1,timestamp1value2,timestamp2value3.....
Every timestamp has exactly 13 characters.
Is there a way to get these data in a list as I want it? Would be very much work get the data again.
Thanks
Max
import re
input = "value0,0123456789012value1,0123456789012value2,0123456789012value3"
for (line, value, timestamp) in re.findall("(([^,]+),(.{13}))", input):
print value, timestamp
You will have to strip the last , but you can insert a comma after every 13 chars following a comma:
import re
s = "-0.1351197,1466615025472-0.25672746,1466615025501-0.3661744,1466615025531-0.4646‌​7665,1466615025561-0.5533287,1466615025591-0.63311553,1466615025621-0.7049236,146‌​6615025652-0.7695509,1466615025681-1.7158673,1466615025711-1.6896278,146661502574‌​1-1.65375,1466615025772-1.6092329,1466615025801"
print(re.sub("(?<=,)(.{13})",r"\1"+",", s))
Which will give you:
-0.1351197,1466615025472,-0.25672746,1466615025501,-0.3661744,1466615025531,-0.4646‌​7665,1466615025561,-0.5533287,1466615025591,-0.63311553,1466615025621,-0.7049236,146‌​6615025652-0.7695509,1466615025681,-1.7158673,1466615025711,-1.6896278,146661502574‌​1-1.65375,1466615025772,-1.6092329,1466615025801,
I coded a quickie using your example, and not using 13 but len("timestamp") so you can adapt
instr = "value,timestampvalue2,timestampvalue3,timestampvalue4,timestamp"
previous_i = 0
for i,c in enumerate(instr):
if c==",":
next_i = i+len("timestamp")+1
print(instr[previous_i:next_i])
previous_i = next_i
output is descrambled:
value,timestamp
value2,timestamp
value3,timestamp
value4,timestamp
I think you could do something like this:
direction = []
for line in open(filename):
list = line.split(',')
v = list[0]
for s in list[1:]:
t = s[:13]
direction.append([float(v), long(t)])
v = s[13:]
If you're using python 3.X, then the long function no longer exists -- use int.

Compare configuration data text with a default data text

I am in the process of understanding how to compare data from two text files and print the data that does not match into a new document or output.
The Program Goal:
Allow the user to compare the data in a file that contains many lines of data with a default file that has the correct values of the data.
Compare multiple lines of different data with the same parameters against a default list of the data with the same parameters
Example:
Lets say I have the following text document that has these parameters and data:
Lets call it Config.txt:
<231931844151>
Bird = 3
Cat = 4
Dog = 5
Bat = 10
Tiger = 11
Fish = 16
<92103884812>
Bird = 4
Cat = 40
Dog = 10
Bat = Null
Tiger = 19
Fish = 24
etc. etc.
Let's call this my Configuration data, now I need to make sure that the values these parameters inside my Config Data file are correct.
So I have a default data file that has the correct values for these parameters/variables. Lets call it Default.txt
<Correct Parameters>
Bird = 3
Cat = 40
Dog = 10
Bat = 10
Tiger = 19
Fish = 234
This text file is the default configuration or the correct configuration for the data.
Now I want to compare these two files and print out the data that is incorrect.
So, in theory, if I were to compare these two text document I should get an output of the following: Lets call this Output.txt
<231931844151>
Cat = 4
Dog = 5
Tiger = 11
Fish = 16
<92103884812>
Bird = 4
Bat = Null
Fish = 24
etc. etc.
Since these are the parameters that are incorrect or do not match. So in this case we see that for <231931844151> the parameters Cat, Dog, Tiger, and Fish did not match the default text file so those get printed. In the case of <92103884812> Bird, Bat, and Fish do not match the default parameters so those get printed.
So that's the gist of it for now.
Code:
Currently this is my approach I am trying to do however I'm not sure how I can compare a data file that has different sets of lines with the same parameters to a default data file.
configFile = open("Config.txt", "rb")
defaultFile = open("Default.txt", "rb")
with open(configFile) as f:
dataConfig = f.read().splitlines()
with open(defaultFile) as d:
dataDefault = d.read().splitlines()
def make_dict(data):
return dict((line.split(None, 1)[0], line) for line in data)
defdict = make_dict(dataDefault)
outdict = make_dict(dataConfig)
#Create a sorted list containing all the keys
allkeys = sorted(set(defdict) | set(outdict))
#print allkeys
difflines = []
for key in allkeys:
indef = key in defdict
inout = key in outdict
if indef and not inout:
difflines.append(defdict[key])
elif inout and not indef:
difflines.append(outdict[key])
else:
#key must be in both dicts
defval = defdict[key]
outval = outdict[key]
if outval != defval:
difflines.append(outval)
for line in difflines:
print line
Summary:
I want to compare two text documents that have data/parameters in them, One text document will have a series of data with the same parameters while the other will have just one series of data with the same parameters. I need to compare those parameters and print out the ones that do not match the default. How can I go about doing this in Python?
EDIT:
Okay so thanks to #Maria 's code I think I am almost there. Now I just need to figure out how to compare the dictionary to the list and print out the differences. Here's an example of what I am trying to do:
for i in range (len(setNames)):
print setNames[i]
for k in setData[i]:
if k in dataDefault:
print dataDefault
obvious the print line is just there to see if it worked or not but I'm not sure if this is the proper way about going through this.
Sample code for parsing the file into separate dictionaries. This works by finding the group separators (blank lines). setNames[i] is the name of the set of parameters in the dictionary at setData[i]. Alternatively you can create an object which has a string name member and a dictionary data member and keep a list of those. Doing the comparisons and outputting it how you want is up to you, this just regurgitates the input file to the command line in a slightly different format.
# The function you wrote
def make_dict(data):
return dict((line.split(None, 1)[0], line) for line in data)
# open the file and read the lines into a list of strings
with open("Config.txt" , "rb") as f:
dataConfig = f.read().splitlines()
# get rid of trailing '', as they cause problems and are unecessary
while (len(dataConfig) > 0) and (dataConfig[len(dataConfig) - 1] == ''):
dataConfig.pop()
# find the indexes of all the ''. They amount to one index past the end of each set of parameters
setEnds = []
index = 0
while '' in dataConfig[index:]:
setEnds.append(dataConfig[index:].index('') + index)
index = setEnds[len(setEnds) - 1] + 1
# separate out your input into separate dictionaries, and keep track of the name of each dictionary
setNames = []
setData = []
i = 0;
j = 0;
while j < len(setEnds):
setNames.append(dataConfig[i])
setData.append(make_dict(dataConfig[i+1:setEnds[j]]))
i = setEnds[j] + 1
j += 1
# handle the last index to the end of the list. Alternativel you could add len(dataConfig) to the end of setEnds and you wouldn't need this
if len(setEnds) > 0:
setNames.append(dataConfig[i])
setData.append(make_dict(dataConfig[i+1:]))
# regurgitate the input to prove it worked the way you wanted.
for i in range(len(setNames)):
print setNames[i]
for k in setData[i]:
print "\t" + k + ": " + setData[i][k];
print ""
Why not just use those dicts and loop through them to compare?
for keys in outdict:
if defdict.get(keys):
print outdict.get(keys)

Categories