Trying to read in an external file into a dictionary [duplicate]

Trying to read in an external file into a dictionary [duplicate] - python

This question already has an answer here:
How to read in a file into a dictionary
(1 answer)
Closed 8 years ago.
OK I am trying to read in an external file into a dictionary however I am receiving some syntax errors. The clues which get read in then have to replace the letters which they pair with in the list of coded words
My code for reading into a dictionary and replacing the symbols is as follows.
d = {}
def read_clues(clues):
global d
with open("hey.txt") as f:
for line in f:
(key, val) = line[1], line[0]
d[key] = val
def replace_symbols(clues, words):
global d
for word in range(len(words)):
for key, value in d.items():
words[word] = words[word].replace(key, value)
In the main part of my program I have the code for calling the replace_symbols. However I am getting a syntax error after print key, in the last line. The code for this is shown below.
#REPLACES LETTERS
print("======== The clues have been replaced ===========")
replace_symbols(clues, words)
for key, value in d.items():
print key, value // This will print the symbols and letters

Assuming that hey.txt has the keys and values separated by a space, the following code should work:
def read_clues(clues):
global d
with open("hey.txt") as f:
for line in f:
stuff = line.split(" ") #split each line into parts
(key, val) = stuff[1], stuff[0]
d[key] = val
If the separator is other than a space, just include it as an argument to split().

There are some other problems in your code, but since you're asking about the syntax error, it's almost certainly this line:
print key, value // This will print the symbols and letters
First, // does not mean "comment" in Python, it means "integer division". So, you're asking it to divide value by This (which would probably raise a NameError, because it's unlikely you have anything named This in your code), and then including a bunch of other identifiers starting with will. A string of two identifiers in a row isn't valid syntax.
How do you write a comment in Python? Use #, not //:
print key, value # This will print the symbols and letters
Second, if you're using Python 3.x, print is a normal function, like anything else, so its arguments have to go in parentheses, like all of your other function calls. (And given the print call a few lines up, I'm willing to bet you are using Python 3.x.) Most likely you've copied this from some code for Python 2.x. There are some important differences between Python 2 and 3, which means that not all code for Python 2 can be copied and pasted into your Python 3. And this is one of the cases where it doesn't work. So:
print(key, value) # This will print the symbols and letters
But don't make that second change if you're using Python 2.x; otherwise, you'll just end up printing a tuple instead of two strings separated by a space. (For example, print 1, 2 prints 1 2, but print(1, 2) prints (1, 2).)

Related

What causes this return() to create a SyntaxError?

I need this program to create a sheet as a list of strings of ' ' chars and distribute text strings (from a list) into it. I have already coded return statements in python 3 but this one keeps giving
return(riplns)
^
SyntaxError: invalid syntax
It's the return(riplns) on line 39. I want the function to create a number of random numbers (randint) inside a range built around another randint, coming from the function ripimg() that calls this one.
I see clearly where the program declares the list I want this return() to give me. I know its type. I see where I feed variables (of the int type) to it, through .append(). I know from internet research that SyntaxErrors on python's return() functions usually come from mistype but it doesn't seem the case.
#loads the asciified image ("/home/userX/Documents/Programmazione/Python projects/imgascii/myascify/ascimg4")
#creates a sheet "foglio1", same number of lines as the asciified image, and distributes text on it on a randomised line
#create the sheet foglio1
def create():
ref = open("/home/userX/Documents/Programmazione/Python projects/imgascii/myascify/ascimg4")
charcount = ""
field = []
for line in ref:
for c in line:
if c != '\n':
charcount += ' '
if c == '\n':
charcount += '*' #<--- YOU GONNA NEED TO MAKE THIS A SPACE IN A FOLLOWING FUNCTION IN THE WRITER.PY PROGRAM
for i in range(50):#<------- VALUE ADJUSTMENT FROM WRITER.PY GOES HERE(default : 50):
charcount += ' '
charcount += '\n'
break
for line in ref:
field.append(charcount)
return(field)
#turn text in a list of lines and trasforms the lines in a list of strings
def poemln():
txt = open("/home/gcg/Documents/Programmazione/Python projects/imgascii/writer/poem")
arrays = []
for line in txt:
arrays.append(line)
txt.close()
return(arrays)
#rander is to be called in ripimg()
def rander(rando, fldepth):
riplns = []
for i in range(fldepth):
riplns.append(randint((rando)-1,(rando)+1)
return(riplns) #<---- THIS RETURN GIVES SyntaxError upon execution
#opens a rip on the side of the image.
def ripimg():
upmost = randint(160, 168)
positions = []
fldepth = 52 #<-----value is manually input as in DISTRIB function.
positions = rander(upmost,fldepth)
return(positions)
I omitted the rest of the program, I believe these functions are enough to get the idea, please tell me if I need to add more.

You have incomplete set of previous line's parenthesis .
In this line:-
riplns.append(randint((rando)-1,(rando)+1)
You have to add one more brace at the end. This was causing error because python was reading things continuously and thought return statement to be a part of previous uncompleted line.

How to replace letters with numbers and re-convert at anytime (Caesar cipher)?

I've been coding this for almost 2 days now but cant get it. I've coded two different bits trying to find it.
Code #1
So this one will list the letters but wont change it to the numbers (a->1, b->2, ect)
import re
text = input('Write Something- ')
word = '{}'.format(text)
for letter in word:
print(letter)
#lists down
Outcome-
Write something- test
t
e
s
t
Then I have this code that changes the letters into numbers, but I haven't been able to convert it back into letters.
Code #2
u = input('Write Something')
a = ord(u[-1])
print(a)
#converts to number and prints ^^
enter code here
print('')
print(????)
#need to convert from numbers back to letters.
Outcome:
Write Something- test
116
How can I send a text through (test) and make it convert it to either set numbers (a->1, b->2) or random numbers, save it to a .txt file and be able to go back and read it at any time?

What youre trying to achieve here is called "caesar encryption".
You for example say normally you would have: A=1, a=2, B=3, B=4, etc...
then you would have a "key" which "shifts" the letters. Lets say the key is "3", so you would shift all letters 3 numbers up and you would end up with: A=4, a=5, B=6, b=7, etc...
This is of course only ONE way of doing a caesar encryption. This is the most basic example. You could also say your key is "G", which would give you:
A=G, a=g, B=H, b=h, etc.. or
A=G, a=H, B=I, b=J, etc...
Hope you understand what im talking about. Again, this is only one very simple example way.
Now, for your program/script you need to define this key. And if the key should be variable, you need to save it somewhere (write it down). Put your words in a string, and check and convert each letter and write it into a new string.
You then could say (pseudo code!):
var key = READKEYFROMFILE;
string old = READKEYFROMFILE_OR_JUST_A_NORMAL_STRING_:)
string new = "";
for (int i=0, i<old.length, i++){
get the string at i;
compare with your "key";
shift it;
write it in new;
}
Hope i could help you.
edit:
You could also use a dictionary (like the other answer says), but this is a very static (but easy) way.
Also, maybe watch some guides/tutorials on programming. You dont seem to be that experienced. And also, google "Caesar encryption" to understand this topic better (its very interesting).
edit2:
Ok, so basically:
You have a variable, called "key" in this variable, you store your key (you understood what i wrote above with the key and stuff?)
You then have a string variable, called "old". And another one called "new".
In old, you write your string that you want to convert.
New will be empty for now.
You then do a "for loop", which goes as long as the ".length" of your "old" string. (that means if your sentence has 15 letters, the loop will go through itself 15 times and always count the little "i" variable (from the for loop) up).
You then need to try and get the letter from "old" (and save it for short in another vairable, for example char temp = "" ).
After this, you need to compare your current letter and decide how to shift it.
If thats done, just add your converted letter to the "new" string.
Here is some more precise pseudo code (its not python code, i dont know python well), btw char stands for "character" (letter):
var key = g;
string old = "teststring";
string new = "";
char oldchar = "";
char newchar = "";
for (int i=0; i<old.length; i++){
oldchar = old.charAt[i];
newchar = oldchar //shift here!!!
new.addChar(newchar);
}
Hope i could help you ;)
edit3:
maybe also take a look at this:
https://inventwithpython.com/chapter14.html
Caesar Cipher Function in Python
https://www.youtube.com/watch?v=WXIHuQU6Vrs

Just use dictionary:
letters = {'a': 1, 'b': 2, ... }
And in the loop:
for letter in word:
print(letters[letter])

To convert to symbol codes and back to characters:
text = input('Write Something')
for t in text:
d = ord(t)
n = chr(d)
print(t,d,n)
To write into file:
f = open("a.txt", "w")
f.write("someline\n")
f.close()
To read lines from file:
f = open("a.txt", "r")
lines = f.readlines()
for line in lines:
print(line, end='') # all lines have newline character at the end
f.close()
Please see documentation for Python 3: https://docs.python.org/3/

Here are a couple of examples. My method involves mapping the character to the string representation of an integer padded with zeros so it's 3 characters long using str.zfill.
Eg 0 -> '000', 42 -> '042', 125 -> '125'
This makes it much easier to convert a string of numbers back to characters since it will be in lots of 3
Examples
from string import printable
#'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\r\x0b\x0c'
from random import sample
# Option 1
char_to_num_dict = {key : str(val).zfill(3) for key, val in zip(printable, sample(range(1000), len(printable))) }
# Option 2
char_to_num_dict = {key : str(val).zfill(3) for key, val in zip(printable, range(len(printable))) }
# Reverse mapping - applies to both options
num_to_char_dict = {char_to_num_dict[key] : key for key in char_to_num_dict }
Here are two sets of dictionaries to map a character to a number. The first option uses random numbers eg 'a' = '042', 'b' = '756', 'c' = '000' the problem with this is you can use it one time, close the program and then the next time the mapping will most definitely not match. If you want to use random values then you will need to save the dictionary to a file so you can open to get the key.
The second option creates a dictionary mapping a character to a number and maintains order. So it will follow the sequence eg 'a' = '010', 'b' = '011', 'c' = '012' everytime.
Now I've explained the mapping choices here are the function to convert between
def text_to_num(s):
return ''.join( char_to_num_dict.get(char, '') for char in s )
def num_to_text(s):
slices = [ s[ i : i + 3 ] for i in range(0, len(s), 3) ]
return ''.join( num_to_char_dict.get(char, '') for char in slices )
Example of use ( with option 2 dictionary )
>>> text_to_num('Hello World!')
'043014021021024094058024027021013062'
>>> num_to_text('043014021021024094058024027021013062')
'Hello World!'
And finally if you don't want to use a dictionary then you can use ord and chr still keeping with padding out the number with zeros method
def text_to_num2(s):
return ''.join( str(ord(char)).zfill(3) for char in s )
def num_to_text2(s):
slices = [ s[ i : i + 3] for i in range(0, len(s), 3) ]
return ''.join( chr(int(val)) for val in slices )
Example of use
>>> text_to_num2('Hello World!')
'072101108108111032087111114108100033'
>>> num_to_text2('072101108108111032087111114108100033')
'Hello World!'

Regex vs readline for text processing

I have a text to process (router output) and generate useful data structure (dictionary having keys as iface name and values as packet counts) from it. I have two approaches to do the same task. I would like to know which one should I use for efficiency and which one looks more prone to fail for bigger data samples.
Readline1 gets a list from readline and processes output and writes into the dictionary with key as interface name and values as next three items.
Readline2 uses re module and match the groups and from groups it writes to dictionary keys and values.
input self.output to these functions will be something like this:
message =
"""
Interface 1/1\n\t
input : 1234\n\t
output : 3456\n\t
dropped : 12\n
\n
Interface 1/2\n\t
input : 7123\n\t
output : 2345\n\t
dropped : 31\n\t
"""
def ReadLine1(self):
lines = self.output.splitlines()
for index, line in enumerate(lines):
if "Interface" in line:
valuelist = []
for i in [1,2,3]:
valuelist.append((lines[index+i].split(':'))[1].strip())
self.IFlist[line.split()[1]] = valuelist
return self.IFlist
def Readline2(self):
#print repr(self.output)
n = re.compile(r"\n*Interface (./.)\n\s*input : ([0-9]+)\n\s*output : ([0-9]+)\n\s*dropped : ([0-9]+)",re.MULTILINE|re.DOTALL)
blocks = self.output.split('\n\n')
for block in blocks:
m_object = re.match(n, block)
self.IFlist[m_object.group(1)] = [m_object.group(i) for i in (2,3,4)]

Both of your methods use specific aspects of the format to achieve the parsing you are trying to do, and if that format was changed / broken one of the methods could also break...
For example if you added a space in the empty line between the two entries (which you cannot see) then the blocks = self.output.split('\n\n') would fail to find two consecutive newline characters and the regex version would miss out on the second entry:
{'1/1': ['1234', '3456', '13']}
Or if you added an extra newline between input and output like this:
Interface 1/2
input : 7123
output : 2345
dropped : 31
The regex \s* would deal with the extra space fine but the non-regex parsing would assume that lines[index+i].split(':') has an indice [1] so it would raise an IndexError with that data
Or if you added some extra space at the end of any line then the regex would fail to see the newline right after the content and re.match(n, lock) would return None so the next line would raise an AttributeError: 'NoneType' object has no attribute 'group'
Or if you changed Interface to interface for one of the entries (no longer capital I) then the regex would raise the same error as above but the non-regex would simply ignore that entry.
While I was testing it I found that the regex was easier to mess up with small edits to the sample message, but I also found that the version I made using a generator expression and str.partition was significantly more robust then both of them:
def readline3():
gen_lines = (line for line in self.output.splitlines()
if line and not line.isspace())
try:
while True: #ended when next() throws a StopIteration
start,_,key = next(gen_lines).partition(" ")
if start == "Interface":
IFlist[key] = [next(gen_lines).rpartition(" : ")[2]
for _ in "123"]
except StopIteration: # reached end of output
return self.IFlist
This succeeded in every case mentioned above and a few more, and since the only method this is relying on is str.partition which alway returns a 3 item tuple there is nothing to raise any unexpected errors unless self.output is something other then a string.
Also running a benchmark using timeit your readline1 consistently was faster then readline2 and my readline3 was usually slightly more then readline1:
#using the default 1000000 loops using 'message'
<function readline1 at 0x100756f28>
11.225649802014232
<function readline2 at 0x1057e3950>
14.838601427007234
<function readline3 at 0x1057e39d8>
11.693351223017089

Wit's end with file to dict

Python: 2.7.9
I erased all of my code because I'm going nuts.
Here's the gist (its for Rosalind challenge thingy):
I want to take a file that looks like this (no quotes on carets)
">"Rosalind_0304
actgatcgtcgctgtactcg
actcgactacgtagctacgtacgctgcatagt
">"Rosalind_2480
gctatcggtactgcgctgctacgtg
ccccccgaagaatagatag
">"Rosalind_2452
cgtacgatctagc
aaattcgcctcgaactcg
etc...
What I can't figure out how to do is basically everything at this point, my mind is so muddled. I'll just show kind of what I was doing, but failing to do.
1st. I want to search the file for '>'
Then assign the rest of that line into the dictionary as a key.
read the next lines up until the next '>' and do some calculations and return
findings into the value for that key.
go through the file and do it for every string.
then compare all values and return the key of whichever one is highest.
Can anyone help?
It might help if I just take a break. I've been coding all day and i think I smell colors.
def func(dna_str):
bla
return gcp #gc count percentage returned to the value in dict

With my_function somewhere that returns that percentage value:
with open('rosalind.txt', 'r') as ros:
rosa = {line[1:].split(' ')[0]:my_function(line.split(' ')[1].strip()) for line in ros if line.strip()}
top_key = max(rosa, key=rosa.get)
print(top_key, rosa.get(top_key))
For each line in the file, that will first check if there's anything left of the line after stripping trailing whitespace, then discard the blank lines. Next, it adds each non-blank line as an entry to a dictionary, with the key being everything to the left of the space except for the unneeded >, and the value being the result of sending everything to the right of the space to your function.
Then it saves the key corresponding to the highest value, then prints that key along with its corresponding value. You're left with a dictionary rosa that you can process however you like.
Complete code of the module:
def my_function(dna):
return 100 * len(dna.replace('A','').replace('T',''))/len(dna)
with open('rosalind.txt', 'r') as ros:
with open('rosalind_clean.txt', 'w') as output:
for line in ros:
if line.startswith('>'):
output.write('\n'+line.strip())
elif line.strip():
output.write(line.strip())
with open('rosalind_clean.txt', 'r') as ros:
rosa = {line[1:].split(' ')[0]:my_function(line.split(' ')[1].strip()) for line in ros if line.strip()}
top_key = max(rosa, key=rosa.get)
print(top_key, rosa.get(top_key))
Complete content of rosalind.txt:
>Rosalind_6404 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCG
TTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG
>Rosalind_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCA
GGCGCTCCGCCGAAGGTCTATATCCA
TTTGTCAGCAGACACGC
>Rosalind_0808 CCACCCTCGTGGT
ATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT
Result when running the module:
Rosalind_0808 60.91954022988506
This should properly handle an input file that doesn't necessarily have one entry per line.
See SO's formatting guide to learn how to make inline or block code tags to get past things like ">". If you want it to appear as regular text rather than code, escape the > with a backslash:
Type:
\>Rosalind
Result:
>Rosalind

I think I got that part down now. Thanks so much. BUUUUT. Its throwing an error about it.
rosa = {line[1:].split(' ')[0]:calc(line.split(' ')[1].strip()) for line in ros if line.strip()}
IndexError: list index out of range
this is my func btw.
def calc(dna_str):
for x in dna_str:
if x == 'G':
gc += 1
divc += 1
elif x == 'C':
gc += 1
divc += 1
else:
divc += 1
gcp = float(gc/divc)
return gcp

Exact test file. no blank lines before or after.
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

Put function outputs to a list in Python [duplicate]

This question already has answers here:
How can I use `return` to get back multiple values from a loop? Can I put them in a list?
(2 answers)
How to concatenate (join) items in a list to a single string
(11 answers)
How can I print multiple things on the same line, one at a time?
(18 answers)
Closed 8 months ago.
The aim of the following program is to convert words in 4 characters from "This" to "T***", I have done the hard part getting that list and len working.
The problem is the program outputs the answer line by line, I wonder if there is anyway that I can store output back to a list and print it out as a whole sentence?
Thanks.
#Define function to translate imported list information
def translate(i):
if len(i) == 4: #Execute if the length of the text is 4
translate = i[0] + "***" #Return ***
return (translate)
else:
return (i) #Return original value
#User input sentense for translation
orgSent = input("Pleae enter a sentence:")
orgSent = orgSent.split (" ")
#Print lines
for i in orgSent:
print(translate(i))

On py 2.x you can add a , after print:
for i in orgSent:
print translate(i),
If you're on py 3.x, then try:
for i in orgSent:
print(translate(i),end=" ")
default value of end is a newline(\n), that's why each word gets printed on a new line.

Use a list comprehension and the join method:
translated = [translate(i) for i in orgSent]
print(' '.join(translated))
List comprehensions basically store the return values of functions in a list, exactly what you want. You could do something like this, for instance:
print([i**2 for i in range(5)])
# [0, 1, 4, 9, 16]
The map function could also be useful - it 'maps' a function to each element of an iterable. In Python 2, it returns a list. However in Python 3 (which I assume you're using) it returns a map object, which is also an iterable that you can pass into the join function.
translated = map(translate, orgSent)
The join method joins each element of the iterable inside the parentheses with the string before the .. For example:
lis = ['Hello', 'World!']
print(' '.join(lis))
# Hello World!
It's not limited to spaces, you could do something crazy like this:
print('foo'.join(lis))
# HellofooWorld!

sgeorge-mn:tmp sgeorge$ python s
Pleae enter a sentence:"my name is suku john george"
my n*** is s*** j*** george
You just need to print with ,. See last line of below pasted code part.
#Print lines
for i in orgSent:
print (translate(i)),
For your more understanding:
sgeorge-mn:~ sgeorge$ cat tmp.py
import sys
print "print without ending comma"
print "print without ending comma | ",
sys.stdout.write("print using sys.stdout.write ")
sgeorge-mn:~ sgeorge$ python tmp.py
print without ending comma
print without ending comma | print using sys.stdout.write sgeorge-mn:~ sgeorge$

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to read in an external file into a dictionary [duplicate] - python

Related

What causes this return() to create a SyntaxError?

How to replace letters with numbers and re-convert at anytime (Caesar cipher)?

Regex vs readline for text processing

Wit's end with file to dict

Put function outputs to a list in Python [duplicate]

Categories

Resources