I need to have my dictionary keys in this format '/ cat/' but i keep getting multiple forward slashes. Here is my code:
# Defining the Digraph method #
def digraphs(s):
dictionary = {}
count = 0;
while count <= len(s):
string = s[count:count + 2]
count += 1
dictionary[string] = s.count(string)
for entry in dictionary:
dictionary['/' + entry + '/'] = dictionary[entry]
del dictionary[entry]
print(dictionary)
#--End of the Digraph Method---#
Here is my output:
i do this:
digraphs('my cat is in the hat')
{'///in///': 1, '/// t///': 1, '/// c///': 1, '//s //': 1, '/my/': 1, '/n /': 1, '/e /': 1, '/ h/': 1, '////ha////': 1, '//////': 21, '/is/': 1, '///ca///': 1, '/he/': 1, '//th//': 1, '/t/': 3, '//at//': 2, '/t /': 1, '////y ////': 1, '/// i///': 2}
In Python, you generally shouldn't iterate over objects while modifying them. Instead of modifying your dictionary, make a new one:
new_dict = {}
for entry in dictionary:
new_dict['/' + entry + '/'] = dictionary[entry]
return new_dict
Or more compactly (Python 2.7 and above):
return {'/' + key + '/': val for key, val in dictionary.items()}
An even better approach would be to skip creating your original dictionary in the first place:
# Defining the Digraph method #
def digraphs(s):
dictionary = {}
for count in range(len(s)):
string = s[count:count + 2]
dictionary['/' + string + '/'] = s.count(string)
return dictionary
#--End of the Digraph Method---#
You are adding entries to the dictionary as you loop over it, so you your new entries are included in the loop too and get extra slashes added again. A better approach is to make a new dictionary containing the new keys you want:
newDict = dict(('/' + key + '/', val) for key, val in oldDict.iteritems())
As #Blender points out, you can also use a dictionary comprehension if you're using Python 3:
{'/'+key+'/': val for key, val in oldDict.items()}
Related
Link to problem statement
Please help. I am very confused on how to execute this:
This is what I currently have:
def similarityAnalysis(paragraph1, paragraph2):
dict = {}
for word in lst:
if word in dict:
dict[word] = dict[word] + 1
else:
dict[word] = 1
for key, vale in dict.items():
print(key, val)
see below.
For find common words we use set intersection
For counting we use a dict
Code
lst1 = ['jack','Jim','apple']
lst2 = ['chair','jack','ball','steve']
common = set.intersection(set(lst1),set(lst2))
print('commom words below:')
print(common)
print()
print('counter below:')
counter = dict()
for word in lst1:
if word not in counter:
counter[word] = [0,0]
counter[word][0] += 1
for word in lst2:
if word not in counter:
counter[word] = [0,0]
counter[word][1] += 1
print(counter)
output
commom words below:
{'jack'}
counter below:
{'jack': [1, 1], 'Jim': [1, 0], 'apple': [1, 0], 'chair': [0, 1], 'ball': [0, 1], 'steve': [0, 1]}
Analysing your code as follows:
You use the variable name dict which is a reserved keyword (for creating dictionaries). By using this as a variable name, you will loose the ability to use the dict function.
The function uses a variable named lst which is not one of its arguments. Where do the values for this variable come from?
In the second for loop, you use the variable name vale but then later reference a different variable called val.
Otherwise, looks good. There may be other issues, that's as far as I got.
Recommend googling the following and seeing what code you find
"Python count the number of words in a paragraph"
Update:
There are many ways to do this, but here's one answer:
def word_counts(lst):
counts = {}
for word in lst:
counts[word] = counts.get(word, 0) + 1
return counts
def similarityAnalysis(paragraph1, paragraph2):
lst1 = paragraph1.split()
lst2 = paragraph2.split()
counts1 = word_counts(lst1)
counts2 = word_counts(lst2)
common_words = set(lst1).intersection(lst2)
return {word: (counts1[word], counts2[word]) for word in common_words}
paragraph1 = 'one three two one two four'
paragraph2 = 'one two one three three one'
print(similarityAnalysis(paragraph1, paragraph2))
Output:
{'three': (1, 2), 'one': (2, 3), 'two': (2, 1)}
I have a dictionary with positive integers as keys, the values don't matter for my question.
Separately, I am iterating through a list of integers, and I want to reference the largest key in my dictionary, that is smaller than the current integer that I am iterating over in my list (if it exists!).
For example:
from collections import defaultdict
def Loep(obstacles):
my_dict = defaultdict(int)
output = []
for i in range(len(obstacles)):
if max(j for j in my_dict.keys() if j<= obstacles[i]):
temp = max(j for j in my_dict.keys() if j<= obstacles[i])
my_dict[obstacles[i]] = temp + 1
output.append(my_dict[obstacles[i]])
else:
my_dict[obstacles[i]] = 1
output.append(my_dict[obstacles[i]])
print(Loep([3,1,5,6,4,2]))
I am getting an error for the 'if' statement above- I believe it is because I have one too many arguments in max(), any ideas how to amend the code?
The error is: ValueError: max() arg is an empty sequence
I've tried separating it, but I can't quite do it.
Something like this:
from collections import defaultdict
def Loep(obstacles):
my_dict = defaultdict(int)
my_dict.update({
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
})
output = []
for obstacle in obstacles:
keys = [j for j in my_dict.keys() if j <= obstacle]
if keys:
# there is at least one qualifying key
key = max(keys)
my_dict[obstacle] = key + 1
output.append(my_dict[obstacle])
else:
my_dict[obstacle] = 1
output.append(my_dict[obstacle])
return output
print(Loep([3, 1, 5, 6, 4, 2]))
In response to your comment about doing it in one line.. yes, you could condense it like so:
for obstacle in obstacles:
key = max([None]+[j for j in my_dict.keys() if j <= obstacle])
if key is not None:
# etc
.. and definitely there are other ways to do it.. using filter.. or other ways.. but end of the day you are trying to not just get the max, but to get the max lower than a specific value. Unless you're working with a very large amount of data, or in need of extreme speed.. that this is the easiest way.
Try this. Is it what you want?
from collections import defaultdict
def Loep(obstacles):
my_dict = defaultdict(int)
output = []
for i in range(len(obstacles)):
founds = [j for j in my_dict.keys() if j <= obstacles[i]]
if founds:
max_val = max(founds)
my_dict[obstacles[i]] = max_val + 1
else:
my_dict[obstacles[i]] = 1
output.append(my_dict[obstacles[i]])
return output
print(Loep([3, 1, 5, 6, 4, 2]))
I have the following dictionary:
mydict = {'mindestens': 2,
'Situation': 3,
'österreichische': 2,
'habe.': 1,
'Über': 1,
}
How can I get a list / text out of it, that the strings in my dictionary are repeated as the number is mapped in the dictionary to it:
mylist = ['mindestens', 'mindestens', 'Situation', 'Situation', 'Situation',.., 'Über']
mytext = 'mindestens mindestens Situation Situation Situation ... Über'
You might just use loops:
mylist = []
for word,times in mydict.items():
for i in range(times):
mylist.append(word)
itertools library has convenient features for such cases:
from itertools import chain, repeat
mydict = {'mindestens': 2, 'Situation': 3, 'österreichische': 2,
'habe.': 1, 'Über': 1,
}
res = list(chain.from_iterable(repeat(k, v) for k, v in mydict.items()))
print(res)
The output:
['mindestens', 'mindestens', 'Situation', 'Situation', 'Situation', 'österreichische', 'österreichische', 'habe.', 'Über']
For text version - joining a list items is trivial: ' '.join(<iterable>)
I have a String like this str = "aabcccdfffeeeeettaaaattiioccc"
I need output like this Result ={aa: 1;b:1;ccc:2;d:1;fff:1;eeeee:1;tt:2;aaaa:1;ii:1;o:1;ccc:1}
I have tried it like this so far:
def repeating_letters(the_string):
temp = []
count = 0
for i in range(len(the_string)):
if(the_string[i] == the_string[i]):
if(the_string[i] == the_string[i+1]):
temp = the_string[i]
# count = count+1
print(the_string[i])
if name__== "__main":
the_string = "aaafassskfahfioejwwa"
repeating_letters(the_string)
Hints
I would follow this steps:
Create a list where I will store my partial strings
Start iterating the string
Store the initial position and the current character
Keep iterating until the character is different
Store in the list the partial string from the initial position you stored until 1 less than the current position
Update the initial position to the current one and the current character
Use the list to create a collections.Counter
About your code, the_string[i] == the_string[i] will always be true.
SPOILER: solution
from collections import Counter
def repeating_letters(the_string):
partials = []
initial = 0
for i, character in enumerate(the_string):
if character == the_string[initial]:
continue
partials.append(the_string[initial:i])
initial = i
partials.append(the_string[initial:]) # Needed for the last partial string
return Counter(partials)
As #prahantrana mentions in a comment, getting the partials can be done in a one-liner with the groupby method from the itertools library.
from collections import Counter
from itertools import groupby
def repeating_letters(the_string):
return Counter(''.join(group) for _, group in groupby(the_string))
Or
from collections import Counter
from itertools import groupby
def repeating_letters(the_string):
return Counter(char*len(list(group)) for char, group in groupby(the_string))
I'm not sure which of them is faster.
from collections import Counter
from itertools import groupby
def splitter(text):
"""
text: str
return : frequency of continous characters
"""
string = [''.join(group) for key, group in groupby(text)]
return Counter(string)
l = 'aaaabcccdfffeeeeettfffaaaattiioccceeeeeeaaaa'
print(splitter(l))
output
Counter({'aaaa': 3, 'ccc': 2, 'fff': 2, 'tt': 2, 'b': 1, 'd': 1, 'eeeee': 1, 'ii': 1, 'o': 1, 'eeeeee': 1})
other way , coded method, not using any library
from collections import Counter
def function(string):
"""
string: str
return: frequency of continous same character
"""
res = []
tmp = []
if len(string)==0:
return Counter('')
val = string[0]
for i in range(1, len(string)):
if string[i] == val:
tmp.append(val)
val =string[i]
else:
tmp.append(val)
res.append(tmp)
tmp = []
val = string[i]
tmp.append(val)
res.append(tmp)
p = [''.join(i) for i in res]
return Counter(p)
l ='aaaabcccdfffeeeeettfffaaaattiioccceeeeeeaaaa'
print(function(l))
output
Counter({'aaaa': 3, 'ccc': 2, 'fff': 2, 'tt': 2, 'b': 1, 'd': 1, 'eeeee': 1, 'ii': 1, 'o': 1, 'eeeeee': 1})
I have a system log that looks like the following:
{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}
I want to parse this in Python into a dictionary object with = splitting key-value pairs. At the same time, the array that is enclosed by [] is also preserved. I want to keep this as generic as possible so the parsing can also hold some future variations.
What I tried so far (code will be written): split each line by "=" into key-value pair, determine where [ and ] starts and end and then split the lines in between by ":" into key-value pairs. That seems a little hard-coded.. Any better idea?
This could be pretty easily simplified to YAML. pip install pyyaml, then set up like so:
import string, yaml
data = """
{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}
"""
With this setup, you can use the following to parse your data:
data2 = data.replace(":", ": ").replace("=", ":").replace("[","{").replace("]","}")
lines = data2.splitlines()
for i, line in enumerate(lines):
if len(line)>0 and line[-1] in string.digits and not line.endswith(",") or i < len(lines) - 1 and line.endswith("}"):
lines[i] += ","
data3 = "\n".join(lines)
yaml.load(data3) # {'a': 1, 'b': 2, 'c': {'x': 1, 'y': 2, 'z': 3}, 'd': 4}
Explanation
In the first line, we perform some simple substitutions:
YAML requires that there is a space after colons in key/value pairs. So with replace(":", ": "), we can ensure this.
Since YAML key/value pairs are always denoted by a colon and your format sometimes uses equals signs, we replace equal signs with commas using .replace("=", ":")
Your format sometimes uses square brackets where curly brackets should be used in YAML. We fix using .replace("[","{").replace("]","}")
At this point, your data looks like this:
{
a : 1
b : 2
c : {
x: 1,
y: 2,
z: 3,
}
d : 4
}
Next, we have a for loop. This is simply responsible for adding commas after lines where they're missing. The two cases in which for loops are missing are:
- They're absent after a numeric value
- They're absent after a closing bracket
We match the first of these cases using len(line)>0 and line[-1] in string.digits (the last character in the line is a digit)
The second case is matched using i < len(lines) - 1 and line.endswith("}"). This checks if the line ends with }, and also checks that the line is not the last, since YAML won't allow a comma after the last bracket.
After the loop, we have:
{
a : 1,
b : 2,
c : {
x: 1,
y: 2,
z: 3,
},
d : 4,
}
which is valid YAML. All that's left is yaml.load, and you've got yourself a python dict.
If anything isn't clear please leave a comment and I'll happily elaborate.
There is probably a better answer, but I would take advantage of all your dictionary keys being at the same indentation level. There's not an obvious way to be to do this with newline splitting, JSON loading, or that sort of thing since the list structure is a bit weird (it seems like a cross between a list and a dictionary).
Here's an implementation that parses keys based on indentation level:
import re
log = '''{
a = 1
b = 2
c = [
x:1,
y:2,
z:3,
]
d = 4
}'''
log_lines = log.split('\n')[1:-1] # strip bracket lines
KEY_REGEX = re.compile(r' [^ ]')
d = {}
current_pair = ''
for i, line in enumerate(log_lines):
if KEY_REGEX.match(line):
if current_pair:
key, value = current_pair.split('=')
d[key.strip()] = value.strip()
current_pair = line
else:
current_pair += line.strip()
if current_pair:
key, value = current_pair.split('=')
d[key.strip()] = value.strip()
print(d)
Output:
{'d': '4', 'c': '[x:1,y:2,z:3,]', 'a': '1', 'b': '2'}