In the process of writing an app, I've come across a bizarre python code piece that I can't understand:
The Code:
msg = [108878, [[314.06, 2, 4.744018], [314.03, 1, 15.9059], [314.02, 2, 79.8338531], [314, 1, 54.90047253], [313.56, 2, 1.75392219], [313.53, 2, 15.61132219], [313.5, 1, 0.554316], [313.42, 1, 1.5976], [313.27, 1, 0.43344], [313.26, 1, 62.724], [313.25, 1, 2.57518855], [313.24, 1, 0.04], [313.09, 2, 22.51784808], [312.9, 1, 40], [312.82, 1, 26.65592034], [312.7, 1, 35.53791008], [312.62, 1, 0.46912], [312.61, 1, 100], [312.6, 1, 48.33502123], [312.57, 1, 4.24547326], [312.56, 2, 0.2], [312.5, 2, 109.76863639], [312.43, 1, 100], [312.42, 1, 0.11142352], [312.4, 1, 7.815571], [314.09, 2, -3.01187461], [314.14, 1, -1.27056771], [314.39, 2, -9.31898324], [314.46, 1, -0.01930229], [314.49, 1, -0.40344], [314.5, 1, -3.40637161], [314.53, 1, -0.2], [314.54, 2, -0.46432889], [314.57, 1, -0.04200538], [314.71, 1, -0.050949], [314.84, 1, -0.02153813], [314.88, 1, -0.04200538], [314.93, 1, -68.439], [314.94, 2, -7.477782], [314.95, 1, -5], [315.1, 1, -5.97], [315.12, 1, -0.01], [315.16, 1, -40], [315.2, 1, -0.04200538], [315.22, 1, -25.7525], [315.23, 1, -78.54523718], [315.38, 1, -80], [315.42, 1, -6.65060488], [315.47, 1, -20], [315.48, 1, -5.36]]]
bids = asks = {}
lenbids = lenasks = 0
for order in msg[1]:
if float(order[2])>0:
lenbids +=1
bids[order[0]]=order[2]
elif float(order[2])<0:
lenasks+=1
asks[order[0]]=-order[2]
print(len(bids),len(asks),lenbids,lenasks)
Output:
50 50 25 25
It seems to me as though python is behaving properly with regards to the lenbids/lenasks actions but is executing the second part of the if statement regardless of whether the condition is met or not?
I'm running Pycharm with Anaconda3 if that makes any difference.
Any help greatly appreciated!
You have set bids and asks to be the same dictionary:
bids = asks = {}
You want them to be different dictionaries
bids, asks = {}, {}
Related
I have a dataframe such as below:
user_id
sales
example_flag_1
example_flag_2
quartile_1
quartile_2
1
10
0
1
1
1
2
21
1
1
2
2
3
300
0
1
3
3
4
41
0
1
4
4
5
55
0
1
1
1
...
I'm attempting to iterate through all possible combinations of (in my example) example_flag_1, example_flag_2, quartile_1, and quartile_2. Then, for each combination, what is the sum of sales for users who fit that combination profile?
For example, for all users with 1, 1, 1, 1, what is the sum of their sales?
What about 0, 1, 1, 1?
I want the computer to go through all possible combinations and tell me.
I hope that's clear, but let me know if you have any questions.
Sure.
Use itertools.product() to generate the combinations, functools.reduce() to generate the mask, and you're off to the races:
import itertools
from functools import reduce
import pandas as pd
data = pd.DataFrame(
{
"user_id": [1, 2, 3, 4, 5],
"sales": [10, 21, 300, 41, 55],
"example_flag_1": [0, 1, 0, 0, 0],
"example_flag_2": [1, 1, 1, 1, 1],
"quartile_1": [1, 2, 3, 4, 1],
"quartile_2": [1, 2, 3, 4, 1],
}
)
flag_columns = ["example_flag_1", "example_flag_2", "quartile_1", "quartile_2"]
flag_options = [set(data[col].unique()) for col in flag_columns]
for combo_options in itertools.product(*flag_options):
combo = {col: option for col, option in zip(flag_columns, combo_options)}
mask = reduce(lambda x, y: x & y, [data[col] == option for col, option in combo.items()])
sales_sum = data[mask].sales.sum()
print(combo, sales_sum)
This prints out (e.g.)
{'example_flag_1': 0, 'example_flag_2': 1, 'quartile_1': 1, 'quartile_2': 1} 65
{'example_flag_1': 0, 'example_flag_2': 1, 'quartile_1': 1, 'quartile_2': 2} 0
...
{'example_flag_1': 0, 'example_flag_2': 1, 'quartile_1': 3, 'quartile_2': 1} 0
{'example_flag_1': 0, 'example_flag_2': 1, 'quartile_1': 3, 'quartile_2': 2} 0
{'example_flag_1': 0, 'example_flag_2': 1, 'quartile_1': 3, 'quartile_2': 3} 300
{'example_flag_1': 0, 'example_flag_2': 1, 'quartile_1': 3, 'quartile_2': 4} 0
...
I have this dataset in which one of the columns has empty rows and some strings, whereas I only need to keep the numeric ones.
I have tried this for the strings:
df_3 = df_cor_inc[['Person ID','rt']]
df5 = df_3.to_csv('Documents/a.csv',index=False)
df5['rt'].apply(lambda x: pd.to_numeric(x, errors = 'coerce')).dropna()
But I get: AttributeError: 'NoneType' object has no attribute 'dropna'.
This doesn't work either because 'AttributeError: 'NoneType' object has no attribute 'rt':
df5[df5.rt.apply(lambda x: x.isnumeric())]
Same thing happens when I try to get rid of the empty rows, I get an error because I have 'NoneType'. How do I get rid of it so that I only keep the numeric values of that column and delete all the rows that don't have them?
This is how the data looks:
Person ID,rt
0,445
0,445
0,445
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,Wait success
1,
1,
1,
1,
1,
1,1230
1,1230
1,1230
1,1230
1,1230
1,1230
1,1721
1,1721
1,1721
1,1721
1,1721
1,1721
Problem here is ouput variable is set to None if write df_3 by to_csv to file.
df5 = df_3.to_csv('Documents/a.csv',index=False)
Solution is working only with df_3 like:
df_3 = df_cor_inc[['Person ID','rt']]
#no assign to df5
df_3.to_csv('Documents/a.csv',index=False)
#assigned converted values to numeric
df_3['rt'] = pd.to_numeric(df_3['rt'], errors = 'coerce')
#removed NaNs rows by rt column
df5 = df_3.dropna(subset=['rt'])
why is it when I do not set default value of defaultdict to be zero (int), my below program does not give me results:
>>> doc
'A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. I am alone, and feel the charm of existence in this spot, which was created for the bliss of souls like mine. I am so happy'
>>> some = defaultdict()
>>> for i in doc.split():
... some[i] = some[i]+1
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyError: 'A'
>>> some
defaultdict(None, {})
>>> i
'A'
yet it works with a default value
>>> some = defaultdict(int)
>>> for i in doc.split():
... some[i] = some[i]+1
...
>>> some
defaultdict(<class 'int'>, {'A': 1, 'wonderful': 1, 'serenity': 1, 'has': 1, 'taken': 1, 'possession': 1, 'of': 4, 'my': 2, 'entire': 1, 'soul,': 1, 'like': 2, 'these': 1, 'sweet': 1, 'mornings': 1, 'spring': 1, 'which': 2, 'I': 3, 'enjoy': 1, 'with': 1, 'whole': 1, 'heart.': 1, 'am': 2, 'alone,': 1, 'and': 1, 'feel': 1, 'the': 2, 'charm': 1, 'existence': 1, 'in': 1, 'this': 1, 'spot,': 1, 'was': 1, 'created': 1, 'for': 1, 'bliss': 1, 'souls': 1, 'mine.': 1, 'so': 1, 'happy': 1})
>>>
Could you tell why does it work like thus?
As the documentation says:
The first argument provides the initial value for the default_factory
attribute; it defaults to None. All remaining arguments are treated
the same as if they were passed to the dict constructor, including
keyword arguments.
Therefore, if you just write defaultdict without passing any value to the constructor, the default value is set to None
See the output:
some = defaultdict()
print(some) # defaultdict(None, {})
And when the value is set to None, you can not execute: some[i] = some[i]+1.
Thus, you have to set the default value to int explicitly: some = defaultdict(int)
I have a default dictionary with name df:
defaultdict(<type 'int'>, {u'DE': 1, u'WV': 1, u'HI': 1, u'WY': 1, u'NH': 2, u'NJ': 1, u'NM': 1, u'TX': 1, u'LA': 1, u'NC': 1, u'NE': 1, u'TN': 1, u'RI': 1, u'VA': 1, u'CO': 1, u'AK': 1, u'AR': 1, u'IL': 1, u'GA': 1, u'IA': 1, u'MA': 1, u'ID': 1, u'ME': 1, u'OK': 2, u'MN': 1, u'MI': 1, u'KS': 1, u'MT': 1, u'MS': 1, u'SC': 2, u'KY': 1, u'OR': 1, u'SD': 1})
how do I get the keys of this dictionary whose values are more than 1.
If I do [df[val] for val in df if df[val]>1]
I get the output as [2, 2, 2]
If I print [df.keys() for val in df if df[val]>1] Still I donot get the key values, I need the keys that has values more than 2 like this ['SC', 'OK', 'NH']
How do I do that??
Reading from a dictionary created using defaultdict() is the same as a normal dict.
To get the keys which have values > 1, you would do:
my_dict = defaultdict(...)
print [key for key, value in my_dict.iteritems() if value > 1]
If you're using Python 3 then it's my_dict.items().
We can use list compression method.
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d['HI'] = 1
>>> d['NH'] = 2
>>> d['WY'] = 1
>>> d['OK'] = 2
>>> [i[0] for i in d.items() if i[1]>1]
['NH', 'OK']
My problem is I can't figure out how to display the word count using the dictionary and refer
to keys length. For example, consider the following piece of text:
"This is the sample text to get an idea!. "
Then the required output would be
3 2
2 3
0 5
as there are 3 words of length 2, 2 words of length 3, and 0 words of length 5 in the
given sample text.
I got as far as displaying the list the word occurrence frequency:
def word_frequency(filename):
word_count_list = []
word_freq = {}
text = open(filename, "r").read().lower().split()
word_freq = [text.count(p) for p in text]
dictionary = dict(zip(text,word_freq))
return dictionary
print word_frequency("text.txt")
which diplays the dict in this format:
{'all': 3, 'show': 1, 'welcomed': 1, 'not': 2, 'availability': 1, 'television,': 1, '28': 1, 'to': 11, 'has': 2, 'ehealth,': 1, 'do': 1, 'get': 1, 'they': 1, 'milestone': 1, 'kroes,': 1, 'now': 3, 'bringing': 2, 'eu.': 1, 'like': 1, 'states.': 1, 'them.': 1, 'european': 2, 'essential': 1, 'available': 4, 'because': 2, 'people': 3, 'generation': 1, 'economic': 1, '99.4%': 1, 'are': 3, 'eu': 1, 'achievement,': 1, 'said': 3, 'for': 3, 'broadband': 7, 'networks': 2, 'access': 2, 'internet': 1, 'across': 2, 'europe': 1, 'subscriptions': 1, 'million': 1, 'target.': 1, '2020,': 1, 'news': 1, 'neelie': 1, 'by': 1, 'improve': 1, 'fixed': 2, 'of': 8, '100%': 1, '30': 1, 'affordable': 1, 'union,': 2, 'countries.': 1, 'products': 1, 'or': 3, 'speeds': 1, 'cars."': 1, 'via': 1, 'reached': 1, 'cloud': 1, 'from': 1, 'needed': 1, '50%': 1, 'been': 1, 'next': 2, 'households': 3, 'commission': 5, 'live': 1, 'basic': 1, 'was': 1, 'said:': 1, 'more': 1, 'higher.': 1, '30mbps': 2, 'that': 4, 'but': 2, 'aware': 1, '50mbps': 1, 'line': 1, 'statement,': 1, 'with': 2, 'population': 1, "europe's": 1, 'target': 1, 'these': 1, 'reliable': 1, 'work': 1, '96%': 1, 'can': 1, 'ms': 1, 'many': 1, 'further.': 1, 'and': 6, 'computing': 1, 'is': 4, 'it': 2, 'according': 1, 'have': 2, 'in': 5, 'claimed': 1, 'their': 1, 'respective': 1, 'kroes': 1, 'areas.': 1, 'responsible': 1, 'isolated': 1, 'member': 1, '100mbps': 1, 'digital': 2, 'figures': 1, 'out': 1, 'higher': 1, 'development': 1, 'satellite': 4, 'who': 1, 'connected': 2, 'coverage': 2, 'services': 2, 'president': 1, 'a': 1, 'vice': 1, 'mobile': 2, "commission's": 1, 'points': 1, '"access': 1, 'rural': 1, 'the': 16, 'agenda,': 1, 'having': 1}
def freqCounter(infilepath):
answer = {}
with open(infilepath) as infile:
for line in infilepath:
for word in line.strip().split():
l = len(word)
if l not in answer:
answer[l] = 0
answer[l] += 1
return answer
AlternativelyL
import collections
def freqCounter(infilepath):
with open(infilepath) as infile:
return collections.Counter(len(word) for line in infile for word in line.strip().split())
Use collections.Counter
import collections
sentence = "This is the sample text to get an idea"
Count = collections.Counter([len(a) for a in sentence.split()])
print Count
To count how many words in a text have given lengths: size -> frequency distribution, you could use a regular expression to extract words:
#!/usr/bin/env python3
import re
from collections import Counter
text = "This is the sample text to get an idea!. "
words = re.findall(r'\w+', text.casefold())
frequencies = Counter(map(len, words)).most_common()
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in frequencies]))
Output
3 word(s) of length 2
3 word(s) of length 4
2 word(s) of length 3
1 word(s) of length 6
Note: It ignores the punctuation such as !. after 'idea' unlike .split()-based solutions automatically.
To read words from a file, you could read lines and extract words from them in the same way as it done for text in the first code example:
from itertools import chain
with open(filename) as file:
words = chain.from_iterable(re.findall(r'\w+', line.casefold())
for line in file)
# use words here.. (the same as above)
frequencies = Counter(map(len, words)).most_common()
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in frequencies]))
In practice, you could use a list to find the length frequency distribution if you ignore words that are longer than a threshold:
def count_lengths(words, maxlen=100):
frequencies = [0] * (maxlen + 1)
for length in map(len, words):
if length <= maxlen:
frequencies[length] += 1
return frequencies
Example
import re
text = "This is the sample text to get an idea!. "
words = re.findall(r'\w+', text.casefold())
frequencies = count_lengths(words)
print("\n".join(["%d word(s) of length %d" % (n, length)
for length, n in enumerate(frequencies) if n > 0]))
Output
3 word(s) of length 2
2 word(s) of length 3
3 word(s) of length 4
1 word(s) of length 6