Selecting specific results from print out in Python - 2.7 - python

I am looking to take the three most recent (based on time) lists from the printed code below. These are not actual files but text parsed and stored in dictionaries:
list4 = sorted(data1.values(), key = itemgetter(4))
for complete in list4:
if complete[1] == 'Completed':
print complete
returns:
['aaa664847', ' Completed', ' location' , ' mode', ' 2014-xx-ddT20:00:00.000']
['aaa665487', ' Completed', ' location' , ' mode', ' 2014-xx-ddT19:00:00.000']
['aaa661965', ' Completed', ' location' , ' mode', ' 2014-xx-ddT18:00:00.000']
['aaa669696', ' Completed', ' location' , ' mode', ' 2014-xx-ddT17:00:00.000']
['aaa665376', ' Completed', ' location' , ' mode', ' 2014-xx-ddT16:00:00.000']
I have tried to append these results to another list got this:
[['aaa664847', ' Completed', ' location' , ' mode', ' 2014-xx-ddT20:00:00.000']]
[['aaa665487', ' Completed', ' location' , ' mode', ' 2014-xx-ddT19:00:00.000']]
I would like one list which i then could use [-3:] to print out the three most recent.
storage = [['aaa664847', ' Completed', ' location' , ' mode', ' 2014-xx-ddT20:00:00.000'],['aaa665487', ' Completed', ' location' , ' mode', ' 2014-xx-ddT19:00:00.000']]

So,
import itertools
storage = list(itertools.islice((c for c in list4 if c[1]=='Completed'), 3))
maybe...?
Added: an explanation might help. The (c for c in list4 if c[1]=='Completed') part is a generator expression ("genexp") -- it walks list4 from the start and only yields, one at a time, items (sub-lists here) satisfying the condition.
The () around it are needed, because itertools.islice takes another argument (a genexp must always be surrounded by parentheses, though when it's the only argument to a callable the parentheses that call the callable are enough and need not be doublled up).
islice is told (via its second argument) to yield only the first up to 3 items of the iterable it receives as the first argument. Once it's done that it stops looping, doing no further work (which would be useless).
We do need a call to list over this all because we require, as a result, a list, not an iterator (which is what islice's result is).
People who are uncomfortable with generators and iterators might choose the following less-elegant, probably less-performant, but simpler approach:
storage = []
for c in list4:
if c[1]=='Completed':
storage.append(c)
if len(c) == 3: break
This is perfectly valid Python, too (and it would have worked just fine as far back as Python 1.5.4 if not earlier). But modern Python usage leans far more towards generators, iterators, and itertools, where applicable...

Related

parsing a list of strings based on values in the string

I scraped data from a website and output the results in a list using the following code to get the following output using beautifulsoup and requests:
['1\n',
' Saul Alvarez*',
'1545\n',
'\n\n',
' middle\n',
' 30\n',
' 53\xa01\xa02\n',
' \n',
'orthodox\n',
'Guadalajara, Mexico',
'2\n',
' Tyson Fury',
'1030\n',
'\n\n',
' heavy\n',
' 32\n',
' 30\xa00\xa01\n',
' \n',
'orthodox\n',
'Wilmslow, United Kingdom',
'3\n',
' Errol Spence Jr',
'697.2\n',
'\n\n',
' welter\n',
' 30\n',
' 27\xa00\xa00\n',
' \n',
'southpaw\n',
'Desoto, USA',
'4\n',
' Terence Crawford',
'658.9\n',
'\n\n',
' welter\n',
...
I'm having difficulty parsing this list wherever there is an integer + '\n'.
So ideally I would like the output to be a list of lists :
[[
'1\n',
' Saul Alvarez*',
'1545\n',
'\n\n',
' middle\n',
' 30\n',
' 53\xa01\xa02\n',
' \n',
'orthodox\n',
'Guadalajara, Mexico'
],
['2\n',
' Tyson Fury',
'1030\n',
'\n\n',
' heavy\n',
' 32\n',
' 30\xa00\xa01\n',
' \n',
'orthodox\n',
'Wilmslow, United Kingdom']
['3\n',
' Errol Spence Jr',
'697.2\n',
'\n\n',
' welter\n',
' 30\n',
' 27\xa00\xa00\n',
' \n',
'southpaw\n',
'Desoto, USA'],
...]
Well, there are 2 things going on, and I'll address only the first.
You can drop blanks and '\n' because those are newline characters, i.e. linefeeds.
li = ['1\n',
' Saul Alvarez*',
'1545\n',
'\n\n',
' middle\n',
' 30\n',
' 53\xa01\xa02\n',
' \n',
]
li = [val.replace(r"\n","") for val in li]
li = [val.strip() for val in li if val.strip()]
print(li)
That outputs:
['1', 'Saul Alvarez*', '1545', 'middle', '30', '53\xa01\xa02']
Second issue, which I won't address here as we don't know the html format which you haven't given, is that you are grabbing all the element values (the text in each tag) without looking at the HTML markup's structure. That's the wrong approach to take.
I assume that if you look at the page's source you might find something like <div class="name">Saul Alvarez</div><div class="weightclass">middle</div>. Using the markup's annotation and semantic context is more productive than trying to guess at the structure from the above list with 6 elements. BeautifulSoup can do it, trying using soup.select("div.name") for example.
The nice thing with soup.select which uses CSS selectors is that you can pre-test your query in your browser's dev tools.
Just remember, soup.select will return a list of html elements, from which you'll want to look at the value.

Question about changing the argument in range function through iterations

I'm a newbie so I'm really sorry if this is too basic of a question, but I just couldn't solve it on my own. Perhaps it's not considered complex enough ( at all ) which would explain why I couldn't find an adequate answer online.
I've made a tic-tac-toe program following the Automate the Boring Stuff with Python textbook, but modified it a tiny bit so it doesn't allow players to enter 'X'/'O' in already filled slots. Here's what it looks like :
theBoard = {'top-L': ' ', 'top-M': ' ', 'top-R': ' ',
'mid-L': ' ', 'mid-M': ' ', 'mid-R': ' ',
'low-L': ' ', 'low-M': ' ', 'low-R': ' '}
def printBoard(board):
print(board['top-L']+'|'+board['top-M']+'|'+board['top-R'])
print('-+-+-')
print(board['mid-L']+'|'+board['mid-M']+'|'+board['mid-R'])
print('-+-+-')
print(board['low-L']+'|'+board['low-M']+'|'+board['low-R'])
turn='X'
_range=9
for i in range(_range):
printBoard(theBoard)
print('''It is the '''+turn+''' player's turn.''')
move=input()
if theBoard[move]==' ':
theBoard[move]=turn
else:
print('The slot is already filled !')
_range+=1
if turn=='X':
turn ='O'
else:
turn='X'
printBoard(theBoard)
However, it doesn't seem like the _range variable is being increased by one at all through iterations where I intentionally enter 'X'\'O' in the slots where such symbols are already existent.
Is there something that I'm missing her ? Is there any way I could make this work as I planned it ?
Thank you in advance.
It could be easier to sanitize the input immediately via a while loop instead of increasing the range.
move = input(f"It is the {turn} player's turn.")
while theBoard[move]!=' ':
move=input('The slot is already filled')
Your attempt did not work because you changed _range after it being used and it is not used after you changed it.

How do I find the source code for a method in Pandas?

The following is the GitHub link for Python's Pandas package.
https://github.com/pandas-dev/pandas
I would like to find the source code for a specific method (for instance, iterrows). What would be the file path for this?
Python, in general, is easily introspect-able. You can use the inspect module if you want to do this programatically. so for example:
In [8]: import pandas as pd
In [9]: import inspect
In [10]: pd.DataFrame.iterrows
Out[10]: <function pandas.core.frame.DataFrame.iterrows(self)>
In [11]: inspect.getsourcefile(pd.DataFrame.iterrows)
Out[11]: '/Users/juan/anaconda3/envs/py38/lib/python3.8/site-packages/pandas/core/frame.py'
So you can go to pandas/core/frame.py. Note, this won't always work if it is, say, a method written in C as an extension. But it should for Python source code. In fact, you can even get the source code lines using inspect.getsourcelines, which returns a tuple of lines, line_number:
In [12]: inspect.getsourcelines(pd.DataFrame.iterrows)
Out[12]:
([' def iterrows(self):\n',
' """\n',
' Iterate over DataFrame rows as (index, Series) pairs.\n',
'\n',
' Yields\n',
' ------\n',
' index : label or tuple of label\n',
' The index of the row. A tuple for a `MultiIndex`.\n',
' data : Series\n',
' The data of the row as a Series.\n',
'\n',
' it : generator\n',
' A generator that iterates over the rows of the frame.\n',
'\n',
' See Also\n',
' --------\n',
' itertuples : Iterate over DataFrame rows as namedtuples of the values.\n',
' items : Iterate over (column name, Series) pairs.\n',
'\n',
' Notes\n',
' -----\n',
'\n',
' 1. Because ``iterrows`` returns a Series for each row,\n',
' it does **not** preserve dtypes across the rows (dtypes are\n',
' preserved across columns for DataFrames). For example,\n',
'\n',
" >>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])\n",
' >>> row = next(df.iterrows())[1]\n',
' >>> row\n',
' int 1.0\n',
' float 1.5\n',
' Name: 0, dtype: float64\n',
" >>> print(row['int'].dtype)\n",
' float64\n',
" >>> print(df['int'].dtype)\n",
' int64\n',
'\n',
' To preserve dtypes while iterating over the rows, it is better\n',
' to use :meth:`itertuples` which returns namedtuples of the values\n',
' and which is generally faster than ``iterrows``.\n',
'\n',
' 2. You should **never modify** something you are iterating over.\n',
' This is not guaranteed to work in all cases. Depending on the\n',
' data types, the iterator returns a copy and not a view, and writing\n',
' to it will have no effect.\n',
' """\n',
' columns = self.columns\n',
' klass = self._constructor_sliced\n',
' for k, v in zip(self.index, self.values):\n',
' s = klass(v, index=columns, name=k)\n',
' yield k, s\n'],
860)
Generally, also, you can just print the function/method and look at the information in the string representation, and pretty much figure it out:
In [19]: pd.DataFrame.iterrows
Out[19]: <function pandas.core.frame.DataFrame.iterrows(self)>
So just from that you could see it is in pandas.core.frame.
This site and this one have a button with a link (source). I usually just google the method I need and add the word source

Python Django Lambda Expressions For Lists

I need to Update a List in Python which is:
data = [{' Customers ','null,blank '},{' CustomersName ','max=50,null,blank '},{' CustomersAddress ','max=150,blank '},{' CustomersActive ','Active '}]
I wanted to Write a Lambda Expression to Store the Customers, CustomersName in the List and Remove the White Spaces.
I am absolutely New to Python and Does not Have Any Knowledge!
As I see it, You have Declared the Dictionary Inside a List but the Dict is Wrong, It should be {"key":"value"}, So I assume you need to Change it to List as such:
data = [[' Customers ','null,blank '],[' CustomersName ','max=50,null,blank '],[' CustomersAddress ','max=150,blank '],[' CustomersActive ','Active ']]
And Then The Following would get you Your Desired!
data_NameExtracted = [x[0].strip() for x in data]
You can not put this inside a lambda expression, but you can use a generator object like this:
# please note that i have used tuples instead of sets,
# because sets are unordered
data = [
(' Customers ','null,blank '),
(' CustomersName ','max=50,null,blank '),
(' CustomersAddress ','max=150,blank '),
(' CustomersActive ','Active ')
]
# Indexing is not allowed for set objects
values = [item[0].strip() for item in data]
see:
https://wiki.python.org/moin/Generators
https://docs.python.org/3/tutorial/datastructures.html#sets
EDIT:
If you wan't to use dictionaries you could use something like this:
data = [
{' Customers ': 'null,blank '},
{' CustomersName ': 'max=50,null,blank '},
{' CustomersAddress ': 'max=150,blank '},
{' CustomersActive ': 'Active '}
]
# expecting a single value in the dicts
values = [item.values()[0].strip() for item in data]

Test if sentences contain smaller sentences

So I have 100 million sentences, and for each sentence I'd like to see whether it contains one of 6000 smaller sentences (matching whole words only). So far my code is
smaller_sentences = [...]
for large_sentence in file:
for small_sentence in smaller_sentences:
if ((' ' + small_sentence + ' ') in large_sentence)
or (large_sentence.startswith(small_sentence + ' ')
or (large_sentence.endswith(' ' + small_sentence):
outfile.write(large_sentence)
break
But this code runs prohibitively slowly. Do you know of a faster way to go about doing this?
Without knowing more about the domain (word/sentence length), frequency of read/write/query and specifics around the algorithm.
But, in the first instance you can switch your condition around.
This checks the whole string (slow), then the head (fast), then the tail (fast).
((' ' + small_sentence + ' ') in large_sentence)
or (large_sentence.startswith(small_sentence + ' ')
or (large_sentence.endswith(' ' + small_sentence):
This checks the head then the tail (fast), then the head (fast), then the whole string. Not huge bump in the Big-O sense, but it might add some speed if you know that the strings might be more likely at the start or finish.
(large_sentence.startswith(small_sentence + ' ')
or (large_sentence.endswith(' ' + small_sentence)
or ((' ' + small_sentence + ' ') in large_sentence)

Categories