python replace \n from text - python

Here is a text file sample:
'15235457345', '', '\n\nR\n\nE\nM\nO\n\nV\nE\nD\n', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '\n\nB\n\nU\n\nT\n\nH\nO\nW\n', '1445123147'
I want to remove some pieces like
\n\nR\n\nE\nM\nO\n\nV\nE\nD\n
and
\n\nB\n\nU\n\nT\n\nH\nO\nW\n
I use removed and buthow to figure out the problem, but in real practice these are other words\timestamp etc.

le = ['15235457345', '', '\n\nR\n\nE\nM\nO\n\nV\nE\nD\n', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '\n\nB\n\nU\n\nT\n\nH\nO\nW\n', '1445123147']
print [value for value in le if '\n' not in value]
Output:
['15235457345', '', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '1445123147']

s='15235457345', '', '\n\nR\n\nE\nM\nO\n\nV\nE\nD\n', '1445133666', 'nick', '', '1236500', 'git', '', '', '123face', '2015-10-18 ', '2015-10-23 ', 'name', 'great', 'sha', '\n\nB\n\nU\n\nT\n\nH\nO\nW\n', '1445123147'
for i in range(0,len(s)):
print s[i].replace('\n','')
Output:
15235457345
REMOVED
1445133666
nick
1236500
git
123face
2015-10-18
2015-10-23
name
great
sha
BUTHOW
1445123147
Hope this is what you are looking for.

Related

Requests object not filtering correctly

I'm trying to retrieve all URLs from a page using Python's Requests library. I can't figure out why my filterer is returning hundreds of items more than I am expecting. Code:
import requests
import re
r = requests.get('http://exrx.net/Lists/ExList/NeckWt', headers=headers_dict, timeout=3)
counter = 0
raw_html = r.text
listly = re.split('\"', raw_html)
for i in listly:
if "https://exrx.net" in i or "../../" in i:
pass
else:
listly.remove(i)
counter += 1
print(listly)
print('-'*5)
print('the list is now', len(listly), 'objects long')
print(counter, ' objects were removed')
print('-'*5)
The final list however contains 487 items (down from >900), including the following, which are confusingly not specified in my if / elif block.
I cannot figure out why they are not being deleted:
['en', 'Content-Type', 'text/html; charset=utf-8', '... func = ', '... func.apply: ', "----- F'D: ", '... file = ', "----- ERR'D: ", "----- F'D: ", '', 'load', '_', ' blocked = TIME DELAY!', ' blocked = ', ' blocked = ', 'markLoaded dummyfile: ', '1', "let's go", 'on', 'on', 'on', 'on', 'script', 'text/javascript', 'head', '/detroitchicago/grapefruit.gif', 'prerender', '?orig=', '&v=', '/porpoiseant/army.gif', 'compid', '0', '', 'impression', '', 'impression', 'prerender', '?orig=', '&sts=', 'domain_id', '&visit_uuid=', 'undefined', 'false', 'false', 'function', 'CustomEvent', 'false', 'false', 'content-type', 'text/html; charset=UTF-8', 'generator', 'concrete5', 'shortcut icon', 'https://exrx.net/application/files/8014/4923/2704/Runner3.jpg', 'image/x-icon', 'icon', 'https://exrx.net/application/files/8014/4923/2704/Runner3.jpg', 'image/x-icon', 'canonical', 'https://exrx.net/Lists/ExList/NeckWt', 'text/javascript', '/index.php', '/updates/concrete5-8.5.7/concrete/images', '/index.php/tools/required', 'https://exrx.net', '', 'en_US', 'text/css', 'Logo', '79715', '3471', 'text/javascript', 'https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js?ccm_nocache=1a72ca0f3692b16db9673a9a89faff0649086c52', 'text/javascript', '/updates/concrete5-8.5.7/concrete/js/ie/html5-shiv.js?ccm_nocache=1a72ca0f3692b16db9673a9a89faff0649086c52', 'text/javascript', '/updates/concrete5-8.5.7/concrete/js/ie/respond.js?ccm_nocache=1a72ca0f3692b16db9673a9a89faff0649086c52', 'text/javascript', '', 'touchstart', 'https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,700,900', 'stylesheet', 'text/css', '/application/files/cache/css/fruitful/iGotStyle.css?ts=1644387679', 'stylesheet', 'text/css', 'all', 'viewport', 'width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no', '/application/files/cache/css/fruitful/accessory.css?ts=1644387679', 'stylesheet', 'text/css', 'all', 'https://use.fontawesome.com/bf47fdcc0a.js', '', 'text/css', '', '//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js', 'ca-pub-6329449765532083', 'text/javascript', '1', 'https://exrx.net/Lists/ExList/NeckWt', 'false', 'false', 'text/javascript', 'false', 'ad_cache_level', 'ad_lazyload_version', 'ad_load_version', 'city', 'Sydney', 'country', 'AU', 'days_since_last_visit', 'domain_id', 'domain_test_group', 'engaged_time_visit', 'ezcache_level', 'ezcache_skip_code', 'form_factor_id', 'framework_id', 'is_return_visitor', 'is_sitespeed', 'last_page_load', '', 'last_pageview_id', '', 'lt_cache_level', 'metro_code', 'page_ad_positions', '', 'page_view_count', 'page_view_id', '578b3a09-c637-461b-4c42-c6c83546001c', 'position_selection_id', 'postal_code', '2000', 'pv_event_count', 'response_size_orig', 'response_time_orig', 'serverid', '54.66.141.238:27055', 'state', 'NSW', 't_epoch', 'template_id', 'time_on_site_visit', 'url', 'https://exrx.net/Lists/ExList/NeckWt', 'user_id', 'weather_precipitation', 'weather_summary', '', 'weather_temperature', 'word_count', 'worst_bad_word_level', '&ez_orig=1', 'expires=', 'ezux_lpl_107151=', '|', '|', '; ', 'complete', 'onload', 'attach_ezolpl', 'attach_ezolpl', '578b3a09-c637-461b-4c42-c6c83546001c', 'false', 'page527', 'ccm-page ccm-page-id-527 page-type-page page-template-directory-template', 'siteHeader', 'container', 'row', 'logo', 'col-xs-6 col-md-3', 'ccm-custom-style-container ccm-custom-style-logo-79715', 'https://exrx.net/', '/application/files/3114/3635/4565/logo_same_proportion_5_2_2015.gif', 'ExRx.net: Exercise Prescription on Internet', 'ccm-image-block img-responsive bID-79715', 'mainNav', 'clearfix hidden-xs hidden-sm col-sm-9', 'nav', '', 'https://exrx.net/Lists/Directory', '_self', '', '', '/Lists/Directory', '_self', '', '', '/WeightTraining/Instructions', '_self', '', '', '/Lists/Muscle', '_self', '', '', '/Lists/Articulations', '_self', '', '', '/Calculators', '_self', '', '', 'https://exrx.net/Beginning', '_self', '', '', '/Beginning', '_self', '', '', '/WeightTraining', '_self', '', '', '/Kinesiology', '_self', '', '', '/Aerobic', '_self', '', '', '/ExInfo', '_self', '', '', '/Sports', '_self', '', '', '/Bodybuilding', '_self', '', '', '/Drugs', '_self', '', '', '/Psychology', '_self', '', '', '/FatLoss', '_self', '', '', '/Nutrition', '_self', '', '', '/Testing', '_self', '', '', 'https://exrx.net/Notes/SiteJournal', '_self', '', '', '/Notes/SiteJournal', '_self', '', '', '/People/Contact', '_self', '', '', '/Notes/Feedback', '_self', '', '', '/Notes/Archive/Feedback10', '_self', '', '', '/Questions', '_self', '', '', '/forum/', '_blank', '', '', '/Links', '_self', '', '', '/Abstracts', '_self', '', '', '/Journals', '_self', '', '', '/Videos', '_self', '', '', '/Talks', '_self', '', '', '/Notes/Donations', '_self', '', '', 'https://exrx.net/Store', '_self', '', 'mobileAssets', 'col-xs-6 visible-xs-block visible-sm-block text-right', 'icoMobileNav', 'fa fa-bars', 'text/javascript', '/packages/fruitful/themes/fruitful/js/initExRx.js', 'headerShell', 'container', 'row', 'col-sm-12', 'fruitful-page-title fruitfull-title-padding', 'page-title', 'row Breadcrumb-Container Add-Margin-Top', 'container', 'col-sm-9', 'http://exrx.net', '../Directory', 'col-sm-3', 'google_translate_element', 'text/javascript', 'mainShell', 'container ', 'row', 'col-sm-12', 'ccm-custom-style-container ccm-custom-style-directorytopadvertise-86906 Add-Margin-Bottom', 'ezoic-pub-ad-placeholder-103', '', '//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js', 'adsbygoogle', 'display:block; height:90px;', 'ca-pub-6329449765532083', '4409668012', 'container', 'row', 'col-sm-12', 'Sternocleidomastoid', '../../Muscles/Sternocleidomastoid', 'container', 'row', 'col-sm-12', 'row', 'col-sm-6', '../../WeightExercises/Sternocleidomastoid/CBNeckFlx', '../../WeightExercises/Sternocleidomastoid/CBNeckFlxBelt', '../../WeightExercises/Sternocleidomastoid/CBNeckRotationBelt', '../../WeightExercises/Sternocleidomastoid/CBNeckLtrFlxBelt', '_top', '../../WeightExercises/Sternocleidomastoid/LVNeckFlexionH', '_top', '../../WeightExercises/Sternocleidomastoid/LVLateralNeckFlexionH', '_top', '../../WeightExercises/Sternocleidomastoid/LVNeckFlx', '_top', '../../WeightExercises/Sternocleidomastoid/LVNeckLtrFlx', '_top', '../../WeightExercises/Sternocleidomastoid/WtLyingNeckFlexion', '../../WeightExercises/Sternocleidomastoid/WtNeckFlx', '_top', '../../WeightExercises/Sternocleidomastoid/WtNeckLateralFlex', '_top', 'col-sm-6', '../../WeightExercises/Sternocleidomastoid/BWFrontNeckBridge', '../../WeightExercises/Sternocleidomastoid/BWWallFrontNeckBridge', '../../WeightExercises/Sternocleidomastoid/BWWallSideNeckBridge', '../../Stretches/Sternocleidomastoid/NeckRetraction', '../../Stretches/Sternocleidomastoid/NeckRotation', 'https://exrx.net/WeightExercises/Sternocleidomastoid/STNeckFlexion', 'https://exrx.net/WeightExercises/Sternocleidomastoid/STNeckLateralFlexion', '', '//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js', 'adsbygoogle', 'display:inline-block;width:300px;height:250px', 'ca-pub-6329449765532083', '2861896011', 'container', 'row', 'col-sm-12', 'Splenius', '../../Muscles/Splenius', 'container', 'row', 'col-sm-12', 'row', 'col-sm-6', '../../WeightExercises/Splenius/CBNeckExt', '_top', '../../WeightExercises/Splenius/CBNeckExtBelt', '../../WeightExercises/Splenius/LVNeckExtentionH', '../../WeightExercises/Splenius/LVNeckExt', '_top', '../../WeightExercises/Splenius/WtLyingNeckExtension', '../../WeightExercises/Splenius/WtNeckExtension', '../../WeightExercises/Splenius/WtNeckExt', '_top', '../../WeightExercises/Splenius/WtNeckHarnessExt', '#Sternocleidomastoid', 'col-sm-6', 'https://exrx.net/WeightExercises/Splenius/BRNeckRetraction', '../../WeightExercises/Splenius/BWRearNeckBridge', '../../WeightExercises/Splenius/BWWallRearNeckBridge', '../../WeightExercises/Splenius/LyingIsometricNeckRetr', '../../Stretches/Splenius/Neck', 'https://exrx.net/WeightExercises/Splenius/STNeckExtension', '../../Stretches/ErectorSpinae/Plow', 'WaistWt#Erector', 'container', 'row', 'col-sm-12', 'BackWt', 'BackWt#UpperTrap', 'WaistWt', 'WaistWt#Erector', 'container', 'row', 'col-sm-12 Add-Margin-Top', 'container', 'subfooter no-print', 'text-align: center;', 'text-align: center;', '../../Lists/Directory', '../../Notes/Notes', '_parent', 'site-footer', 'container ', 'row', 'copyright', 'col-xs-12 col-sm-3', 'col-xs-12 col-sm-9', 'margin:0px !important', 'https://exrx.net/People/Contact', 'https://exrx.net/Notes/Privacy', 'https://exrx.net/Notes/Legal', 'https://exrx.net/Notes/ADA', 'https://www.facebook.com/pages/ExRxnet/1685475628344232', 'https://exrx.net/Notes/Feedback', 'ajax', 'https://exrx.net/Notes/Archive/Feedback1', 'https://exrx.net/Store', 'amzn-assoc-ad-d457ebf0-12d4-46d4-a3f1-6d2aa75f0d88', '', '//z-na.amazon-adsystem.com/widgets/onejs?MarketPlace=US&adInstanceId=d457ebf0-12d4-46d4-a3f1-6d2aa75f0d88', '/packages/fruitful/themes/fruitful/js/functions.js', 'text/javascript', '', '/packages/fruitful/themes/fruitful/js/bootstrap.min.js', 'text/javascript', '', 'text/javascript', '', '#mainNav', 'body', 'id', 'mobileNav', 'visible-xs-block visible-sm-block', 'hidden-xs hidden-sm', '#icoMobileNav', '.ccm-page, #mobileNav', 'slideOver', 'text/javascript', '/updates/concrete5-8.5.7/concrete/js/picturefill.js?ccm_nocache=1a72ca0f3692b16db9673a9a89faff0649086c52', 'exrx_net', 'audins.js', '__ez.script.add', '//go.ezoic.net/detroitchicago/audins.js?cb=195-3', 'display:none;', '//pixel.quantserve.com/pixel/p-31iz6hfFutd16.gif?labels=Domain.exrx_net,DomainId.107151', '0', '1', '1', 'Quantcast', 'text/javascript', 'false']
Take a look as BeautifulSoup, the main Python web scraping library. The best way imo to get all the links on the page is by doing something like:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
req = Request("http://exrx.net/Lists/ExList/NeckWt")
page_source = urlopen(req)
soup = BeautifulSoup(page_source , "lxml")
links = []
for link in soup.findAll('a'):
links.append(link.get('href'))
This would get all the links on the page without you manually having to deal with manually parsing the HTML of the page.
Generally, you are not permitted to remove elements from a list while iterating through it, which you are doing in your for loop. Instead, try adding the desired elements in another list, or use list compression.
Example of list comprehension:
listly = [s for s in listly if "https://exrx.net" in s or "../../" in listly]

Iterating over list with while sentence and values not being excluded with list.remove

I'm running a code to clean a database. Basically, if some value appears in a list they should be removed.
Below you can see the code:
pattern = re.compile("((?:\d{10}|\d{9}|\d{8}|\d{7}|\d{6}|\d{5}|\d{4})(?:-?[\d]))?(?!\S)")
cc = pattern.findall(a)
print("cpf:", cpf)
print("ag:", ag)
print("cc start:",cc)
for i in cc:
print("i:",i)
try:
while i in ag: cc.remove(i)
except:pass
try:
while i in cpf:cc.remove(i)
except:pass
try:
while "" in i:cc.remove(i)
except:pass
print("final cc:",cc)
It prints in my screen the following:
cpf: ['00770991092']
ag: 3527
cc start: ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '00770991092', '', '', '', '', '', '', '', '', '01068651-0', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
i:
i: 01068651-0
final cc: ['00770991092']
Well, the '' values are removed, that's seem to be working fine. However since '00770991092' is a value inside cpf it should've been removed, but it hasn't. In the "final cc" that's the value I'm getting and it should be '01068651-0'.
Even If I run this check:
if cc in cpf:print(True)
It confirms it is True.
What am I missing?
PS.: I find quite intriguing that when I print(i) inside the for sentence only two values show up (and one is empty).
Modifying a list as you're iterating over it doesn't work very well. Is building a new list an option? Something like:
filtered_cc = [
i for i in cc
if not (i in ag or i in cpf or i == "")
]

How do I split a name in a list then reinsert the split items?

I have lists inside the csv list:
newlist = [
['id', 'name', 'lastContactedTime', 'email', 'phone_phones', 'home_phones', 'mobile_phones', 'work_phones', 'fax_phones', 'other_phones', 'address_1', 'address_2', 'address_3', 'city', 'state', 'postal_code', 'country', 'tags'],
['12-contacts', 'Courtney James', '', 'courtney#forlanchema.com', '+1 3455463849', '', '', '', '', '', '654 Rodney Franklin street', '', '', 'Birmingham', 'AL', '45678', 'US', ''],
['4-contacts', 'Joe Malcoun', '2019-08-13 14:41:12', 'ceo#nutshell.com', '', '', '', '', '', '', '212 South Fifth Ave', '', '', 'Ann Arbor', 'MI', '48103', 'US', ''],
['8-contacts', 'Rafael Acosta', '', 'racosta#forlanchema.com', '+1 338551534', '', '', '', '', '', '13 Jordan Avenue SW', '', '', 'Birmingham', 'AL', '45302', 'US', '']
]
I want to create a recurring event where I split the names like: "Courtney James" in each list and add it to a new list.
I have tried to split and append each name separately using a while loop to a list but it did not work out
#Splitting an item in the list and adding it to a new list
m = 1
while newlist[m][1] != None:
splitter = newlist[m][1].split()
namelist = splitter
m+1
print(namelist)
else:
break
I get errors or the code does not compile. I expect the names to be split and added to a new list.
My desired output would be recurring lists to be able to add it to a new excel worksheet using xlsxwriter:
Headers= ['Lastname','First name','Company','Title','Willing To share', 'Willing to introduce', 'Work phone', 'Work email', 'Work street', 'Work City', ' Work State', 'Work Zip', 'Personal Street', 'Personal City', 'Personal State', 'Personal Zip', 'Mobile Phone', 'Personal email', 'Note', 'Note Category']
List1= ['Doe1', 'John', 'company1', 'CIO', 'Yes', 'Yes', '999-999-999', 'email#email.com', '123 work street', 'workville', 'IL', '12345', '1234 personal street', 'peronville', 'Il', '12345', '999-999-999', 'personemail#email.com', 'public note visible to everyone', 'Public']
List2=
List3=
When you split each name, you get a list such as [first_name, last_name]. Assuming you wanted to build up a "list of these lists", then you want to do the following using your code as a basis:
namelist = [] # new, empty list
for i in range(1, len(newlist)):
names = newlist[i][1].split() # this yields [first_name, last_name]
#print(names)
namelist.append([names[1], names[0]]) # [last_name, first_name]
range(1, len(newlist)) generates the numbers 1, 2, ... length of newlist - 1
namelist.append([names[1], names[0]]) appends the split names to our new list
The result:
[['James', 'Courtney'], ['Malcoun', 'Joe'], ['Acosta', 'Rafael']]
What you are looking for is a more complicated list with other elements in it. But at least the above code shows how to properly loop through your original list.

How to know index of a decimal value in a python list

I have a list like the following
['UIS', '', '', '', '', '', '', '', '', '02/05/2014', 'N', '', '', '', '', '9:30:00', '', '', '', '', '', '', '', '', '31.8000', '', '', '', '', '', '', '3591', 'O', '', '', '', '', '0', '', '', '', '', '', '', '', '', '', '', '', '', '', '0']
Now how to know which element is decimal here , basically I want to track the 31.8000 value from the list. Is it possible ?
You can reliably find if a variable has a floating point number or not, by literal evaluating and checking if it is of type float, like this
from ast import literal_eval
result = []
for item in data:
temp = ""
try:
temp = literal_eval(item)
except (SyntaxError, ValueError):
pass
if isinstance(temp, float):
result.append(item)
print result
# ['31.8000']
If you want to get the indexes, just enumerate the data like this
for idx, item in enumerate(data):
...
...
and while preparing the result, add the index instead of the actual element
result.append(idx)
Iterate over the list and check if float() succeeds:
floatables = []
for i,item in enumerate(data):
try:
float(item)
floatables.append(i)
except ValueError:
pass
print floatables
Alternatively, if you want to match the decimal format you can use
import re
decimals = []
for i,item in enumerate(data):
if re.match("^\d+?\.\d+?$", item) is not None:
decimals.append(i)
print decimals
Using a list comprehension and a regular expression match:
>>> import re
>>> [float(i) for i in x if re.match(r'^[+-]\d+?[.]\d+$',i)]
[31.8]
If you want to tracking the indexes of the floats:
>>> [x.index(i) for i in x if re.match(r'[+-]?\d+?[.]\d+',i)]
[24]
data = ['UIS', '', '', '', '', '', '', '', '', '02/05/2014', 'N', '', '', '', '', '9:30:00', '', '', '', '', '', '', '', '', '31.8000', '', '', '', '', '', '', '3591', 'O', '', '', '', '', '0', '', '', '', '', '', '', '', '', '', '', '', '', '', '0']
import decimal
target = decimal.Decimal('31.8000')
def is_target(input):
try:
return decimal.Decimal(input) == target
except decimal.InvalidOperation, e:
pass
output = filter( is_target, data)
print output

Python array deleting items

I have array
a=['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '151 ihi Chun', '151 ihi Chun', '149 st Hg', '149 st Hg', '125 Tatane', '125 Tatane', '174 Sunnygat', '174 Sunnygat', '174 Sunnygat', '126 Nank', '126 Nank', '162 Rass', '162 Rass']
I want to remove all '' objects, but cant.
a.remove('')
or while a.index(''): a.remove('')
Are don't help..
Use a filter() call with None as the filter (tests for truth, so non-emptyness):
a = filter(None, a)
or a list comprehension:
a = [e for e in a if e]
If you need to explicitly allow other 'false' values and only want to filter out empty strings, use:
a = [e for e in a if e != '']
If those items are actually '', in other words, empty strings, then you can use the following:
a = [x for x in a if x]
Since an empty string evaluates to false when used in a truth testing statement.
try
for i in a:
a.remove('')
a.remove('')
i am also not sure why in first time it's not removing all but in second time sure it removes all the blank

Categories