remove specific characters from text file string - python

I have a text file with the following line in it: ('larry', 3, 100)
I need to assign the three parts of it to different variables using a
method close to what I have below. So far I can split it down,
but I cannot remove the brackets and apostrophes from the result.
I have tried other solutions on stackover for hours to no avail...
filecontent = filename.read().strip().split(',')
for i in filecontent:
name = filecontent[0]
weeks_worked = filecontent[1]
weekly_payment = filecontent[2]
print ("name:" + name)
print ("weeks worked:" + weeks_worked)
print ("weekly payment:" + weekly_payment)
gives the result:
name:('larry'
weeks worked:3
weekly payment:100)
How do I make it show just:
name:larry
weeks worked:3
weekly payment:100

You'll want to use ast.literal_eval here, which will convert the string to a tuple:
import ast
filecontent = ast.literal_eval(filename.read().strip())
You also don't need a for-loop, you can also just do:
name, weeks_worked, weekly_payment = filecontent

You can use regex either after or before split.
filecontent = filename.read().strip().split(',')
for i in filecontent:
name = re.sub(r'\(|\)|\'','',filecontent[0])
weeks_worked = re.sub(r'\(|\)|\'','',filecontent[1])
weekly_payment = re.sub(r'\(|\)|\'','',filecontent[1])
OR I would prefer doing this
filecontent = re.sub(r'\(|\)|\'','',filename.read().strip()).split(',')
And then do your usual stuff.

Related

Delete all characters that come after a given string

how exactly can I delete characters after .jpg? is there a way to differentiate between the extension I take with python and what follows?
for example I have a link like that
https://s13emagst.akamaized.net/products/29146/29145166/images/res_cd1fa80f252e88faa70ffd465c516741.jpg10DCC3DD9E74DC1D10104F623D7E9BDC
How can I delete everything after .jpg?
I tried replacing but it didn't work
another way?
Use a forum to count strings or something like ?
I tried to get jpg files with this
for link in links:
res = requests.get(link).text
soup = BeautifulSoup(res, 'html.parser')
img_links = []
for img in soup.select('a.thumbnail img[src]'):
print(img["src"])
with open('links'+'.csv', 'a', encoding = 'utf-8', newline='') as csv_file:
file_is_empty = os.stat(self.filename+'.csv').st_size == 0
fieldname = ['links']
writer = csv.DictWriter(csv_file, fieldnames = fieldname)
if file_is_empty:
writer.writeheader()
writer.writerow({'links':img["src"]})
img_links.append(img["src"])
You could use split (assuming the string has 'jpg', otherwise the code below will just return the original url).
string = 'https://s13emagst.akamaized.net/products/29146/29145166/images/res_cd1fa80f252e88faa70ffd465c516741.jpg10DCC3DD9E74DC1D10104F623D7E9BDC'
jpg_removed = string.split('.jpg')[0]+'.jpg'
Example
string = 'www.google.com'
com_removed = string.split('.com')[0]
# com_removed = 'www.google'
You can make use of regular expression. You just want to ignore the characters after .jpg so you can some use of something like this:
import re
new_url=re.findall("(.*\.jpg).*",old_url)[0]
(.*\.jpg) is like a capturing group where you're matching any number of characters before .jpg. Since . has a special meaning you need to escape the . in jpg with a \. .* is used to match any number of character but since this is not inside the capturing group () this will get matched but won't get extracted.
You can use the .find function to find the characters .jpg then you can index the string to get everything but that. Ex:
string = https://s13emagst.akamaized.net/products/29146/29145166/images/res_cd1fa80f252e88faa70ffd465c516741.jpg10DCC3DD9E74DC1D10104F623D7E9BDC
index = string.find(".jpg")
new_string = string[:index+ 4]
You have to add four because that is the length of jpg so it does not delete that too.
The find() method returns the lowest index of the substring if it is found in given string. If its is not found then it returns -1.
str ='https://s13emagst.akamaized.net/products/29146/29145166/images/res_cd1fa80f252e88faa70ffd465c516741.jpg10DCC3DD9E74DC1D10104F623D7E9BDC'
result = str.find('jpg')
print(result)
new_str = str[:result]
print(new_str+'jpg')
See: Extracting extension from filename in Python
Instead of extracting the extension, we extract the filename and add the extension (if we know it's always .jpg, it's fine!)
import os
filename, file_extension = os.path.splitext('/path/to/somefile.jpg_corruptedpath')
result = filename + '.jpg'
Now, outside of the original question, I think there might be something wrong with how you got that piece of information int he first place. There must be a better way to extract that jpeg without messing around with the path. Sadly I can't help you with that since I a novice with BeautifulSoup.
You could use a regular expression to replace everything after .jpg with an empty string:
import re
url ='https://s13emagst.akamaized.net/products/29146/29145166/images/res_cd1fa80f252e88faa70ffd465c516741.jpg10DCC3DD9E74DC1D10104F623D7E9BDC'
name = re.sub(r'(?<=\.jpg).*',"",url)
print(name)
https://s13emagst.akamaized.net/products/29146/29145166/images/res_cd1fa80f252e88faa70ffd465c516741.jpg

regex.sub unexpectedly modifying the substituting string with some kind of encoding?

I have a path string "...\\JustStuff\\2017GrainHarvest_GQimagesTestStand\\..." that I am inserting into an existing text file in place of another string. I compile a regex pattern and find bounding text to get the location to insert, and then use regex.sub to replace it. I'm doing something like this...
with open(imextXML, 'r') as file:
filedata = file.read()
redirpath = re.compile("(?<=<directoryPath>).*(?=</directoryPath>)", re.ASCII)
filedatatemp = redirpath.sub(newdir,filedata)
The inserted text is messed up though, with "\\20" being replaced with "\x8" and "\\" replaced with "\" (single slash)
i.e.
"...\\JustStuff\\2017GrainHarvest_GQimagesTestStand\\..." becomes
"...\\JustStuff\x817GrainHarvest_GQimagesTestStand\..."
What simple thing am I missing here to fix it?
Update:
to break this down even further to copy and paste to reproduce the issue...
t2 = r'\JustStuff\2017GrainHarvest_GQimagesTestStand\te'
redirpath = re.compile("(?<=<directoryPath>).*(?=</directoryPath>)", re.ASCII)
temp = r"<directoryPath>aasdfgsdagewweags</directoryPath>"
redirpath.sub(t2,temp)
produces...
>>'<directoryPath>\\JustStuff\x817GrainHarvest_GQimagesTestStand\te</directoryPath>'
When you define the string that you want to insert, prefix it with an r to indicate that it is a raw string literal:
>>> rex = re.compile('a')
>>> s = 'path\\with\\2017'
>>> sr = r'path\\with\\2017'
>>> rex.sub(s, 'ab')
'path\\with\x817b'
>>> rex.sub(sr, 'ab')
'path\\with\\2017b'

simple way convert python string to quoted string

i'm new to python and i'm having a select statement like following help_category_id, name, what is the most effective way to convert this string to this:
'help_category_id', 'name'
i've currently done this, which works fine, but is there a nicer and more clean way to do the same:
test_string = 'help_category_id, name'
column_sort_list = []
if test_string is not None:
for col in test_string.split(','):
column = "'{column}'".format(column=col)
column_sort_list.append(column)
column_sort = ','.join(column_sort_list)
print(column_sort)
Simple one liner using looping constructs:
result = ", ".join(["'" + i + "'" for i.strip() in myString.split(",")])
What we are doing here is we are creating a list that contains all substrings of your original string, with the quotes added. Then, using join, we make that list into a comma delimited string.
Deconstructed, the looping construct looks like this:
resultList = []
for i in myString.split(","):
resultList.append("'" + i.strip() + "'")
Note the call to i.strip(), which removes extraneous spaces around each substring.
Note: You can use format syntax to make this code even cleaner:
New syntax:
result = ", ".join(["'{}'".format(i.strip()) for i in myString.split(",")])
Old syntax:
result = ", ".join(["'%s'" % i.strip() for i in myString.split(",")])
it can be achieved by this also.
','.join("'{}'".format(value) for value in map(lambda text: text.strip(), test_string.split(",")))

Python to print string from substring from list

I am a newbie to python.Consider I have a list ['python','java','ruby']
I have a textfile as:
jrubyk
knwdjavawe
weqkpythonqwe
1ruby.e
Expected output:
ruby
java
python
ruby
I need to print the strings in list hidden inside as substring.
Is there a way to obtain that?
I tend to use regular expressions when I want to strip certain substrings from larger strings. Here is an inelegant but readable way to do this.
import re
python_matcher = re.compile('python')
java_matcher = re.compile('java')
ruby_matcher = re.compile('ruby')
hidden_text_list = open('hidden.txt', 'r').readlines()
for line in hidden_text_list:
python_matched = python_matcher.search(line)
java_matched = java_matcher.search(line)
ruby_matched = ruby_matcher.search(line)
if python_matched:
print python_matched.group()
elif java_matched:
print java_matched.group()
elif ruby_matched:
print ruby_matched.group()
The brute force approach is:
hidden_strings = ['python','java','ruby']
with open('path/to/textfile/as/in/example.txt') as infile:
for line in infile:
for hidden_string in hidden_strings:
if hidden_string in line:
print(hidden_string)

Pyparsing: How can I parse data and then edit a specific value in a .txt file?

my data is located in a .txt file (no, I can't change it to a different format) and it looks like this:
varaiablename = value
something = thisvalue
youget = the_idea
Here is my code so far (taken from the examples in Pyparsing):
from pyparsing import Word, alphas, alphanums, Literal, restOfLine, OneOrMore, \
empty, Suppress, replaceWith
input = open("text.txt", "r")
src = input.read()
# simple grammar to match #define's
ident = Word(alphas + alphanums + "_")
macroDef = ident.setResultsName("name") + "= " + ident.setResultsName("value") + Literal("#") + restOfLine.setResultsName("desc")
for t,s,e in macroDef.scanString(src):
print t.name,"=", t.value
So how can I tell my script to edit a specific value for a specific variable?
Example:
I want to change the value of variablename, from value to new_value.
So essentially variable = (the data we want to edit).
I probably should make it clear that I don't want to go directly into the file and change the value by changing value to new_value but I want to parse the data, find the variable and then give it a new value.
Even though you have already selected another answer, let me answer your original question, which was how to do this using pyparsing.
If you are trying to make selective changes in some body of text, then transformString is a better choice than scanString (although scanString or searchString are fine for validating your grammar expression by looking for matching text). transformString will apply token suppression or parse action modifications to your input string as it scans through the text looking for matches.
# alphas + alphanums is unnecessary, since alphanums includes all alphas
ident = Word(alphanums + "_")
# I find this shorthand form of setResultsName is a little more readable
macroDef = ident("name") + "=" + ident("value")
# define values to be updated, and their new values
valuesToUpdate = {
"variablename" : "new_value"
}
# define a parse action to apply value updates, and attach to macroDef
def updateSelectedDefinitions(tokens):
if tokens.name in valuesToUpdate:
newval = valuesToUpdate[tokens.name]
return "%s = %s" % (tokens.name, newval)
else:
raise ParseException("no update defined for this definition")
macroDef.setParseAction(updateSelectedDefinitions)
# now let transformString do all the work!
print macroDef.transformString(src)
Gives:
variablename = new_value
something = thisvalue
youget = the_idea
For this task you do not need to use special utility or module
What you need is reading lines and spliting them in list, so first index is left and second index is right side.
If you need these values later you might want to store them in dictionary.
Well here is simple way, for somebody new in python. Uncomment lines whit print to use it as debug.
f=open("conf.txt","r")
txt=f.read() #all text is in txt
f.close()
fwrite=open("modified.txt","w")
splitedlines = txt.splitlines():
#print splitedlines
for line in splitedlines:
#print line
conf = line.split('=')
#conf[0] is what it is on left and conf[1] is what it is on right
#print conf
if conf[0] == "youget":
#we get this
conf[1] = "the_super_idea" #the_idea is now the_super_idea
#join conf whit '=' and write
newline = '='.join(conf)
#print newline
fwrite.write(newline+"\n")
fwrite.close()
Actually, you should have a look at the config parser module
Which parses exactly your syntax (you need only to add [section] at the beginning).
If you insist on your implementation, you can create a dictionary :
dictt = {}
for t,s,e in macroDef.scanString(src):
dictt[t.name]= t.value
dictt[variable]=new_value
ConfigParser
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read('example.txt')
variablename = config.get('variablename', 'float')
It'll yell at you if you don't have a [section] header, though, but it's ok, you can fake one.

Categories