Verifying multiple words only found in a string in python - python

I have some data
data = ['vancouver', 'paris', 'montreal']
string = 'montrealvancouverparis'
I am checking only those strings in the data found in a string only. I am intending to find the words of data one by one in the string variable. So, that in the end, it will give me the empty string meaning all words of data found in the string variable.
So far, I have tried this but no luck
found_words = []
for i in range(len(data)):
if data[i] in string:
found_words.append(string.replace(data[i], ''))
found_words
Output: ['montrealparis', 'montrealvancouver', 'vancouverparis']
Desired Output: string = ''
Thanks

Your variables are a mess – What is var ? Looks like it should be data[i] instead of 'var[i]'. And data_new is not defined either. This should work:
data = ['vancouver', 'paris', 'montreal']
string = 'montrealvancouverparis'
buffer = []
for i in range(len(data)):
if data[i] in string:
buffer.append(string.replace(data[i], ''))
print(buffer)
['montrealparis', 'montrealvancouver', 'vancouverparis']
This give you the output you stated. If you instead want to end up with the empty string, you need to update the string you are searching each time to remove the data that was found previously:
for i in range(len(data)):
if data[i] in string:
string = string.replace(data[i], '')
print(f"Final String: '{string}'")
Final String: ''

IIUC, you can just use list comprehension:
>>> [word for word in data if word in string]
['vancouver', 'paris', 'montreal']

data = ['vancouver', 'paris', 'montreal']
string = 'montrealvancouverparis'
found_words = []
for i in range(len(data)):
if data[i] in string:
found_words.append(data[i])
found_words

Related

Extract numeric values from a string for python

I have a string with contains numeric values which are inside quotes. I need to remove numeric values from these and also the [ and ]
sample string: texts = ['13007807', '13007779']
texts = ['13007807', '13007779']
texts.replace("'", "")
texts..strip("'")
print texts
# this will return ['13007807', '13007779']
So what i need to extract from string is:
13007807
13007779
If your texts variable is a string as I understood from your reply, then you can use Regular expressions:
import re
text = "['13007807', '13007779']"
regex=r"\['(\d+)', '(\d+)'\]"
values=re.search(regex, text)
if values:
value1=int(values.group(1))
value2=int(values.group(2))
output:
value1=13007807
value2=13007779
You can use * unpack operator:
texts = ['13007807', '13007779']
print (*texts)
output:
13007807 13007779
if you have :
data = "['13007807', '13007779']"
print (*eval(data))
output:
13007807 13007779
The easiest way is to use map and wrap around in list
list(map(int,texts))
Output
[13007807, 13007779]
If your input data is of format data = "['13007807', '13007779']" then
import re
data = "['13007807', '13007779']"
list(map(int, re.findall('(\d+)',data)))
or
list(map(int, eval(data)))

x.split has no effect

For some reason x.split(':', 1)[-1] doesn't do anything. Could someone explain and maybe help me?
I'm trying to remove the data before : (including ":") but it keeps that data anyway
Code
data = { 'state': 1, 'endTime': 1518852709307, 'fileSize': 000000 }
data = data.strip('{}')
data = data.split(',')
for x in data:
x.split(':', 1)[-1]
print(x)`
Output
"state":1
"endTime":1518852709307
"fileSize":16777216
It's a dictonary, not a list of strings.
I think this is what you're looking for:
data = str({"state":1,"endTime":1518852709307,"fileSize":000000}) #add a str() here
data = data.strip('{}')
data = data.split(',')
for x in data:
x=x.split(':')[-1] # set x to x.split(...)
print(x)
The script below prints out:
1
1518852709307
0
Here is a one-liner version:
print (list(map(lambda x:x[1],data.items())))
Prints out:
[1, 1518852709307, 0]
Which is a list of integers.
Seems like you just want the values in the dictionary
data = {"state":1,"endTime":1518852709307,"fileSize":000000}
for x in data:
print(data[x])
I'm not sure, but I think it's because the computer treats "state" and 1 as separate objects. Therefore, it is merely stripping the string "state" of its colons, of which there are none.
You could make the entire dictionary into a string by putting:
data = str({ Your Dictionary Here })
then, print what you have left in for "for x in data" statement like so:
for x in data:
b = x.split(':', 1)[-1] # creating a new string
print(b)
data in your code is a dictionary. So you can just access your the values of it like data[state] which evaluates to 1.
If you get this data as a string like:
data = "{'state':1, 'endTime':1518852709307, 'fileSize':000000}"
You could use json.loads to convert it into a dictionary and access the data like explained above.
import json
data = '{"state":1, "endTime":1518852709307, "fileSize":0}'
data = json.loads(data)
for _,v in data.items():
print(v)
If you want to parse the string yourself this should work:
data = '{"state":1,"endTime":1518852709307,"fileSize":000000}'
data = data.strip('{}')
data = data.split(',')
for x in data:
x=x.split(':')[-1]
print(x)

Can't get rid of hex characters

This program makes an array of verbs which come from a text file.
file = open("Verbs.txt", "r")
data = str(file.read())
table = eval(data)
num_table = len(table)
new_table = []
for x in range(0, num_table):
newstr = table[x].replace(")", "")
split = newstr.rsplit("(")
numx = len(split)
for y in range(0, numx):
split[y] = split[y].split(",", 1)[0]
new_table.append(split[y])
num_new_table = len(new_table)
for z in range(0, num_new_table):
print(new_table[z])
However the text itself contains hex characters such as in
('a\\xc4\\x9fr\\xc4\\xb1[Verb]+[Pos]+[Imp]+[A2sg]', ':', 17.6044921875)('A\\xc4\\x9fr\\xc4\\xb1[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]', ':', 11.5615234375)
I'm trying to get rid of those. How am supposed to do that?
I've looked up pretty much everywhere and decode() returns an error (even after importing codecs).
You could use parse, a python module that allows you to search inside a string for regularly-formatted components, and, from the components returned, you could extract the corresponding integers, replacing them from the original string.
For example (untested alert!):
import parse
# Parse all hex-like items
list_of_findings = parse.findall("\\x{:w}", your_string)
# For each item
for hex_item in list_of_findings:
# Replace the item in the string
your_string = your_string.replace(
# Retrieve the value from the Parse Data Format
hex_item[0],
# Convert the value parsed to a normal hex string,
# then to int, then to string again
str(int("0x"+hex_item[0]))
)
Obs: instead of "int", you could convert the found hex-like values to characters, using chr, as in:
chr(hex_item[0])

Python adding to the list

I have to strip whitespace for extracted strings, one string at a time for which I'm using split(). The split() function returns list after removing white spaces. I want to store this in my own dynamic list since I have to aggregate all of the strings.
The snippet of my code:
while rec_id = "ffff"
output = procs.run_cmd("get sensor info", command)
sdr_li = []
if output:
byte_str = output[0]
str_1 = byte_str.split(' ')
for byte in str_1:
sdr_li.append(byte)
rec_id = get_rec_id()
Output = ['23 0a 06 01 52 2D 12']
str_1 = ['23','0a','06','01','52','2D','12']
This does not look very elegant, transferring from one list to another. Is there another way to achieve this.
list.extend():
sdr_li.extend(str_1)
str.split() returns you a list so just add your list's items to the main list. Use extend https://docs.python.org/2/tutorial/datastructures.html
so rewriting your data into something legible and properly indented you'd get:
my_list = list
while rec_id = "ffff"
output = procs.run_cmd("get sensor info", command)
if output:
result_string = output[0]
# extend my_list with the list resulting from the whitespace
# seperated tokens of the output
my_list.extend( result_string.split() )
pass # end if
rec_id = get_rec_id()
...
pass # end while

how to remove \t from result

I have parsed a web page and written all the links into a csv file; when I try to read these links from the csv I am getting this:
[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],
['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]
\t is coming after each and every letter ,i have tried this to remove \t from the result but no luck
Here is my code
out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[row[1]] for row in data]
new_data = new_data.strip('\t\n\r')
print new_data
This is giving an error
AttributeError: 'list' object has no attribute 'strip'
You can use the re.sub function for easy substition in strings:
import re
string = "[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts \tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]"
new_string = re.sub(r'\t', '', string)
print new_string
======= OUTPUT:
[['http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011'], ['http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011']]
Notice that the strip method only remove blank characters from both end of string.
Try the below approach:
out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[''.join(row[0].strip().split('\t'))] for row in data]
print new_data
as a kludge solution:
x = [['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]
for y in x:
for z in y:
print("".join(z.split('\t')))
Returns:
> http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011
> http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011
you need to index to the string then do a simple replace
string = [[...],[...]...]
lst = []
for ylst in string:
for ln in ylst:
lst.append(ln.replace('\t',''))
lst will contain each line without the '\t's

Categories