how to split findall result which contain "," in data

how to split findall result which contain "," in data - python

x = re.findall(r'FROM\s(.*?\s)(WHERE|INNER|OUTER|JOIN|GROUP,data,re.DOTALL)
I am using above expression to parse oracle sql query and get the result.
I get multiple matches and want to print them each line by line.
How can i do that.
Some result even have "," in between them.

You can try this :
for elt in x:
print('\n'.join(elt.split(',')))
join returns a list of the comma-separated elements, which are then joined again with \n (new line). Therefore, you get one result per line.

Your result is returned in a list.
from https://docs.python.org/2/library/re.html:
re.findall(pattern, string, flags=0) Return all non-overlapping
matches of pattern in string, as a list of strings.
If you are not familiar with data structures, more information here
you should be able to easily iterate on over the returned list with a for loop:
for matchedString in x:
#replace commas
n = matchedString.replace(',','') #to replace commas
#add to new list or print, do something, any other logic
print n

Related

Split in python with character special

I split within a string traversing an array with values, this split must contain the following rule:
Split the string into two parts when there is a special character, and select the first part as a result;
SCRIPT
array = [
'srv1 #s',
'srv2;192.168.9.1'
]
result = []
for x in array:
outfinally = [line.split(';')[0] and line.split()[0] for line in x.splitlines() if line and line[0].isalpha()]
for srv in outfinally:
if srv != None:
result.append(srv)
for i in result:
print(i)
OUTPUT
srv1
srv2;192.168.9.1
DESIRED OUTPUT
srv1
srv2

This should split on any special charters and append the first part of the split to a new list:
array = [
'srv1 #s',
'srv2;192.168.9.1'
]
sep = (r'[`\-=~!##$%^&*()_+\[\]{};\'\\:"|<,./<>?]')
rest = text.split(sep, 1)[0]
new_array =[]
for i in array:
new_array.append(re.split(sep,i)[0])
Output:
['srv1 ', 'srv2']

You can split twice with the two different separators instead:
result = [s.split()[0].split(';')[0] for s in array]
result becomes:
['srv1', 'srv2']

The problem is here: line.split(';')[0] and line.split()[0]
Your second condition splits on whitespace. As a result, it'll always return the whitespace-split version unless there's a semicolon at the start of the input (in which case you get empty string).
You probably want to chain the two splits instead:
line.split(';')[0].split()[0]
To see what the code in your question is doing, take a look at what your conditional expression does in a few different cases:
array = ['srv1 s', 'srv2;192.168.9.1', ';192.168.1.1', 'srv1;srv2 192.168.1.1']
>>> for item in array:
... print("Original: {}\n\tSplit: {}".format(item, item.split(';')[0] and item.split()[0]))
...
Original: srv1 s
Split: srv1 # split on whitespace
Original: srv2;192.168.9.1
Split: srv2;192.168.9.1 # split on whitespace!
Original: ;192.168.1.1
Split: # split on special char, returned empty which is falsey, returns empty str
Original: srv1;srv2 192.168.1.1
Split: srv1;srv2 # split only on whitespace

Change
outfinally = [line.split(';')[0] and line.split()[0] for line in x.splitlines() if line and line[0].isalpha()]
To
outfinally = [line.replace(';', ' ').split()[0] for line in x.splitlines() if line and line[0].isalpha()]
When you use and like that, it will always return the first result as long as the first result is truthy. The split function returns the full string in a list when a match is not found. Since it's returning something truthy, you'll never move on to the second condition (and if you use or like I first tried to do, you'll always move on to the second condition). Instead of having 2 conditions, what you'll have to do is combine them into one. Something like line.replace(';', ' ').split()[0] or blhsing's solution is even better.

How to split a list item based on digits in item

I am currently parsing this huge rpt file. In each item there is a value in parentheses. For example, "item_number_one(3.14)". How could I extract that 3.14 using the split function in python? Or is there another way to do that?
#Splits all items by comma
items = line.split(',')
#splits items within comma, just gives name
name_only = [i.split('_')[0] for i in items]
# print(name_only)
#splits items within comma, just gives full name
full_name= [i.split('(')[0] for i in items]
# print(full_Name)
#splits items within comma, just gives value in parentheses
parenth_value = [i.split('0-9')[0] for i in items]
# parenth_value = [int(s) for s in items.split() if s.isdigit()]
print(parenth_value)
parenth_value = [i.split('0-9')[0] for i in items]

for a more general way of extracting numbers from strings, you should read about Regular Expressions.
for this very specific case, you can split by ( and then by ) to get the value in between them.
like this:
line = "item_number_one(3.14)"
num = line.split('(')[1].split(')')[0]
print(num)

You could simply find starting index of parentheses and ending parentheses, and get the area between them:
start_paren = line.index('(')
end_paren = line.index(')')
item = line[start_paren + 1:end_paren]
# item = '3.14'
Alternatively, you could use regex, which offers an arguably more elegant solution:
import re
...
# anything can come before the parentheses, anything can come afterwards.
# We have to escape the parentheses and put a group inside them
# (this is notated with its own parentheses inside the pair that is escaped)
item = re.match(r'.*\(([0-9.-]*)\).*', line).group(1)
# item = '3.14'

can use regex and do something like below;
import re
sentence = "item_number_one(3.14)"
re.findall(r'\d.+', sentence)

You could get the integer value by using the following regular expression:
import re
text = 'item_number_one(3.14)'
re.findall(r'\d.\d+', text)
o/p: ['3.14']
Explanation:
"\d" - Matches any decimal digit; this is equivalent to the class [0-9].
"+" - one or more integers
In the same way you can parse the rpt file and split the lines and fetch the value present in the parentheses .

creating a regular expression in python with variables and a comma

I am trying to match and find names exactly from one list with another list using python.
# First List
file = 'Last_First.csv'
filename = file.split('_')
last = filename[0]
first = filename[1]
search a large list of names where names are saved as Last, First
pattern = re.compile(re.escape(last+','+first))
# Second List
['63', 'Last, First', '65164345']
when i search the list line by line, i get an empty list
matches = pattern.findall(line)
printing the pattern, i get
re.compile(r'Last\,First', re.UNICODE)
how can i get rid of \ ?

The \ is an escape character that has to be there. You are getting an empty list because 'Last, First' has a space after the comma, your regular expression is not matching that space.

Return a string of country codes from an argument that is a string of prices

So here's the question:
Write a function that will return a string of country codes from an argument that is a string of prices (containing dollar amounts following the country codes). Your function will take as an argument a string of prices like the following: "US$40, AU$89, JP$200". In this example, the function would return the string "US, AU, JP".
Hint: You may want to break the original string into a list, manipulate the individual elements, then make it into a string again.
Example:
> testEqual(get_country_codes("NZ$300, KR$1200, DK$5")
> "NZ, KR, DK"
As of now, I'm clueless as to how to separate the $ and the numbers. I'm very lost.

I would advice using and looking up regex expressions
https://docs.python.org/2/library/re.html
If you use re.findall it will return you a list of all matching strings, and you can use a regex expression like /[A-Z]{2}$ to find all the two letter capital words in the list.
After that you can just create a string from the resulting list.
Let me know if that is not clear

def test(string):
return ", ".join([item.split("$")[0] for item in string.split(", ")])
string = "NZ$300, KR$1200, DK$5"
print test(string)

Use a regular expression pattern and append the matches to a string. (\w{2})\$ matches exactly 2 word characters followed by by a $.
def get_country_codes(string):
matches = re.findall(r"(\w{2})\$", string)
return ", ".join(match for match in matches)

How to Check if the substring is matching in a list of strings in Python

I have a list and I want to find if the string is present in the list of strings.
li = ['Convenience','Telecom Pharmacy']
txt = '1 convenience store'
I want to match the txt with the Convenience from the list.
I have tried
if any(txt.lower() in s.lower() for s in li):
print s
print [s for s in li if txt in s]
Both the methods didn't give the output.
How to match the substring with the list?

You could use set() and intersection:
In [19]: set.intersection(set(txt.lower().split()), set(s.lower() for s in list1))
Out[19]: {'convenience'}

I think split is your answer. Here is the description from the python documentation:
string.split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the optional second argument sep is absent or None, the words are separated by arbitrary
strings of whitespace characters (space, tab, newline, return,
formfeed). If the second argument sep is present and not None, it
specifies a string to be used as the word separator. The returned list
will then have one more item than the number of non-overlapping
occurrences of the separator in the string. If maxsplit is given, at
most maxsplit number of splits occur, and the remainder of the string
is returned as the final element of the list (thus, the list will have
at most maxsplit+1 elements). If maxsplit is not specified or -1, then
there is no limit on the number of splits (all possible splits are
made).
The behavior of split on an empty string depends on the value of sep. If sep is not specified, or specified as None, the result will be
an empty list. If sep is specified as any string, the result will be a
list containing one element which is an empty string.
Use the split command on your txt variable. It will give you a list back. You can then do a compare on the two lists to find any matches. I personally would write the nested for loops to check the lists manually, but python provides lots of tools for the job. The following link discusses different approaches to matching two lists.
How can I compare two lists in python and return matches
Enjoy. :-)

I see two things.
Do you want to find if the pattern string matches EXACTLY an item in the list? In this case, nothing simpler:
if txt in list1:
#do something
You can also do txt.upper() or .lower() if you want list case insensitive
But If you want as I understand, to find if there is a string (in the list) which is part of txt, you have to use "for" loop:
def find(list1, txt):
#return item if found, false otherwise
for i in list1:
if i.upper() in txt.upper(): return i
return False
It should work.
Console output:
>>>print(find(['Convenience','Telecom Pharmacy'], '1 convenience store'))
Convenience
>>>

You can try this,
>> list1 = ['Convenience','Telecom Pharmacy']
>> txt = '1 convenience store'
>> filter(lambda x: txt.lower().find(x.lower()) >= 0, list1)
['Convenience']
# Or you can use this as well
>> filter(lambda x: x.lower() in txt.lower(), list1)
['Convenience']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.