comparing two list with different format - python

I have two list :-
influx = [u'mphhos-fnwp-010101-2',
u'mphhos-fnwp-010101-1',
u'mphhos-fnwp-010101-7',
u'mphhos-fnwp-010101-10',
u'mphhos-fnwp-010101-9',
u'mphhos-fnwp-010101-4',
u'mphhos-fnwp-010101-3',
u'mphhos-fnwp-010101-8',
u'mphhos-fnwp-010101-6',
u'mphhos-fnwp-010101-5',
u'mphhos-fnwp-010101-11']
etcd =[u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-4',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-9',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-1',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-10',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-3',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-6',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-7',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-8',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-11',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-2',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-5']
Etcd is the parent list and I want to compare influx with Etcd.
1.) I want to get all elements which are not present in the list influx and return them.
2.) How I can convert the etcd list into influx list formatting by omitting /xymon/fnwp/mphhos/
Either of the above question will get me my solution.
I tried lots of methods but I am not getting my solution as they are in different format.
I will get my answer by doing set(etcd)-set(influx) but as they are in different format I am getting all the items in the list.

str.rsplit
[x for x in etcd if x.rsplit('/', 1)[1] not in influx]
Per rafaelc's suggestion
infx = set(influx)
[x for x in etcd if x.rsplit('/', 1)[1] not in infx]

One simple solution would be to remove the prefixes
for i, char in enumerate(etcd):
char = char.replace('/xymon/fnwp/mphhos/', '')
etcd[i] = char
And then you could find the differences using set().

influx = [u'mphhos-fnwp-010101-2',
u'mphhos-fnwp-010101-1',
u'mphhos-fnwp-010101-7',
u'mphhos-fnwp-010101-10',
u'mphhos-fnwp-010101-9',
u'mphhos-fnwp-010101-4',
u'mphhos-fnwp-010101-3',
u'mphhos-fnwp-010101-8',
u'mphhos-fnwp-010101-6',
u'mphhos-fnwp-010101-5',
u'mphhos-fnwp-010101-11']
etcd =[u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-4',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-9',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-1',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-10',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-3',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-6',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-7',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-8',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-11',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-2',
u'/xymon/fnwp/mphhos/mphhos-fnwp-010101-5']
etcd = [x.replace('/xymon/fnwp/mphhos/', '') for x in etcd]
# or using regex
# etcd = [re.sub('/xymon/fnwp/mphhos/', '', x) for x in etcd]
diff = set(etcd) - set(influx)
print(diff)

Related

PySpark / Python Slicing and Indexing Issue

Can someone let me know how to pull out certain values from a Python output.
I would like the retrieve the value 'ocweeklyreports' from the the following output using either indexing or slicing:
'config': '{"hiveView":"ocweeklycur.ocweeklyreports"}
This should be relatively easy, however, I'm having problem defining the Slicing / Indexing configuation
The following will successfully give me 'ocweeklyreports'
myslice = config['hiveView'][12:30]
However, I need the indexing or slicing modified so that I will get any value after'ocweeklycur'
I'm not sure what output you're dealing with and how robust you're wanting it but if it's just a string you can do something similar to this (for a quick and dirty solution).
input = "Your input"
indexStart = input.index('.') + 1 # Get the index of the input at the . which is where you would like to start collecting it
finalResponse = input[indexStart:-2])
print(finalResponse) # Prints ocweeklyreports
Again, not the most elegant solution but hopefully it helps or at least offers a starting point. Another more robust solution would be to use regex but I'm not that skilled in regex at the moment.
You could almost all of it using regex.
See if this helps:
import re
def search_word(di):
st = di["config"]["hiveView"]
p = re.compile(r'^ocweeklycur.(?P<word>\w+)')
m = p.search(st)
return m.group('word')
if __name__=="__main__":
d = {'config': {"hiveView":"ocweeklycur.ocweeklyreports"}}
print(search_word(d))
The following worked best for me:
# Extract the value of the "hiveView" key
hive_view = config['hiveView']
# Split the string on the '.' character
parts = hive_view.split('.')
# The value you want is the second part of the split string
desired_value = parts[1]
print(desired_value) # Output: "ocweeklyreports"

In Python, how to remove items in a list based on the specific string format?

I have a Python list as below:
merged_cells_lst = [
'P19:Q19
'P20:Q20
'P21:Q21
'P22:Q22
'P23:Q23
'P14:Q14
'P15:Q15
'P16:Q16
'P17:Q17
'P18:Q18
'AU9:AV9
'P10:Q10
'P11:Q11
'P12:Q12
'P13:Q13
'A6:P6
'A7:P7
'D9:AJ9
'AK9:AQ9
'AR9:AT9'
'A1:P1'
]
I only want to unmerge the cells in the P and Q columns. Therefore, I seek to remove any strings/items in the merged_cells_lst that does not have the format "P##:Q##".
I think that regex is the best and most simple way to go about this. So far I have the following:
for item in merge_cell_lst:
if re.match(r'P*:Q*'):
pass
else:
merged_cell_lst.pop(item)
print(merge_cell_lst)
The code however is not working. I could use any additional tips/help. Thank you!
Modifying a list while looping over it causes troubles. You can use list comprehension instead to create a new list.
Also, you need a different regex expression. The current pattern P*:Q* matches PP:QQQ, :Q, or even :, but not P19:Q19.
import re
merged_cells_lst = ['P19:Q19', 'P20:Q20', 'P21:Q21', 'P22:Q22', 'P23:Q23', 'P14:Q14', 'P15:Q15', 'P16:Q16', 'P17:Q17', 'P18:Q18', 'AU9:AV9', 'P10:Q10', 'P11:Q11', 'P12:Q12', 'P13:Q13', 'A6:P6', 'A7:P7', 'D9:AJ9', 'AK9:AQ9', 'AR9:AT9', 'A1:P1']
p = re.compile(r"P\d+:Q\d+")
output = [x for x in merged_cells_lst if p.match(x)]
print(output)
# ['P19:Q19', 'P20:Q20', 'P21:Q21', 'P22:Q22', 'P23:Q23', 'P14:Q14', 'P15:Q15',
# 'P16:Q16', 'P17:Q17', 'P18:Q18', 'P10:Q10', 'P11:Q11', 'P12:Q12', 'P13:Q13']
Your list has some typos, should look something like this:
merged_cells_lst = [
'P19:Q19',
'P20:Q20',
'P21:Q21', ...]
Then something as simple as:
x = [k for k in merged_cells_lst if k[0] == 'P']
would work. This is assuming that you know a priori that the pattern you want to remove follows the Pxx:Qxx format. If you want a dynamic solution then you can replace the condition in the list comprehension with a regex match.

Transform continious date string (20190327200000000W) in date time

I'm doing an application which parse a XML from http request and one of the attributes is a date.
The problem is that the format is a string without separation, for example: '20190327200000000W' and I need to transform it into a datetime format to send it to a database.
All the information I have found is with some kind of separation char (2019-03-23 ...). Can you help me?
Thanks!!!
Maybe this? (in jupypter notebook)
from datetime import datetime
datetime_object = datetime.strptime('20190327200000000W', '%Y%m%d%H%M%S%fW')
datetime_object
Well I have solved this, at first I did that Xenobiologist said, but I had a format problem, so I decided to delete the last character (the X of %X)...and I realized that I hadn't a string, I had a list, so I transformed to string and did the operations. My code (I'll put only the inside for loop part, without the parsing part):
for parse in tree.iter(aaa):
a = parse.get(m)
respon = a.split(' ')
if m == 'Fh':
x = str(respon[0])
x2 = len(x)
x3 = x[:x2-1]
print (x3)
y = time.strptime(x3, "%Y%m%d%H%M%S%f")

How do I turn each element in a list into a string with quotes

I am using PyCharm IDE.
I frequently work with large data sets, and sometimes I have to iterate through each data.
For instance, I have a list
ticker_symbols = [500.SI, 502.SI, 504.SI, 505.SI, 508.SI, 510.SI, 519.SI...]
How do I automatically format each element into a string with quotes, i .e.
ticker_symbols = ['500.SI', '502.SI', '504.SI', '505.SI', '508.SI', '510.SI', '519.SI'...] ?
Is there a short-cut on PyCharm?
You can just do something like:
ticker_symbols = '[500.SI,502.SI,504.SI,505.SI,508.SI,510.SI,519.SI]'
print(ticker_symbols[1:-1].split(','))
Or like your string:
ticker_symbols = '[500.SI, 502.SI, 504.SI, 505.SI, 508.SI, 510.SI, 519.SI]'
print(ticker_symbols[1:-1].split(', '))
Both reproduce:
['500.SI', '502.SI', '504.SI', '505.SI', '508.SI', '510.SI', '519.SI']
You can use list comprehension:
temp_list = ["'{}'".format(x) for x in ticker_symbols]
Result in:
['500.SI', '502.SI', '504.SI',...]
This will convert your list elements to stringsticker_symbols=str(ticker_symbols[1:-1].split(', '))

Sorting with two digits in string - Python

I am new to Python and I have a hard time solving this.
I am trying to sort a list to be able to human sort it 1) by the first number and 2) the second number. I would like to have something like this:
'1-1bird'
'1-1mouse'
'1-1nmouses'
'1-2mouse'
'1-2nmouses'
'1-3bird'
'10-1birds'
(...)
Those numbers can be from 1 to 99 ex: 99-99bird is possible.
This is the code I have after a couple of headaches. Being able to then sort by the following first letter would be a bonus.
Here is what I've tried:
#!/usr/bin/python
myList = list()
myList = ['1-10bird', '1-10mouse', '1-10nmouses', '1-10person', '1-10cat', '1-11bird', '1-11mouse', '1-11nmouses', '1-11person', '1-11cat', '1-12bird', '1-12mouse', '1-12nmouses', '1-12person', '1-13mouse', '1-13nmouses', '1-13person', '1-14bird', '1-14mouse', '1-14nmouses', '1-14person', '1-14cat', '1-15cat', '1-1bird', '1-1mouse', '1-1nmouses', '1-1person', '1-1cat', '1-2bird', '1-2mouse', '1-2nmouses', '1-2person', '1-2cat', '1-3bird', '1-3mouse', '1-3nmouses', '1-3person', '1-3cat', '2-14cat', '2-15cat', '2-16cat', '2-1bird', '2-1mouse', '2-1nmouses', '2-1person', '2-1cat', '2-2bird', '2-2mouse', '2-2nmouses', '2-2person']
def mysort(x,y):
x1=""
y1=""
for myletter in x :
if myletter.isdigit() or "-" in myletter:
x1=x1+myletter
x1 = x1.split("-")
for myletter in y :
if myletter.isdigit() or "-" in myletter:
y1=y1+myletter
y1 = y1.split("-")
if x1[0]>y1[0]:
return 1
elif x1[0]==y1[0]:
if x1[1]>y1[1]:
return 1
elif x1==y1:
return 0
else :
return -1
else :
return -1
myList.sort(mysort)
print myList
Thanks !
Martin
You have some good ideas with splitting on '-' and using isalpha() and isdigit(), but then we'll use those to create a function that takes in an item and returns a "clean" version of the item, which can be easily sorted. It will create a three-digit, zero-padded representation of the first number, then a similar thing with the second number, then the "word" portion (instead of just the first character). The result looks something like "001001bird" (that won't display - it'll just be used internally). The built-in function sorted() will use this callback function as a key, taking each element, passing it to the callback, and basing the sort order on the returned value. In the test, I use the * operator and the sep argument to print it without needing to construct a loop, but looping is perfectly fine as well.
def callback(item):
phrase = item.split('-')
first = phrase[0].rjust(3, '0')
second = ''.join(filter(str.isdigit, phrase[1])).rjust(3, '0')
word = ''.join(filter(str.isalpha, phrase[1]))
return first + second + word
Test:
>>> myList = ['1-10bird', '1-10mouse', '1-10nmouses', '1-10person', '1-10cat', '1-11bird', '1-11mouse', '1-11nmouses', '1-11person', '1-11cat', '1-12bird', '1-12mouse', '1-12nmouses', '1-12person', '1-13mouse', '1-13nmouses', '1-13person', '1-14bird', '1-14mouse', '1-14nmouses', '1-14person', '1-14cat', '1-15cat', '1-1bird', '1-1mouse', '1-1nmouses', '1-1person', '1-1cat', '1-2bird', '1-2mouse', '1-2nmouses', '1-2person', '1-2cat', '1-3bird', '1-3mouse', '1-3nmouses', '1-3person', '1-3cat', '2-14cat', '2-15cat', '2-16cat', '2-1bird', '2-1mouse', '2-1nmouses', '2-1person', '2-1cat', '2-2bird', '2-2mouse', '2-2nmouses', '2-2person']
>>> print(*sorted(myList, key=callback), sep='\n')
1-1bird
1-1cat
1-1mouse
1-1nmouses
1-1person
1-2bird
1-2cat
1-2mouse
1-2nmouses
1-2person
1-3bird
1-3cat
1-3mouse
1-3nmouses
1-3person
1-10bird
1-10cat
1-10mouse
1-10nmouses
1-10person
1-11bird
1-11cat
1-11mouse
1-11nmouses
1-11person
1-12bird
1-12mouse
1-12nmouses
1-12person
1-13mouse
1-13nmouses
1-13person
1-14bird
1-14cat
1-14mouse
1-14nmouses
1-14person
1-15cat
2-1bird
2-1cat
2-1mouse
2-1nmouses
2-1person
2-2bird
2-2mouse
2-2nmouses
2-2person
2-14cat
2-15cat
2-16cat
You need leading zeros. Strings are sorted alphabetically with the order different from the one for digits. It should be
'01-1bird'
'01-1mouse'
'01-1nmouses'
'01-2mouse'
'01-2nmouses'
'01-3bird'
'10-1birds'
As you you see 1 goes after 0.
The other answers here are very respectable, I'm sure, but for full credit you should ensure that your answer fits on a single line and uses as many list comprehensions as possible:
import itertools
[''.join(r) for r in sorted([[''.join(x) for _, x in
itertools.groupby(v, key=str.isdigit)]
for v in myList], key=lambda v: (int(v[0]), int(v[2]), v[3]))]
That should do nicely:
['1-1bird',
'1-1cat',
'1-1mouse',
'1-1nmouses',
'1-1person',
'1-2bird',
'1-2cat',
'1-2mouse',
...
'2-2person',
'2-14cat',
'2-15cat',
'2-16cat']

Categories