List of List to Key-Value Pairs - python

I have a string which is semicolon delimited and then space delimited:
'gene_id EFNB2; Gene_type cDNA_supported; transcript_id EFNB2.aAug10; product_id EFNB2.aAug10;'
I want to create a dictionary in one line by splitting based on the delimiters but so far I can only get to a list of lists:
filter(None,[x.split() for x in atts.split(';')])
Which gives me:
[['gene_id', 'EFNB2'], ['Gene_type', 'cDNA_supported'], ['transcript_id', 'EFNB2.aAug10'], ['product_id', 'EFNB2.aAug10']]
When what I want is:
{'gene_id': 'EFNB2', 'Gene_type': 'cDNA_supported', 'transcript_id': 'EFNB2.aAug10', 'product_id': 'EFNB2.aAug10'}
I have tried:
filter(None,{k:v for k,v in x.split() for x in atts.split(';')})
but it gives me nothing. Anybody know how to accomplish this?

You are very close now, you can just call dict on your list of lists:
>>> lst = [['gene_id', 'EFNB2'], ['Gene_type', 'cDNA_supported'], ['transcript_id', 'EFNB2.aAug10'], ['product_id', 'EFNB2.aAug10']]
>>> dict(lst)
{'Gene_type': 'cDNA_supported',
'gene_id': 'EFNB2',
'product_id': 'EFNB2.aAug10',
'transcript_id': 'EFNB2.aAug10'}

Related

How to split a list of lists dynamically with next line separator

My list of lists looks like this:
list_of_lists = [['England', '90.0%'], ['Scotland', '10.0%']]
I would like to have this output:
England, 90.0%
Scotland, 10.0%
I have tried unpacking the list of lists and printing using the following:
a,b = list_of_lists
print(a,'\n',b)
but I would like to print them dynamically based on the length of the list_of_lists. So if my list_of_lists is len(x) then I want to print(x[0],'\n\',...,x[i])
You can join the elements per row using ', ' as separator, then join the rows with a newline and print:
list_of_lists = [['England', '90.0%'], ['Scotland', '10.0%']]
print('\n'.join(map(', '.join, list_of_lists)))
output:
England, 90.0%
Scotland, 10.0%
Simple for loop?
for row in list_of_lists:
print(', '.join(row))
You could use unpacking of a generator that formats each entry and use a new line as the print separator:
print(*(f'{c}, {p}' for c,p in list_of_lists),sep='\n')
This supports different data types and gives you full control over the order, spacing, alignment and delimiters
If your data is only made up of strings that you want to separate with commas, you can use join instead of a format string:
print(*map(', '.join,list_of_lists),sep='\n') # only works with string data

How to split parts of a string in a list based on predefined part of in the string

Plese help me with below question
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net',
'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman',
'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']
You can always split each string in the list with '.' and get a new list. In this case, if you are only interested in the first split, you should use the second argument in the split method (which tells the occurrence):
first_list =[x.split('.')[0] for x in sample_list]
For the second list:
second_list =[x.split('.',1)[1] for x in sample_list]
A better way is to iterate only once through the sample_list and get both the lists. As shown below:
first_list, second_list = zip(* [x.split('.',1) for x in sample_list])
Using a list comprehension along with split:
sample_list = ['Ironman.googlesuite.net', 'Hulk.googlekey.net',
'Thor.googlestream.net', 'Antman.googled.net', 'Loki.googlesuite.net',
'Captain.googlekey.net']
result_list1 = [i.split('.')[0] for i in sample_list]
print(result_list1)
This prints:
['Ironman', 'Hulk', 'Thor', 'Antman', 'Loki', 'Captain']
This strategy is to retain, for each input domain, just the component up to, but not including, the first dot separator. For the second list, we can use re.sub here:
result_list2 = [re.sub(r'^[^.]+\.', '', i) for i in sample_list]
print(result_list2)
This prints:
['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']
thank you for the answers, it does help but what if I have list like this:
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net', 'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman', 'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']

Python list extend, append every extend value with new line

Please help, I am using extend list to append multiple values to list.
I need to extend to list as a new line for every extend.
>>> list1 = []
>>> list1 = (['Para','op','qa', 'reason'])
>>> list1.extend(['Power','pass','ok', 'NA'])
>>> print list1
['Para', 'op', 'qa', 'reason', 'Power', 'pass', 'ok', 'NA']
I need to provide this list to csv and It has to print like two lines.
Para, op, qa, reason
Power, pass, ok, NA
If you wanted separate lists, make them separate. Don't use list.extend(), use appending:
list1 = [['Para','op','qa', 'reason']] # brackets, creating a list with a list
list1.append(['Power','pass','ok', 'NA'])
Now list1 is a list with two objects, each itself a list:
>>> list1
[['Para', 'op', 'qa', 'reason'], ['Power', 'pass', 'ok', 'NA']]
If you are using the csv module to write out your CSV file, use the csvwriter.writerows() method to write each row into a separate line:
>>> import csv
>>> import sys
>>> writer = csv.writer(sys.stdout)
>>> writer.writerows(list1)
Para,op,qa,reason
Power,pass,ok,NA
Your desired result, list1, should be a list of two elements, that each one of them is a list by itself.
list1 = ['Para','op','qa', 'reason']
# wrapping list1 with [] crates a new list which its first element is the original list1.
# In your case, this action gives a list of lines with only one single line
# Only after that I can add a new list of lines that contains another single line
list1 = [list1] + [['Power','pass','ok', 'NA']]
print (list1)

Creating RDD from input data with repeated delimiters - Spark

I have input data as key value pairs with pipe delimitation as below, some of values contain delimiters in its fields.
key1:value1|key2:val:ue2|key3:valu||e3
key1:value4|key2:value5|key3:value6
Expected output is below.
value1|val:ue2|valu||e3
value4|value5|value6
i tried as below to create RDD,
rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple)
Above mapping works when we don't have these delimiters in the input value fields as below.
key1:value1|key2:value2|key3:value3
key1:value4|key2:value5|key3:value6
And also i tried regex as below,
rdd=sc.textFile("path").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")]).map(tuple)
Input data without delimiters
key1:value1|key2:value2|key3:value3
key1:value4|key2:value5|key3:value6
>>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")])
>>> rdd.collect()
[(u'value1', u'value2', u'value3'), (u'value4', u'value5', u'value6')]
Input data with delimiters
key1:value1|key2:val:ue2|key3:valu||e3
key1:value4|key2:value5|key3:value6
Without regex
>>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("|")]).map(tuple)
>>> rdd.collect()
Error: IndexError: list index out of range
with regex
>>> rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1] for x in l.split("((?<!\|)\|(?!\|))")).map(tuple)
>>> rdd.collect()
[(u'value1|key2'), (u'value4|key2')]
How can i achieve below result from the input?
[(u'value1', u'val:ue2', u'valu||e3'), (u'value4', u'value5', u'value6')]
From this i will create dataframe do some processing.
Any suggestions from pure python also welcome. Thanks in Advance!
Here is the solution:
The main issue is l.split() works for fixed delimiter only.
rdd=sc.textFile("testcwp").map(lambda l: [x.split(":")[1:] for x in re.split("((?<!\|)\|(?!\|))",l)]).map(tuple)
>>> rdd.collect()
[([u'value1'], [u'val', u'ue2'], [u'val||ue3']), ([u'value4'], [u'value5'], [u'value6'])]
Following RDD concatenates elements inside lists,
>>> rdd2=rdd.map(lambda l: ['|'.join(x) for x in l]).map(tuple)
>>> rdd2.collect()
[(u'value1', u'value2', u'val||ue3'), (u'value4', u'value5', u'value6')]

sorting a list and separating the different features

So I am given a list and I am supposed to sort it down into two lists, one with the names of the companies and one with the prices in a nested list.
['Acer 481242.74\n', 'Beko 966071.86\n', 'Cemex 187242.16\n', 'Datsun 748502.91\n', 'Equifax 146517.59\n', 'Gerdau 898579.89\n', 'Haribo 265333.85\n']
I used the following code to separate the names properly:
print('\n'.join(data))
namelist = [i.split(' ', 1)[0] for i in data]
print(namelist)
But now it wants me to seperate all the prices from the list and put them in a single list nested together and I don't know how to do that.
To build two separate lists, just use a regular loop:
names = []
prices = []
for entry in data:
name, price = entry.split()
names.append(name)
prices.append(price)
If you needed the entries together in one list, each entry a list containing the name and the price separately, just split in a list comprehension like you did, but don't pick one or the other value from the result:
names_and_prices = [entry.split() for entry in data]
I used str.split() without arguments to split on arbitrary whitespace. This assumes you always have exactly two entries in your strings. You can still limit the split, but then use None as the first argument, and strip the line beforehand to get rid of the \n separately:
names_and_prices = [entry.strip().split(None, 1) for entry in data]
Demo for the 'nested' approach:
>>> data = ['Acer 481242.74\n', 'Beko 966071.86\n', 'Cemex 187242.16\n', 'Datsun 748502.91\n', 'Equifax 146517.59\n', 'Gerdau 898579.89\n', 'Haribo 265333.85\n']
>>> [entry.split() for entry in data]
[['Acer', '481242.74'], ['Beko', '966071.86'], ['Cemex', '187242.16'], ['Datsun', '748502.91'], ['Equifax', '146517.59'], ['Gerdau', '898579.89'], ['Haribo', '265333.85']]
split() is the right approach, as it will give you everything you need if you don't limit it to just one split (the , 1) in your code). If you provide no arguments to it at all, it'll split on any size of whitespace.
>>> data = ['Acer 481242.74\n', 'Beko 966071.86\n', 'Cemex 187242.16\n', 'Datsun 748502.91\n', 'Equifax 146517.59\n', 'Gerdau 898579.89\n', 'Haribo 265333.85\n']
>>> nested_list = [i.split() for i in data]
>>> nested_list
[['Acer', '481242.74'], ['Beko', '966071.86'], ['Cemex', '187242.16'], ['Datsun', '748502.91'], ['Equifax', '146517.59'], ['Gerdau', '898579.89'], ['Haribo', '265333.85']]
>>> print(*nested_list, sep='\n')
['Acer', '481242.74']
['Beko', '966071.86']
['Cemex', '187242.16']
['Datsun', '748502.91']
['Equifax', '146517.59']
['Gerdau', '898579.89']
['Haribo', '265333.85']

Categories