Python list extend, append every extend value with new line - python

Please help, I am using extend list to append multiple values to list.
I need to extend to list as a new line for every extend.
>>> list1 = []
>>> list1 = (['Para','op','qa', 'reason'])
>>> list1.extend(['Power','pass','ok', 'NA'])
>>> print list1
['Para', 'op', 'qa', 'reason', 'Power', 'pass', 'ok', 'NA']
I need to provide this list to csv and It has to print like two lines.
Para, op, qa, reason
Power, pass, ok, NA

If you wanted separate lists, make them separate. Don't use list.extend(), use appending:
list1 = [['Para','op','qa', 'reason']] # brackets, creating a list with a list
list1.append(['Power','pass','ok', 'NA'])
Now list1 is a list with two objects, each itself a list:
>>> list1
[['Para', 'op', 'qa', 'reason'], ['Power', 'pass', 'ok', 'NA']]
If you are using the csv module to write out your CSV file, use the csvwriter.writerows() method to write each row into a separate line:
>>> import csv
>>> import sys
>>> writer = csv.writer(sys.stdout)
>>> writer.writerows(list1)
Para,op,qa,reason
Power,pass,ok,NA

Your desired result, list1, should be a list of two elements, that each one of them is a list by itself.
list1 = ['Para','op','qa', 'reason']
# wrapping list1 with [] crates a new list which its first element is the original list1.
# In your case, this action gives a list of lines with only one single line
# Only after that I can add a new list of lines that contains another single line
list1 = [list1] + [['Power','pass','ok', 'NA']]
print (list1)

Related

Permutations with very large list

I'm trying to run a very large permutation using Python. The goal is to pair items in groups of four or less, separated by 1) periods, 2) dashes, and 3) without any separation. The order is important.
# input
food = ['', 'apple', 'banana', 'bread', 'tomato', 'yogurt', ...] `
# ideal output would be a list that contains strings like the following:
apple-banana-bread (no dashes before or after!)
apple.banana.bread (using periods)
applebananabread (no spaces)
apple-banana (by combining with the first item in the list, I also get shorter groups but need to delete empty items before joining)
... for all the possible groups of 4, order is important
# Requirements:
# Avoiding a symbol at the beginning or end of a resulting string
# Also creating groups of length 1, 2, and 3
I've used itertools.permutations to create an itertools.chain (perms). But then, this fails with a MemoryError when removing empty elements after converting to a list. Even when using a machine with a large amount of RAM.
food = ['', 'apple', 'banana', 'bread', 'tomato', 'yogurt', ...] `
perms_ = itertools.permutations(food, 4)
perms = [list(filter(None, tup)) for tup in perms] # remove empty nested elements, to prevent two symbols in a row or a symbol before/after
perms = filter(None, perms) # remove empty lists, to prevent two symbols in a row or a symbol before/after
names_t = (
['.'.join(group) for group in perms_t] + # join using dashes
['-'.join(group) for group in perms_t] + # join using periods
[''.join(group) for group in perms_t] # join without spaces
)
names_t = list(set(names_t)) # remove all duplicates
How can I make this code more memory efficient so that it doesn't crash for a large list? If I need to, I can run the code separately for each item separator (commas, periods, directly joined).
Given that I'm not too sure what you would do with a saved list of 6B things, but I think you have 2 strategies if you want to go forward.
First, you could reduce the size of the things in the list by substituting something like a numpy unit8 for each item, which would reduce the size of the resulting list by a LOT, but you would not have the format you want.
In [15]: import sys
In [16]: import numpy as np
In [17]: list_of_strings = ['dog food'] * 1000000
In [18]: list_of_uint8s = np.ones(1000000, dtype=np.uint8)
In [19]: sys.getsizeof(list_of_strings)
Out[19]: 8000056
In [20]: sys.getsizeof(list_of_uint8s)
Out[20]: 1000096
Second, if you just want to "save" the items to some kind of massive file, you do NOT need to realize the list in memory. Just use itertools.permutations and write the objects to the file on-the-fly. No need to create the list in memory if you just want to push it to a file...
In [48]: from itertools import permutations
In [49]: stuff = ['dog', 'cat', 'mouse']
In [50]: perms = permutations(stuff, 2)
In [51]: with open('output.csv', 'w') as tgt:
...: for p in perms:
...: line = '-'.join(p)
...: tgt.write(line)
...: tgt.write('\n')
...:
In [52]: %more output.csv
dog-cat
dog-mouse
cat-dog
cat-mouse
mouse-dog
mouse-cat

How to split parts of a string in a list based on predefined part of in the string

Plese help me with below question
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net',
'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman',
'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']
You can always split each string in the list with '.' and get a new list. In this case, if you are only interested in the first split, you should use the second argument in the split method (which tells the occurrence):
first_list =[x.split('.')[0] for x in sample_list]
For the second list:
second_list =[x.split('.',1)[1] for x in sample_list]
A better way is to iterate only once through the sample_list and get both the lists. As shown below:
first_list, second_list = zip(* [x.split('.',1) for x in sample_list])
Using a list comprehension along with split:
sample_list = ['Ironman.googlesuite.net', 'Hulk.googlekey.net',
'Thor.googlestream.net', 'Antman.googled.net', 'Loki.googlesuite.net',
'Captain.googlekey.net']
result_list1 = [i.split('.')[0] for i in sample_list]
print(result_list1)
This prints:
['Ironman', 'Hulk', 'Thor', 'Antman', 'Loki', 'Captain']
This strategy is to retain, for each input domain, just the component up to, but not including, the first dot separator. For the second list, we can use re.sub here:
result_list2 = [re.sub(r'^[^.]+\.', '', i) for i in sample_list]
print(result_list2)
This prints:
['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']
thank you for the answers, it does help but what if I have list like this:
sample_list = ['Ironman.mdc.googlesuite.net', 'Hulk.nba.abc.googlekey.net',
'Thor.web.gg.hh.googlestream.net', 'Antman.googled.net', 'Loki.media.googlesuite.net','Captain.googlekey.net']
I would want everything preceeding 'googlesuite.net', 'googlekey.net','googlestream.net' and 'googled.net' in list1 and corresponding prefixes in another list as:
result_list1=['Ironman.mdc', 'Hulk.nba.abc', 'Thor.web.gg.hh', 'Antman', 'Loki.media', 'Captain']
result_list2=['googlesuite.net', 'googlekey.net', 'googlestream.net', 'googled.net',
'googlesuite.net', 'googlekey.net']

How do I turn a repeated list element with delimiters into a list?

I imported a CSV file that's basically a table with 5 headers and data sets with 5 elements.
With this code I turned that data into a list of individuals with 5 bits of information (list within a list):
import csv
readFile = open('Category.csv','r')
categoryList = []
for row in csv.reader(readFile):
categoryList.append(row)
readFile.close()
Now I have a list of lists [[a,b,c,d,e],[a,b,c,d,e],[a,b,c,d,e]...]
However element 2 (categoryList[i][2]) or 'c' in each list within the overall list is a string separated by a delimiter (':') of variable length. How do I turn element 2 into a list itself? Basically making it look like this:
[[a,b,[1,2,3...],d,e][a,b,[1,2,3...],d,e][a,b,[1,2,3...],d,e]...]
I thought about looping through each list element and finding element 2, then use the .split(':') command to separate those values out.
The solution you suggested is feasible. You just don't need to do it after you read the file. You can do it while taking it as a input in the first place.
for row in csv.reader(readFile):
row[2] = row[2].split(":") # Split element 2 of each row before appending
categoryList.append(row)
Edit: I guess you know the purpose of split function. So I will explain row[2].
You have a data such as [[a,b,c,d,e],[a,b,c,d,e],[a,b,c,d,e]...] which means each row goes like [a,b,c,d,e], [a,b,c,d,e], [a,b,c,d,e], So every row[2] corresponds to c. Using this way, you get to alter all c's before you append and turn them in to [[a,b,c,d,e],[a,b,c,d,e],[a,b,c,d,e]...].
Not really clear about your structure but if c is a string seperated by : within then try
list(c.split(':'))
Let me know if it solved your problem
You can use a list comprehension on each row and split items containing ':' into a new sublist:
for row in csv.reader(readFile):
new_row = [i.split(':') if ':' in i else i for i in row]
categoryList.append(new_row)
This works if you also have other items in the row that you need to split on ':'.
Otherwise, you can directly split on the index if you only have one item containing ':':
for row in csv.reader(readFile):
row[2] = row[2].split(':')
categoryList.append(row)
Assume that you have a row like this:
row = ["foo", "bar", "1:2:3:4:5", "baz"]
To convert item [2] into a sublist, you can use
row[2] = row[2].split(":") # elements can be assigned to, yawn.
Now the row is ['foo', 'bar', ['1', '2', '3', '4', '5'], 'baz']
To graft the split items to the "top level" of the row, you can use
row[2:3] = row[2].split(":") # slices can be assigned to, too, yay!
Now the row is ['foo', 'bar', '1', '2', '3', '4', '5', 'baz']
This of course skips any defensive checks of the row data (can it at all be split?) that a real robust application should have.

List of List to Key-Value Pairs

I have a string which is semicolon delimited and then space delimited:
'gene_id EFNB2; Gene_type cDNA_supported; transcript_id EFNB2.aAug10; product_id EFNB2.aAug10;'
I want to create a dictionary in one line by splitting based on the delimiters but so far I can only get to a list of lists:
filter(None,[x.split() for x in atts.split(';')])
Which gives me:
[['gene_id', 'EFNB2'], ['Gene_type', 'cDNA_supported'], ['transcript_id', 'EFNB2.aAug10'], ['product_id', 'EFNB2.aAug10']]
When what I want is:
{'gene_id': 'EFNB2', 'Gene_type': 'cDNA_supported', 'transcript_id': 'EFNB2.aAug10', 'product_id': 'EFNB2.aAug10'}
I have tried:
filter(None,{k:v for k,v in x.split() for x in atts.split(';')})
but it gives me nothing. Anybody know how to accomplish this?
You are very close now, you can just call dict on your list of lists:
>>> lst = [['gene_id', 'EFNB2'], ['Gene_type', 'cDNA_supported'], ['transcript_id', 'EFNB2.aAug10'], ['product_id', 'EFNB2.aAug10']]
>>> dict(lst)
{'Gene_type': 'cDNA_supported',
'gene_id': 'EFNB2',
'product_id': 'EFNB2.aAug10',
'transcript_id': 'EFNB2.aAug10'}

sorting a list and separating the different features

So I am given a list and I am supposed to sort it down into two lists, one with the names of the companies and one with the prices in a nested list.
['Acer 481242.74\n', 'Beko 966071.86\n', 'Cemex 187242.16\n', 'Datsun 748502.91\n', 'Equifax 146517.59\n', 'Gerdau 898579.89\n', 'Haribo 265333.85\n']
I used the following code to separate the names properly:
print('\n'.join(data))
namelist = [i.split(' ', 1)[0] for i in data]
print(namelist)
But now it wants me to seperate all the prices from the list and put them in a single list nested together and I don't know how to do that.
To build two separate lists, just use a regular loop:
names = []
prices = []
for entry in data:
name, price = entry.split()
names.append(name)
prices.append(price)
If you needed the entries together in one list, each entry a list containing the name and the price separately, just split in a list comprehension like you did, but don't pick one or the other value from the result:
names_and_prices = [entry.split() for entry in data]
I used str.split() without arguments to split on arbitrary whitespace. This assumes you always have exactly two entries in your strings. You can still limit the split, but then use None as the first argument, and strip the line beforehand to get rid of the \n separately:
names_and_prices = [entry.strip().split(None, 1) for entry in data]
Demo for the 'nested' approach:
>>> data = ['Acer 481242.74\n', 'Beko 966071.86\n', 'Cemex 187242.16\n', 'Datsun 748502.91\n', 'Equifax 146517.59\n', 'Gerdau 898579.89\n', 'Haribo 265333.85\n']
>>> [entry.split() for entry in data]
[['Acer', '481242.74'], ['Beko', '966071.86'], ['Cemex', '187242.16'], ['Datsun', '748502.91'], ['Equifax', '146517.59'], ['Gerdau', '898579.89'], ['Haribo', '265333.85']]
split() is the right approach, as it will give you everything you need if you don't limit it to just one split (the , 1) in your code). If you provide no arguments to it at all, it'll split on any size of whitespace.
>>> data = ['Acer 481242.74\n', 'Beko 966071.86\n', 'Cemex 187242.16\n', 'Datsun 748502.91\n', 'Equifax 146517.59\n', 'Gerdau 898579.89\n', 'Haribo 265333.85\n']
>>> nested_list = [i.split() for i in data]
>>> nested_list
[['Acer', '481242.74'], ['Beko', '966071.86'], ['Cemex', '187242.16'], ['Datsun', '748502.91'], ['Equifax', '146517.59'], ['Gerdau', '898579.89'], ['Haribo', '265333.85']]
>>> print(*nested_list, sep='\n')
['Acer', '481242.74']
['Beko', '966071.86']
['Cemex', '187242.16']
['Datsun', '748502.91']
['Equifax', '146517.59']
['Gerdau', '898579.89']
['Haribo', '265333.85']

Categories