Transfer string to data frame Python

Transfer string to data frame Python - python

I have a large string which I have to transfer into a data frame. For example the string is:
meals_string = "APPETIZERS Southern Fried Quail with
Greens,Huckleberries,Pecans & Blue Cheese 14.00 Park Avenue Cafe
Chopped Salad Goat Feta Cheese,Nigoise Olives,Marinated White [...]
ENTREES Horseradish Crusted Canadian Salmon,Potato Fritters, Marinated
Cucumbers,Chive Vinaigrette 27.00 Sautéed Prawns with Mushroom
Tortellini,Grilled Tomato Vinaigrette & Sweet Corn 29.50"
meals = meals_string.splitlines()
Which gives me var "meals" as list, but I am stuck how to convert the string into dataframe with 3 columns: Category; Meal_name; Price

A relatively simple parser for your string can be built and the passed directly to pandas.DataFrame like:
Code:
def meal_string_parser(meal_string):
category = ''
meal = []
price = 0
for word in meal_string.split():
if word:
try:
price = float(word)
yield category, ' '.join(meal), price
meal = []
except ValueError:
# this is not a number, so not a price
if word.upper() == word and word.isalnum():
# found category
category = word
else:
meal.append(word)
if meal:
yield category, ' '.join(meal), price
Test Code:
meals_string = """
APPETIZERS
Southern Fried Quail with Greens,Huckleberries,Pecans & Blue Cheese 14.00
Park Avenue Cafe Chopped Salad Goat Feta Cheese,Nigoise Olives,Marinated White 13.00
ENTREES
Horseradish Crusted Canadian Salmon,Potato Fritters, Marinated Cucumbers,Chive Vinaigrette 27.00
Sautéed Prawns with Mushroom Tortellini,Grilled Tomato Vinaigrette & Sweet Corn 29.50
"""
import pandas as pd
df = pd.DataFrame(meal_string_parser(meals_string),
columns='Category Meal_name Price'.split())
print(df)
Results:
Category Meal_name Price
0 APPETIZERS Southern Fried Quail with Greens,Huckleberries... 14.0
1 APPETIZERS Park Avenue Cafe Chopped Salad Goat Feta Chees... 13.0
2 ENTREES Horseradish Crusted Canadian Salmon,Potato Fri... 27.0
3 ENTREES Sautéed Prawns with Mushroom Tortellini,Grille... 29.5

Related

classifing excel data row by row in n level columns

I have problem with excel file to classify data in some columns and rows, I need to arrange merge cells to next column as a 1 row and next column go to beside them like this pictures:
Input:
Output for Dairy:
Summary:
first we took Dairy row, then we go to the second column in front of Dairy and get data in front of Dairy, then we go to the second column and in front of Milk to Mr. 1 we get the Butter to Mrs. 1 and Butter to Mrs. 2 and so on ...
After that we want to export it into an excel file like in Output picture.
I have written a code which get the first column data and finds all the data in front of it but I need to change it in order to get the data row by row like in the Output picture:
import pandas
import openpyxl
import xlwt
from xlwt import Workbook
df = pandas.read_excel('excel.xlsx')
result_first_level = []
for i, item in enumerate(df[df.columns[0]].values, 2):
if pandas.isna(item):
result_first_level[-1]['index'] = i
else:
result_first_level.append(dict(name=item, index=i, levels_name=[]))
for level in df.columns[1:]:
move_index = 0
for i, obj in enumerate(result_first_level):
if i == 0:
for item in df[level].values[0:obj['index'] - 1]:
if pandas.isna(item):
move_index += 1
continue
else:
obj['levels_name'].append(item)
move_index += 1
else:
for item in df[level].values[move_index:obj['index'] - 1]:
if pandas.isna(item):
move_index += 1
continue
else:
obj['levels_name'].append(item)
move_index += 1
# Workbook is created
wb = Workbook()
# add_sheet is used to create sheet.
sheet1 = wb.add_sheet('Sheet 1')
style = xlwt.easyxf('font: bold 1')
move_index = 0
for item in result_first_level:
for member in item['levels_name']:
sheet1.write(move_index, 0, item['name'], style)
sheet1.write(move_index, 1, member)
move_index += 1
wb.save('test.xls')
download Input File excel from here
Thanks for helping!

First, fill forward your data to fill blank cells with the last valid value the create an ordered collection using pd.CategoricalDtype to sort the product column. Finally, you have just to iterate over columns pairwise and rename columns to allow concatenate. The last step is to sort your rows by product value.
import pandas as pd
# Prepare your dataframe
df = pd.read_excel('input.xlsx').dropna(how='all')
df.update(df.iloc[:, :-1].ffill())
df = df.drop_duplicates()
# Get keys to sort data in the final output
cats = pd.CategoricalDtype(df.T.melt()['value'].dropna().unique(), ordered=True)
# Group pairwise values
data = []
for cols in zip(df.columns, df.columns[1:]):
col_mapping = dict(zip(cols, ['product', 'subproduct']))
data.append(df[list(cols)].rename(columns=col_mapping))
# Merge all data
out = pd.concat(data).drop_duplicates().dropna() \
.astype(cats).sort_values('product').reset_index(drop=True)
Output:
>>> cats
CategoricalDtype(categories=['Dairy', 'Milk to Mr.1', 'Butter to Mrs.1',
'Butter to Mrs.2', 'Cheese to Miss 2 ', 'Cheese to Mr.2',
'Milk to Miss.1', 'Milk to Mr.5', 'yoghurt to Mr.3',
'Milk to Mr.6', 'Fruits', 'Apples to Mr.6',
'Limes to Miss 5', 'Oranges to Mr.7', 'Plumbs to Miss 5',
'apple for mr 2', 'Foods & Drinks', 'Chips to Mr1',
'Jam to Mr 2.', 'Coca to Mr 5', 'Cookies to Mr1.',
'Coca to Mr 7', 'Coca to Mr 6', 'Juice to Miss 1',
'Jam to Mr 3.', 'Ice cream to Miss 3.', 'Honey to Mr 5',
'Cake to Mrs. 2', 'Honey to Miss 2',
'Chewing gum to Miss 7.'], ordered=True)
>>> out
product subproduct
0 Dairy Milk to Mr.1
1 Dairy Cheese to Mr.2
2 Milk to Mr.1 Butter to Mrs.1
3 Milk to Mr.1 Butter to Mrs.2
4 Butter to Mrs.2 Cheese to Miss 2
5 Cheese to Mr.2 Milk to Miss.1
6 Cheese to Mr.2 yoghurt to Mr.3
7 Milk to Miss.1 Milk to Mr.5
8 yoghurt to Mr.3 Milk to Mr.6
9 Fruits Apples to Mr.6
10 Fruits Oranges to Mr.7
11 Apples to Mr.6 Limes to Miss 5
12 Oranges to Mr.7 Plumbs to Miss 5
13 Plumbs to Miss 5 apple for mr 2
14 Foods & Drinks Chips to Mr1
15 Foods & Drinks Juice to Miss 1
16 Foods & Drinks Cake to Mrs. 2
17 Chips to Mr1 Jam to Mr 2.
18 Chips to Mr1 Cookies to Mr1.
19 Jam to Mr 2. Coca to Mr 5
20 Cookies to Mr1. Coca to Mr 6
21 Cookies to Mr1. Coca to Mr 7
22 Juice to Miss 1 Honey to Mr 5
23 Juice to Miss 1 Jam to Mr 3.
24 Jam to Mr 3. Ice cream to Miss 3.
25 Cake to Mrs. 2 Chewing gum to Miss 7.
26 Cake to Mrs. 2 Honey to Miss 2

If column contains substring from list, create new column with removed substring from list

I'm trying to create a simplified name column. I have a brand name column and a list of strings as shown below. If the brand name column contains any string from list, then create a simplified brand name column with the string matched removed. The other brand name column elements that do not contain any strings from list will be carried over to the simplified column
l = ['co', 'ltd', 'company']
df:
Brand
Nike
Adidas co
Apple company
Intel
Google ltd
Walmart co
Burger King
Desired df:
Brand Simplified
Nike Nike
Adidas co Adidas
Apple company Apple
Intel Intel
Google Ltd Google
Walmart co Walmart
Burger King Burger King
Thanks in advance! Any help is appreciated!!

how about use this to remove substrings and trailing whitespaces
list_substring = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = df['Brand'].str.replace('|'.join(list_substring), '').str.lstrip()

In [28]: df
Out[28]:
Brand
0 Nike
1 Adidas co
2 Apple company
3 Intel
4 Google ltd
5 Walmart co
6 Burger King
In [30]: df["Simplified"] = df.Brand.apply(lambda x: x.split()[0] if x.split()[-1] in l else x)
In [31]: df
Out[31]:
Brand Simplified
0 Nike Nike
1 Adidas co Adidas
2 Apple company Apple
3 Intel Intel
4 Google ltd Google
5 Walmart co Walmart
6 Burger King Burger King

Using str.replace
Ex:
l = ['co', 'ltd', 'company']
df = pd.DataFrame({'Brand': ['Nike', 'Adidas co', 'Apple company', 'Intel', 'Google ltd', 'Walmart co', 'Burger King']})
df['Simplified'] = df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b", "").str.strip()
#or df['Brand'].str.replace(r"\b(" + "|".join(l) + r")\b$", "").str.strip() #TO remove only in END of string
print(df)
Output:
Brand Simplified
0 Nike Nike
1 Adidas co Adidas
2 Apple company Apple
3 Intel Intel
4 Google ltd Google
5 Walmart co Walmart
6 Burger King Burger King

df = {"Brand":["Nike","Adidas co","Apple company","Google ltd","Berger King"]}
df = pd.DataFrame(df)
list_items = ['ltd', 'company', 'co'] # 'company' will be evaluated first before 'co'
df['Simplified'] = [' '.join(w) for w in df['Brand'].str.split().apply(lambda x: [i for i in x if i not in list_items])]

Populate value for data frame row based on condition

Background
I have a dataset that looks like the following:
product_name price
Women's pant 20.00
Men's Shirt 30.00
Women's Dress 40.00
Blue Shirt 30.00
...
I am looking to create a new column called
gender
which will contain the values Women,Men, or Unisex based in the string in the product_name
The desired result would look like this:
product_name price gender
Women's pant 20.00 women
Men's Shirt 30.00 men
Women's Dress 40.00 women
Blue Shirt 30.00 unisex
My Approach
I figured that first I should create a new column with a blank value for each row. Then I should loop through each row in the dataframe and check on the string df[product_name] to see if its a mens, womens, or unisex and fill out the respective gender row value.
Here is my code:
df['gender'] = ""
for product_name in df['product_name']:
if 'women' in product_name.lower():
df['gender'] = 'women'
elif 'men' in product_name.lower():
df['gender'] = 'men'
else:
df['gender'] = 'unisex'
However, I get the following result:
product_name price gender
Women's pant 20.00 men
Men's Shirt 30.00 men
Women's Dress 40.00 men
Blue Shirt 30.00 men
I would really appreciate some help here as I am new to python and pandas library.

You could use a list comprehension with if/else to get your output:
df['gender'] = ['women' if 'women' in word
else "men" if "men" in word
else "unisex"
for word in df.product_name.str.lower()]
df
product_name price gender
0 Women's pant 20.0 women
1 Men's Shirt 30.0 men
2 Women's Dress 40.0 women
3 Blue Shirt 30.0 unisex
Alternatively, you could use numpy select to achieve the same results:
cond1 = df.product_name.str.lower().str.contains("women")
cond2 = df.product_name.str.lower().str.contains("men")
condlist = [cond1, cond2]
choicelist = ["women", "men"]
df["gender"] = np.select(condlist, choicelist, default="unisex")
Usually, for strings, python's iteration is much faster; you have to test that though.

Try turning your for statement into a function and using apply. So something like -
def label_gender(product_name):
'''product_name is a str'''
if 'women' in product_name.lower():
return 'women'
elif 'men' in product_name.lower():
return 'men'
else:
return 'unisex'
df['gender'] = df.apply(lambda x: label_gender(x['product_name']),axis=1)
A good breakdown of using apply/lambda can be found here: https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7

You can also use np.where + Series.str.contains,
import numpy as np
df['gender'] = (
np.where(df.product_name.str.contains("women", case=False), 'women',
np.where(df.product_name.str.contains("men", case=False), "men", 'unisex'))
)
product_name price gender
0 Women's pant 20.0 women
1 Men's Shirt 30.0 men
2 Women's Dress 40.0 women
3 Blue Shirt 30.0 unisex

Use np.where .str.contains and regex first word` in phrase. So that;
#np.where(if product_name has WomenORMen, 1st Word in Phrase, otherwise;unisex)
df['Gender']=np.where(df.product_name.str.contains('Women|Men')\
,df.product_name.str.split('(^[\w]+)').str[1],'Unisex')
product_name price gender
0 Women's pant 20.0 Women
1 Men's Shirt 30.0 Men
2 Women's Dress 640.0 Women
3 Blue Shirt 30.0 Unisex

Python - Finding the top 5 rows containing a word in a dataframe

I'm trying to make a function that prints the top 5 products and their prices, and the bottom 5 products and their prices of the product listings that contain words from a wordlist. I've tried making it like this -
def wordlist_top_costs(filename, wordlist):
xlsfile = pd.ExcelFile(filename)
dframe = xlsfile.parse('Sheet1')
dframe['Product'].fillna('', inplace=True)
dframe['Price'].fillna(0, inplace=True)
price = {}
for word in wordlist:
mask = dframe.Product.str.contains(word, case=False, na=False)
price[mask] = dframe.loc[mask, 'Price']
top = sorted(Score.items(), key=operator.itemgetter(1), reverse=True)
print("Top 10 product prices for: ", wordlist.name)
for i in range(0, 5):
print(top[i][0], " | ", t[i][1])
bottom = sorted(Score.items(), key=operator.itemgetter(1), reverse=False)
print("Bottom 10 product prices for: ", wordlist.name)
for i in range(0, 5):
print(top[i][0], " | ", t[i][1])
However, the above function throws an error at line
price[mask] = dframe.loc[mask, 'Price in AUD'] that says -
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Any help to correct/modify this appreciated. Thanks!
Edit -
For eg.
wordlist - alu, co, vin
Product | Price
Aluminium Crown - 22.20
Coca Cola - 1.0
Brass Box - 28.75
Vincent Kettle - 12.00
Vinyl Stickers - 0.50
Doritos - 2.0
Colin's Hair Oil - 5.0
Vincent Chase Sunglasses - 75.40
American Tourister - $120.90
Output :
Top 3 Product Prices:
Vincent Chase Sunglasses - 75.40
Aluminium Crown - 22.20
Vincent Kettle - 12.0
Bottom 3 Product Prices:
Vinyl Stickers - 0.50
Coca Cola - 1.0
Colin's Hair Oil - 5.0

You can use nlargest and
nsmallest:
#remove $ and convert column Price to floats
dframe['Price'] = dframe['Price'].str.replace('$', '').astype(float)
#filter by regex - joined all values of list by |
wordlist = ['alu', 'co', 'vin']
pat = '|'.join(wordlist)
mask = dframe.Product.str.contains(pat, case=False, na=False)
dframe = dframe.loc[mask, ['Product','Price']]
top = dframe.nlargest(3, 'Price')
#top = dframe.sort_values('Price', ascending=False).head(3)
print (top)
Product Price
7 Vincent Chase Sunglasses 75.4
0 Aluminium Crown 22.2
3 Vincent Kettle 12.0
bottom = dframe.nsmallest(3, 'Price')
#bottom = dframe.sort_values('Price').head(3)
print (bottom)
Product Price
4 Vinyl Stickers 0.5
1 Coca Cola 1.0
6 Colin's Hair Oil 5.0
Setup:
dframe = pd.DataFrame({'Price': ['22.20', '1.0', '28.75', '12.00', '0.50', '2.0', '5.0', '75.40', '$120.90'], 'Product': ['Aluminium Crown', 'Coca Cola', 'Brass Box', 'Vincent Kettle', 'Vinyl Stickers', 'Doritos', "Colin's Hair Oil", 'Vincent Chase Sunglasses', 'American Tourister']}, columns=['Product','Price'])
print (dframe)
Product Price
0 Aluminium Crown 22.20
1 Coca Cola 1.0
2 Brass Box 28.75
3 Vincent Kettle 12.00
4 Vinyl Stickers 0.50
5 Doritos 2.0
6 Colin's Hair Oil 5.0
7 Vincent Chase Sunglasses 75.40
8 American Tourister $120.90

Print python list in groups of 3

I have a list of 107 names, I would like to print them out in groups of 3 or so, with each name separated by a tab, a newline after each line, until the end. How can I do this?
with for item in list print item i only get 1 name per line of course, which is fine I guess but i'd like to fit more in the console at once so I'd like to print 3 or so names on each line as I go through the list, so instead of:
name1
name2
name3
name4
name5
name6
i would get:
name1 name2 name3
name4 name5 name6
It's kindof hard to search for an answer to this, i haven't been able to come up with quite what I need or that I could understand, most things I did find just deal with len() or range() and confused me. Is there some simple way to do this? Thank you!
[edit:update] using #inspectorG4dget's example of:
for i in range(0, len(listnames), 5):
print '\t\t'.join(listnames[i:i+5])
i get the following: http://www.pasteall.org/pic/show.php?id=41159
how can I get that cleaned up so everything is nicely aligned in each column? Is what I want possible to do easily?

1)
li = ['sea','mountain','desert',
'Emma','Cathy','Kate',
'ii','uuuuuuuuuuuuuuuuuuu','aaa',
'round','flat','sharp',
'blueberry','banana','apple',
'red','purple','white',
'hen','tiger']
a,b = divmod(len(li),3)
itn = iter(li).next
print ''.join('%s\t%s\t%s\n' % (itn(),itn(),itn())
for i in xrange(a))\
+ ('%s\t%s\t\n' % (itn(),itn()) if b==2
else '%s\t\n' % itn() if b==1
else '')
result
sea mountain desert
Emma Cathy Kate
ii uuuuuuuuuuuuuuuuuuu aaa
round flat sharp
blueberry banana apple
red purple white
hen tiger
.
2)
And to align in columns whose width depends on the longest element of the list:
li = ['sea','mountain','desert',
'Emma','Cathy','Kate',
'HH','VVVVVVV','AAA',
'round','flat','sharp',
'blueberry','banana','apple',
'red','purple','white',
'hen','tiger']
maxel = max(len(el) for el in li)
a,b = divmod(len(li),3)
itn = iter(li).next
form = '%%-%ds\t%%-%ds\t%%-%ds\n' % (maxel,maxel,maxel)
print ''.join(form % (itn(),itn(),itn())
for i in xrange(a))\
+ ('%%-%ds\t%%-%ds\t\n' %(maxel,maxel) % (itn(),itn()) if b==2
else '%%-%ds\t\n' % ma% itn() if b==1
else '')
result
sea mountain desert
Emma Cathy Kate
HH VVVVVVV AAA
round flat sharp
blueberry banana apple
red purple white
hen tiger
.
3)
To align in column, the width of each column depending upon the longest element in it:
li = ['sea','mountain','desert',
'Emma','Cathy','Kate',
'HH','VVVVVVV','AAA',
'round','flat','sharp',
'nut','banana','apple',
'red','purple','white',
'hen','tiger']
maxel0 = max(len(li[i]) for i in xrange(0,len(li),3))
maxel1 = max(len(li[i]) for i in xrange(1,len(li),3))
maxel2 = max(len(li[i]) for i in xrange(2,len(li),3))
a,b = divmod(len(li),3)
itn = iter(li).next
form = '%%-%ds\t%%-%ds\t%%-%ds\n' % (maxel0,maxel1,maxel2)
print ''.join(form % (itn(),itn(),itn())
for i in xrange(a))\
+ ('%%-%ds\t%%-%ds\t\n' %(maxel0,maxel1) % (itn(),itn()) if b==2
else '%%-%ds\t\n' % maxel0 % itn() if b==1
else '')
result
sea mountain desert
Emma Cathy Kate
HH VVVVVVV AAA
round flat sharp
nut banana apple
red purple white
hen tiger
4)
I've modified the algorithm in order to generalize to any number of columns wanted.
The wanted number of columns must be passed as argument to parameter nc :
from itertools import imap,islice
li = ['sea','mountain','desert',
'Emma','Cathy','Kate',
'HH','VVVVVVV','AAA',
'round','flat','sharp',
'nut','banana','apple',
'heeeeeeeeeeen','tiger','snake'
'red','purple','white',
'atlantic','pacific','antarctic',
'Bellini']
print 'len of li == %d\n' % len(li)
def cols_print(li,nc):
maxel = tuple(max(imap(len,islice(li,st,None,nc)))
for st in xrange(nc))
nblines,tail = divmod(len(li),nc)
stakes = (nc-1)*['%%-%ds\t'] + ['%%-%ds']
form = ''.join(stakes) % maxel
itn = iter(li).next
print '\n'.join(form % tuple(itn() for g in xrange(nc))
for i in xrange(nblines))
if tail:
print ''.join(stakes[nc-tail:]) % maxel[0:tail] % tuple(li[-tail:]) + '\n'
else:
print
for nc in xrange(3,8):
cols_print(li,nc)
print '-----------------------------------------------------------'
result
len of li == 24
sea mountain desert
Emma Cathy Kate
HH VVVVVVV AAA
round flat sharp
nut banana apple
heeeeeeeeeeen tiger snakered
purple white atlantic
pacific antarctic Bellini
-----------------------------------------------------------
sea mountain desert Emma
Cathy Kate HH VVVVVVV
AAA round flat sharp
nut banana apple heeeeeeeeeeen
tiger snakered purple white
atlantic pacific antarctic Bellini
-----------------------------------------------------------
sea mountain desert Emma Cathy
Kate HH VVVVVVV AAA round
flat sharp nut banana apple
heeeeeeeeeeen tiger snakered purple white
atlantic pacific antarctic Bellini
-----------------------------------------------------------
sea mountain desert Emma Cathy Kate
HH VVVVVVV AAA round flat sharp
nut banana apple heeeeeeeeeeen tiger snakered
purple white atlantic pacific antarctic Bellini
-----------------------------------------------------------
sea mountain desert Emma Cathy Kate HH
VVVVVVV AAA round flat sharp nut banana
apple heeeeeeeeeeen tiger snakered purple white atlantic
pacific antarctic Bellini
-----------------------------------------------------------
.
But I prefer a displaying in which there are no tabs between columns, but only a given number of characters.
In the following code, I choosed to separate the columns by 2 characters: it is the 2 in the line
maxel = tuple(max(imap(len,islice(li,st,None,nc)))+2
The code
from itertools import imap,islice
li = ['sea','mountain','desert',
'Emma','Cathy','Kate',
'HH','VVVVVVV','AAA',
'round','flat','sharp',
'nut','banana','apple',
'heeeeeeeeeeen','tiger','snake'
'red','purple','white',
'atlantic','pacific','antarctic',
'Bellini']
print 'len of li == %d\n' % len(li)
def cols_print(li,nc):
maxel = tuple(max(imap(len,islice(li,st,None,nc)))+2
for st in xrange(nc))
nblines,tail = divmod(len(li),nc)
stakes = nc*['%%-%ds']
form = ''.join(stakes) % maxel
itn = iter(li).next
print '\n'.join(form % tuple(itn() for g in xrange(nc))
for i in xrange(nblines))
if tail:
print ''.join(stakes[nc-tail:]) % maxel[0:tail] % tuple(li[-tail:]) + '\n'
else:
print
for nc in xrange(3,8):
cols_print(li,nc)
print 'mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'
the result
len of li == 24
sea mountain desert
Emma Cathy Kate
HH VVVVVVV AAA
round flat sharp
nut banana apple
heeeeeeeeeeen tiger snakered
purple white atlantic
pacific antarctic Bellini
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
sea mountain desert Emma
Cathy Kate HH VVVVVVV
AAA round flat sharp
nut banana apple heeeeeeeeeeen
tiger snakered purple white
atlantic pacific antarctic Bellini
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
sea mountain desert Emma Cathy
Kate HH VVVVVVV AAA round
flat sharp nut banana apple
heeeeeeeeeeen tiger snakered purple white
atlantic pacific antarctic Bellini
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
sea mountain desert Emma Cathy Kate
HH VVVVVVV AAA round flat sharp
nut banana apple heeeeeeeeeeen tiger snakered
purple white atlantic pacific antarctic Bellini
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
sea mountain desert Emma Cathy Kate HH
VVVVVVV AAA round flat sharp nut banana
apple heeeeeeeeeeen tiger snakered purple white atlantic
pacific antarctic Bellini
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm

You could also try this:
from itertools import izip
l = ['name1', 'name2', 'name3', 'name4', 'name5', 'name6']
for t in izip(*[iter(l)]*3):
print '\t'.join(t)
name1 name2 name3
name4 name5 name6
If you're not certain that the list length will be a multiple of 3, you could use izip_longest, applying the same idea:
from itertools import izip_longest as izipl
l = ['name1', 'name2', 'name3', 'name4', 'name5', 'name6', 'name7']
for t in izipl(fillvalue='', *[iter(l)]*3):
print '\t'.join(t)
name1 name2 name3
name4 name5 name6
name7

This should do it:
In [12]: L
Out[12]: ['name1', 'name2', 'name3', 'name4', 'name5', 'name6']
In [13]: for i in range(0,len(L),3): print ' '.join(L[i:i+3])
name1 name2 name3
name4 name5 name6
EDIT: to get everything into a fixed width (some code that I wrote a while to turn columnar data into a table. All you have to do is columnize your data and call this old code):
def tabularize(infilepath, outfilepath, delim='\t', largeFile=False):
""" Return nothing
Write into the file in outfilepath, the contents of infilepath, expressed in tabular form.
The tabular form is similar to the way in which SQL tables are displayed.
If largeFile is set to True, then no caching of lines occurs. However, two passes of the infile are required"""
if largeFile:
widths = getWidths(infilepath, delim)
else:
with open(infilepath) as infile:
lines = [line.strip().split(delim) for line in infile.readlines() if line.strip()]
widths = [max([len(row) for row in rows])+2 for rows in izip_longest(*lines, fillvalue="")]
with open(outfilepath, 'w') as outfile:
outfile.write("+")
for width in widths:
outfile.write('-'*width + "+")
outfile.write('\n')
for line in lines:
outfile.write("|")
for col,width in izip_longest(line,widths, fillvalue=""):
outfile.write("%s%s%s|" %(' '*((width-len(col))/2), col, ' '*((width+1-len(col))/2)))
outfile.write('\n+')
for width in widths:
outfile.write('-'*width + "+")
outfile.write('\n')
def getWidths(infilepath, delim):
answer = defaultdict(int)
with open(infilepath) as infile:
for line in infile:
cols = line.strip().split(delim)
lens = map(len, cols)
for i,l in enumerate(lens):
if answer[i] < l:
answer[i] = l
return [answer[k] for k in sorted(answer)]
def main(L, n, infilepath, outfilepath):
iterator = iter(L)
with open(infilepath, 'w') as infile:
for row in itertools.izip_longest([iterator]*n, fillavalue=''):
infile.write('\t'.join(row)+'\n')
if len(L) > 10**6:
largeFile = True
tabularize(infilepath, outfilepath, delim='\t', largeFile)

Try using itertools i think its a much more simpler solution.
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
names = ['name1', 'name2', 'name3', 'name4', 'name5', 'name6']
for item1 in grouper(3, names, ''):
print '\t'.join(item1)
Result:
name1 name2 name3
name4 name5 name6

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Transfer string to data frame Python - python

Related

classifing excel data row by row in n level columns

If column contains substring from list, create new column with removed substring from list

Populate value for data frame row based on condition

Python - Finding the top 5 rows containing a word in a dataframe

Print python list in groups of 3

Categories

Resources