Group duplicate rows with different column values then send to csv

Group duplicate rows with different column values then send to csv - python

I have this csv file favsites.csv:
Emails Favorite Site
batman#email.com something.com
batman#email.com hamburgers.com
poisonivy#email.com yonder.com
superman#email.com cookies.com
catgirl#email.com cattreats.com
catgirl#email.com fishcaviar.com
catgirl#email.com elegantfashion.com
joker#email.com cards.com
supergirl#email.com nailart.com
I want to group the duplicates, then merge the columns, and then send to a csv.
So once grouped and merged it should look like this:
Emails Favorite Site
batman#email.com something.com
hamburgers.com
poisonivy#email.com yonder.com
superman#email.com cookies.com
catgirl#email.com cattreats.com
fishcaviar.com
elegantfashion.com
joker#email.com cards.com
supergirl#email.com nailart.com
How would I send this to a csv file and have it look like this? But something.com and hamburgers.com are in one cell for batman; and cattreats.com, fishcaviar.com, and elegantfashion.com are in one cell for catgirl. OR, have them in the same row but different columns like this.
Emails Favorite Site
batman#email.com something.com hamburgers.com
poisonivy#email.com yonder.com
superman#email.com cookies.com
catgirl#email.com cattreats.com fishcaviar.com elegantfashion.com
joker#email.com cards.com
supergirl#email.com nailart.com
Here is my code so far:
import pandas as pd
Dir='favsites.csv'
sendcsv='mergednames.csv'
df = pd.read_csv(Dir)
df = pd.DataFrame(df)
df_sort = df.sort_values('Emails')
grouped = df_sort.groupby(['Emails', 'Favorite Site']).agg('sum')
When I print grouped it shows:
Empty DataFrame
Columns: []
Index: [(batman#email.com, hamburgers.com), (batman#email.com, something.com), (catgirl#email.com, cattreats.com), (catgirl#email.com, elegantfashion.com), (catgirl#email.com, fishcaviar.com), (joker#email.com, cards.com), (poisonivy#email.com, yonder.com), (supergirl#email.com, nailart.com), (superman#email.com, cookies.com)]

You can replace duplicated values with empty strings:
emails = ['batman#email.com', 'poisonivy#email.com','superman#email.com', 'batman#email.com']
favs =['something.com', 'hamburgers.com', 'yonder.com', 'cookies.com' ]
df = pd.DataFrame({'Emails': emails, 'Favorite Site': favs})
df_sorted = df.sort_values('Emails')
df_sorted.loc[df['Emails'].duplicated(), 'Emails'] = ''
Output:
Emails
Favorite Site
batman#email.com
something.com
cookies.com
poisonivy#email.com
hamburgers.com
superman#email.com
yonder.com

IIUC, you can use pandas.Series.str.ljust and pandas.DataFrame.to_csv with (\t) as a sep :
df.loc[df["Emails"].duplicated(), "Emails"] = ""
len_emails = df["Emails"].str.len().max()
len_sites = df["Favorite Site"].str.len().max()
df = df.T.reset_index().T.reset_index(drop=True)
df[0] = df[0].str.ljust(len_emails)
df[1] = df[1].str.ljust(len_sites)
df.to_csv("/tmp/out1.csv", index=False, header=False, sep="\t")
Output (notepad) :
For the second format, you can use pandas.DataFrame.groupby
df = (
pd.read_csv("/tmp/input.csv", sep="\s\s+", engine="python")
.groupby("Emails", as_index=False, sort=False).agg(",".join)
.T.reset_index().T.reset_index(drop=True)
.pipe(lambda d: d[[0]].join(d[1].str.split(",", expand=True), rsuffix="_"))
.pipe(lambda d: pd.concat([d[col].str.ljust(d[col].fillna("").str.len().max().sum())
for col in d.columns], axis=1))
)
df.to_csv('tmp/out2.csv', index=False, header=False, sep="\t")
Output (notepad) :

Related

how to use the input with pandas to get all the value.count linked to this input

my dataframe looks like this:
Index(['#Organism/Name', 'TaxID', 'BioProject Accession', 'BioProject ID', 'Group', 'SubGroup', 'Size (Mb)', 'GC%', 'Replicons', 'WGS',
'Scaffolds', 'Genes', 'Proteins', 'Release Date', 'Modify Date',
'Status', 'Center', 'BioSample Accession', 'Assembly Accession',
'Reference', 'FTP Path', 'Pubmed ID', 'Strain'],
dtype='object')
I ask the user to enter the name of the species with this script :
print("bacterie species?")
species=input()
I want to look for the rows with "Organism/Name" equal to the species written by the user (input) then to calculate with "values.count" of the status column and finally to retrieve 'FTP Path'.
Here is the code that I could do but that does not work:
if (data.loc[(data["Organism/Name"]==species)
print(Data['Status'].value_counts())
else:
print("This species not found")
if (data.loc[(data["Organism/Name"]==species)
print(Data['Status'].value_counts())
else:
print(Data.get["FTP Path"]

If I understand your question correctly, this is what you're trying to achieve:
import wget
import numpy as np
import pandas as pd
URL='https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt'
data = pd.read_csv(wget.download(URL) , sep = '\t', header = 0)
species = input("Enter the bacteria species: ")
if data["#Organism/Name"].str.contains(species, case = False).any():
print(data.loc[data["#Organism/Name"].str.contains(species, case = False)]['Status'].value_counts())
FTP_list = data.loc[data["#Organism/Name"].str.contains(species, case = False)]["FTP Path"].values
else:
print("This species not found")
To wite all the FTP_Path urls into a txt file, you can do this:
with open('/path/urls.txt', mode='wt') as file:
file.write('\n'.join(FTP_list))

how to merge two or more list in custom order in python

I have the following code:
import pandas as pd
y = pd.ExcelFile('C:\\Users\\vibhu\\Desktop\\Training docs\\excel training\\super store data transformation\\Sample - Superstore data transformation by Vaibhav.xlsx')
superstore_orders = y.parse(sheet_name='Orders Input data')
superstore_orders.dtypes
factual_table= superstore_orders[['Order ID','Customer ID','Postal Code','Product ID','Product Name','Sales','Quantity','Discount','Profit' ]]
Order_table= superstore_orders[['Order ID','Order Date','Ship Date','Ship Mode']]
Order_table1= Order_table.drop_duplicates(subset='Order ID', keep='first', inplace=False)
Customer_table= superstore_orders[['Customer ID','Customer Name','Segment']]
Customer_table1= Customer_table.drop_duplicates(subset='Customer ID', keep='first', inplace=False)
Geographical_table= superstore_orders[['Postal Code','Country','City','State','Region']]
Geographical_table1= Geographical_table.drop_duplicates(subset='Postal Code', keep='first', inplace=False)
Product_table= superstore_orders[['Product ID','Category','Sub-Category','Product Name']]
Product_table1= Product_table.drop_duplicates(subset=['Product ID','Product Name'], keep='first', inplace=False)
Final_factual_data = pd.merge(Order_table1, factual_table, how='left', on='Order ID')
Final_factual_data = pd.merge(Customer_table1, Final_factual_data, how='left', on='Customer ID')
Final_factual_data = pd.merge(Geographical_table1,Final_factual_data,how='left', on='Postal Code')
Final_factual_data = pd.merge(Product_table1,Final_factual_data,how='left', on=['Product ID','Product Name'] )
Output is this format:- Product ID Category Sub-Category Product Name Postal Code Country City State Region Customer ID Customer Name Segment Order ID Order Date Ship Date Ship Mode Sales Quantity Discount Profit
I require reformatting in this order :
Order ID order date ship date ship mode Customer ID cutomer name segment Postal Code country city state reion Product ID Product Name product key cateory subcategory Sales Quantity Discount Profit

Final_factual_data1 = Final_factual_data [['Order ID','Order Date','Ship Date','Ship Mode','Customer ID','Customer Name','Segment','Country','City','State','Postal Code','Region','Product ID','Category','Sub-Category','Product Name','Sales','Quantity','Discount','Profit']]
this code help me to get the desired answer

Just assign the intended ordered sequence to columns attribute:
Final_factual_data.columns = ['Order ID', 'order date', 'ship date', 'ship mode', 'Customer ID', 'cutomer name', 'segment', 'Postal Code', 'country', 'city', 'state reion', 'Product ID', 'Product Name', 'product key', 'cateory', 'subcategory', 'Sales', 'Quantity', 'Discount', 'Profit']

Nested dictionary groups from excel

I'm new in python and openpyxl. I started to learn in order to make my every day tasks easier and faster at my workplace.
Task:
There is an excel file with a lots of rows, looks like this
excel file
I want to create a daily report based on this excel file. In my example Today is 2019/05/08.
Expected result:
Only show the info where the date is match with Today date.
Expected structure:
required outcome
My solution
In my solution I create a list of the rows where I can find only the Today values. After that I read only that rows and create dictionaries. But the result is nothing. I also in a trouble about how to work with multiple keys. Because there are multiple issue numbers are in the list.
from datetime import datetime
import openpyxl
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
from openpyxl.utils import column_index_from_string
#Open excel file
excel_path = "\\REE.xlsx"
wb = openpyxl.load_workbook(excel_path, data_only=True)
ws_1 = wb.worksheets[1]
#The Today date. need some format due to excel date handling
today = datetime.today()
today = today.replace(hour=00, minute=00, second=00, microsecond=00)
#Crate a list of the lines where only Today values are present
issue_line_list = []
for cell in ws_1["B"]:
if cell.value == today:
issue_line = cell.row
issue_line_list.append(issue_line)
#Creare a txt file for output
file = open("daily_report.txt", "w")
#The dict what I want to use
dict = []
issue_numbers_list = []
issue = []
#Create a dict for the issues
for line in issue_line_list:
issue_number_value = ws_1.cell(row = line, column = 3).value
issue_numbers_list.append(issue_number_value)
#Create a dict for other information
for line in issue_line_list:
issue_number_value = ws_1.cell(row = line, column = 3).value
by_value = ws_1.cell(row = line, column = 2 ).value
group_value = ws_1.cell(row = line, column = 4).value
events_value = ws_1.cell(row = line, column = 5).value
deadline_value = ws_1.cell(row = line, column = 6).value
try:
deadline_value = deadline_value.strftime('%Y.%m.%d')
except:
deadline_value = ""
issue.append(issue_number_value)
issue.append(by_value)
issue.append(group_value)
issue.append(events_value)
issue.append(deadline_value)
issue.append(deadline_value)
#Append the two dict
dict.append(issue_numbers_list)
dict.append(issue)
#Save it to the txt file.
file.write(dict)
file.close()
Questions
- How to solve the multiple same key issue?
- How to create nested groups?
- What should add or delete to my code in order to get the expected result?
Remark
Openpyxl is not only option. If you have a bettwer/easier/faster way I open for every idea.
Thank you in advance for you support!

Can you try the following:
import pandas as pd
cols = ['date', 'by', 'issue_number', 'group', 'events', 'deadline']
req_cols = ['events', 'deadline']
data = [
['2019-05-07', 'john', '113140', '#issue_closed', 'something different', ''],
['2019-05-08', 'david', '113140', '#task', 'something different', ''],
['2019-05-08', 'victor', '114761', '#task_result', 'something different', ''],
['2019-05-08', 'john', '114761', '#task', 'something different', '2019-05-10'],
['2019-05-08', 'david', '114761', '#task',
'something different', '2019-05-08'],
['2019-05-08', 'victor', '113140', '#task_result', 'something different', ''],
['2019-05-07', 'john', '113140', '#issue_created',
'something different', '2019-05-09'],
['2019-05-07', 'david', '113140', '#location', 'something different', ''],
['2019-05-07', 'victor', '113140', '#issue_closed', 'something different', 'done'],
['2019-05-07', 'john', '113140', '#task_result', 'something different', ''],
['2019-05-07', 'david', '113140', '#task',
'something different', '2019-05-10'],
]
df = pd.DataFrame(data, columns=cols)
df1 = df.groupby(['issue_number', 'group']).describe()[req_cols].droplevel(0, axis=1)['top']
df1.columns = req_cols
print(df1)
Output:
events deadline
issue_number group
113140 #issue_closed something different done
#issue_created something different 2019-05-09
#location something different
#task something different 2019-05-10
#task_result something different
114761 #task something different 2019-05-08
#task_result something different
To open an excel file, you can do the following:
df = pd.read_excel(excel_path, sheet_name=my_sheet)
req_cols = ['EVENTS', 'DEADLINE']
df1 = df.groupby(['ISSUE NUMBER', 'GROUP']).describe()[req_cols].droplevel(0, axis=1)['top']
df1.columns = req_cols
print(df1)

The task almost solved, but I faced a new issue.
The code:
excel_path = "\\REE.xlsx"
my_sheet = 'Events'
cols = ['DATE', 'BY', 'ISSUE NUMBER', 'GROUP', 'EVENTS', 'DEADLINE']
req_cols = ['EVENTS', 'DEADLINE']
df = pd.read_excel(excel_path, sheet_name = my_sheet, columns=cols)
today = datetime.today().strftime('%Y-%m-%d')
today_filter = (df[(df['DATE'] == today)])
df = pd.DataFrame(today_filter, columns=cols)
df1 = df.groupby(['ISSUE NUMBER', 'GROUP']).describe()[req_cols].droplevel(0, axis=1['top']
df1.columns = req_cols
print(df1)
On the 'BY' column there are same values. eg. '#task'. But the script print only once.
int his case
Required result:
114761
#task Jane another words 2019-05-10
#task result John something
#task John something else 2019-05-08
...
...
...
...
My code result:
114761
#task Jane another words 2019-05-10
#task result John something
...
...
...
John #task something else 2019-05-08 do not print it out. Why?
And there is a some result in other options also. If there are more some values at'BY' column the script print out only the first and skip the rest.

Try to include a column based on input and file name in Pandas Dataframe in Python

I have a several csv files which have the following structure:
Erster Hoch Tief Schlusskurs Stuecke Volumen
Datum
14.02.2017 151.55 152.35 151.05 152.25 110.043 16.687.376
13.02.2017 149.85 152.20 149.25 151.25 415.76 62.835.200
10.02.2017 149.00 150.05 148.65 149.40 473.664 70.746.088
09.02.2017 144.75 148.45 144.35 148.00 642.175 94.348.392
Erster Hoch Tief Schlusskurs Stuecke Volumen
Datum
14.02.2017 111.454 111.776 111.454 111.776 44 4.918
13.02.2017 110.570 110.989 110.570 110.989 122 13.535
10.02.2017 109.796 110.705 109.796 110.705 0 0
09.02.2017 107.993 108.750 107.993 108.750 496 53.933
all are different based on the naming of the file name:
wkn_A1EWWW_historic.csv
wkn_A0YAQA_historic.csv
I want to have the following Output:
Date wkn Open High low Close pieced Volume
14.02.2017 A1EWWW 151.55 152.35 151.05 152.25 110.043 16.687.376
13.02.2017 A1EWWW 149.85 152.20 149.25 151.25 415.76 62.835.200
10.02.2017 A1EWWW 149.00 150.05 148.65 149.40 473.664 70.746.088
09.02.2017 A1EWWW 144.75 148.45 144.35 148.00 642.175 94.348.392
Date wkn Open High low Close pieced Volume
14.02.2017 A0YAQA 111.454 111.776 111.454 111.776 44 4.918
13.02.2017 A0YAQA 110.570 110.989 110.570 110.989 122 13.535
10.02.2017 A0YAQA 109.796 110.705 109.796 110.705 0 0
09.02.2017 A0YAQA 107.993 108.750 107.993 108.750 496 53.933
The code looks like the following:
import pandas as pd
wkn_list_dummy = {'A0YAQA','A1EWWW'}
for w_list in wkn_list_dummy:
url = 'C:/wkn_'+str(w_list)+'_historic.csv'
df = pd.read_csv(url, encoding='cp1252', sep=';', decimal=',', index_col=0)
print(df)
I tried using melt but it was not working.

You can add column by just assigning a value to it:
df['new_column'] = 'string'
All together:
import pandas as pd
wkn_list_dummy = {'A0YAQA','A1EWWW'}
final_df = pd.DataFrame()
for w_list in wkn_list_dummy:
url = 'C:/wkn_'+str(w_list)+'_historic.csv'
df = pd.read_csv(url, encoding='cp1252', sep=';', decimal=',', index_col=0)
df['wkn'] = w_list
final_df = final_df.append(df)
final_df.reset_index(inplace=True)
print(final_df)

How do i select only certain rows based on label in pandas?

Here is my function:
def get_historical_closes(ticker, start_date, end_date):
my_dir = '/home/manish/Desktop/Equity/subset'
os.chdir(my_dir)
dfs = []
for files in glob.glob('*.txt'):
dfs.append(pd.read_csv(files, names = ['Ticker', 'Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Null'], parse_dates = [1]))
p = pd.concat(dfs)
d = p.reset_index(['Date', 'Ticker', 'Close'])
pivoted = d.pivot_table(index = ['Date'], columns =['Ticker'])
pivoted.columns = pivoted.columns.droplevel(0)
return pivoted
closes = get_historical_closes(['LT' or 'HDFC'or 'ACC'], '1999-01-01', '2014-12-31')
My problem is I just want to get data for a few rows namely, data for LT, HDFC and ACC for all the dates, but when I execute the function, I am getting data for all the rows (approx. 1500 nos.)
How can I slice the dataframe, so that I get only selected rows and not the entire dataframe?
Raw input data is a collection of text files as so:
20MICRONS,20150401,36.5,38.95,35.8,37.35,64023,0
3IINFOTECH,20150401,5.9,6.3,5.8,6.2,1602365,0
3MINDIA,20150401,7905,7905,7850,7879.6,310,0
8KMILES,20150401,710.05,721,706,712.9,20196,0
A2ZINFRA,20150401,15.5,16.55,15.2,16,218219,0
AARTIDRUGS,20150401,648.95,665.5,639.65,648.25,42927,0
AARTIIND,20150401,348,349.4,340.3,341.85,122071,0
AARVEEDEN,20150401,42,42.9,41.55,42.3,627,0
ABAN,20150401,422,434.3,419,429.1,625857,0
ABB,20150401,1266.05,1284,1266,1277.45,70294,0
ABBOTINDIA,20150401,3979.25,4009.95,3955.3,3981.25,2677,0
ABCIL,20150401,217.8,222.95,217,221.65,11583,0
ABGSHIP,20150401,225,225,215.3,220.2,237737,0
ABIRLANUVO,20150401,1677,1677,1639.25,1666.7,106336,0
ACC,20150401,1563.7,1591.3,1553.2,1585.9,176063,0
ACCELYA,20150401,932,953.8,923,950.5,4297,0
ACE,20150401,40.1,41.7,40.05,41.15,356130,0
ACROPETAL,20150401,2.75,3,2.7,2.85,33380,0
ADANIENT,20150401,608.8,615.8,603,612.4,868006,0
ADANIPORTS,20150401,308.45,312.05,306.1,310.95,1026200,0
ADANIPOWER,20150401,46.7,48,46.7,47.75,3015649,0
ADFFOODS,20150401,60.5,60.5,58.65,59.75,23532,0
ADHUNIK,20150401,20.95,21.75,20.8,21.2,149431,0
ADORWELD,20150401,224.9,224.9,215.65,219.2,2743,0
ADSL,20150401,19,20,18.7,19.65,35053,0
ADVANIHOTR,20150401,43.1,43.1,43,43,100,0
ADVANTA,20150401,419.9,430.05,418,428,16206,0
AEGISCHEM,20150401,609,668,600,658.4,264828,0
AFL,20150401,65.25,70,65.25,68.65,9507,0
AGARIND,20150401,95,100,87.25,97.45,14387,0
AGCNET,20150401,91.95,93.75,91.4,93,2453,0
AGRITECH,20150401,5.5,6.1,5.5,5.75,540,0
AGRODUTCH,20150401,2.7,2.7,2.6,2.7,451,0
AHLEAST,20150401,196,202.4,185,192.25,357,0
AHLUCONT,20150401,249.5,258.3,246,251.3,44541,0
AHLWEST,20150401,123.9,129.85,123.9,128.35,688,0
AHMEDFORGE,20150401,229.5,237.35,228,231.45,332680,0
AIAENG,20150401,1268,1268,1204.95,1214.1,48950,0
AIL,20150401,735,747.9,725.1,734.8,31780,0
AJANTPHARM,20150401,1235,1252,1207.05,1223.3,126442,0
AJMERA,20150401,118.7,121.9,117.2,118.45,23005,0
AKSHOPTFBR,20150401,14.3,14.8,14.15,14.7,214028,0
AKZOINDIA,20150401,1403.95,1412,1392,1400.7,17115,0
ALBK,20150401,99.1,101.65,99.1,101.4,2129046,0
ALCHEM,20150401,27.9,32.5,27.15,31.6,32338,0
ALEMBICLTD,20150401,34.6,36.7,34.3,36.45,692688,0
ALICON,20150401,280,288,279.05,281.05,5937,0
ALKALI,20150401,31.6,34.2,31.6,33.95,4663,0
ALKYLAMINE,20150401,314,334,313.1,328.8,1515,0
ALLCARGO,20150401,317,323.5,315,319.15,31056,0
ALLSEC,20150401,21.65,22.5,21.6,21.6,435,0
ALMONDZ,20150401,10.6,10.95,10.5,10.75,23600,0
ALOKTEXT,20150401,7.5,8.2,7.4,7.95,8145264,0
ALPA,20150401,11.85,11.85,10.75,11.8,3600,0
ALPHAGEO,20150401,384.3,425.05,383.95,419.75,13308,0
ALPSINDUS,20150401,1.85,1.85,1.85,1.85,1050,0
ALSTOMT&D,20150401,585.85,595,576.65,588.4,49234,0
AMARAJABAT,20150401,836.5,847.75,831,843.9,121150,0
AMBIKCO,20150401,790,809,780.25,802.6,4879,0
AMBUJACEM,20150401,254.95,261.4,253.4,260.25,1346375,0
AMDIND,20150401,20.5,22.75,20.5,22.3,693,0
AMRUTANJAN,20150401,480,527.05,478.35,518.3,216407,0
AMTEKAUTO,20150401,144.5,148.45,144.2,147.45,552874,0
AMTEKINDIA,20150401,55.6,58.3,55.1,57.6,700465,0
AMTL,20150401,13.75,14.45,13.6,14.45,2111,0
ANANTRAJ,20150401,39.9,40.3,39.35,40.05,376564,0
ANDHRABANK,20150401,78.35,80.8,78.2,80.55,993038,0
ANDHRACEMT,20150401,8.85,9.3,8.75,9.1,15848,0
ANDHRSUGAR,20150401,92.05,98.95,91.55,96.15,11551,0
ANGIND,20150401,36.5,36.9,35.6,36.5,34758,0
ANIKINDS,20150401,22.95,24.05,22.95,24.05,1936,0
ANKITMETAL,20150401,2.85,3.25,2.85,3.15,29101,0
ANSALAPI,20150401,23.45,24,23.45,23.8,76723,0
ANSALHSG,20150401,29.9,29.9,28.75,29.65,7748,0
ANTGRAPHIC,20150401,0.1,0.15,0.1,0.15,23500,0
APARINDS,20150401,368.3,375.6,368.3,373.45,2719,0
APCOTEXIND,20150401,505,505,481.1,495.85,3906,0
APLAPOLLO,20150401,411.5,434,411.5,428.65,88113,0
APLLTD,20150401,458.9,464,450,454.7,72075,0
APOLLOHOSP,20150401,1351,1393.85,1351,1390,132827,0
APOLLOTYRE,20150401,169.65,175.9,169,175.2,3515274,0
APOLSINHOT,20150401,195,197,194.3,195.2,71,0
APTECHT,20150401,57.6,61,57,59.7,206475,0
ARCHIDPLY,20150401,32.95,35.8,32.5,35.35,103036,0
ARCHIES,20150401,19.05,19.4,18.8,19.25,46840,0
ARCOTECH,20150401,342.5,350,339.1,345.2,44142,0
ARIES,20150401,106.75,113.9,105,112.7,96825,0
ARIHANT,20150401,43.5,50,43.5,49.3,1647,0
AROGRANITE,20150401,61.5,62,59.55,60.15,2293,0
ARROWTEX,20150401,25.7,27.8,25.1,26.55,17431,0
ARSHIYA,20150401,39.55,41.5,39,40,69880,0
ARSSINFRA,20150401,34.65,36.5,34.6,36.3,71442,0
ARVIND,20150401,260.85,268.2,259,267.2,1169433,0
ARVINDREM,20150401,15.9,17.6,15.5,17.6,5407412,0
ASAHIINDIA,20150401,145,145,141,142.45,16240,0
ASAHISONG,20150401,113,116.7,112.15,115.85,5475,0
ASAL,20150401,45.8,45.8,38,43.95,7429,0
ASHAPURMIN,20150401,74,75.4,74,74.05,36406,0
ASHIANA,20150401,248,259,246.3,249.5,21284,0
ASHIMASYN,20150401,8.4,8.85,8.05,8.25,3253,0
ASHOKA,20150401,175.1,185.4,175.1,183.75,1319134,0
ASHOKLEY,20150401,72.7,74.75,72.7,74.05,17233199,0
ASIANHOTNR,20150401,104.45,107.8,101.1,105.15,780,0
ASIANPAINT,20150401,810,825.9,803.5,821.7,898480,0
ASIANTILES,20150401,116.25,124.4,116.25,123.05,31440,0
ASSAMCO,20150401,4.05,4.3,4.05,4.3,476091,0
ASTEC,20150401,148.5,154.5,146,149.2,322308,0
ASTRAL,20150401,447.3,451.3,435.15,448.6,64889,0
ASTRAMICRO,20150401,146.5,151.9,145.2,150.05,735681,0
ASTRAZEN,20150401,908,940.95,908,920.35,3291,0
ATFL,20150401,635,648,625.2,629.25,6202,0
ATLANTA,20150401,67.2,71,67.2,68.6,238683,0
ATLASCYCLE,20150401,203.9,210.4,203,208.05,25208,0
ATNINTER,20150401,0.2,0.2,0.2,0.2,1704,0
ATUL,20150401,1116,1160,1113,1153.05,32969,0
ATULAUTO,20150401,556.55,576.9,555.9,566.25,59117,0
AURIONPRO,20150401,192.3,224.95,191.8,217.55,115464,0
AUROPHARMA,20150401,1215,1252,1215,1247.4,1140111,0
AUSOMENT,20150401,22.6,22.6,21.7,21.7,2952,0
AUSTRAL,20150401,0.5,0.55,0.5,0.5,50407,0
AUTOAXLES,20150401,834.15,834.15,803,810.2,4054,0
AUTOIND,20150401,60,65,59.15,63.6,212036,0
AUTOLITIND,20150401,36,39,35.2,37.65,14334,0
AVTNPL,20150401,27,28,26.7,27.9,44803,0
AXISBANK,20150401,557.7,572,555.25,569.65,3753262,0
AXISCADES,20150401,335.4,345,331.4,339.65,524538,0
AXISGOLD,20150401,2473.95,2493,2461.1,2483.15,138,0
BAFNAPHARM,20150401,29.95,31.45,29.95,30.95,21136,0
BAGFILMS,20150401,3.05,3.1,2.9,3,31278,0
BAJAJ-AUTO,20150401,2027.05,2035,2002.95,2019.8,208545,0
BAJAJCORP,20150401,459,482,454,466.95,121972,0
BAJAJELEC,20150401,230,234.8,229,232.4,95432,0
BAJAJFINSV,20150401,1412,1447.5,1396,1427.55,44811,0
BAJAJHIND,20150401,14.5,14.8,14.2,14.6,671746,0
BAJAJHLDNG,20150401,1302.3,1329.85,1285.05,1299.9,24626,0
BAJFINANCE,20150401,4158,4158,4062.2,4140.05,12923,0
BALAJITELE,20150401,65.75,67.9,65.3,67.5,47063,0
BALAMINES,20150401,81.5,83.5,81.5,83.45,6674,0
BALKRISIND,20150401,649,661,640,655,16919,0
BALLARPUR,20150401,13.75,13.95,13.5,13.9,271962,0
BALMLAWRIE,20150401,568.05,580.9,562.2,576.75,17423,0
BALPHARMA,20150401,68.9,74.2,67.1,68.85,84178,0
BALRAMCHIN,20150401,50.95,50.95,49.3,50,84400,0
BANARBEADS,20150401,33,39.5,33,39.25,1077,0
BANARISUG,20150401,834.7,855,820,849.85,618,0
BANCOINDIA,20150401,105,107.5,103.25,106.8,11765,0
BANG,20150401,6.2,6.35,6.1,6.35,9639,0
BANKBARODA,20150401,162.75,170.4,162.05,168.9,2949846,0
BANKBEES,20150401,1813.45,1863,1807,1859.78,19071,0
BANKINDIA,20150401,194.6,209.8,194.05,205.75,3396490,0
BANSWRAS,20150401,65,65,60.1,63.9,6238,0
BARTRONICS,20150401,11.45,11.85,11.35,11.6,109658,0
BASF,20150401,1115,1142,1115,1124.65,14009,0
BASML,20150401,184,192,183.65,191.6,642,0
BATAINDIA,20150401,1095,1104.9,1085,1094.7,137166,0
BAYERCROP,20150401,3333,3408.3,3286.05,3304.55,8839,0
BBL,20150401,627.95,641.4,622.2,629.8,5261,0
BBTC,20150401,441,458,431.3,449.15,141334,0
BEDMUTHA,20150401,16.85,18,16.25,17.95,16412,0
BEL,20150401,3355,3595,3350,3494.2,582755,0
BEML,20150401,1100,1163.8,1086,1139.2,631231,0
BEPL,20150401,22.1,22.45,21.15,22.3,5459,0
BERGEPAINT,20150401,209.3,216.9,208.35,215.15,675963,0
BFINVEST,20150401,168.8,176.8,159.5,172.7,113352,0
BFUTILITIE,20150401,707.4,741,702.05,736.05,1048274,0
BGLOBAL,20150401,2.9,3.05,2.9,3.05,16500,0
BGRENERGY,20150401,117.35,124,117.35,122.3,207979,0
BHAGYNAGAR,20150401,17.9,17.9,16.95,17.5,1136,0
BHARATFORG,20150401,1265.05,1333.1,1265.05,1322.6,704419,0
BHARATGEAR,20150401,73.5,77.7,72.7,75.9,13730,0
BHARATRAS,20150401,810,840,800,821.4,981,0
BHARTIARTL,20150401,393.3,404.85,393.05,402.3,5494883,0
BHEL,20150401,235.8,236,229.6,230.7,3346075,0
BHUSANSTL,20150401,65.15,67.9,63.65,64,1108540,0
BIL,20150401,401.3,422,401.3,419.35,2335,0
BILENERGY,20150401,0.8,0.95,0.8,0.95,8520,0
BINANIIND,20150401,90.55,93.95,90.2,93.3,27564,0
BINDALAGRO,20150401,23.4,23.4,22.25,22.8,111558,0
BIOCON,20150401,472.5,478.85,462.7,466.05,1942983,0
BIRLACORPN,20150401,415,420,402.8,414.7,11345,0
BIRLACOT,20150401,0.05,0.1,0.05,0.1,439292,0
BIRLAERIC,20150401,52.3,54.45,52.15,53.7,9454,0
BIRLAMONEY,20150401,24.35,28.85,23.9,28.65,78710,0
BLBLIMITED,20150401,3.7,3.7,3.65,3.65,550,0
BLISSGVS,20150401,128,132.55,124.3,126.15,261958,0
BLKASHYAP,20150401,13.7,15.15,13.7,14.15,118455,0
BLUEDART,20150401,7297.35,7315,7200,7285.55,2036,0
BLUESTARCO,20150401,308.75,315,302,311.35,19046,0
BLUESTINFO,20150401,199,199.9,196.05,199.45,1268,0
BODALCHEM,20150401,34.5,34.8,33.05,34.65,65623,0
BOMDYEING,20150401,64,66.3,63.7,65.95,1168851,0
BOSCHLTD,20150401,25488,25708,25201,25570.7,16121,0
BPCL,20150401,810.95,818,796.5,804.2,1065969,0
BPL,20150401,30.55,32.5,30.55,31.75,116804,0
BRFL,20150401,146,147.9,142.45,144.3,7257,0
BRIGADE,20150401,143.8,145.15,140.25,144.05,36484,0
BRITANNIA,20150401,2155.5,2215.3,2141.35,2177.55,245908,0
BROADCAST,20150401,3.35,3.5,3.3,3.3,4298,0
BROOKS,20150401,38.4,39.5,38.4,39.3,19724,0
BSELINFRA,20150401,1.9,2.15,1.85,2.05,97575,0
BSL,20150401,29.55,31.9,27.75,31,9708,0
BSLGOLDETF,20150401,2535,2535,2501.5,2501.5,122,0
BSLIMITED,20150401,27.5,27.5,25.45,27.15,728818,0
BURNPUR,20150401,9.85,9.85,9.1,9.15,144864,0
BUTTERFLY,20150401,190.95,194,186.1,192.35,25447,0
BVCL,20150401,17.25,17.7,16.5,17.7,9993,0
CADILAHC,20150401,1755,1796.8,1737.05,1790.15,302149,0
CAIRN,20150401,213.85,215.6,211.5,213.35,841463,0
CAMLINFINE,20150401,89.5,91.4,87.5,91.1,32027,0
CANBK,20150401,366.5,383.8,365.15,381,1512605,0
CANDC,20150401,20.6,24.6,20.6,23.25,9100,0
CANFINHOME,20150401,611.1,649.95,611.1,644.7,72233,0
CANTABIL,20150401,47.6,50.5,47.6,50.25,5474,0
CAPF,20150401,398.85,427,398,421.75,224074,0
CAPLIPOINT,20150401,1020,1127.8,1020,1122.65,108731,0
CARBORUNIV,20150401,191.05,197,188.35,190,42681,0
CAREERP,20150401,151.9,156.6,149,153.25,26075,0
CARERATING,20150401,1487,1632.75,1464,1579.2,65340,0
CASTROLIND,20150401,476,476.25,465.1,467.3,185850,0
CCCL,20150401,4.2,4.7,4.2,4.65,47963,0
CCHHL,20150401,10.8,11,10.4,10.8,69325,0
CCL,20150401,178.35,185.9,176,184.3,244917,0
CEATLTD,20150401,805.25,830.8,785.75,826.7,501415,0
CEBBCO,20150401,18.3,20.25,18.1,19.85,40541,0
CELEBRITY,20150401,11.5,12.5,11.5,12.1,5169,0
CELESTIAL,20150401,59.9,61.8,59.5,60.05,128386,0
CENTENKA,20150401,152,159.9,148.2,157.1,16739,0
CENTEXT,20150401,1.5,1.5,1.2,1.25,19308,0
CENTRALBK,20150401,106,107.2,104.3,106.3,992782,0
CENTUM,20150401,756.85,805,756.8,801.9,26848,0
CENTURYPLY,20150401,234,245,234,243.45,367540,0
CENTURYTEX,20150401,633.6,682.4,631,675.35,3619413,0
CERA,20150401,2524.75,2524.75,2470,2495.3,6053,0
CEREBRAINT,20150401,15.6,16.2,14.65,14.8,348478,0
CESC,20150401,604.95,613.4,595.4,609.75,294334,0
CGCL,20150401,173,173,173,173,9,0
CHAMBLFERT,20150401,70.2,73.4,70.2,72.65,2475030,0
CHEMFALKAL,20150401,72.8,77,72,76.3,1334,0
CHENNPETRO,20150401,69,70.35,68.3,68.95,160576,0
CHESLINTEX,20150401,10.1,10.1,8.75,9.4,1668,0
CHOLAFIN,20150401,599.85,604,582.15,598.2,23125,0
CHROMATIC,20150401,3.4,4.05,3,3.3,63493,0
CIGNITITEC,20150401,433,444.95,432,440,32923,0
CIMMCO,20150401,92,94.05,91,94.05,19931,0
CINELINE,20150401,14.5,14.95,14.5,14.9,4654,0
CINEVISTA,20150401,3.3,3.3,3.3,3.3,10,0
CIPLA,20150401,714,716.5,703.85,709.6,1693796,0
CLASSIC,20150401,1.5,1.55,1.45,1.45,7770,0
CLNINDIA,20150401,824.7,837.9,819,828.8,6754,0
CLUTCHAUTO,20150401,13.75,13.75,13.6,13.6,1414,0
CMAHENDRA,20150401,9.35,9.5,8.9,9.15,1005172,0
CMC,20150401,1925.85,1925.85,1891,1907.25,153068,0
CNOVAPETRO,20150401,20,22.75,17.1,22.75,1656,0
COALINDIA,20150401,362.9,364.25,358,363,1428949,0
COLPAL,20150401,2003.4,2009.9,1990.05,2002.5,92909,0
COMPUSOFT,20150401,9.4,10.05,9,9.7,15083,0
CONCOR,20150401,1582.35,1627.3,1561,1582.85,182280,0
CONSOFINVT,20150401,36.55,40,36.5,40,439,0
CORDSCABLE,20150401,25.55,28,24.1,25.8,15651,0
COREEDUTEC,20150401,8,8.85,7.6,8.4,890455,0
COROMANDEL,20150401,268.5,271.35,266.15,268.35,42173,0
CORPBANK,20150401,52.5,55,52.05,54.1,1141752,0
COSMOFILMS,20150401,76.9,80,76.2,79.25,21020,0
COUNCODOS,20150401,1.2,1.2,1.2,1.2,2850,0
COX&KINGS,20150401,323,324.85,316.5,317.8,76998,0
CPSEETF,20150401,24.2,24.37,24.08,24.34,180315,0
CREATIVEYE,20150401,3.4,3.6,2.8,3.45,8545,0
CRISIL,20150401,2049,2052.45,2000,2030.7,3928,0
CROMPGREAV,20150401,164.85,167.4,163.2,166.1,2739478,0
CTE,20150401,18.55,18.55,16.85,17.05,8260,0
CUB,20150401,97.35,98.75,96.4,98.3,182702,0
CUMMINSIND,20150401,879,900.95,874.75,889.9,358652,0
CURATECH,20150401,10.8,11,9.75,10,755,0
CYBERTECH,20150401,28.5,33.45,28.1,33.4,103549,0
CYIENT,20150401,509.9,515,495.1,514.1,30415,0
DAAWAT,20150401,105,112.25,99.5,108.4,26689,0
DABUR,20150401,266.5,268.5,264.65,266.55,642177,0
DALMIABHA,20150401,428.15,439.9,422.5,432.65,9751,0
DALMIASUG,20150401,17.5,17.5,16.45,17.15,12660,0
DATAMATICS,20150401,66.5,75,66,72.15,119054,0
DBCORP,20150401,378,378,362.6,369.45,8799,0
DBREALTY,20150401,67,67.15,65.8,66.3,212297,0
DBSTOCKBRO,20150401,47.6,47.65,47.45,47.55,24170,0
DCBBANK,20150401,110.95,114.95,110.15,114.45,935858,0
DCM,20150401,84.5,88.75,84.1,87,34747,0
DCMSHRIRAM,20150401,107.95,114.3,107.95,112.8,29474,0
DCW,20150401,16.75,17.2,16.65,17.15,270502,0
DECCANCE,20150401,310.05,323.9,310.05,321.55,446,0
DECOLIGHT,20150401,1.45,1.45,1.4,1.4,1100,0
DEEPAKFERT,20150401,140,144,138.25,139.95,162156,0
DEEPAKNTR,20150401,68,70.65,66.4,69.95,8349,0
DEEPIND,20150401,46.6,54.4,46.3,51.9,52130,0
DELTACORP,20150401,79.95,82.75,79.75,82.35,889247,0
DELTAMAGNT,20150401,36.6,37.45,36.6,37.45,60,0
DEN,20150401,121.45,127,121.2,122.4,59512,0
DENABANK,20150401,50.8,51.5,50.1,51.35,376680,0
DENORA,20150401,136.7,136.7,131.05,133.6,743,0
DHAMPURSUG,20150401,36.8,36.95,34.85,36.35,38083,0
DHANBANK,20150401,30.8,32.1,30.5,31.75,195779,0
DHANUKA,20150401,690,690,652,660.15,24958,0
DHARSUGAR,20150401,14.15,14.7,13.8,14.45,1748,0
DHFL,20150401,468.9,474.9,461.6,467.85,448551,0
DHUNINV,20150401,97.15,103,94.5,99.85,15275,0
DIAPOWER,20150401,44.9,45.95,43.3,45.55,126085,0
DICIND,20150401,343,347,341,341.95,7745,0
DIGJAM,20150401,8,8.15,7.75,8.05,96467,0
DISHMAN,20150401,168,172.65,164.7,171.8,778414,0
DISHTV,20150401,82.2,84.85,81.35,84.15,5845850,0
DIVISLAB,20150401,1770.1,1809,1770.1,1802.35,68003,0
DLF,20150401,157,160.9,156.2,159.7,3098216,0
DLINKINDIA,20150401,165.05,168,162.2,164.75,22444,0
DOLPHINOFF,20150401,120.8,134.4,119.5,130.2,190716,0
DONEAR,20150401,15,15.95,14.5,15.35,679,0
DPL,20150401,46.6,49,44,45.45,25444,0
DPSCLTD,20150401,17.15,17.15,16.55,16.85,916,0
DQE,20150401,24.3,24.8,22.75,23.1,57807,0
DRDATSONS,20150401,5.8,6.1,5.7,6,2191357,0
DREDGECORP,20150401,374.9,403,372.65,393.4,106853,0
DRREDDY,20150401,3541,3566.8,3501.7,3533.65,282785,0
DSKULKARNI,20150401,77.6,77.6,74,77.1,3012,0
DSSL,20150401,9.5,9.5,9.5,9.5,50,0
DTIL,20150401,206.95,231.75,205.95,219.05,1437,0
DUNCANSLTD,20150401,15.55,16.3,15.3,15.85,740,0
DWARKESH,20150401,21,21,19.85,20.7,9410,0
DYNAMATECH,20150401,3868,4233,3857.1,3920.55,59412,0
DYNATECH,20150401,2.85,3,2.85,3,3002,0
EASTSILK,20150401,1.55,1.85,1.55,1.75,9437,0
EASUNREYRL,20150401,40.05,43,40.05,42.55,21925,0
ECEIND,20150401,136,148,127,133.85,43034,0
ECLERX,20150401,1603.8,1697,1595,1600.65,123468,0
EDELWEISS,20150401,63.65,67.5,63,66.6,451255,0
EDL,20150401,23.9,25,23.9,24.4,7799,0
EDUCOMP,20150401,12.45,13.55,12.35,13.55,499009,0
EICHERMOT,20150401,15929,16196.95,15830.05,16019.5,45879,0
EIDPARRY,20150401,174.05,175.8,168.65,171.2,56813,0
EIHAHOTELS,20150401,228,232.8,225,228,85,0
EIHOTEL,20150401,107.25,110,107.25,109.5,57306,0
EIMCOELECO,20150401,399,409.5,399,409.5,184,0
EKC,20150401,9.35,11.15,9.35,11.05,350782,0
ELAND,20150401,14.3,16.45,14.3,16.25,191406,0
ELDERPHARM,20150401,90.5,91.5,89.45,91.5,23450,0
ELECON,20150401,66.5,76.2,66.25,74.45,6045416,0
ELECTCAST,20150401,19.8,20.55,18.9,19.4,1956889,0
ELECTHERM,20150401,25.9,25.9,22.2,24,14611,0
ELGIEQUIP,20150401,147.5,150.4,146.4,150,9475,0
....
ZENITH, 20150401,...

I use EdChum code from his comment and add some clarification. I think the main problem is d is output dataframe d cannot be looped in cycle for, if you need one output from all *.txt files.
import pandas as pd
import glob
def get_historical_closes(ticker, start_date, end_date):
dfs = []
#create empty df for output
d = pd.DataFrame()
#glob can use path with *.txt - see http://stackoverflow.com/a/3215392/2901002
for files in glob.glob('/home/manish/Desktop/Equity/subset/*.txt'):
#added index_col for multiindex df
dfs.append(pd.read_csv(files, index_col=['Date', 'Ticker', 'Close'], names = ['Ticker', 'Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Null'], parse_dates = [1]))
p = pd.concat(dfs)
#d is output from all .txt files, so cannot be looped in cycle for
d = p.reset_index(['Date', 'Ticker', 'Close'])
d = d[(d['Ticker'].isin(ticker)) & (d['Date'] > start_date) & (d['Date'] < end_date)]
pivoted = d.pivot_table(index = ['Date'], columns =['Ticker'])
pivoted.columns = pivoted.columns.droplevel(0)
return pivoted
#function isin need list of columns, so 'or' can be replaced by ','
#arguments are changed for testing: 'HDFC' to 'AGCNET' and end_date '2014-12-31' to '2015-12-31'
closes = get_historical_closes(['LT','AGCNET','ACC'], '1999-01-01', '2015-12-31')
print closes

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Group duplicate rows with different column values then send to csv - python

Related

how to use the input with pandas to get all the value.count linked to this input

how to merge two or more list in custom order in python

Nested dictionary groups from excel

Try to include a column based on input and file name in Pandas Dataframe in Python

How do i select only certain rows based on label in pandas?

Categories

Resources