How to customise my csv file using python instead of Excel macro - python

I want to store my python code result in to csv file, but here is my python code i am not show my python result in my csv file
I have converted the macro vb file to python... Any advise would be more appreciated, because I am new to this.
*unable to enter full code due to site error.
Please find my code
import pandas as pd
import csv
import numpy as np
from vb2py.vbfunctions import *
from vb2py.vbdebug import *
def My_custom_MACRO():
#
# My_custom_MACRO Macro
#
#
Range('A1:A2').Select()
Range('A2').Activate()
Columns('A:A').EntireColumn.AutoFit()
Columns('G:G').Select()
Selection.Insert(Shift=xlToRight, CopyOrigin=xlFormatFromLeftOrAbove)
Columns('P:P').EntireColumn.AutoFit()
Columns('P:P').Select()
Selection.Cut(Destination=Columns('G:G'))
Range('G53').Select()
ActiveWindow.SmallScroll(Down=- 45)
Range('G1').Select()
ActiveCell.FormulaR1C1 = 'Live Deli'
ActiveWindow.SmallScroll(Down=- 9)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=7, Criteria1='>50', Operator=xlAnd)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=4, Criteria1='>50', Operator=xlAnd)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=7, Criteria1='>50', Operator=xlAnd)
ActiveWindow.SmallScroll(Down=0)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=4, Criteria1='>50%', Operator=xlAnd)
df_1=pd.read_csv(r'D:\proj\project.csv',My_custom_MACRO)
df_1.to_csv(r'D:\proj\project_output.csv')

I don't see sort in your macro just a column move and filter. Try this ;
import pandas as pd
df = pd.read_csv(r'c:\temp\project.csv')
cols = df.columns
# cut column P paste column G as Live Deli
colP = df.pop(cols[14])
df.insert(14,'','')
df.insert(6,'Live Deli',colP)
# apply filter to col 4 and 7
df1 = df.loc[ (df[cols[3]] > 0.5) & (df['Live Deli'] > 50) ]
# save
df1.to_csv(r'c:\temp\project_output.csv', index=False)

Related

how to write pandas dataframe in libreoffice calc using python?

I am using the following code in ubuntu 20.
import pyoo
import os
import uno
import pandas as pd
os.system("/usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nofirststartwizard --nologo --norestore --accept='socket,host=localhost,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext'")
df=pd.Dataframe()
df['Name']=['Anil','Raju','Arun']
df['Age']=['32','34','45']
desktop = pyoo.Desktop('localhost', 2002)
doc = desktop.open_spreadsheet("/home/vivek/Documents/Libre python trial/oi_data.ods")
sh1=doc.sheets['oi_data']
sh1[1,4].value=df
doc.save()
It gives all data in a single cell as a string:
'Name age0 Anil 321 Raju 342 Arun 45'
I want to write a DataFrame in LibreOffice Calc in columns & rows of sheet like this :
Name age
0 Anil 32
1 Raju 34
2 Arun 45
example code used in xlwings in window os just for reference (I want to achieve same with simple code in Libreoffice calc in ubuntu/Linux, if possible..)
import pandas as pd
import xlwings as xlw
# Connecting with excel workbook
file=xlw.Book("data.xlsx")
# connection with excel sheet
sh1=file.sheets('sheet1')
df=pd.DataFrame()
df['Name']=['Anil','Raju','Arun']
df['Age']=['32','34','45']
sh1.range('A4').value=df
From the pyoo documentation, a range of values is set with a list of lists.
sheet[1:3,0:2].values = [[3, 4], [5, 6]]
To get a list of lists from a dataframe, the following code is recommended at How to convert a Python Dataframe to List of Lists? :
lst = [df[i].tolist() for i in df.columns]
EDIT:
Write a function called insertDf() that does the two things above, calculating the required indices.
Then instead of sh1.range('A4').value=df, write insertDf(df,'A4',sh1).
Or perhaps more elegant is to create a class called CalcDataFrame that extends pandas.DataFrame to add a method called writeCells().
Also, it would be easier to write location arguments as (row number, column number) instead of a 'column letters&row number' combined string.
df = CalcDataFrame()
df['Name']=['Anil','Raju','Arun']
df['Age']=['32','34','45']
df.writeCells(sh1,1,4)

Reading csv ,saving in dataframe, putting if condition, not getting exectped result

I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")

Count occurrences of number from specific column in python

I am trying to do the equivalent of a COUNTIF() function in excel. I am stuck at how to tell the .count() function to read from a specific column in excel.
I have
df = pd.read_csv('testdata.csv')
df.count('1')
but this does not work, and even if it did it is not specific enough.
I am thinking I may have to use read_csv to read specific columns individually.
Example:
Column name
4
4
3
2
4
1
the function would output that there is one '1' and I could run it again and find out that there are three '4' answers. etc.
I got it to work! Thank you
I used:
print (df.col.value_counts().loc['x']
Here is an example of a simple 'countif' recipe you could try:
import pandas as pd
def countif(rng, criteria):
return rng.eq(criteria).sum()
Example use
df = pd.DataFrame({'column1': [4,4,3,2,4,1],
'column2': [1,2,3,4,5,6]})
countif(df['column1'], 1)
If all else fails, why not try something like this?
import numpy as np
import pandas
import matplotlib.pyplot as plt
df = pandas.DataFrame(data=np.random.randint(0, 100, size=100), columns=["col1"])
counters = {}
for i in range(len(df)):
if df.iloc[i]["col1"] in counters:
counters[df.iloc[i]["col1"]] += 1
else:
counters[df.iloc[i]["col1"]] = 1
print(counters)
plt.bar(counters.keys(), counters.values())
plt.show()

How to use Pandas within Python to create a Fiscal Year Column?

I have a code with python that cleans a .csv up before I append it to another data set. It is missing a couple columns so I have been trying to figure how to use Pandas to add the column and fill the rows.
I currently have a column DiscoveredDate in a format of 10/1/2017 12:49.
What I'm trying to do is take that column and anything from the date range 10/1/2016-10/1/2017 have a column FedFY have its row filled with 2017 and like wise for 2018.
Below is my current script minus a few different column cleanups.
import os
import re
import pandas as pd
import Tkinter
import numpy as np
outpath = os.path.join(os.getcwd(), "CSV Altered")
# TK asks user what file to assimilate
from Tkinter import Tk
from tkFileDialog import askopenfilename
Tk().withdraw()
filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file
#Filepath is acknowledged and disseminated with the following totally human protocols
filenames = os.path.basename(filepath)
filename = [filenames]
for f in filename:
name = f
df = pd.read_csv(f)
# Make Longitude values negative if they aren't already.
df['Longitude'] = - df['Longitude'].abs()
# Add Federal Fiscal Year Field (FedFY)
df['FedFY'] = df['DiscoveredDate']
df['FedFY'] = df['FedFY'].replace({df['FedFY'].date_range(10/1/2016 1:00,10/1/2017 1:00): "2017",df['FedFY'].date_range(10/1/2017 1:00, 10/1/2018 1:00): "2018"})
I also tried this but figured I was completely fudging it up.
for rows in df['FedFY']:
if rows = df['FedFY'].date_range(10/1/2016 1:00, 10/1/2017 1:00):
then df['FedFY'] = df['FedFY'].replace({rows : "2017"})
elif df['FedFY'] = df['FedFY'].replace({rows : "2018"})
How should I go about this efficiently? Is it just my syntax messing me up? Or do I have it all wrong?
[Edited for clarity in title and throughout.]
Ok thanks to DyZ I am making progress; however, I figured out a much simpler way to do so, that figures all years.
Building on his np.where, I:
From datetime import datetime
df['Date'] = pd.to_datetime(df['DiscoveredDate'])
df['CalendarYear'] = df['Date'].dt.year
df['Month'] = df.Date.dt.month
c = pd.to_numeric(df['CalendarYear'])
And here is the magic line.
df['FedFY'] = np.where(df['Month'] >= 10, c+1, c)
To Mop up I added a line to get it back into date time format from numeric.
df['FedFY'] = (pd.to_datetime(df['FedFY'], format = '%Y')).dt.year
This is what really crossed the bridge for me Create a column based off a conditional with pandas.
Edit: Forgot to mention to import date time for .dt stuff
If you are concerned only with these two FYs, you can compare your date directly to the start/end dates:
df["FedFY"] = np.where((df.DiscoveredDate < pd.to_datetime("10/1/2017")) &\
(df.DiscoveredDate > pd.to_datetime("10/1/2016")),
2017, 2018)
Any date before 10/1/2016 will be labeled incorrectly! (You can fix this by adding another np.where).
Make sure that the start/end dates are correctly included or not included (change < and/or > to <= and >=, if necessary).

Convert pandas.TimeSeries to R.ts

I have some pandas TimeSeries with date index:
import pandas as pd
import numpy as np
pandas_ts = pd.TimeSeries(np.random.randn(100),pd.date_range(start='2000-01-01', periods=100))
I need convert it to R TS (like sunspots dataset) to call some R function (slt) with my TS, which works only with timeseries. But i found that in pandas.rpy and rpy2 API's there is only DataFrame support. Is there another way to do this?
If there is no such I can convert TS to DataFrame in python, then convert it to R DF and convert it to TS in R but I have some troubles at last step because i'm new in R.
Any ideas or help in converting in R? =)
I am not a pandas proficient , But you can save you pandas time series to csv file and read it from R.
Python:
## write data
with open(PATH_CSV_FILE,"w") as file:
pandas_ts.to_csv(file)
## read data
with open(PATH_CSV_FILE,"r") as file:
pandas_ts.from_csv(file)
R:
library(xts)
## to read data
ts.xts <- read.zoo(PATH_CSV_FILE,index=0)
## to save data
write.zoo(ts.xts,PATH_CSV_FILE)
The easiest might just be to use the R function ts() in a call corresponding to your pandas.date_range() call.
from rpy2.robjects.packages import importr
stats = importr('stats')
from rpy2.robjects.vectors import IntVector
# The time series created in the question is:
# pd.date_range(start='2000-01-01', periods=100)
stats.ts(IntVector(range(100)), start=IntVector((2000, 1, 1)))
Inspired by the answers given here already, I created a small function for conversion of an existing Pandas time series towards an R time series. It might be usefull to more of you. Feel free to further improve and edit my contribution.
def pd_ts2r_ts(pd_ts):
'''Pandas timeseries (pd_ts) to R timeseries (r_ts) conversion
'''
from rpy2.robjects.vectors import IntVector,FloatVector
rstats = rpackages.importr('stats')
r_start = IntVector((pd_ts.index[0].year,pd_ts.index[0].month,pd_ts.index[0].day))
r_end = IntVector((pd_ts.index[-1].year,pd_ts.index[-1].month,pd_ts.index[-1].day))
freq_pandas2r_ts = {
# A dictionary for converting pandas.Series frequencies into R ts frequencies
'D' : 365, # is this correct, how about leap-years?
'M' : 12,
'Y' : 1,
}
r_freq = freq_pandas2r_ts[pd_ts.index.freqstr]
result = rstats.ts(FloatVector(pd_ts.values),start=r_start,end=r_end,frequency=r_freq)
return result

Categories