I want to store my python code result in to csv file, but here is my python code i am not show my python result in my csv file
I have converted the macro vb file to python... Any advise would be more appreciated, because I am new to this.
*unable to enter full code due to site error.
Please find my code
import pandas as pd
import csv
import numpy as np
from vb2py.vbfunctions import *
from vb2py.vbdebug import *
def My_custom_MACRO():
#
# My_custom_MACRO Macro
#
#
Range('A1:A2').Select()
Range('A2').Activate()
Columns('A:A').EntireColumn.AutoFit()
Columns('G:G').Select()
Selection.Insert(Shift=xlToRight, CopyOrigin=xlFormatFromLeftOrAbove)
Columns('P:P').EntireColumn.AutoFit()
Columns('P:P').Select()
Selection.Cut(Destination=Columns('G:G'))
Range('G53').Select()
ActiveWindow.SmallScroll(Down=- 45)
Range('G1').Select()
ActiveCell.FormulaR1C1 = 'Live Deli'
ActiveWindow.SmallScroll(Down=- 9)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=7, Criteria1='>50', Operator=xlAnd)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=4, Criteria1='>50', Operator=xlAnd)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=7, Criteria1='>50', Operator=xlAnd)
ActiveWindow.SmallScroll(Down=0)
ActiveSheet.Range('$A$1:$N$201').AutoFilter(Field=4, Criteria1='>50%', Operator=xlAnd)
df_1=pd.read_csv(r'D:\proj\project.csv',My_custom_MACRO)
df_1.to_csv(r'D:\proj\project_output.csv')
I don't see sort in your macro just a column move and filter. Try this ;
import pandas as pd
df = pd.read_csv(r'c:\temp\project.csv')
cols = df.columns
# cut column P paste column G as Live Deli
colP = df.pop(cols[14])
df.insert(14,'','')
df.insert(6,'Live Deli',colP)
# apply filter to col 4 and 7
df1 = df.loc[ (df[cols[3]] > 0.5) & (df['Live Deli'] > 50) ]
# save
df1.to_csv(r'c:\temp\project_output.csv', index=False)
Related
I am using the following code in ubuntu 20.
import pyoo
import os
import uno
import pandas as pd
os.system("/usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nofirststartwizard --nologo --norestore --accept='socket,host=localhost,port=2002,tcpNoDelay=1;urp;StarOffice.ComponentContext'")
df=pd.Dataframe()
df['Name']=['Anil','Raju','Arun']
df['Age']=['32','34','45']
desktop = pyoo.Desktop('localhost', 2002)
doc = desktop.open_spreadsheet("/home/vivek/Documents/Libre python trial/oi_data.ods")
sh1=doc.sheets['oi_data']
sh1[1,4].value=df
doc.save()
It gives all data in a single cell as a string:
'Name age0 Anil 321 Raju 342 Arun 45'
I want to write a DataFrame in LibreOffice Calc in columns & rows of sheet like this :
Name age
0 Anil 32
1 Raju 34
2 Arun 45
example code used in xlwings in window os just for reference (I want to achieve same with simple code in Libreoffice calc in ubuntu/Linux, if possible..)
import pandas as pd
import xlwings as xlw
# Connecting with excel workbook
file=xlw.Book("data.xlsx")
# connection with excel sheet
sh1=file.sheets('sheet1')
df=pd.DataFrame()
df['Name']=['Anil','Raju','Arun']
df['Age']=['32','34','45']
sh1.range('A4').value=df
From the pyoo documentation, a range of values is set with a list of lists.
sheet[1:3,0:2].values = [[3, 4], [5, 6]]
To get a list of lists from a dataframe, the following code is recommended at How to convert a Python Dataframe to List of Lists? :
lst = [df[i].tolist() for i in df.columns]
EDIT:
Write a function called insertDf() that does the two things above, calculating the required indices.
Then instead of sh1.range('A4').value=df, write insertDf(df,'A4',sh1).
Or perhaps more elegant is to create a class called CalcDataFrame that extends pandas.DataFrame to add a method called writeCells().
Also, it would be easier to write location arguments as (row number, column number) instead of a 'column letters&row number' combined string.
df = CalcDataFrame()
df['Name']=['Anil','Raju','Arun']
df['Age']=['32','34','45']
df.writeCells(sh1,1,4)
I'm reading csv, saving it into dataframe and using if condition but I'm not getting expected result.
My python code below :
import pandas as pd
import numpy as np
import datetime
import operator
from datetime import datetime
dt = datetime.now ( ).strftime ( '%m/%d/%Y' )
stockRules = pd.read_csv("C:\stock_rules.csv", dtype={"Product Currently Out of Stock": str}).drop_duplicates(subset="Product Currently Out of Stock", keep="last" )
pd.to_datetime(stockRules['FROMMONTH'], format='%m/%d/%Y')
pd.to_datetime(stockRules['TOMONTH'], format='%m/%d/%Y')
if stockRules['FROMMONTH'] <= dt and stockRules['TOMONTH'] >= dt:
print(stockRules)
My csv file is below :
Productno FROMMONTH TOMONTH
120041 2/1/2019 5/30/2019
112940 2/1/2019 5/30/2019
121700 2/1/2019 2/1/2019
I want to read csv file and want to print the product number, which meets the condition only.
I played around with the code a bit and simplified it somewhat, but the idea behind the selection should still work the same:
dt = datetime.now().strftime("%m/%d/%Y")
stockRules = pd.read_csv("data.csv", delimiter=";")
stockRules["FROMMONTH"] = pd.to_datetime(stockRules["FROMMONTH"], format="%m/%d/%Y")
stockRules["TOMONTH"] = pd.to_datetime(stockRules["TOMONTH"], format="%m/%d/%Y")
sub = stockRules[(stockRules["FROMMONTH"] <= dt) & (dt <= stockRules["TOMONTH"])]
print(sub["Productno"])
Notice that when using pd.to_datetime I am assigning the result of the operation to the original column, overriding whatever was in it before.
Hope this helps.
EDIT:
For my tests I changed the CSV to use ; as delimiter, since I had trouble reading in the data you provided in your question. Might be that you will have to specify another delimiter. For tabs for example:
stockRules = pd.read_csv("data.csv", delimiter="\t")
I am trying to do the equivalent of a COUNTIF() function in excel. I am stuck at how to tell the .count() function to read from a specific column in excel.
I have
df = pd.read_csv('testdata.csv')
df.count('1')
but this does not work, and even if it did it is not specific enough.
I am thinking I may have to use read_csv to read specific columns individually.
Example:
Column name
4
4
3
2
4
1
the function would output that there is one '1' and I could run it again and find out that there are three '4' answers. etc.
I got it to work! Thank you
I used:
print (df.col.value_counts().loc['x']
Here is an example of a simple 'countif' recipe you could try:
import pandas as pd
def countif(rng, criteria):
return rng.eq(criteria).sum()
Example use
df = pd.DataFrame({'column1': [4,4,3,2,4,1],
'column2': [1,2,3,4,5,6]})
countif(df['column1'], 1)
If all else fails, why not try something like this?
import numpy as np
import pandas
import matplotlib.pyplot as plt
df = pandas.DataFrame(data=np.random.randint(0, 100, size=100), columns=["col1"])
counters = {}
for i in range(len(df)):
if df.iloc[i]["col1"] in counters:
counters[df.iloc[i]["col1"]] += 1
else:
counters[df.iloc[i]["col1"]] = 1
print(counters)
plt.bar(counters.keys(), counters.values())
plt.show()
I have a code with python that cleans a .csv up before I append it to another data set. It is missing a couple columns so I have been trying to figure how to use Pandas to add the column and fill the rows.
I currently have a column DiscoveredDate in a format of 10/1/2017 12:49.
What I'm trying to do is take that column and anything from the date range 10/1/2016-10/1/2017 have a column FedFY have its row filled with 2017 and like wise for 2018.
Below is my current script minus a few different column cleanups.
import os
import re
import pandas as pd
import Tkinter
import numpy as np
outpath = os.path.join(os.getcwd(), "CSV Altered")
# TK asks user what file to assimilate
from Tkinter import Tk
from tkFileDialog import askopenfilename
Tk().withdraw()
filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file
#Filepath is acknowledged and disseminated with the following totally human protocols
filenames = os.path.basename(filepath)
filename = [filenames]
for f in filename:
name = f
df = pd.read_csv(f)
# Make Longitude values negative if they aren't already.
df['Longitude'] = - df['Longitude'].abs()
# Add Federal Fiscal Year Field (FedFY)
df['FedFY'] = df['DiscoveredDate']
df['FedFY'] = df['FedFY'].replace({df['FedFY'].date_range(10/1/2016 1:00,10/1/2017 1:00): "2017",df['FedFY'].date_range(10/1/2017 1:00, 10/1/2018 1:00): "2018"})
I also tried this but figured I was completely fudging it up.
for rows in df['FedFY']:
if rows = df['FedFY'].date_range(10/1/2016 1:00, 10/1/2017 1:00):
then df['FedFY'] = df['FedFY'].replace({rows : "2017"})
elif df['FedFY'] = df['FedFY'].replace({rows : "2018"})
How should I go about this efficiently? Is it just my syntax messing me up? Or do I have it all wrong?
[Edited for clarity in title and throughout.]
Ok thanks to DyZ I am making progress; however, I figured out a much simpler way to do so, that figures all years.
Building on his np.where, I:
From datetime import datetime
df['Date'] = pd.to_datetime(df['DiscoveredDate'])
df['CalendarYear'] = df['Date'].dt.year
df['Month'] = df.Date.dt.month
c = pd.to_numeric(df['CalendarYear'])
And here is the magic line.
df['FedFY'] = np.where(df['Month'] >= 10, c+1, c)
To Mop up I added a line to get it back into date time format from numeric.
df['FedFY'] = (pd.to_datetime(df['FedFY'], format = '%Y')).dt.year
This is what really crossed the bridge for me Create a column based off a conditional with pandas.
Edit: Forgot to mention to import date time for .dt stuff
If you are concerned only with these two FYs, you can compare your date directly to the start/end dates:
df["FedFY"] = np.where((df.DiscoveredDate < pd.to_datetime("10/1/2017")) &\
(df.DiscoveredDate > pd.to_datetime("10/1/2016")),
2017, 2018)
Any date before 10/1/2016 will be labeled incorrectly! (You can fix this by adding another np.where).
Make sure that the start/end dates are correctly included or not included (change < and/or > to <= and >=, if necessary).
I have some pandas TimeSeries with date index:
import pandas as pd
import numpy as np
pandas_ts = pd.TimeSeries(np.random.randn(100),pd.date_range(start='2000-01-01', periods=100))
I need convert it to R TS (like sunspots dataset) to call some R function (slt) with my TS, which works only with timeseries. But i found that in pandas.rpy and rpy2 API's there is only DataFrame support. Is there another way to do this?
If there is no such I can convert TS to DataFrame in python, then convert it to R DF and convert it to TS in R but I have some troubles at last step because i'm new in R.
Any ideas or help in converting in R? =)
I am not a pandas proficient , But you can save you pandas time series to csv file and read it from R.
Python:
## write data
with open(PATH_CSV_FILE,"w") as file:
pandas_ts.to_csv(file)
## read data
with open(PATH_CSV_FILE,"r") as file:
pandas_ts.from_csv(file)
R:
library(xts)
## to read data
ts.xts <- read.zoo(PATH_CSV_FILE,index=0)
## to save data
write.zoo(ts.xts,PATH_CSV_FILE)
The easiest might just be to use the R function ts() in a call corresponding to your pandas.date_range() call.
from rpy2.robjects.packages import importr
stats = importr('stats')
from rpy2.robjects.vectors import IntVector
# The time series created in the question is:
# pd.date_range(start='2000-01-01', periods=100)
stats.ts(IntVector(range(100)), start=IntVector((2000, 1, 1)))
Inspired by the answers given here already, I created a small function for conversion of an existing Pandas time series towards an R time series. It might be usefull to more of you. Feel free to further improve and edit my contribution.
def pd_ts2r_ts(pd_ts):
'''Pandas timeseries (pd_ts) to R timeseries (r_ts) conversion
'''
from rpy2.robjects.vectors import IntVector,FloatVector
rstats = rpackages.importr('stats')
r_start = IntVector((pd_ts.index[0].year,pd_ts.index[0].month,pd_ts.index[0].day))
r_end = IntVector((pd_ts.index[-1].year,pd_ts.index[-1].month,pd_ts.index[-1].day))
freq_pandas2r_ts = {
# A dictionary for converting pandas.Series frequencies into R ts frequencies
'D' : 365, # is this correct, how about leap-years?
'M' : 12,
'Y' : 1,
}
r_freq = freq_pandas2r_ts[pd_ts.index.freqstr]
result = rstats.ts(FloatVector(pd_ts.values),start=r_start,end=r_end,frequency=r_freq)
return result