Python how to concatenate file names - python

I'd like to include the month and year in my file name based on the month submitted.
I defined a variable that would be used to insert my file name.
MTH = '20' + df['Month'].astype(str) + '/01'
MONTH = pd.to_datetime(MTH).dt.month
YEAR = pd.to_datetime(MTH).dt.year
SAVE File Directory:
df.to_excel(r'C:\OUTPUT\SALES_'+ MONTH + '_' + YEAR + '.xlsx', index=False )
Error:
File "C:\SR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\generic.py", line 2284, in to_excel
formatter.write(
File "C:\SR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\formats\excel.py", line 834, in write
writer = ExcelWriter( # type: ignore[abstract]
File "C:\SR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\excel\_xlsxwriter.py", line 191, in __init__
super().__init__(
File "C:\SR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\excel\_base.py", line 925, in __init__
self.handles = get_handle(
File "C:\SR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\common.py",
line 608, in get_handle
ioargs = _get_filepath_or_buffer(
File "C:\SR\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\common.py",
line 395, in _get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'pandas.core.series.Series'>
PS C:\SR\Desktop\Python>

You probably mean:
df.to_excel(r'C:\OUTPUT\SALES_'+ MONTH.item() + '_' + YEAR.item() + '.xlsx', index=False )
Or:
df.to_excel(r'C:\OUTPUT\SALES_'+ MONTH[0] + '_' + YEAR[0] + '.xlsx', index=False )

Related

Python script works but throws error - pandas.errors tokenizing data , Expected 9 fields saw 10

I am new to python. I am trying to read json response from requests and filtering using pandas to save in csv file. This script works and gives me all the data but its throws this error after execution -
I am not able to figure out why its throwing this error ? How can I pass this error ?
Error -
script.py line 42, in <module>
df = pd.read_csv("Data_script4.csv")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 458, in _read
data = parser.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 1196, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 2155, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 862, in
pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 905, in
pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 53,
saw 10
This is my script -
if __name__ == '__main__':
parser = argparse.ArgumentParser("gets data")
parser.add_argument("-o" , dest="org", help="org name")
parser.add_argument("-p" , dest="pat", help="pat value")
args = parser.parse_args()
org = args.org
token = args.pat
url = "https://dev.azure.com/{org_name}/_apis/git/repositories?
api-
version=6.0".format(org_name=org)
data = getproject(url,token)
data_file=open("Data_script4.csv", "w",newline='')
val=data['value']
csv_writer = csv.writer(data_file)
for i in val:
if count==0:
header=i.keys()
csv_writer.writerow(header)
count +=1
csv_writer.writerow(i.values())
pro_name=[]
time=[]
df = pd.read_csv("Data_script4.csv")
for i in df["project"]:
res = ast.literal_eval(i)
pro_name.append(res['name'])
time.append(res['lastUpdateTime'])
del df["project"]
df["project name"] = pro_name
df["lastUpdateTime"] = time
df =df[["id","name","url","project
name","lastUpdateTime","defaultBranch","size","remoteUrl","sshUrl","webUrl"]]
df.head()
df.to_csv("Data_Filtered.csv",index=False)
print("\nFile Created Successfully...")
data_file.close()
os.remove('Data_script4.csv')
How can I resolve this issue ?
Your question was answered here
Here's the takeaway:
You need to substitute:
df = pd.read_csv("Data_script4.csv")
with this:
df = pd.read_csv('Data_script4.csv', error_bad_lines=False)

How to convert csv's to line protocol

How do I display a CSV file in line protocol format, like influxdb uses?
My CSV data as below...
time, avg_FreshOrders, p95_FreshOrders, FreshOrders
1593648000000,1479.08407079646,2589,226
1593475200000,2242.8617021276596,5622,188
1593734400000,1682.3375,2738,160
I am using Python to convert to line protocol as below
import pandas as pd
#convert csv's to line protocol
df_full = pd.read_csv("data/FreshOrders.csv")
df_full["measurement"] = ['FO' for t in range(len(df_full))]
lines = [str(df_full["measurement"][d])
+ ",type=FreshOrders"
+ " "
+ "avg_FreshOrders=" + str(df_full["avg_FreshOrders"][d]) + ","
+ "p95_FreshOrders=" + str(df_full["p95_FreshOrders"][d]) + ","
+ "FreshOrders=" + str(df_full["FreshOrders"][d])
+ " " + str(df_full["time"][d]) for d in range(len(df_full))]
#append lines to a text file with DDL & DML:
thefile = open('data/import.txt', 'a+')
for item in lines:
thefile.write("%s\n" % item)
While running this Python code i am getting below errors.
Traceback (most recent call last):
File "csv_to_line.py", line 6, in <module>
lines = [str(df_full["measurement"][d])
File "csv_to_line.py", line 6, in <listcomp>
lines = [str(df_full["measurement"][d])
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 997, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1004, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
Can someone please help me to resolve this issue?
If you have problem with the keyword, you can use index instead.
import pandas as pd
# convert csv's to line protocol
df_full = pd.read_csv("haha.csv")
df_full["measurement"] = ['FO' for t in range(len(df_full))]
lines = []
for idx in range(len(df_full)):
temp = str(df_full.iloc[idx, -1]) + ",type=FreshOrders" + " " + "avg_FreshOrders=" + str(df_full.iloc[idx, 1]) + ","\
+ "p95_FreshOrders=" + str(df_full.iloc[idx, 2]) + "," + "FreshOrders=" + str(df_full.iloc[idx, 3]) + " " + \
str(df_full.iloc[idx, 0])
lines.append(temp)
# append lines to a text file with DDL & DML:
thefile = open('data/import.txt', 'a+')
for item in lines:
thefile.write("%s\n" % item)

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x00\x00\x00\x01Bud1'

I am trying to consolidate a few excel files and getting the following error.
NOTE: This code works fine in windows, I am trying to run the same on mac and im getting this error.
Code:
import pandas as pd
import os
all_data=pd.DataFrame() <br/>
temp1=pd.DataFrame()<br/>
temp2=pd.DataFrame()
FL=os.listdir("Files/")
for i in FL:<br/>
temp1=pd.read_excel("Files/"+i)<br/>
print("Reading file "+ i)<br/>
temp1.loc[:, "Sub Query"] = i[0:i.find(".")]<br/>
print("working on file "+ i)<br/>
temp2 = pd.concat([temp2,temp1], sort=False)<br/>
print("file "+ i +" is over, moving on...")<br/>
writer = pd.ExcelWriter("output/Mastersheet.xlsx", engine='xlsxwriter', options={'strings_to_urls': False})<br/>
temp2.to_excel(writer, index=False)<br/>
writer.close()<br/>
**Error:**
File "<ipython-input-9-1e7da69236eb>", line 1, in <module>
runfile('/Users/simplify360/Desktop/TVS/Consolidate/consolidate.py', wdir='/Users/simplify360/Desktop/TVS/Consolidate')
File "/Applications/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "/Applications/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/simplify360/Desktop/TVS/Consolidate/consolidate.py", line 19, in <module>
temp1=pd.read_excel("Files/"+i)
File "/Applications/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/Applications/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
io = ExcelFile(io, engine=engine)
File "/Applications/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 819, in __init__
self._reader = self._engines[engine](self._io)
File "/Applications/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 21, in __init__
super().__init__(filepath_or_buffer)
File "/Applications/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 359, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "/Applications/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 36, in load_workbook
return open_workbook(filepath_or_buffer)
File "/Applications/anaconda3/lib/python3.7/site-packages/xlrd/__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "/Applications/anaconda3/lib/python3.7/site-packages/xlrd/book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/Applications/anaconda3/lib/python3.7/site-packages/xlrd/book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/Applications/anaconda3/lib/python3.7/site-packages/xlrd/book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x00\x00\x00\x01Bud1'
I have tried different things like removing few fields from the excel file, renaming the excel file but it still won't work. Please help! Thanks!
I've been having this same issue and I just solved it. The problem is that there is a hidden file called '.DS_Store' in every directory on Macs. Try changing this block of code:
for i in FL:<br/>
temp1=pd.read_excel("Files/"+i)<br/>
print("Reading file "+ i)<br/>
temp1.loc[:, "Sub Query"] = i[0:i.find(".")]<br/>
print("working on file "+ i)<br/>
temp2 = pd.concat([temp2,temp1], sort=False)<br/>
print("file "+ i +" is over, moving on...")<br/>
To this:
for i in FL:<br/>
if(i != '.DS_Store'):
temp1=pd.read_excel("Files/"+i)<br/>
print("Reading file "+ i)<br/>
temp1.loc[:, "Sub Query"] = i[0:i.find(".")]<br/>
print("working on file "+ i)<br/>
temp2 = pd.concat([temp2,temp1], sort=False)<br/>
print("file "+ i +" is over, moving on...")<br/>

KeyException in Python Pandas

I am receiving the following error related to a Key Error. I have a large data set (in the realm of 10 million records) and I am trying to filter only the records that contain a key word in the 'tags' field. I am able to compare for matching key words easily, but to parse for including key words seems to be quite difficult and any of the methods I have tried that I found on SO throw an error. I am new to Pandas so please forgive me if I am committing a cardinal sin. (Took BigData in university and we worked mostly in Spark. I realize the code is a bit hacky right now just trying to get it to function)
Notes: 1 The data is stored across quarterly files so I am iterating over the files and con-concatenating the results (which is the reason for the index and the counter) 2 I commented out the lines that allow me to parse for exact matches (#is_goodwill = data_frame['tag'] == word_of_interest && #good_will_relation = data_frame[is_goodwill])
Goal: Filter for records containing key word word_of_interest
It does not have to be an exact match to the key word, but rather contain the keyword. Code is below the error
Error
Traceback (most recent call last):
File "C:\Users\tyler\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'tags'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "parsePandasSECData.py", line 64, in <module>
main()
File "parsePandasSECData.py", line 42, in main
good_will_relation = data_frame[data_frame['tags'].str.contains(word_of_interest)]
File "C:\Users\tyler\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "C:\Users\tyler\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\tyler\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File "C:\Users\tyler\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3843, in get
loc = self.items.get_loc(item)
File "C:\Users\tyler\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'tags'
Code
import pandas as pd
import os.path
import time
import csv
def main():
start_time = time.time()
file_path = "C:/Users/TA/Desktop/Data/"
word_of_interest = "ShareholdersEquity"
NUM_FILE_NAME = "number.csv"
SUB_FILE_NAME = "subnumber.csv"
quarterly_list = ['Q1', 'Q2', 'Q3', 'Q4']
all_concat_data = None
pd.set_option('display.max_row', 1000)
for counter in range(9,19):
for index in range(len(quarterly_list)):
#iterates over all file locations
num_file_path = file_path + quarterly_list[index] + str(counter) + '/' + NUM_FILE_NAME
sub_file_path = file_path + quarterly_list[index] + str(counter) + '/' + SUB_FILE_NAME
if os.path.exists(num_file_path) and os.path.exists(sub_file_path):
print('Starting ' + quarterly_list[index] + str(counter) + ' Data')
#Load data
data_frame = pd.read_csv(num_file_path, dtype={'adsh': str, 'tag': str, 'version coreg': str, 'ddate': int, 'qtrs': int, 'uom': str, 'value': float, 'footnote': str}, \
header=0, delimiter='\t', low_memory= False, encoding= 'ISO-8859-1')
#Comparative Data
transaction_descriptions = pd.read_csv(sub_file_path, dtype={'adsh': str}, header = 0, delimiter = '\t', low_memory=False, encoding='ISO-8859-1')
#is_goodwill = data_frame['tag'] == word_of_interest
#good_will_relation = data_frame[is_goodwill]
good_will_relation = data_frame[data_frame['tags'].str.contains(word_of_interest)]
captured_data = good_will_relation.merge(transaction_descriptions, how='inner', left_on='adsh', right_on='adsh')
if all_concat_data is not None:
all_concat_data = pd.concat([all_concat_data,captured_data])
else:
all_concat_data = captured_data
else:
print(quarterly_list[index] + str(counter) + ' Does not exist...Skipping')
print('Starting Writer operation')
writer = pd.ExcelWriter('output.xlsx')
all_concat_data.to_excel(writer, 'Sheet1')
print("--- %s seconds ---" % (time.time() - start_time))
if __name__ == "__main__":
main()

IO Error: csv file does not exist though it exists at given location specified

import pandas as pd
import os
import time
from datetime import datetime
path = "C:\WinPython-32bit-2.7.9.5\python- 2.7.9\Lib\idlelib\MuditPracticals\intraQuarter\intraQuarter"
def Key_Stats(gather="Total Debt/Equity (mrq)"):
statspath = path+'/_KeyStats'
stock_list = [x[0] for x in os.walk(statspath)]
df = pd.DataFrame(columns = ['Date','Unix','Ticker','DE Ratio','Price','SP500'])
sp500_df = pd.DataFrame.from_csv("YAHOO-INDEX_GSPC.csv")
for each_dir in stock_list[1:25]:
each_file = os.listdir(each_dir)
ticker = each_dir.split("\\")[3]
if len(each_file) > 0:
for file in each_file:
date_stamp = datetime.strptime(file, '%Y%m%d%H%M%S.html')
unix_time = time.mktime(date_stamp.timetuple())
full_file_path = each_dir+'/'+file
source = open(full_file_path,'r').read()
try:
value = float(source.split(gather+':</td><td class="yfnc_tabledata1">')[1].split('</td>')[0])
try:
sp500_date = datetime.fromtimestamp(unix_time).strftime('%Y-%m-%d')
row = sp500_df[(sp500_df.index == sp500_date)]
sp500_value = float(row["Adjusted Close"])
except:
sp500_date = datetime.fromtimestamp(unix_time-259200).strftime('%Y-%m-%d')
row = sp500_df[(sp500_df.index == sp500_date)]
sp500_value = float(row["Adjusted Close"])
stock_price = float(source.split('</small><big><b>')[1].split('</b></big>')[0])
#print("stock_price:",stock_price,"ticker:", ticker)
df = df.append({'Date':date_stamp,
'Unix':unix_time,
'Ticker':ticker,
'DE Ratio':value,
'Price':stock_price,
'SP500':sp500_value}, ignore_index = True)
except Exception as e:
print "hello"
save = gather.replace(' ','').replace(')','').replace('(','').replace('/','')+('.csv')
print(save)
df.to_csv(save)
Key_Stats()
Compile Time Error In Spyder
File "<ipython-input-1-dfafbc7450e8>", line 1, in <module>
runfile('C:/WinPython-32bit-2.7.9.5/python- 2.7.9/Lib/idlelib/MuditPracticals/data_organisation1.py', wdir='C:/WinPython-32bit-2.7.9.5/python-2.7.9/Lib/idlelib/MuditPracticals')
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/WinPython-32bit-2.7.9.5/python-2.7.9/Lib/idlelib/MuditPracticals/data_organisation1.py", line 56, in <module>
Key_Stats()
File "C:/WinPython-32bit-2.7.9.5/python-2.7.9/Lib/idlelib/MuditPracticals/data_organisation1.py", line 13, in Key_Stats
sp500_df = pd.DataFrame.from_csv("YAHOO-INDEX_GSPC.csv")
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\core\frame.py", line 1036, in from_csv
infer_datetime_format=infer_datetime_format)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 566, in __init__
self._make_engine(self.engine)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 705, in _make_engine
``self._engine = CParserWrapper(self.f, **self.options)
File "C:\WinPython-32bit-2.7.9.5\python-2.7.9\lib\site-packages\pandas\io\parsers.py", line 1072, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3160)
File "pandas\parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5905)
IOError: File YAHOO-INDEX_GSPC.csv does not exist
It is giving IO error though file exists at that location
IO ERROR occurs at compile time
and why it is so that in other IDLE pandas module is not found but in Spyder there is no pandas Error
the path to your .csv file is relative. if the file is not in your current working directory python will not find it.
"though file exists at that location"... that is the problem with relative paths: what is that location?
here is a previous answer that should resolve the issue:
Python ConfigParser cannot search .ini file correctly (Ubuntu 14, Python 3.4)

Categories