Timestamp object has no attribute 'split' - python

Sometimes when I'm running a python code in Google Colab and it runs in the first place, turns out that in the 2nd or 3rd attempt this same chunk of code for unknown reasons gives an error, as if the code was wrong (even though nothing has been modified). As soon as I disconnect and restart the notebook, the exact same chunk of code runs normally, again without modifications. Has anyone already come across this issue and know how to fix it?
import datetime #1st chunk
def convert_date(x): #2nd chunk
y= x.split(' ')[0]
return datetime.datetime(int(y.split('/')[2]),int(y.split('/')[1]),int(y.split('/')[0]))
hr['Hire Date'] = hr['Hire Date'].map(lambda x: convert_date(x)) #3rd chunk
When running the 3rd chunk it gives the error: AttributeError: 'Timestamp' object has no attribute 'split'

Is because you already applied the transformation to that column. Reload your data instead of restarting the kernel and it will work again.

Related

'DataFrame' object has no attribute '_internal'

I am trying to run the line of code:
pd.get_dummies(pd_df, columns = ['ethnicity'])
However, I keep getting the error 'DataFrame' object has no attribute '_internal'. It looks like its linked to the ...pyspark/pandas/namespace.py file so therefore I am not too sure how to fix it.
Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!
I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
}
)
pd.get_dummies(sales_data, columns = ['region'])
I had this same error. I was confusing the execution by using ps (pyspark.pandas) instead of pd (pandas).
Ensure your alias are correct and you're not accidentally renaming a pandas instantiation:
Ex.
import pyspark.pandas as pd

OwlReady2 error after using consecutive load()

Been using owlready2 to parse multiple input OWL ontologies. The problem is: i get an error everytime i try to load the second ontology. If i only load one, everything works fine. Whenever i try to load the second i get an error associated with the owlready load() function:
SELECT x FROM transit""", (s, p, p)).fetchall(): yield x
sqlite3.OperationalError: near "WITH": syntax error
Relevant information:
on my machine, i can do as many loads as i want and it works fine
only when porting my code to a linux server of my department in order to get it deployed, this error happens.
Any sugestions?

String or Unicode type required for dfgui running wx with kivy python

I intended
to write a code which helps me display Table / Dataframe on GUI (Kivy). To which I found the solution here. Apparently it uses a non-official package from a github repo which is dfgui.
The Problem
occurred to me when I executed as told on StackOverflow link. However returned Error that
wx._core.PyAssertionError: C++ assertion "!items.IsEmpty()" failed at
/usr/include/wx-3.0/wx/ctrlsub.h(154) in InsertItems(): need something
to insert
I Brokedown
the problem by selective execution in foll. way
import dfgui
import pandas as pd
xls = pd.read_excel('Res.xls')
df = pd.DataFrame(xls)
dfgui.show(df)
#dfgui.show(xls) Apparently the same as df
which then returned
TypeError: String or Unicode type required
and led me to this link, which I couldn't understand much.
Point me in North, or perhaps a different solution could be great too.

PySpark serialization EOFError

I am reading in a CSV as a Spark DataFrame and performing machine learning operations upon it. I keep getting a Python serialization EOFError - any idea why? I thought it might be a memory issue - i.e. file exceeding available RAM - but drastically reducing the size of the DataFrame didn't prevent the EOF error.
Toy code and error below.
#set spark context
conf = SparkConf().setMaster("local").setAppName("MyApp")
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)
#read in 500mb csv as DataFrame
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true',
inferschema='true').load('myfile.csv')
#get dataframe into machine learning format
r_formula = RFormula(formula = "outcome ~ .")
mldf = r_formula.fit(df).transform(df)
#fit random forest model
rf = RandomForestClassifier(numTrees = 3, maxDepth = 2)
model = rf.fit(mldf)
result = model.transform(mldf).head()
Running the above code with spark-submit on a single node repeatedly throws the following error, even if the size of the DataFrame is reduced prior to fitting the model (e.g. tinydf = df.sample(False, 0.00001):
Traceback (most recent call last):
File "/home/hduser/spark1.6/python/lib/pyspark.zip/pyspark/daemon.py", line 157,
in manager
File "/home/hduser/spark1.6/python/lib/pyspark.zip/pyspark/daemon.py", line 61,
in worker
File "/home/hduser/spark1.6/python/lib/pyspark.zip/pyspark/worker.py", line 136,
in main if read_int(infile) == SpecialLengths.END_OF_STREAM:
File "/home/hduser/spark1.6/python/lib/pyspark.zip/pyspark/serializers.py", line 545,
in read_int
raise EOFError
EOFError
The error appears to happen in the pySpark read_int function. Code for which is as follows from spark site :
def read_int(stream):
length = stream.read(4)
if not length:
raise EOFError
return struct.unpack("!i", length)[0]
This would mean that when reading 4bytes from the stream, if 0 bytes are read, EOF error is raised. The python docs are here.
I have faced the same issues and don't know how to debug it. seems that it will cause executor thread stuck and never return anything.
Have you checked to see where in your code the EOError is arising?
My guess would be that it's coming as you attempt to define df with, since that's the only place in your code that the file is actually trying to be read.
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true',
inferschema='true').load('myfile.csv')
At every point after this line, your code is working with the variable df, not the file itself, so it would seem likely that this line is generating the error.
A simple way to test if this is the case would be to comment out the rest of your code, and/or place a line like this right after the line above.
print(len(df))
Another way would be to use a try loop, like:
try:
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true',
inferschema='true').load('myfile.csv')
except:
print("Failed to load file into df!")
If it turns out that that line is the one generating the EOFError, then you're never getting the dataframes in the first place, so attempting to reduce them won't make a difference.
If that is the line generating the error, two possibilities come to mind:
Your code is calling one or both of the .csv files earlier on, and isn't closing it prior to this line. If so, simply close it above your code here.
There's something wrong with the .csv files themselves. Try loading them outside of this code, and see if you can get them into memory properly in the first place, using something like csv.reader, and manipulate them in ways you'd expect.

win32com Excel data input error

I'm exporting results of my script into Excel spreadsheet. Everything works fine, I put big sets of data into SpreadSheet, but sometimes an error occurs:
File "C:\Python26\lib\site-packages\win32com\client\dynamic.py", line 550, in __setattr__
self._oleobj_.Invoke(entry.dispid, 0, invoke_type, 0, value)
pywintypes.com_error: (-2147352567, 'Exception.', (0, None, None, None, 0, -2146777998), None)***
I suppose It's not a problem of input data format. I put several different types of data strings, ints, floats, lists and it works fine. When I run the sript for the second time it works fine - no error. What's going on?
PS. This is code that generates error, what's strange is that the error doesn't occur always. Say 30% of runs results in an error. :
import win32com.client
def Generate_Excel_Report():
Excel=win32com.client.Dispatch("Excel.Application")
Excel.Workbooks.Add(1)
Cells=Excel.ActiveWorkBook.ActiveSheet.Cells
for i in range(100):
Row=int(35+i)
for j in range(10):
Cells(int(Row),int(5+j)).Value="string"
for i in range(100):
Row=int(135+i)
for j in range(10):
Cells(int(Row),int(5+j)).Value=32.32 #float
Generate_Excel_Report()
The strangest for me is that when I run the script with the same code, the same input many times, then sometimes an error occurs, sometimes not.
This is most likely a synchronous COM access error. See my answer to Error while working with excel using python for details about why and a workaround.
I can't see why the file format/extension would make a difference. You'd be calling the same COM object either way. My experience with this error is that it's more or less random, but you can increase the chances of it happening by interacting with Excel while your script is running.
edit: It doesn't change a thing. Error occurs, but leff often. Once in 10 simulations while with .xlsx file once in 3 simulations. Please help
The problem was with the file I was opening. It was .xlsx , while I've saved it as .xls the problem disappeared. So beware, do not ever use COM interface with .xlsx or You'll get in trouble !
You should diseable excel interactivity while doing this.
import win32com.client
def Generate_Excel_Report():
Excel=win32com.client.Dispatch("Excel.Application")
#you won't see what happens (faster)
Excel.ScreenUpdating = False
#clics on the Excel window have no effect
#(set back to True before closing Excel)
Excel.Interactive = False
Excel.Workbooks.Add(1)
Cells=Excel.ActiveWorkBook.ActiveSheet.Cells
for i in range(100):
Row=int(35+i)
for j in range(10):
Cells(int(Row),int(5+j)).Value="string"
for i in range(100):
Row=int(135+i)
for j in range(10):
Cells(int(Row),int(5+j)).Value=32.32 #float
Excel.ScreenUpdating = True
Excel.Interactive = True
Generate_Excel_Report()
Also you could do that to increase your code performance :
#Construct data block
string_line = []
for i in range(10)
string_line.append("string")
string_block = []
for i in range(100)
string_block.append(string_line)
#Write data block in one call
ws = Excel.Workbooks.Sheets(1)
ws.Range(
ws.Cells(35, 5)
ws.Cells(135,15)
).Values = string block
I had the same error while using xlwings for interacting with Excel. xlwings also use win32com clients in the backend.
After some debugging, I realized that this error pops up whenever the code is executed and the excel file (containing data) is not in focus. In order to resolve the issue, I simply select the file which is being processed and run the code and it always works for me.

Categories