Viewing the text inside Pyspark object

Viewing the text inside Pyspark object - python

I am able to load an log file using the following command:
logFile = sc.textFile("/resources/jupyterlab/labs/BD0211EN/LabData/notebook.log")
But when I try to see the log file contents, I am not able to do. I checked dir(logFile), but I am not able to see the content inside. Now when I run the code in the Jupyter cell, I get the following:
/resources/jupyterlab/labs/BD0211EN/LabData/notebook.log MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0
Is it possible to see the contents of the log file?
Thanks

I guess what you need is the following:
logFile.collect()
This will show you the content's that are split line wise.

Related

Save file Excel using Openpyx without loosing data

Using this command, unfortunately it always creates that file for me, losing the previous data:
Account.save("Ex.xlsx")
The command: SaveCopyAs not work with a workbook
I would simply like to replicate the SaveCopyAs command on python to save my excel file after writing and updating it. Unfortunately with the save command, I delete all the previous content

When you execute Example=Workbook(), you are making a new file. That means when you execute Example.save("Jungle.xlsx"), you are overwriting the original file. Instead, you should use Example = load_workbook('Jungle.xlsx') to read the contents of the original so that Example.save("Jungle.xlsx") can act like an update.
See https://openpyxl.readthedocs.io/en/stable/tutorial.html#loading-from-a-file for more details.

Get text from any file type using Python

I have a file which is not .txt extension, but I can right click and open it using notepad, and it's a readable file.
However, if I try to open the file I cannot retrieve the text, or edit it.
Here's some code to make things clearer:
path=r"C:\Users\Alon\Desktop\RUN Low.spe"
f=open(path,'r+')
f.readlines()
Output:
[]
Again, if I try to open this file using Notepad - no problems. I can read and edit the text, but I wanna do it via Python. Is there a solution to this?

How to extract word from text file to use as a variable?

I have a simple script that writes the network name to a log file. It displays the following.
All User Profile : NET_NAME
what I would like to do is open that text file and extract just the NET_NAME part and then be able to use that as a variable and also save the text file with the changes.
I have tried using split function, it kind of works when using the text, but when trying to read from the file it doesn't work. I have searched regex but do not know the syntax to achieve what I want.

split can indeed be used to achieve this. In case you're using Python3 and the content is in text.txt file, the snippet below should be able to do the trick:
with open("text.txt", "rb") as f:
content = f.read().decode("utf-8")
name = content.split(":")[1].strip()
print(name)

How to open a .data file extension

I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges

Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")

It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format
The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names
One line per instance
Separator tabulation between the values
There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.

To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset

I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")

It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','.
solution from #mustious.

Save reportlab pdf file without show it

I am using reportab to generate a pdf file.
The last sentences of my script are as follows:
doc.build(story)
os.system('xxxx.pdf') # show the pdf file.
Then leave the script and save the created pdf file where I like. But, I want to know how can I save the file without showing it.
doc.save('xxxx.pdf')
: *** AttributeError: 'SimpleDocTemplate' object has no attribute 'save'
What can I do to save the file automaticaly

The command
os.system('xxxx.pdf')
means that the PDF is already saved onto disk at 'xxxx.pdf'. Just remove the os.system command that launches your default PDF viewer, and you will see the PDF in the folder.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Viewing the text inside Pyspark object - python

I guess what you need is the following: logFile.collect() This will show you the content's that are split line wise.

Related

Save file Excel using Openpyx without loosing data

Get text from any file type using Python

How to extract word from text file to use as a variable?

How to open a .data file extension

Save reportlab pdf file without show it

Categories

Resources