Python Pandas Word Frequency Analysis - python

I have been chipping away at a side project for a while, and am coming up against a wall (most likely made by my own mistakes). I am attempting to create a csv repository of words used in news articles and their frequency, when written as titles of the news stories and on which day they are used with this format:
Chart basic format, with words as columns and their frequencies charted by the date
I am attempting to have it work in this shape, with words as the recorded columns and the dates as the row/index with the frequency being recorded in a table. I have been attempting to do this with Python's Pandas library, but have had minimal success in updating a csv or a Dataframe with new information. It has either overwritten the csv with new information and ignored the date index, or has not adjoined them as I expected from information from here. I've also added screenshots of my code page1 page2 page3.
If there anyone has any advice on how to proceed, or if there is a better way to record this information, I would greatly appreciate the help. Thank you all in advance.
-Brad

Related

clustering days from csv data in python

I have CSV data that is already in a datetime format. I need to analyze the data by comparing days and finding the most similar ones. The data is organized in 10minute intervalls which means 144 rows per day need to be clustered into one day. It would be ideal if every day would be copied into an array and could be accessed by saying e.g. print(array_26.08.2022).
[CSV Screenshot]
(https://i.stack.imgur.com/bZEAR.png)
i searched online but couldnt find a solution

How to automate calculations in Pandas dataframe

I'm currently struggling to find good information on how to calculate differences, percentages etc. using several columns and rows in a Pandas dataframe - and how to show the output in a nice table using Python.
Short example of what I'm going for:
I'm working with NBA data and have gathered a bunch of match statistics for home and away teams during the 2019/20 season (the season finishes later this month). The first row shows the Free Throw percentage and "Regular" means regular matches with audience members and "Bubble" denotes the matches without audience members.
A short view of my Pandas dataframe:
How do I automate the calculations using Python code? Feel free to give me examples!

Take dates and times from multiple columns to one datetime object with Python

I've got a dataset with multiple time values as below.
Area,Year,Month,Day of Week,Time of Day,Hour of Day
x,2016,1,6.0,108,1.0
z,2016,1,6.0,140,1.0
n,2016,1,6.0,113,1.0
p,2016,1,6.0,150,1.0
r,2016,1,6.0,158,1.0
I have been trying to transform this into a single datetime object to simplify the dataset and be able to do proper time series analysis against it.
For some reason I have been unable to get the right outcome using the datetime library from Python. Would anyone be able to point me in the right direction?
Update - Example of stats here.
https://data.pa.gov/Public-Safety/Crash-Incident-Details-CY-1997-Current-Annual-Coun/dc5b-gebx/data
I don't think there is a week column. Hmm. I wonder if I've missed something?
Any suggestions would be great. Really just looking to simplify this dataset. Maybe even create another table / sheet for the causes of crash, as their's a lot of superfluous columns that are taking up a lot of data, which can be labeled with simple ints.

Use ISIN to retrieve stock data in PYTHON

In my quest of retrieving the stock prices, in daily, within a 10y period, of the 600 companies of the index EUROSTOXX 600, I'm facing some difficulties.
First question : Retrieving all of this with one part of code seems feasible according to you ?
(I'm considering adding also main financial indicators like ROI,ROE,EBIT,EPS, annual performance... and export all of this on one excel sheet)
I collected all the 600 ISIN. The question is, can I use it to retrieve the data from yahoo finance (or anything else) or should I find a way to find the 600 real tickers defined by Yahoo ?
If yes, does anyone have a tip for that ? I've been looking for lists but this index doesn't look very popular apparently.
Thank you for reading !

How to search and print text cells from an excel column into a new excel doc?

I am currently looking for a way to automate a search for cells containing text in excel using python, then printing to a new excel sheet.
My background in coding is very limited but I have done something similar in Python some odd years ago, finding text matching one cell and printing it to another sheet. However, this requires finding information from several cells at once in a large dataset. From my limited skillset I am unable to tell if this is possible.
pandas.read_excel can do this. Check pandas official documentation

Categories