I have written some code that requests a text file from the web, reads it, and outputs the necessary data into an excel spreadsheet. However, it is not quite in the format I need as it is writing each row as one item, and putting it all into the first column. My code is below, along with an image of the output as my code currently stands. I'd like a column for the date, and then one for each livestock animal with the amount processed.
import pandas as pd
import os
import urllib.request
filename = "Livestock Slaughter.txt"
os.chdir(r'S:\1WORKING\FINANCIAL ANALYST\Shawn Schreier\Commodity Dashboard')
directory = os.getcwd()
url = 'https://www.ams.usda.gov/mnreports/sj_ls710.txt'
data=urllib.request.urlretrieve(url, "Slaughter Rates.txt")
df = pd.read_csv("Slaughter Rates.txt", sep='\t', skiprows=5, nrows=3)
df.to_excel('Slaughter Data.xlsx')
As in comments, you can use delim_whitespace=True to load the CSV and then do some post-processing to get correct data. You can also put URL to pd.read_csv() directly:
import pandas as pd
df = pd.read_csv('https://www.ams.usda.gov/mnreports/sj_ls710.txt', delim_whitespace=True, skiprows=5, nrows=3).reset_index()
df = pd.concat([df.loc[:, 'level_0':'level_2'].agg(' '.join, axis=1), df.iloc[:, 3:]], axis=1)
print(df)
df.to_csv('data.csv')
Prints:
0 CATTLE CALVES HOGS SHEEP
0 Tuesday 07/21/2020 (est 118,000 2,000 478,000 7,000
1 Week ago (est) 119,000 2,000 475,000 8,000
2 Year ago (act) 121,000 3,000 476,000 8,000
And saves the data as data.csv (screenshot from LibreOffice):
Related
I have a .csv file with 100 rows of data displayed like this
"Jim 1234"
"Sam 1235"
"Mary 1236"
"John 1237"
What I'm trying to achieve is splitting the numbers from the names into 2 columns in python
edit*
Using,
import pandas as pd
df = pd.read_csv('test.csv', sep='\s+')
df.to_csv('result.csv', index=False)
I managed to get it to display like this in excel
However, the numbers still do not show up in column B as I expected.
Your data have only one column and a tab delimiter:
pd.read_csv('test.csv', quoting=1, header=None, squeeze=True) \
.str.split('\t', expand=True) \
.to_csv('result.csv', index=False, header=False)
very simple way,
data=pd.DataFrame(['Jim1234','Sam4546'])
data[0].str.split('(\d+)', expand=True)
if your file resemble to the picture below then the next code will work csv file content
import pandas as pd
df = pd.read_csv('a.csv', header=None, delimiter='\s')
df
code execution
Every department completes an annual budget in Excel and submits the budget. The individual budgets get rolled up into a single master budget.
I've used file linking Excel formulas in the past but, this can be very time-consuming and prone to human error.
I think this is a perfect job for Python with Pandas (and/or other libraries).
Here is a picture sample data:
Here is what I have tried so far: (edited/cleaned-up a little from the original)
#import libraries
import pandas as pd
import glob
# import excel files
path = '*.xlsx'
files = glob.glob(path)
# loop thru
combined_files = pd.DataFrame()
for i in files:
df = pd.read_excel(i, index_col=None, skiprows=11,
nrows=147, usecols='D:P')
combined_files = combined_files.concat(df)
combined_files.to_excel('output4.xlsx', index=False)
If I run print(files) the files are listed
I've also played around with variations of the "combined_excels" variable but no joy.
Desired output:
A spreadsheet or .csv that has the general ledger description, (ie, "supplies"), in the first column, followed by combined amounts from all files under; Jan, Feb, Mar, etc.
So if dept1 budgets for $100 in supplies in January, dept2 budgets $200 in supplies in January, and dept3 budgets for $400 in supplies in January then the result will say: Supplies: Under January will be: $700.
I will have approximately 65 different Excel files and will need to iterate over the list. Most workbooks have multiple sheets. All of the worksheets have a sheet called, "Budget" and that is where we pulled from.
I removed all supporting sheets from my three sample files so I wouldn't have to deal with that aspect yet, but I will need to add that filter back soon.
I appreciate any help you can provide!
John
Try this code ad your loop thru and concat:
# Budget Roll-up
# Used to roll-up individual budgets into one master budget
#import libraries
import pandas as pd
import glob
# import excel files
path = '*.xlsx'
files = glob.glob(path)
# loop thru
combined_files = pd.DataFrame()
for i in files:
df = pd.read_excel(i, index_col=None,
skiprows=11, nrows=147, usecols='D:P')
df.rename(columns={ df.columns[0]: 'test'}, inplace = True)
df.set_index('test', inplace=True)
combined_files = combined_files.add(df, fill_value=0, axis=1)
combined_files.to_excel('output.xlsx', index=False)
Use the below function after you have read these excel files in pandas:
combined_excels = pd.concat((df1, df2), axis = 0)
if you want to concat it vertically.
I have managed to use Python with the speedtest-cli package to run a speedtest of my Internet speed. I run this every 15 min and append the results to a .csv file I call "speedtest.csv". I then have this .csv file emailed to me every 12 hours, which is a lot of data.
I am only interested in keeping the rows of data that return less than 13mbps Download speed. Using the following code, I am able to filter for this data and append it to a second .csv file I call speedtestfilteronly.csv.
import pandas as pd
df = pd.read_csv('c:\speedtest.csv', header=0)
df = df[df['Download'].map(lambda x: x < 13000000.0,)]
df.to_csv('c:\speedtestfilteronly.csv', mode='a', header=False)
The problem now is it appends all the rows that match my filter criteria every time I run this code. So if I run this code 4 times, I receive the same 4 sets of appended data in the "speedtestfilteronly.csv" file.
I am looking to only append unlike rows from speedtest.csv to speedtestfilteronly.csv.
How can I achieve this?
I have got the following code to work, except the only thing it is not doing is filtering the results to < 13000000.0 mb/s: Any other ideas?
import pandas as pd
df = pd.read_csv('c:\speedtest.csv', header=0)
df = df[df['Download'].map(lambda x: x < 13000000.0,)]
history_df = pd.read_csv('c:\speedtest.csv')
master_df = pd.concat([history_df, df], axis=0)
new_master_df = master_df.drop_duplicates(keep="first")
new_master_df.to_csv('c:\emailspeedtest.csv', header=None, index=False)
There's a few different way you could approach this, one would be to read in your filtered dataset, append the new one in memory and then drop duplicates like this:
import pandas as pd
df = pd.read_csv('c:\speedtest.csv', header=0)
df = df[df['Download'].map(lambda x: x < 13000000.0,)]
history_df = pd.read_csv('c:\speedtestfilteronly.csv', header=None)
master_df = pd.concat([history_df, df], axis=0)
new_master_df = master_df.drop_duplicates(keep="first")
new_master_df.to_csv('c:\speedtestfilteronly.csv', header=None, index=False)
I have a csv file as follows:
Date,Data
01-01-01,111
02-02-02,222
03-03-03,333
The Date has the following format YEAR-MONTH-DAY. I would like to calculate from these dates the monthly average values of the data (there are way more than 3 dates in my file).
For that I wish to use the following code:
import pandas as pd
import dateutil
import datetime
import os,sys,math,time
from os import path
os.chdir("in/base/dir")
data = pd.DataFrame.from_csv("data.csv")
data['Month'] = pd.DatetimeIndex(data['Date']).month
mean_data = data.groupby('Month').mean()
with open("data_monthly.csv", "w") as f:
print(mean_data, file=f)
For some reason this gives me the error KeyError: 'Date'.
So it seems that the header is not read by pandas. Does anyone know how to fix that?
Your Date column header is read but put into the index. You got to use:
data['Month'] = pd.DatetimeIndex(data.reset_index()['Date']).month
Another solution is to use index_col=None while making the dataframe from csv.
data = pd.DataFrame.from_csv("data.csv", index_col=None)
After which your code would be fine.
The ideal solution would be to use read_csv().
data = pd.read_csv("data.csv")
Use the read_csv method. By Default it is comma separated.
import pandas as pd
df = pd.read_csv(filename)
print(pd.to_datetime(df["Date"]))
Output:
0 2001-01-01
1 2002-02-02
2 2003-03-03
Name: Date, dtype: datetime64[ns]
I posted part of this question couple of days ago with a good answer but that solved just part of my problem.
So, I have a excel file on which needs to be done some data mining and after that needs to get out another excel file with the same format .xlsx
The problem is that I get a strange column after I write the file, which cannot be seen before the writing using Anaconda. And that makes it harder to develop a strategy to counter it's appearance. initially I though I solved the problem by reducing the width to 0 but apparently at some point the file needs to be converted in text and then the columns reappears.
For more details here is part of my code:
import os
import pandas as pd
import numpy as np
import xlsxwriter
# Retrieve current working directory (`cwd`)
cwd = os.getcwd()
cwd
# Change directory
os.chdir("/Users/s7c/Documents/partsstop")
# Assign spreadsheet filename to `file`
file = 'file = 'SC daily inventory retrieval columns for reports'.xlsx
# Load spreadsheet
xl = pd.ExcelFile(file)
# Load a sheet into a DataFrame by name: df
df = xl.parse('Sheet1')
#second file code:
#select just the columns we need and rename them:
df2 = df.iloc[:, [1, 3, 6, 9]]
df2.columns = ['Manufacturer Code', 'Part Number', 'Qty Available', 'List Price']
#then select just the rows we need:
df21 = df2[df2['Manufacturer Code'].str.contains("DRP")]#13837 entries
#select just the DRP, first 3 characters and dropping the ones after:
df21['Manufacturer Code'] = df21['Manufacturer Code'].str[:3]
#add a new column:
#in order to do that we need to convert the next column to numeric:
df21['List Price'] = pd.to_numeric(df21['List Price'], errors='coerce')
df21['Dealer Price'] = df21['List Price'].apply(lambda x: x*0.48) #new column equals half of other column
writer = pd.ExcelWriter('example2.xlsx', engine='xlsxwriter')
# Write your DataFrames to a file
df21.to_excel(writer, 'Sheet1')
The actual view of the problem:
Any constructive idea is appreciated. Thanks!
This column seems to be the index of your DataFrame. You can exclude it by passing index=False to to_excel().