Create new csv based on content of row - python

I have a .csv called cleaned_data.csv formatted like so:
Date,State,Median Listing Price
1/31/2010,Alabama,169900
2/28/2010,Alabama,169900
3/31/2010,Alabama,169500
1/31/2010,Alaska,239900
2/28/2010,Alaska,241250
3/31/2010,Alaska,248000
I would like to create a new .csv file for each state, named {state}.csv, that has the Date and Median Listing Price.
So far I have this:
import pandas
csv = pandas.read_csv('cleaned_data.csv', sep='\s*,\s*', header=0, encoding='utf-8-sig')
state_list = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', ...]
for state in state_list:
csv = csv[csv['State'] == f'{state}']
csv.to_csv(f'state_csvs/{state}.csv', index=False, sep=',')
This successfully creates 51 .csv files named after each state, but only the Alabama.csv has Date, State, and Median Listing Price data for Alabama. Every other .csv only has the following headers with no data:
Date,State,Median Listing Price
Can someone explain to me why this is happening and how to fix it or a better way to do it?
Bonus points: I don't actually need the "State" column in the new .csv files but I'm unsure how to only add Date and Median Listing Price.

Try:
for i in df['State'].unique():
df.loc[df['State'] == i][['Date', 'Median Listing Price']].to_csv(f'state_csvs/{i}.csv', index=False)

Related

How to keep one column of information while feeding another column

I have an input file with two columns:
county name
id
"York, SC"
GHCND:USW00053871
I use the id column to create a list. Then I have a loop for the id to make an online weather request.
input_file = "id_in_sample.csv"
df = pd.read_csv(input_file)
list = df['id'].tolist()
Then I save the weather information along with the id as a csv file.
id
weather
GHCND:USW00053871
5
I am trying to collect the county name in my file as well. So that it looks like the following table.
county name
id
weather
"York, SC"
GHCND:USW00053871
5
Is there a way to do it within the loop when I make a request or should I try to merge/join once I have the csv file?
Read the input file (id_in_sample.csv) using pandas
df = pd.DataFrame({
"county name": ["York, SC"],
"id": ["GHCND:USW00053871"]
})
As per your statement - Then I save the weather information along with the id as a csv file. the weather information is saved in another csv file. load the csv file using pandas
df2 = pd.DataFrame({
"id": ["GHCND:USW00053871"],
"weather": [5]
})
now merge both the dataframes based on id
df.merge(df2, on=["id"], how="inner")
Sample result
county name id weather
0 York, SC GHCND:USW00053871 5
if needed you can save the merged result back to csv file.
Note: Instead of reading the second dataframe (weather info) from file, you can compute the weather values , merge and then write to a file. but it completely depends on the usecase.

How can I convert the column names(task, asset,name,owner) as row and store it in a new .csv file using Python?

In Python, how can I convert the column names(task, asset,name,owner) as row and store it in a new .csv file ?
Data Set (sample_change.csv) :
task asset name owner
JJJ01 61869 assetdev hoskot,john (100000)
JJJ02 87390 assetprod hope, ricky (100235)
JJJ10 28403 assetprod shaw, adam (199345)
The below is the code I started to write, but couldn't think of an approach.
import pandas as pd
import csv
#reading csv file and making the data frame
dataframe = pd.read_csv(r"C:\AWSGEEKS\dataset\sample_change.csv")
columns = list(dataframe.head(0))
print(columns)
Output :
columns
task
asset
name
owner
To write as a single row:
pd.DataFrame(columns=dataframe.columns).to_csv('header.csv')
To write as as single column:
pd.DataFrame(dataframe.columns).to_csv('header.csv', index=False, header=['Name'])
df = pd.DataFrame(dataframe.columns, columns=['column names'])

want to add some columns from multiple dataframe into one specific dataframe

so basically I have downloaded multiple stocks data in and stored in CSV format so I created a function to that and passed a list of stocks name to that user-defined function .so one stock data have multiple columns in like open price, close price etc so I want close price column from every stock df stored in a new data frame with stock names as heading to the columns in new data frame with their close prices in it
so I created a function to download multiple stocks data and passed a list of stocks names to get data I wanted and the function stores them in a CSV format
2) then I tried creating a for loop which reads each and every stock data CSV file and tries to pick only close column from each stock dataframe and store it another empty data frame so i have a data frame of-the stocks close prices with their column header as stock name of the close prices so i was succesful in dowloading the stocks data but failed in 2 part
stocks = ['MSFT','IBM', 'GM', 'ACN', 'GOOG']
end=datetime.datetime.now().date()
start=end-pd.Timedelta(days=365*5)
def hist_data(stocks):
stock_df=web.DataReader(stocks,'iex',start,end)
stock_df['Name']=stocks
fileName=stocks+'_data.csv'
stock_df.to_csv(fileName)
with futures.ThreadPoolExecutor(len(stocks)) as executor:
result=executor.map(dwnld_data,stocks)
print('completed')
#failing in the code below
close_prices = pd.DataFrame()
for i in stocks:
df = pd.read_csv(i + '_data.csv')
df1 = df['close']
close_prices.append(df1)
#so when I try to print close_prices I get blank output
Try the following:
close_prices = pd.DataFrame()
for i in stocks:
df = pd.read_csv(i + '_data.csv')
close_prices[i] = df['close']

How to get all pages data from site and save?

I have the data currencies cost in page. I want to download all data from 2000.01.01 to 2018.12.01. In the page i can download or get data for one day but i want fol all period or for th one year and save it to csv file. How can i do this?
I have tried to get one date and save it to csv. And also try to parse it with urllib but also can`t get all data what i need.
import pandas as pd
data = pd.read_html('http://www.nbt.tj/ru/kurs/kurs.php?date=01.02.2016')
data = data[2]
data.to_csv('currencies.csv', index=False)
Create date range in custom format, loop, get DataFrame and write each DataFrame separately with append mode, where is necessary remove header and write only for first DataFrame:
dates = pd.date_range('2010-01-01', '2018-12-01').strftime('%d.%m.%Y')
for i, x in enumerate(dates):
data = pd.read_html('http://www.nbt.tj/ru/kurs/kurs.php?date={}'.format(x))[2]
if i == 0:
data.to_csv('currencies.csv', index=False)
else:
data.to_csv('currencies.csv', index=False, mode='a', header=None)

python pandas - how to convert date wise stock csv to stock wise csv files

I have date wise stock csv files like below.
EQ070717.CSV
EQ070716.CSV
EQ070715.CSV
[...]
They have stock data in this format:
SC_NAME,OPEN,HIGH,LOW,CLOSE
ABB,1457.70,1469.95,1443.80,1452.90,
AEGI,189.00,193.00,187.40,188.70
HDFC,1650.00,1650.00,1617.05,1629.20
[...]
How can i convert them to stock specific csv files which can be loaded as pandas datafframe. I could do it in .net, but just wanted to know if there is any straightforward way available in python/pandas.
Edit: Adding expected output
Create individual stock files based on stock name:
ABB.csv
AEGI.csv
HDFC.csv
For each stock pull in stock data from all files and add to that stock csv:
For example stock ABB, read stock data from each date wise csv, and add that info to a new line in csv ABB.csv. Date value can be picked from file name or file modified date property also.
DATE, OPEN,HIGH,LOW,CLOSE
070717, 1457.70,1469.95,1443.80,1452.90
070716, 1456.70,1461.95,1441.80,1450.90
070715, 1455.70,1456.95,1441.80,1449.90
I think you need glob for select all files, create list of DataFrames dfs in list comprehension and then use concat for one big DataFrame from all CSVs:
import glob
files = glob.glob('files/*.CSV')
dfs = [pd.read_csv(fp) for fp in files]
df = pd.concat(dfs, ignore_index=True)
If necessary filenames in output DataFrame:
files = glob.glob('files/*.CSV')
dfs = [pd.read_csv(fp) for fp in files]
#win solution - double split
keys = [x.split('\\')[-1].split('.')[0] for x in files]
df = pd.concat(dfs, keys=keys)
adding to #jezrael solution. as user wants each stock specific csv file.
for stock_name in df.SC_NAME.unique():
df[df['SC_NAME']==stock_name].to_csv(path_to_dir+stock_name+'.csv')
My approach would be to set up a sqlite database with a single table. Just three columns, market_date, symbol, and csv_string (maybe a col for line# in the file if you want relative positions preserved). Read all the files and load the data into the table line by line. Create an index on the table on the symbol column. Then create a cursor for read symbol, csv_string from stock_table order by symbol, market_date. Use itertools.groupby to let you know when you have looped over all of a symbol so you can close the last file and open the next.
Of course if you have little enough data that it can all fit into memory you just insert tuples into a list, sort the list and use groupby to loop over it to make your files.

Categories