adding data to an existing empty dataframe containing only column names

adding data to an existing empty dataframe containing only column names - python

How can I add data to an existing empty column in a dataframe?
I have an empty dataframe with column names (stock tickers)
I am trying to add data to each stock, basically, populate the dataframe column by column, from left to right based on the header name.
I am pulling the data from another CSV file which looks like this (CSV file name = column name in the dataframe Im trying to populate):
PS aditional issue may arise due to the length of data available for each stock, eg. I may have a list of 10 values for the first stock, 0 for the second, and 25 for third. I plan to save this in a CSV, so perhaps it could not cause too big of an issue.
I have tried the following idea but without luck. any suggestions are welcome.
import pandas as pd
import os
path = 'F:/pathToFiles'
Russell3k_Divs = 'Russel3000-Divs/'
Russell3k_Tickers = 'Russell-3000-Stock-Tickers-List.csv'
df_tickers = pd.read_csv(path + Russell3k_Tickers)
divFls = os.listdir(path + Russell3k_Divs)
for i in divFls:
df = pd.read_csv(path + Russell3k_Divs + i)
Div = df['Dividends']
i = i[0].split('.')
df_tickers[i] = df_tickers.append(Div)
print(df_tickers)
break

import pandas as pd
import os
from tqdm import tqdm
path = 'F:/pathToFiles'
Russell3k_Divs = 'Russel3000-Divs/'
Russell3k_Tickers = 'Russell-3000-Stock-Tickers-List.csv'
df_tickers = pd.DataFrame()
divFls = os.listdir(path + Russell3k_Divs)
for i in tqdm(divFls):
df = pd.read_csv(path + Russell3k_Divs + i)
i = i.split('.')[0]
df[str(i)] = df['Date']
df_tickers = df_tickers.join(df[str(i)], how='outer')
df_tickers.to_csv('Russell-3000-Stock-Tickers-List1.csv', encoding='utf-8', index=False)
This answer was posted as an edit to the question adding data to an existing empty dataframe containing only column names by the OP Mr.Riply under CC BY-SA 4.0.

Related

Automatic transposing Excel user data in a Pandas Dataframe

I have some big Excel files like this (note: other variables are omitted for brevity):
and would need to build a corresponding Pandas DataFrame with the following structure.
I am trying to develop a Pandas code for, at least, parsing the first column and transposing the id and the full of each user. Could you help with this?

The way that I would tackle it, and I am assuming there are likely to be more efficient ways, is to import the excel file into a dataframe, and then iterate through it to grab the details you need for each line. Store that information in a dictionary, and append each formed line into a list. This list of dictionaries can then be used to create the final dataframe.
Please note, I made the following assumptions:
Your excel file is named 'data.xlsx' and in the current working directory
The index next to each person increments by one EVERY time
All people have a position described in brackets next to the name
I made up the column names, as none were provided
import pandas as pd
# import the excel file into a dataframe (df)
filename = 'data.xlsx'
df = pd.read_excel(filename, names=['col1', 'col2'])
# remove blank rows
df.dropna(inplace=True)
# reset the index of df
df.reset_index(drop=True, inplace=True)
# initialise the variables
counter = 1
name_pos = ''
name = ''
pos = ''
line_dict = {}
list_of_lines = []
# iterate through the dataframe
for i in range(len(df)):
if df['col1'][i] == counter:
name_pos = df['col2'][i].split(' (')
name = name_pos[0]
pos = name_pos[1].rstrip(name_pos[1][-1])
p_index = counter
counter += 1
else:
date = df['col1'][i].strftime('%d/%m/%Y')
amount = df['col2'][i]
line_dict = {'p_index': p_index, 'name': name, 'position': pos, 'date':date, 'amount': amount}
list_of_lines.append(line_dict)
final_df = pd.DataFrame(list_of_lines)
OUTPUT:

How can I export for loop result to CAV or Excel with pandas?

I have a for loop gets datas from a website and would like to export it to xlsx or csv file.
Normally when I print result of loop I can get all list but when I export that to xlsx file only get last item. Where is the problem can you help?
for item1 in spec:
spec2 = item1.find_all('th')
expl2 = item1.find_all('td')
spec2x = spec2[a].text
expl2x = expl2[a].text
yazim = spec2x + ': ' + expl2x
cumle = yazim
patern = r"(Brand|Series|Model|Operating System|CPU|Screen|MemoryStorage|Graphics Card|Video Memory|Dimensions|Screen Size|Touchscreen|Display Type|Resolution|GPU|Video Memory|Graphic Type|SSD|Bluetooth|USB)"
if re.search(patern, cumle):
speclist = translator.translate(cumle, lang_tgt='tr')
specl = speclist
#print(specl)
import pandas as pd
exp = [{ 'Prospec': specl,},]
df = pd.DataFrame(exp, columns = ['Prospec',])
df.to_excel('output1.xlsx',)

Create an empty list and, at each iteration in your for loop, append a data frame to the list. You will end up with a list of data frames. After the loop, use pd.concat() to create a new data frame by concatenating every element of your list. You can then save the resulting df to an excel file.
Your code would look something like this:
import pandas as pd
df_list = []
for item1 in spec:
......
if re.search(patern, cumle):
....
df_list.append(pd.DataFrame(.....))
df = pd.concat(df_list)
df.to_excel(.....)

Column appended to dataframe coming up empty

I have the following code:
import glob
import pandas as pd
import os
import csv
myList = []
path = "/home/reallymemorable/Documents/git/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports_us/*.csv"
for fname in glob.glob(path):
df = pd.read_csv(fname)
row = df.loc[df['Province_State'] == 'Pennsylvania']
dateFromFilename = os.path.basename(fname).replace('.csv','')
fileDate = pd.DataFrame({'Date': [dateFromFilename]})
myList.append(row.join(fileDate))
concatList = pd.concat(myList, sort=True)
print(concatList)
concatList.to_csv('/home/reallymemorable/Documents/test.csv', index=False, header=True
It goes through a folder of CSVs and grabs a specific row and puts it all in a CSV. The files themselves have names like 10-10-2020.csv. I have some code in there that gets the filename and removes the file extension, so I am left with the date alone.
I am trying to add another column called "Date" that contains the filename for each file.
The script almost works: it gives me a CSV of all the rows I pulled out of the various CSVs, but the Date column itself is empty.
If I do print(dateFromFilename), the date/filename prints as expected (e.g. 10-10-2020).
What am I doing wrong?

I believe join has how=left by default. And your fileDate dataframe has different index than row, so you wouldn't get the date. Instead, do an assignment:
for fname in glob.glob(path):
df = pd.read_csv(fname)
row = df.loc[df['Province_State'] == 'Pennsylvania']
dateFromFilename = os.path.basename(fname).replace('.csv','')
myList.append(row.assign(Date=dateFromFilename))
concatList = pd.concat(myList, sort=True)
Another way is to store the dataframes as a dictionary, then concat:
myList = dict()
for fname in glob.glob(path):
df = pd.read_csv(fname)
row = df.loc[df['Province_State'] == 'Pennsylvania']
dateFromFilename = os.path.basename(fname).replace('.csv','')
myList[dateFromFilename] = row
concatList = pd.concat(myList, sort=True)

Concatenating dataframes adding additional columns

I'm trying to create a combined dataframe from a series of 12 individual CSVs (12 months to combine for the year). All the CSVs have the same format and column layout.
When I first ran it, it appeared to work and I was left with a combined dataframe with 6 columns (as expected). Upon looking at it, I found that the header row was applied as actual data in all the files, so I had some bad rows I needed to eliminate. I could manually make these changes but I'm looking to have the code take care of this automatically.
So to that end, I updated the code so it only read in the first CSV with headers and the remaining CSVs without headers and concatenate everything together. This appears to work BUT I end up with 12 columns instead of 6 with the first 6 columns having NaNs for the first CSV and the last 6 columns having NaNs for the other 11 CSVs, which is obviously NOT what I want (see image below).
The code is similar, I just use the header=None parameter in pd.read_csv() for the 11 CSVs after the first (and I don't use that parameter for the first CSV). Can anyone give me a hint as to why I'm getting 12 columns (with the data placement as described) when I run this code? The layout of the CSV file is shown below.
Appreciate any help.
import pandas as pd
import numpy as np
import os
# Need to include the header row only for the first csv (otherwise header row will be included
# for each read csv, which places improperly formatted rows into the combined dataframe).
totrows = 0
# Get list of csv files to read.
files = os.listdir('c:/data/datasets')
# Read the first csv file, including the header row.
dfSD = pd.read_csv('c:/data/datasets/' + files[0], skip_blank_lines=True)
# Now read the remaining csv files (without header row) and concatenate their values
# into our full Sales Data dataframe.
for file in files[1:]:
df = pd.read_csv('c:/data/datasets/' + file, skip_blank_lines=True, header=None)
dfSD = pd.concat([dfSD, df])
totrows += df.shape[0]
print(file + " == " + str(df.shape[0]) + " rows")
print()
print("TOTAL ROWS = " + str(totrows + pd.read_csv('c:/data/datasets/' + files[0]).shape[0]))

One simple solution is the following.
import pandas as pd
import numpy as np
import os
totrows = 0
files = os.listdir('c:/data/datasets')
dfSD = pd.read_csv('c:/data/datasets/' + files[0], skip_blank_lines=True)
columns = []
dfSD = []
for file in files:
df = pd.read_csv('c:/data/datasets/' + file, skip_blank_lines=True)
if not columns:
columns = df.columns
df.columns = columns
dfSD.append(df)
totrows += df.shape[0]
print(file + " == " + str(df.shape[0]) + " rows")
dfSD = pd.concat(dfSD, axis = 0)
dfSD = dfSD.reset_index(drop = True)
Another possibility is:
import pandas as pd
import numpy as np
import os
# Need to include the header row only for the first csv (otherwise header row will be included
# for each read csv, which places improperly formatted rows into the combined dataframe).
totrows = 0
# Get list of csv files to read.
files = os.listdir('c:/data/datasets')
# Read the first csv file, including the header row.
dfSD = pd.read_csv('c:/data/datasets/' + files[0], skip_blank_lines=True)
df_comb = [dfSD]
# Now read the remaining csv files (without header row) and concatenate their values
# into our full Sales Data dataframe.
for file in files[1:]:
df = pd.read_csv('c:/data/datasets/' + file, skip_blank_lines=True, header=None)
df.columns = dfSD.columns
df_comb.append(df)
totrows += df.shape[0]
print(file + " == " + str(df.shape[0]) + " rows")
dfSD = pd.concat([df_comb], axis = 0).reset_index(drop = True)

Column name in Csv changes after merging

I have multiple csv files that i need to merge. The column names are:
idSite idVisit visitIp visitorId
However in the merged file the column 'idSite' changes to 'ï»¿idSite'
This is the program i wrote. Everything else seems to be fine.
import pandas as pd
import os
dirListing = os.listdir("D:/Python/Test/Diku/piwik/filteredcsv/")
df=[]
siteIds = [34]
for id in siteIds:
for item in dirListing:
if str(id) in item:
print item
df.append(pd.read_csv(item,sep = ",",dtype='unicode'))
df3 = pd.concat(df,axis=0, ignore_index=True)
df3.to_csv('merged_' + str(id) + '_raw'+'.csv', sep =',')
Can't seem to figure out the problem. Is it a encoding issue?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

adding data to an existing empty dataframe containing only column names - python

Related

Automatic transposing Excel user data in a Pandas Dataframe

How can I export for loop result to CAV or Excel with pandas?

Column appended to dataframe coming up empty

Concatenating dataframes adding additional columns

Column name in Csv changes after merging

Categories

Resources