I thought this would be fairly simple but I am stuck. I am trying to repeat a row of data based upon a population field. For example, if the population is 921, the row needs to be repeated 921 times, and then move to the next row and repeat based upon the population. The csv file does have a header. I tried removing that and ran into problems so I put the header back.
i = 0
while i < pop:
if pop == 'F21_64':
break
else:
# writerow
i += 1
I keep getting this error code. IndexError: list index out of range
You left a lot to assumption but i'll try to answer. For one thing, it is not clear what you want to do with the pop, other than you want to do something the amount of times of its value (print to screen, output to file???) I will assume print to screen.
I imagine you are trying to do something like this: (lets assume your population field is in column 2)
for row in rows[1:]: #dont look at the header row
pop = row.split(',')[1] #isolate just the pop value
popvalue = int(pop) #convert to int
for i in range(0,popvalue): #for the number of the value...
print row #do the thing you want with the entire row
Related
I have a dataframe called 'dft' of Netflix's TV Shows and movies, with a column, named "listed_in" with entries being a string of all the genres TV shows are classified under. Each row entry has multiple genre classification of different lengths. The genres are written as strings and separated by commas.
A single entry is something like, for example: 'Documentary','International TV Shows','Crime TV Shows'. Another row entry may have different number of genres it classifies under, some of who may be the same as some of the genres of other rows entries.
Now I want to create a list of the unique values in all the rows.
genres = []
for i in range(0,len(dft['listed_in'].str.split(','))):
for j in range(0,len(dft['listed_in'].str.split(',')[i])):
if (dft['listed_in'].str.split(',')[i][j]) not in genres:
genres.append(dft['listed_in'].str.split(',')[i][j])
else:
pass
This keeps the kernel running indefinitely. But the thing is, the list is being created. If I interrupt the kernel after some time, and print the list its there.
Then, I create a dataframe out of this list with the intention of having a column with the count of times each genre appears in the original dataframe.
data = {'Genres':genres,'count':[0 for i in range(0,len(genres))]}
gnr = pd.DataFrame(data = data)
Then to change the count column to each genre's count of occurrence:
for i in range(0,65):
for j in range(0,514):
if gnr.loc[i,'Genres'] in (dft['listed_in'].str.split(',').index[j]):
gnr.loc[i,'count'] = gnr.loc[i,'count'] + dft['listed_in'].str.split(',').value_counts()[j]
else:
pass
Then again this code keeps running indefinitely, but after interrupting it I saw the count for the 1st entry was updated in the gnr dataframe.
I don't know what is happening.
Are you sure that the process actually hangs? For loops with pandas is much slower than you would expect especially with the number of iterations you are doing (65*514). If you haven't already id put in a print(i) so you get some insight as to what iteration you're on
I'm fairly new to Python and still learning the ropes, so I need help with a step by step program without using any functions. I understand how to count through an unknown column range and output the quantity. However, for this program, I'm trying to loop through a column, picking out unique numbers and counting its frequency.
So I have an excel file with random numbers down column A. I only put in 20 numbers but let's pretend the range is unknown. How would I go about extracting the unique numbers and inputting them into a separate column along with how many times they appeared in the list?
I'm not really sure how to go about this. :/
unique = 1
while xw.Range((unique,1)).value != None:
frequency = 0
if unique != unique: break
quantity += 1
"end"
I presume as you can't use functions this may be homework...so, high level:
You could first go through the column and then put all the values in a list?
Secondly take the first value from the list and go through the rest of the list - is it in there? If so then it is not unique. Now remove the value where you have found the duplicate from the list. Keep going if you find another remove that too.
Take the second value and so on?
You would just need list comprehension, some loops and perhaps .pop()
Using pandas library would be the easiest way to do. I created a sample excel sheet having only one column called "Random_num"
import pandas
data = pandas.read_excel("sample.xlsx", sheet_name = "Sheet1")
print(data.head()) # This would give you a sneak peek of your data
print(data['Random_num'].value_counts()) # This would solve the problem you asked for
# Make sure to pass your column name within the quotation marks
#eg: data['your_column'].value_counts()
Thanks
Hi I am trying to make a program that takes any CSV.file and puts it into a list to read from. The row for the CSV.file will be displayed to the user for them to choose what they would like to do. Their decision will take them to a specific row and it will keep doing this till the user ends the program. The row they are taken to needs to be decided by the CSV.file and not user input. In the CSV.file there will be numbers at the end of the row showing which row to print out based on the user input. For example eat, sleep, run, walk 1, 2, 3, 4. So if they choose eat print row 1 if they choose sleep print row 2 and so on. But it cannot be hard coded because different CSV.files can be used in the program.
So im not sure if I should take the CSV.file and separate the integers from the text into two separate lists and then reference and have them reference each other?
Hello you have a good plan for thinking
import csv
# For the average
from statistics import mean
def calculate_averages(input_file_name, output_file_name):
with open("pr.file1.csv") as f:
red=csv.reader(f)
for row in red:
name=row[0]
aray1=row[1:]
aray2=[]
for number in aray1:
aray2.append(int(number))
return mean(aray2)
If this helped you please mark it as the answer to your question!
First, read the CSV file into a data frame then you'll have all the info in a list like data structure. Use pandas CSV => to dataframe. Use pandas.df.iloc[0] to get the row and specific column. This is what I understood from your question. It will go something like below:
import pandas as pd
def getRow(df, i, j): # use this function loop over it to get a specific row
print(df.iloc[i])
print(df.iloc[i, j])
df = pd.read_csv("filename.csv")
while(True):
i = int(input("Row Index")) # Loop over these two lines to continuously do that
j = int(input("Column Index"))
getRow(df, i, j)
I am working on a project using python to select certain values from an excel file. I am using the xlrd library and openpyxl library to do this.
The way the python program should we working is :
Grouping all the data point entries that are in a certain card tase. These are marked in column E. For example, all of the entries between row 26 and row 28 are in Card Task A, and hence they should be grouped together. All entries without a “Card Task” value in column E should not be considered as anything.
Next…
looking at the value from column N (lastExecTime) from a row and compare that time with the following value in column M
If it is seen that the times overlap (column M is less than the previous N value) it will increment a variable called “count” . Count stores the number of times a procedure overlaps.
Finally…
As for the output, the goal is to create a separate text file that displays which tasks are overlapping, and how many tasks overlap in a certain Card Task.
The problem that I am running into is that I cannot pair the data from a card task
Here is a sample of the excel data:
The data (a picture of it)
Here is a picture of more data (this will probably be more helpful)
Click here for it
And here is the code that I have written that tells me if there are multiple procedures going on:
from openpyxl import load_workbook
book = load_workbook('LearnerSummaryNoFormat.xlsx')
sheet = book['Sheet1']
for row in sheet.rows:
if ((row[4].value[:9]) != 'Card Task'):
print ("Is not a card task: " + str(row[1].value))
Essentially my problem is that I am not able to compare all the values from one card task with each other.
Blockquote
I would read through the data once like you have already but store all rows with 'Card Task' in a separate list. Once you have a list of only card task items you can compare.
card_task_row_object_list = []
count = 0
for row in sheet.rows:
if 'Card Task' in row[4]:
card_task_row_object_list.append(row)
From here you would want to compare the time values. What are you needed to check, if two different card task times overlap?
(row 12: start, row 13: end)
def compare_times(card_task_row_object_list):
for row in card_task_row_object_list:
for comparison_row in card_task_row_object_list:
if (comparison_row[12] <= row[13] && comparison_row[13] >= row[12])
# No overlap
else
count+=1
I'm attempting to load numerical data from CSV files in order to loop through the calculated data of each stock(file) separately and determine if the calculated value is greater than a specific number (731 in this case). However, the method I am using seems to make Python repeat the list as well as add quotation marks around the numbers ('500'), as an example, making them strings. Unfortunately, I think the final "if" statement can't handle this and as a result it doesn't seem to function appropriately. I'm not sure what's going on and why Python what I need to do to get this code running properly.
import csv
stocks = ['JPM','PG','GOOG','KO']
for stock in stocks:
Data = open("%sMin.csv" % (stock), 'r')
stockdata = []
for row in Data:
stockdata.extend(map(float, row.strip().split(',')))
stockdata.append(row.strip().split(',')[0])
if any(x > 731 for x in stockdata):
print "%s Minimum" % (stock)
Currently you're adding all columns of each row to a list, then adding to the end of that, the first column of the row again? So are all columns significant, or just the first?
You're also loading all data from the file before the comparison but don't appear to be using it anywhere, so I guess you can shortcut earlier...
If I understand correctly, your code should be this (or amend to only compare first column).
Are you basically writing this?
import csv
STOCKS = ['JPM', 'PG', 'GOOG', 'KO']
for stock in STOCKS:
with open('{}Min.csv'.format(stock)) as csvin:
for row in csv.reader(csvin):
if any(col > 731 for col in map(float, row)):
print '{} minimum'.format(stock)
break