Iterate string.append over rows Python - python

I have a data set like this:
and the objective is to have each cell of the data frame in one list whereby each cell is a word in itself.
I have tried originally with:
string = []
for words in data:
string.append(words)
but it gives me the wanted result for the first row only:
.
I have tried also to iterate it using iterrows but it creates just pairs of value not very useful.
How I can iterate the append function over all the rows and store the results as a single string?

I solved it with this:
string = []
for words in data.itertuples(index=False):
string += words

Related

Splitting a list a specific amount of times

I have an input file containing names, last names and GPA's of students. I'm having a bit of trouble as this is the first time I'm working with splitting text files. I want to split the text file, store it in the Temp list, then put the first and last names in Names and GPA's in Scores. The code below splits the file and puts it in Temp but unfortunately it is splitting it by name, last name and GPA.Is there a way to have it split by names and GPA and not names last names and GPA? This is my output:
enter image description here
This is what I came up with so far:
def main():
try:
inputFile=open("input.txt", "r")
outputFile=open("output.txt", "w")
Names=[]
Scores=[]
Temp=[]
for i in inputFile:
splitlist=i.split()
Temp.append(splitlist)
print(Temp)
except:
print(" ")
main()
You can use rsplit and specify a maximum number of splits
splitlist = i.rsplit(maxsplit=1)
Except for splitting from the right, rsplit() behaves like split()
So, basically, there are 2 ways to achieve your desired result
1. Splitting at the last occurence (easy, recommended method)
In Python, strings can be split in two ways:
"Normal" splitting (str#split)
"Reverse" splitting (str#rsplit), which is like split, but it from the end of the string
Therefore, you can do something like this:
splitlist = i.rsplit(maxsplit=1)
2. Overengineered method (not recommended, but overengineered)
You can reverse your string, use split with a maxsplit of 1, reverse the resulting list AND reverse every entry in the list
So:
splitlist = [j[::-1] for j in i[::-1].split(maxsplit=1)][::-1]
;)

When I import a data set into an array, the length of each element appears as 1

I'm making a program where I import a long list of words from a .txt file into one array called wordlist. I then want to sort them into categories based on the length of the words. However for some reason, when the words are stored in the array, the length shows up as 1 for every single one.
Here is the code
wordlist = []
with open('words.txt', 'r') as words:
for line in words:
strplines = line.strip()
list = strplines.split()
wordlist.append(list)
loading = loading + 1
print(loading,'/ 113809 words loaded')
If I then do something like this
print(len(wordlist[15000]))
The output is 1 despite that word actually being 6 characters long.
I tried this in another program, but the only difference was that I manually inputted a few elements into the array and it worked. That means theres probably an issue with the way I strip the lines from the .txt file.
So the wordlist is an array of arrays? If so when you check the len of it's element, it would return the number of elements in this array so 1. But if you do something like
len(wordlist[1500][0])
you get the len of the first word that is stored in array at index 1500.
It looks that you do not want to append to the array (you would add a list), but you want to extend the array.
And please, please, even if builtins are not reserved words, avoid to use them! So call your list lst or mylist or whatever but not list...
The code could become:
wordlist = []
with open('words.txt', 'r') as words:
for line in words:
strplines = line.strip()
lst = strplines.split()
wordlist.extend(lst)
loading = loading + 1
print(loading,'/ 113809 words loaded')

How to create multiple files for writing multiple lists on the fly using Python?

I will try to provide example for the question.
Let's say we have 3 lists. e.g :-
list1 =['one','two','three']
list2=['a','b','c']
list3=['mike','jack','ram']
Or, say there are list values for each lines in the file.
['one','two','three']
['a','b','c']
['mike','jack','ram']
Now I want to write the three lists to three different files by creating them. The names of the files should be autogenerated e.g:-
file001.txt
file002.txt
file003.txt
I am assuming that your data is in the console and each list is a line.
something like this:
line1 =['one','two','three']
line2=['a','b','c']
line3=['mike','jack','ram']
I merged all the data into one lists of list
all_data = [line1] + [line2] + [line3]
This above part is not necessary if all the list values are line by line in one variable. If not you can merge them using some method.
Now, write each line (list values) to the different file:
count = 1
for data in all_data:
output = open('file' + str(count) + '.txt', 'w')
output.write(','.join(data))
count += 1
output.close()
This keeps going on until the last value of the list. So, based on how many lists are there. If you want to join the values inside the list you can change the ''.join with something desirable in the single quotes ('').
Hope I helped.
You can see a detailed explanation here . But to sum it all up , you define an object of the type file , by opening a file(or creating one if it doesn't exist) , and then writing / reading / etc...
Use enumerate and string formatting to construct the file names.
s = 'file{:03}.txt'
for n, lyst in enumerate((list1, list2, list3), 1):
fname = s.format(n)
with open(fname, 'w') as f:
#f.write(','.join(lyst))
f.write('\n'.join(lyst))
If any of the items are not strings change the write to
f.write('\n'.join(map(str, lyst)))
If the lists are so long that creating a single string to write to the file is prohibitive, change the write to
for thing in lyst:
f.write('{}\n'.format(thing))

How to check if multiple words contain in a CSV column?

I found several similar questions online on how to find a word in a specific column in a CSV file however, I didn't find any on checking multiple words.
Here's the problem:
words = ‘something’
for row in data:
if words not in row[‘column header name’]:
writer.writerow[row]
The above code prints all the data into a csv file. It also checks if 'something-...' is in the row['column header name'] then it skips that row.
For example, in that column, each cell contains strings such as 'something-asdas' or 'something-aaaa'. It's checking part of that string and skips it.
Goal:
words = (‘something’, 'word' , 'dada')
for row in data:
if words not in row[‘column header name’]:
writer.writerow[row]
When I try to do that, I get TypeError: 'in ' requires string as left operand, not tuple.
Any ideas on how to fix that problem? I tried to iterate through the tuple but then my csv file will have duplicates.
You need the built-in any() checking if any of the words in the tuple is present in a row:
words = ("something", "word" , "dada")
for row in data:
# skipping rows containing any of the "words"
if not any(word in row["column header name"] for word in words):
writer.writerow(row)
Of course, this assumes that row is a dictionary and you've used the DictReader.

I cannot get split to work, what am I doing wrong?

Here is the code for the program that I have done so far. I am trying to calculate the efficiency of NBA players for a class project. When I run the program on a comma-delimited file that contains all the stats, instead of splitting on each comma it is creating a list entry of the entire line of the stat file. I get an index out of range error or it treats each character as a index point instead of the separate fields. I am new to this but it seems it should be creating a list for each line in the file that is separated by elements of that list, so I get a list of lists. I hope I have made myself understood.
Here is the code:
def get_data_list (file_name):
data_file = open(file_name, "r")
data_list = []
for line_str in data_file:
# strip end-of-line, split on commas, and append items to list
line_str.strip()
line_str.split(',')
print(line_str)
data_list.append(line_str)
print(data_list)
file_name1 = input("File name: ")
result_list = get_data_list (file_name1)
print(result_list)
I do not see how to post the data file for you to look at and try it with, but any file of numbers that are comma-delimited should work.
If there is a way to post the data file or email to you for you to help me with it I would be happy to do so.
Boliver
Strings are immutable objects, this means you can't change them in place. That means, any operation on a string returns a new one. Now look at your code:
line_str.strip() # returns a string
line_str.split(',') # returns a list of strings
data_list.append(line_str) # appends original 'line_str' (i.e. the entire line)
You could solve this by:
stripped = line_str.strip()
data = stripped.split(',')
data_list.append(data)
Or concatenating the string operations:
data = line_str.strip().split(',')
data_list.append(data)

Categories