cellranger: how to convert a gtf file to string - python

I am using a cellranger mkref and faced with a strange python problem with GTF (custome gtf):
Traceback (most recent call last):
File "/home/user/cellranger-6.0.1/lib/python/cellranger/reference.py", line 750, in validate_gtf
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
File "/home/user/cellranger-6.0.1/external/anaconda/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/home/user/cellranger-6.0.1/external/anaconda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['gtf_to_gene_index', '/home/user/cellranger-6.0.1/indexes', '/home/user/cellranger-6.0.1/indexes/tmp74f_vsxg.json']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/cellranger-6.0.1/bin/rna/mkref", line 139, in <module>
main()
File "/home/user/cellranger-6.0.1/bin/rna/mkref", line 130, in main
reference_builder.build_gex_reference()
File "/home/user/cellranger-6.0.1/lib/python/cellranger/reference.py", line 613, in build_gex_reference
self.validate_gtf()
File "/home/user/cellranger-6.0.1/lib/python/cellranger/reference.py", line 753, in validate_gtf
raise GexReferenceError("Error detected in GTF file: " + exc.output) from exc
TypeError: can only concatenate str (not "bytes") to str
Also, I have the similar gtf file, which cellranger accepts without problems. I compared those files (moreover, the firs one i made from the second one):
file 1: text/plain; charset=us-ascii
file 2: text/plain; charset=us-ascii
Also, I checked with cat -vE and the files is the same
How can I change the file?
Thanks in advance!

I encountered the same issue. The problem was in duplicated IDs in my GTF file. Removing those duplicates solved the issue. See the discussion on Cellranger GitHub: https://github.com/10XGenomics/cellranger/issues/125

Related

Errno22 in merger.write

merging pdf pages...
Traceback (most recent call last):
File "D:\downloader.py", line 262, in <module>
merger.write(f"{book_title}.pdf")
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyPDF2\_merger.py", line 344, in write
my_file, ret_fileobj = self.output.write(fileobj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyPDF2\_writer.py", line 932, in write
stream = FileIO(stream, "wb")
^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument: 'The syntax of serial verbs in Gojri.pdf'
I tried to capture multiple webpages and merge them, but something went wrong as shown above. So How can I fix that problem?

py4j.protocol.Py4JError: An error occurred while calling o112.save

I'm running a pyspark job submit on a university server:
My configuration is :
--master yarn --deploy-mode cluster --num-executors 150 --executor-cores 4 --executor-memory 28g --driver-memory 28g
My first few steps runs correctly :
df = spark.read.format('csv') \
.option('header',True) \
.option('multiLine', True) \
.load(data_file)
df.show()
udf_function = udf(stamp, StringType())
new_df = df.withColumn("column_a", udf_function(struct([df[x] for x in df.columns])))
new_df.show()
When I try to run the following commands separately, I get two very similar errors:
Command 1:
new_df.select("column_a").distinct().show(100)
Error:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "python_stamp.py", line 93, in <module>
main()
File "python_stamp.py", line 82, in main
new_df.select("planning_cluster_id").distinct().show(100)
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/pyspark.zip/pyspark/sql/dataframe.py", line 380, in show
Command 2:
new_df.write.mode("overwrite").format("csv").option("delimiter", ",").option("header", "true").save(save_path)
Error:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "python_stamp.py", line 91, in <module>
main()
File "python_stamp.py", line 83, in main
new_df.write.mode("overwrite").format("csv").option("delimiter", ",").option("header", "true").save(save_path)
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/pyspark.zip/pyspark/sql/readwriter.py", line 738, in save
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o112.save
Does anyone know the reason behind it? I'm pretty confident that it's not because of any memory error, as the previous steps which show the table, load the table all are running correctly.
Additional information: When I run all of these commands on pyspark shell, they run perfectly well.

Dash traceback not leading to any line in my code

I am working on my first Dash app. I am getting the following error:
"Cannot set a frame with no defined index "
The issue is that the traceback doesn’t lead to any line in my code. It only goes through installed libraries:
Traceback (most recent call last):
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\frame.py", line 3540, in _ensure_valid_index
value = Series(value)
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\series.py", line 316, in __init__
data = SingleBlockManager(data, index, fastpath=True)
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\internals\managers.py", line 1516, in __init__
block = make_block(block, placement=slice(0, len(axis)), ndim=1)
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\internals\blocks.py", line 3284, in make_block
return klass(values, ndim=ndim, placement=placement)
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\internals\blocks.py", line 2792, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\internals\blocks.py", line 128, in __init__
"{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
ValueError: Wrong number of items passed 3, placement implies 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\LIAG8802\Documents\Procurement_analytics\venv\lib\site-packages\pandas\core\frame.py", line 3543, in _ensure_valid_index
"Cannot set a frame with no defined index "
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
How can I know which line in my code is causing the error? Note that I found out what the problem was but I want to know how I could have traced it more easily.
EDIT: note that I cannot track the issue putting prints at several lines since the error pops out in the dash debug console, not in the anaconda console which does not crash

subprocess.CalledProcessError : No such file or directory. Even though the file exists

tail: cannot open 'home/sourabh/sanju.txt' for reading: No such file or directory
Traceback (most recent call last):
File "/home/sourabh/resizeWindow.py", line 23, in <module>
line = subprocess.check_output(['tail', '-1', 'home/sourabh/sanju.txt']).split(' ')[3:]
File "/usr/lib/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['tail', '-1', 'home/sourabh/sanju.txt']' returned non-zero exit status 1
I have crossed checked if the file exists and even intentionally created a file.
The exact line in my python code is:
line = subprocess.check_output(['tail', '-1', 'home/sourabh/sanju.txt']).split(' ')[3:]
Edit: As mentioned by #PlumnSemPy this link solves my problem:
What is the most efficient way to get first and last line of a text file?
Try:
line = subprocess.check_output(['tail -1 home/sourabh/sanju.txt'], shell=True).split(' ')[3:]
But heed to the warning here: https://docs.python.org/2/library/subprocess.html#frequently-used-arguments

Python - catch exception within exception?

I have this exception defined:
class ArgumentsException(Exception):
"""Exception that is raised when incorrect arguments are used."""
pass
Now I run my test where I run my program through sh package. And when it raises this expected exception, sh catches that exception itself and then reraises his own exception. Is there a way for me to check if my original exception was raised somehow?
For example when I run this code (this code is expected to raise that exception):
sh.python3(
self.main_py_path,
self.live_cfg_path,
self.workflow_cfg_path)
I get this exception instead:
Traceback (most recent call last):
File "/home/oerp/src/devops-tools/tests/test_main.py", line 153, in test_full_workflow_1
self.workflow_cfg_path)
File "/usr/local/lib/python3.5/dist-packages/sh.py", line 1427, in __call__
return RunningCommand(cmd, call_args, stdin, stdout, stderr)
File "/usr/local/lib/python3.5/dist-packages/sh.py", line 774, in __init__
self.wait()
File "/usr/local/lib/python3.5/dist-packages/sh.py", line 792, in wait
self.handle_command_exit_code(exit_code)
File "/usr/local/lib/python3.5/dist-packages/sh.py", line 815, in handle_command_exit_code
raise exc
sh.ErrorReturnCode_1:
RAN: /usr/bin/python3 /home/oerp/src/devops-tools/main.py /home/oerp/src/devops-tools/tests/configs/__live__.py /home/oerp/src/devops-tools/tests/configs/__workflow__.py
STDOUT:
STDERR:
Traceback (most recent call last):
File "/home/oerp/src/devops-tools/main.py", line 204, in <module>
state = _get_state(args.state, ignore_state=args.ignore_state)
File "/home/oerp/src/devops-tools/main.py", line 68, in _get_state
"__state__.py file must be provided if --ignore-state flag "
exceptions.ArgumentsException: __state__.py file must be provided if --ignore-state flag is not used.
Well I can do something like:
self.assertTrue('ArgumentsException' in str(e.stderr))
But maybe there is more elegant way to check my exception?

Categories