Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues while running Apache Spark streaming applications on Google Dataproc cluster | OutOfMemoryError Java heap space #1026

Open
Sujay39 opened this issue Jul 3, 2023 · 0 comments

Comments

@Sujay39
Copy link

Sujay39 commented Jul 3, 2023

Background:

Cluster:

High availability Google Dataproc cluster is created to run Apache Spark Streaming applications.
The input to these applications is Kafka topic with 12 partitions on n node cluster which is hosted on Google Compute instances. The throughput on this topic is approximately 5k events per minute.

Application:

The application processes the events and stores the data in Google Cloud Storage bucket. 1.5GB memory is allocated for the driver and the executor of the Spark Application. Running Spark version is 3.3.3.
Also, Google Cloud Storage location is given as the checkpoint location.

Issue:

While writing checkpoint information and some metadata information the Spark application runs out of memory and crashes. The stack trace for a couple of occurrences:

  1. Caused by: java.lang.OutOfMemoryError: Java heap space at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:75) ~[?:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createOutputStream(GoogleHadoopOutputStream.java:90) ~[gcs-connector-hadoop3-2.2.14.jar:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:71) ~[gcs-connector-hadoop3-2.2.14.jar:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:616) ~[gcs-connector-hadoop3-2.2.14.jar:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS.createInternal(GoogleHadoopFS.java:98) ~[gcs-connector-hadoop3-2.2.14.jar:?] at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FileContext.create(FileContext.java:703) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:140) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:143) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.$anonfun$addNewBatchByStream$2(HDFSMetadataLog.scala:173) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$Lambda$2548/0x000000010135b840.apply$mcZ$sp(Unknown Source) ~[?:?] at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) ~[scala-library-2.12.14.jar:?] at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.14.jar:?] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.addNewBatchByStream(HDFSMetadataLog.scala:171) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:116) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$18(MicroBatchExecution.scala:675) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$Lambda$3985/0x00000001018e3440.apply$mcV$sp(Unknown Source) ~[?:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.14.jar:?] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:687) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:672) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:255) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$Lambda$2228/0x0000000101209040.apply$mcV$sp(Unknown Source) ~[?:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.14.jar:?] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:218) ~[spark-sql_2.12-3.3.0.jar:3.3.0] 23/06/27 19:03:37 ERROR Utils: uncaught error in thread spark-listener-group-shared, stopping SparkContext

  2. java.lang.OutOfMemoryError: Java heap space at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:75) ~[?:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createOutputStream(GoogleHadoopOutputStream.java:90) ~[gcs-connector-hadoop3-2.2.14.jar:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:71) ~[gcs-connector-hadoop3-2.2.14.jar:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:616) ~[gcs-connector-hadoop3-2.2.14.jar:?] at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS.createInternal(GoogleHadoopFS.java:98) ~[gcs-connector-hadoop3-2.2.14.jar:?] at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:626) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:701) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:697) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.hadoop.fs.FileContext.create(FileContext.java:703) ~[hadoop-client-api-3.3.3.jar:?] at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:327) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:140) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:143) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:333) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.$anonfun$addNewBatchByStream$2(HDFSMetadataLog.scala:173) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$Lambda$2473/0x000000010131a440.apply$mcZ$sp(Unknown Source) ~[?:?] at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) ~[scala-library-2.12.14.jar:?] at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.14.jar:?] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.addNewBatchByStream(HDFSMetadataLog.scala:171) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:116) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$18(MicroBatchExecution.scala:675) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$Lambda$3827/0x000000010184f840.apply$mcV$sp(Unknown Source) ~[?:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.14.jar:?] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:687) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:672) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:255) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$Lambda$2122/0x00000001011b1840.apply$mcV$sp(Unknown Source) ~[?:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.14.jar:?] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:375) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:373) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68) ~[spark-sql_2.12-3.3.0.jar:3.3.0] at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:218) ~[spark-sql_2.12-3.3.0.jar:3.3.0] Exception in thread "stream execution thread for [id = 2be4acf0-1a9e-4dcc-9db2-addc0de7e89f, runId = 37627391-5ef0-42e0-a787-b7ce2ecb8feb]" java.lang.OutOfMemoryError: Java heap space

Trouble shooting attempted:

  1. Fine tuned the configs given in this documentation.
  2. Tried the solutions suggested here.

However these steps have not solved the out-of-memory error.

Any suggestion to solve this is highly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant