In the early stages of atrial fibrillation (AF), most cases are paroxysmal (pAF), making identification only possible with continuous and prolonged monitoring. With the advent of wearables, smartwatches equipped with photoplethysmographic (PPG) sensors are an ideal approach for continuous monitoring of pAF. There have been numerous studies demonstrating successful capture of pAF events, especially using deep learning. However, deep learning requires a large amount of data and independent testing on diverse datasets, to ensure the generalizability of the model, and most prior studies did not meet these requirements. Moreover, most prior studies using wearable-based PPG sensor data collection were limited either to controlled environments, to minimize motion artifacts, or to short duration data collection. Most importantly, frequent premature atrial and ventricular contractions (PAC/PVC) can confound most AF detection algorithms. This has not been well studied, largely due to limited datasets containing these rhythms. Note that the recent deep learning models show 97% AF detection accuracy, and the sensitivity of the current state-of-the-art technique for PAC/PVC detection is only 75% on minimally motion artifact corrupted PPG data. Our study aims to address the above limitations using a recently completed NIH-funded Pulsewatch clinical trial which collected smartwatch PPG data over two weeks from 106 subjects. For our approach, we used multi-modal data which included 1D PPG, accelerometer, and heart rate data. We used a computationally efficient 1D bi-directional Gated Recurrent Unit (1D-Bi-GRU) deep learning model to detect three classes: normal sinus rhythm, AF, and PAC/PVC. Our proposed 1D-Bi-GRU model's performance was compared with two other deep learning models that have reported some of the highest performance metrics, in prior work. For three-arrhythmia-classification, testing data for all deep learning models consisted of using independent data and subjects from the training data, and further evaluations were performed using two independent datasets that were not part of the training dataset. Our multimodal model achieved an unprecedented 83% sensitivity for PAC/PVC detection while maintaining a high accuracy of 97.31% for AF detection. Our model was computationally more efficient (14 times more efficient and 2.7 times faster) and outperformed the best state-of-the-art model by 20.81% for PAC/PVC sensitivity and 2.55% for AF accuracy. We also tested our models on two independent PPG datasets collected with a different smartwatch and a fingertip PPG sensor. Our three-arrhythmia-classification results show high macro-averaged area under the receiver operating characteristic curve values of 96.22%, and 94.17% for two independent datasets, demonstrating better generalizability of the proposed model.