Char/Misc patches for 5.2-rc1 - part 2

Here is the "real" big set of char/misc driver patches for 5.2-rc1 Loads of different driver subsystem stuff in here, all over the places: - thunderbolt driver updates - habanalabs driver updates - nvmem driver updates - extcon driver updates - intel_th driver updates - mei driver updates - coresight driver updates - soundwire driver cleanups and updates - fastrpc driver updates - other minor driver updates - chardev minor fixups Feels like this tree is getting to be a dumping ground of "small driver subsystems" these days. Which is fine with me, if it makes things easier for those subsystem maintainers. All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iGwEABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXNHE2w8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykvyQCYj5vSHQ88yEU+bzwGzQQLOBWYIwCgm5Iku0Y3 f6V3MvRffg4qUp3cGbU= =R37j -----END PGP SIGNATURE----- Merge tag 'char-misc-5.2-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc update part 2 from Greg KH: "Here is the "real" big set of char/misc driver patches for 5.2-rc1 Loads of different driver subsystem stuff in here, all over the places: - thunderbolt driver updates - habanalabs driver updates - nvmem driver updates - extcon driver updates - intel_th driver updates - mei driver updates - coresight driver updates - soundwire driver cleanups and updates - fastrpc driver updates - other minor driver updates - chardev minor fixups Feels like this tree is getting to be a dumping ground of "small driver subsystems" these days. Which is fine with me, if it makes things easier for those subsystem maintainers. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.2-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (255 commits) intel_th: msu: Add current window tracking intel_th: msu: Add a sysfs attribute to trigger window switch intel_th: msu: Correct the block wrap detection intel_th: Add switch triggering support intel_th: gth: Factor out trace start/stop intel_th: msu: Factor out pipeline draining intel_th: msu: Switch over to scatterlist intel_th: msu: Replace open-coded list_{first,last,next}_entry variants intel_th: Only report useful IRQs to subdevices intel_th: msu: Start handling IRQs intel_th: pci: Use MSI interrupt signalling intel_th: Communicate IRQ via resource intel_th: Add "rtit" source device intel_th: Skip subdevices if their MMIO is missing intel_th: Rework resource passing between glue layers and core intel_th: SPDX-ify the documentation intel_th: msu: Fix single mode with IOMMU coresight: funnel: Support static funnel dt-bindings: arm: coresight: Unify funnel DT binding coresight: replicator: Add new device id for static replicator ...
2019-05-07 13:39:22 -07:00 · 2019-05-07 13:39:22 -07:00 · f678d6da74
parent 2310673c3c aad14ad3cf
commit f678d6da74
274 changed files with 10437 additions and 4362 deletions
--- a/Documentation/ABI/stable/sysfs-bus-nvmem
+++ b/Documentation/ABI/stable/sysfs-bus-nvmem
@ -6,6 +6,8 @@ Description:
 		This file allows user to read/write the raw NVMEM contents.
 		Permissions for write to this file depends on the nvmem
 		provider configuration.
+		Note: This file is only present if CONFIG_NVMEM_SYSFS
+		is enabled

 		ex:
 		hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
--- a/Documentation/ABI/testing/sysfs-bus-intel_th-devices-msc
+++ b/Documentation/ABI/testing/sysfs-bus-intel_th-devices-msc
@ -30,4 +30,12 @@ Description:	(RW) Configure MSC buffer size for "single" or "multi" modes.
 		there are no active users and tracing is not enabled) and then
 		allocates a new one.

+What:		/sys/bus/intel_th/devices/<intel_th_id>-msc<msc-id>/win_switch
+Date:		May 2019
+KernelVersion:	5.2
+Contact:	Alexander Shishkin <alexander.shishkin@linux.intel.com>
+Description:	(RW) Trigger window switch for the MSC's buffer, in
+		multi-window mode. In "multi" mode, accepts writes of "1", thereby
+		triggering a window switch for the buffer. Returns an error in any
+		other operating mode or attempts to write something other than "1".

--- a/Documentation/ABI/testing/sysfs-class-mei
+++ b/Documentation/ABI/testing/sysfs-class-mei
@ -65,3 +65,18 @@ Description:	Display the ME firmware version.
 		<platform>:<major>.<minor>.<milestone>.<build_no>.
 		There can be up to three such blocks for different
 		FW components.
+
+What:		/sys/class/mei/meiN/dev_state
+Date:		Mar 2019
+KernelVersion:	5.1
+Contact:	Tomas Winkler <tomas.winkler@intel.com>
+Description:	Display the ME device state.
+
+		The device state can have following values:
+		INITIALIZING
+		INIT_CLIENTS
+		ENABLED
+		RESETTING
+		DISABLED
+		POWER_DOWN
+		POWER_UP
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@ -8,7 +8,8 @@ through the intermediate links connecting the source to the currently selected
 sink. Each CoreSight component device should use these properties to describe
 its hardware characteristcs.

-* Required properties for all components *except* non-configurable replicators:
+* Required properties for all components *except* non-configurable replicators
+  and non-configurable funnels:

 	* compatible: These have to be supplemented with "arm,primecell" as
 	  drivers are using the AMBA bus interface.  Possible values include:
@ -24,8 +25,10 @@ its hardware characteristcs.
 		  discovered at boot time when the device is probed.
 			"arm,coresight-tmc", "arm,primecell";

-		- Trace Funnel:
-			"arm,coresight-funnel", "arm,primecell";
+		- Trace Programmable Funnel:
+			"arm,coresight-dynamic-funnel", "arm,primecell";
+			"arm,coresight-funnel", "arm,primecell"; (OBSOLETE. For
+				backward compatibility and will be removed)

 		- Embedded Trace Macrocell (version 3.x) and
 					Program Flow Trace Macrocell:
@ -65,11 +68,17 @@ its hardware characteristcs.
 	  "stm-stimulus-base", each corresponding to the areas defined in "reg".

 * Required properties for devices that don't show up on the AMBA bus, such as
-  non-configurable replicators:
+  non-configurable replicators and non-configurable funnels:

 	* compatible: Currently supported value is (note the absence of the
 	  AMBA markee):
-		- "arm,coresight-replicator"
+		- Coresight Non-configurable Replicator:
+			"arm,coresight-static-replicator";
+			"arm,coresight-replicator"; (OBSOLETE. For backward
+				compatibility and will be removed)
+
+		- Coresight Non-configurable Funnel:
+			"arm,coresight-static-funnel";

 	* port or ports: see "Graph bindings for Coresight" below.

@ -169,7 +178,7 @@ Example:
 		/* non-configurable replicators don't show up on the
 		 * AMBA bus.  As such no need to add "arm,primecell".
 		 */
-		compatible = "arm,coresight-replicator";
+		compatible = "arm,coresight-static-replicator";

 		out-ports {
 			#address-cells = <1>;
@ -200,8 +209,45 @@ Example:
 		};
 	};

+	funnel {
+		/*
+		 * non-configurable funnel don't show up on the AMBA
+		 * bus.  As such no need to add "arm,primecell".
+		 */
+		compatible = "arm,coresight-static-funnel";
+		clocks = <&crg_ctrl HI3660_PCLK>;
+		clock-names = "apb_pclk";
+
+		out-ports {
+			port {
+				combo_funnel_out: endpoint {
+					remote-endpoint = <&top_funnel_in>;
+				};
+			};
+		};
+
+		in-ports {
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			port@0 {
+				reg = <0>;
+				combo_funnel_in0: endpoint {
+					remote-endpoint = <&cluster0_etf_out>;
+				};
+			};
+
+			port@1 {
+				reg = <1>;
+				combo_funnel_in1: endpoint {
+					remote-endpoint = <&cluster1_etf_out>;
+				};
+			};
+		};
+	};
+
 	funnel@20040000 {
-		compatible = "arm,coresight-funnel", "arm,primecell";
+		compatible = "arm,coresight-dynamic-funnel", "arm,primecell";
 		reg = <0 0x20040000 0 0x1000>;

 		clocks = <&oscclk6a>;
--- a/Documentation/devicetree/bindings/gnss/u-blox.txt
+++ b/Documentation/devicetree/bindings/gnss/u-blox.txt
@ -9,6 +9,7 @@ Required properties:

 - compatible	: Must be one of

+			"u-blox,neo-6m"
 			"u-blox,neo-8"
 			"u-blox,neo-m8"

--- a/Documentation/devicetree/bindings/misc/aspeed-p2a-ctrl.txt
+++ b/Documentation/devicetree/bindings/misc/aspeed-p2a-ctrl.txt
@ -0,0 +1,47 @@
+======================================================================
+Device tree bindings for Aspeed AST2400/AST2500 PCI-to-AHB Bridge Control Driver
+======================================================================
+
+The bridge is available on platforms with the VGA enabled on the Aspeed device.
+In this case, the host has access to a 64KiB window into all of the BMC's
+memory.  The BMC can disable this bridge.  If the bridge is enabled, the host
+has read access to all the regions of memory, however the host only has read
+and write access depending on a register controlled by the BMC.
+
+Required properties:
+===================
+
+ - compatible: must be one of:
+	- "aspeed,ast2400-p2a-ctrl"
+	- "aspeed,ast2500-p2a-ctrl"
+
+Optional properties:
+===================
+
+- memory-region: A phandle to a reserved_memory region to be used for the PCI
+		to AHB mapping
+
+The p2a-control node should be the child of a syscon node with the required
+property:
+
+- compatible : Should be one of the following:
+		"aspeed,ast2400-scu", "syscon", "simple-mfd"
+		"aspeed,g4-scu", "syscon", "simple-mfd"
+		"aspeed,ast2500-scu", "syscon", "simple-mfd"
+		"aspeed,g5-scu", "syscon", "simple-mfd"
+
+Example
+===================
+
+g4 Example
+----------
+
+syscon: scu@1e6e2000 {
+	compatible = "aspeed,ast2400-scu", "syscon", "simple-mfd";
+	reg = <0x1e6e2000 0x1a8>;
+
+	p2a: p2a-control {
+		compatible = "aspeed,ast2400-p2a-ctrl";
+		memory-region = <&reserved_memory>;
+	};
+};
--- a/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt
+++ b/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt
@ -8,11 +8,12 @@ Required properties:
  "allwinner,sun8i-h3-sid"
  "allwinner,sun50i-a64-sid"
  "allwinner,sun50i-h5-sid"
+  "allwinner,sun50i-h6-sid"

 - reg: Should contain registers location and length

 = Data cells =
-Are child nodes of qfprom, bindings of which as described in
+Are child nodes of sunxi-sid, bindings of which as described in
 bindings/nvmem/nvmem.txt

 Example for sun4i:
--- a/Documentation/devicetree/bindings/nvmem/imx-ocotp.txt
+++ b/Documentation/devicetree/bindings/nvmem/imx-ocotp.txt
@ -1,7 +1,8 @@
 Freescale i.MX6 On-Chip OTP Controller (OCOTP) device tree bindings

 This binding represents the on-chip eFuse OTP controller found on
-i.MX6Q/D, i.MX6DL/S, i.MX6SL, i.MX6SX, i.MX6UL, i.MX6ULL/ULZ and i.MX6SLL SoCs.
+i.MX6Q/D, i.MX6DL/S, i.MX6SL, i.MX6SX, i.MX6UL, i.MX6ULL/ULZ, i.MX6SLL,
+i.MX7D/S, i.MX7ULP and i.MX8MQ SoCs.

 Required properties:
 - compatible: should be one of
@ -13,6 +14,7 @@ Required properties:
 	"fsl,imx7d-ocotp" (i.MX7D/S),
 	"fsl,imx6sll-ocotp" (i.MX6SLL),
 	"fsl,imx7ulp-ocotp" (i.MX7ULP),
+	"fsl,imx8mq-ocotp" (i.MX8MQ),
 	followed by "syscon".
 - #address-cells : Should be 1
 - #size-cells : Should be 1
--- a/Documentation/devicetree/bindings/nvmem/st,stm32-romem.txt
+++ b/Documentation/devicetree/bindings/nvmem/st,stm32-romem.txt
@ -0,0 +1,31 @@
+STMicroelectronics STM32 Factory-programmed data device tree bindings
+
+This represents STM32 Factory-programmed read only non-volatile area: locked
+flash, OTP, read-only HW regs... This contains various information such as:
+analog calibration data for temperature sensor (e.g. TS_CAL1, TS_CAL2),
+internal vref (VREFIN_CAL), unique device ID...
+
+Required properties:
+- compatible:		Should be one of:
+			"st,stm32f4-otp"
+			"st,stm32mp15-bsec"
+- reg:			Offset and length of factory-programmed area.
+- #address-cells:	Should be '<1>'.
+- #size-cells:		Should be '<1>'.
+
+Optional Data cells:
+- Must be child nodes as described in nvmem.txt.
+
+Example on stm32f4:
+	romem: nvmem@1fff7800 {
+		compatible = "st,stm32f4-otp";
+		reg = <0x1fff7800 0x400>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		/* Data cells: ts_cal1 at 0x1fff7a2c */
+		ts_cal1: calib@22c {
+			reg = <0x22c 0x2>;
+		};
+		...
+	};
--- a/Documentation/trace/intel_th.rst
+++ b/Documentation/trace/intel_th.rst
@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 =======================
 Intel(R) Trace Hub (TH)
 =======================
--- a/1
+++ b/1
@ -8068,6 +8068,7 @@ F:	drivers/gpio/gpio-intel-mid.c

 INTERCONNECT API
 M:	Georgi Djakov <georgi.djakov@linaro.org>
+L:	linux-pm@vger.kernel.org
 S:	Maintained
 F:	Documentation/interconnect/
 F:	Documentation/devicetree/bindings/interconnect/
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@ -3121,6 +3121,7 @@ static void binder_transaction(struct binder_proc *proc,

 	if (target_node && target_node->txn_security_ctx) {
 		u32 secid;
+		size_t added_size;

 		security_task_getsecid(proc->tsk, &secid);
 		ret = security_secid_to_secctx(secid, &secctx, &secctx_sz);
@ -3130,7 +3131,15 @@ static void binder_transaction(struct binder_proc *proc,
 			return_error_line = __LINE__;
 			goto err_get_secctx_failed;
 		}
-		extra_buffers_size += ALIGN(secctx_sz, sizeof(u64));
+		added_size = ALIGN(secctx_sz, sizeof(u64));
+		extra_buffers_size += added_size;
+		if (extra_buffers_size < added_size) {
+			/* integer overflow of extra_buffers_size */
+			return_error = BR_FAILED_REPLY;
+			return_error_param = EINVAL;
+			return_error_line = __LINE__;
+			goto err_bad_extra_size;
+		}
 	}

 	trace_binder_transaction(reply, t, target_node);
@ -3480,6 +3489,7 @@ err_copy_data_failed:
 	t->buffer->transaction = NULL;
 	binder_alloc_free_buf(&target_proc->alloc, t->buffer);
 err_binder_alloc_buf_failed:
+err_bad_extra_size:
 	if (secctx)
 		security_release_secctx(secctx, secctx_sz);
 err_get_secctx_failed:
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@ -973,6 +973,8 @@ static acpi_status hpet_resources(struct acpi_resource *res, void *data)
 	if (ACPI_SUCCESS(status)) {
 		hdp->hd_phys_address = addr.address.minimum;
 		hdp->hd_address = ioremap(addr.address.minimum, addr.address.address_length);
+		if (!hdp->hd_address)
+			return AE_ERROR;

 		if (hpet_is_known(hdp)) {
 			iounmap(hdp->hd_address);
--- a/drivers/extcon/Kconfig
+++ b/drivers/extcon/Kconfig
@ -30,7 +30,7 @@ config EXTCON_ARIZONA

 config EXTCON_AXP288
 	tristate "X-Power AXP288 EXTCON support"
-	depends on MFD_AXP20X && USB_SUPPORT && X86
+	depends on MFD_AXP20X && USB_SUPPORT && X86 && ACPI
 	select USB_ROLE_SWITCH
 	help
 	  Say Y here to enable support for USB peripheral detection
@ -60,6 +60,13 @@ config EXTCON_INTEL_CHT_WC
 	  Say Y here to enable extcon support for charger detection / control
 	  on the Intel Cherrytrail Whiskey Cove PMIC.

+config EXTCON_INTEL_MRFLD
+	tristate "Intel Merrifield Basin Cove PMIC extcon driver"
+	depends on INTEL_SOC_PMIC_MRFLD
+	help
+	  Say Y here to enable extcon support for charger detection / control
+	  on the Intel Merrifield Basin Cove PMIC.
+
 config EXTCON_MAX14577
 	tristate "Maxim MAX14577/77836 EXTCON Support"
 	depends on MFD_MAX14577
--- a/drivers/extcon/Makefile
+++ b/drivers/extcon/Makefile
@ -11,6 +11,7 @@ obj-$(CONFIG_EXTCON_AXP288)	+= extcon-axp288.o
 obj-$(CONFIG_EXTCON_GPIO)	+= extcon-gpio.o
 obj-$(CONFIG_EXTCON_INTEL_INT3496) += extcon-intel-int3496.o
 obj-$(CONFIG_EXTCON_INTEL_CHT_WC) += extcon-intel-cht-wc.o
+obj-$(CONFIG_EXTCON_INTEL_MRFLD) += extcon-intel-mrfld.o
 obj-$(CONFIG_EXTCON_MAX14577)	+= extcon-max14577.o
 obj-$(CONFIG_EXTCON_MAX3355)	+= extcon-max3355.o
 obj-$(CONFIG_EXTCON_MAX77693)	+= extcon-max77693.o
--- a/drivers/extcon/devres.c
+++ b/drivers/extcon/devres.c
@ -205,7 +205,7 @@ EXPORT_SYMBOL(devm_extcon_register_notifier);

 /**
 * devm_extcon_unregister_notifier()
-			- Resource-managed extcon_unregister_notifier()
+ *			- Resource-managed extcon_unregister_notifier()
 * @dev:	the device owning the extcon device being created
 * @edev:	the extcon device
 * @id:		the unique id among the extcon enumeration
--- a/drivers/extcon/extcon-arizona.c
+++ b/drivers/extcon/extcon-arizona.c
@ -1726,6 +1726,16 @@ static int arizona_extcon_remove(struct platform_device *pdev)
 	struct arizona_extcon_info *info = platform_get_drvdata(pdev);
 	struct arizona *arizona = info->arizona;
 	int jack_irq_rise, jack_irq_fall;
+	bool change;
+
+	regmap_update_bits_check(arizona->regmap, ARIZONA_MIC_DETECT_1,
+				 ARIZONA_MICD_ENA, 0,
+				 &change);
+
+	if (change) {
+		regulator_disable(info->micvdd);
+		pm_runtime_put(info->dev);
+	}

 	gpiod_put(info->micd_pol_gpio);

--- a/drivers/extcon/extcon-intel-cht-wc.c
+++ b/drivers/extcon/extcon-intel-cht-wc.c
@ -17,6 +17,8 @@
 #include <linux/regmap.h>
 #include <linux/slab.h>

+#include "extcon-intel.h"
+
 #define CHT_WC_PHYCTRL			0x5e07

 #define CHT_WC_CHGRCTRL0		0x5e16
@ -29,7 +31,15 @@
 #define CHT_WC_CHGRCTRL0_DBPOFF		BIT(6)
 #define CHT_WC_CHGRCTRL0_CHR_WDT_NOKICK	BIT(7)

-#define CHT_WC_CHGRCTRL1		0x5e17
+#define CHT_WC_CHGRCTRL1			0x5e17
+#define CHT_WC_CHGRCTRL1_FUSB_INLMT_100		BIT(0)
+#define CHT_WC_CHGRCTRL1_FUSB_INLMT_150		BIT(1)
+#define CHT_WC_CHGRCTRL1_FUSB_INLMT_500		BIT(2)
+#define CHT_WC_CHGRCTRL1_FUSB_INLMT_900		BIT(3)
+#define CHT_WC_CHGRCTRL1_FUSB_INLMT_1500	BIT(4)
+#define CHT_WC_CHGRCTRL1_FTEMP_EVENT		BIT(5)
+#define CHT_WC_CHGRCTRL1_OTGMODE		BIT(6)
+#define CHT_WC_CHGRCTRL1_DBPEN			BIT(7)

 #define CHT_WC_USBSRC			0x5e29
 #define CHT_WC_USBSRC_STS_MASK		GENMASK(1, 0)
@ -48,6 +58,13 @@
 #define CHT_WC_USBSRC_TYPE_OTHER	8
 #define CHT_WC_USBSRC_TYPE_DCP_EXTPHY	9

+#define CHT_WC_CHGDISCTRL		0x5e2f
+#define CHT_WC_CHGDISCTRL_OUT		BIT(0)
+/* 0 - open drain, 1 - regular push-pull output */
+#define CHT_WC_CHGDISCTRL_DRV		BIT(4)
+/* 0 - pin is controlled by SW, 1 - by HW */
+#define CHT_WC_CHGDISCTRL_FN		BIT(6)
+
 #define CHT_WC_PWRSRC_IRQ		0x6e03
 #define CHT_WC_PWRSRC_IRQ_MASK		0x6e0f
 #define CHT_WC_PWRSRC_STS		0x6e1e
@ -65,15 +82,6 @@
 #define CHT_WC_VBUS_GPIO_CTLO_DRV_OD	BIT(4)
 #define CHT_WC_VBUS_GPIO_CTLO_DIR_OUT	BIT(5)

-enum cht_wc_usb_id {
-	USB_ID_OTG,
-	USB_ID_GND,
-	USB_ID_FLOAT,
-	USB_RID_A,
-	USB_RID_B,
-	USB_RID_C,
-};
-
 enum cht_wc_mux_select {
 	MUX_SEL_PMIC = 0,
 	MUX_SEL_SOC,
@ -101,9 +109,9 @@ static int cht_wc_extcon_get_id(struct cht_wc_extcon_data *ext, int pwrsrc_sts)
 {
 	switch ((pwrsrc_sts & CHT_WC_PWRSRC_USBID_MASK) >> CHT_WC_PWRSRC_USBID_SHIFT) {
 	case CHT_WC_PWRSRC_RID_GND:
-		return USB_ID_GND;
+		return INTEL_USB_ID_GND;
 	case CHT_WC_PWRSRC_RID_FLOAT:
-		return USB_ID_FLOAT;
+		return INTEL_USB_ID_FLOAT;
 	case CHT_WC_PWRSRC_RID_ACA:
 	default:
 		/*
@ -111,7 +119,7 @@ static int cht_wc_extcon_get_id(struct cht_wc_extcon_data *ext, int pwrsrc_sts)
 		 * the USBID GPADC channel here and determine ACA role
 		 * based on that.
 		 */
-		return USB_ID_FLOAT;
+		return INTEL_USB_ID_FLOAT;
 	}
 }

@ -198,6 +206,30 @@ static void cht_wc_extcon_set_5v_boost(struct cht_wc_extcon_data *ext,
 		dev_err(ext->dev, "Error writing Vbus GPIO CTLO: %d\n", ret);
 }

+static void cht_wc_extcon_set_otgmode(struct cht_wc_extcon_data *ext,
+				      bool enable)
+{
+	unsigned int val = enable ? CHT_WC_CHGRCTRL1_OTGMODE : 0;
+	int ret;
+
+	ret = regmap_update_bits(ext->regmap, CHT_WC_CHGRCTRL1,
+				 CHT_WC_CHGRCTRL1_OTGMODE, val);
+	if (ret)
+		dev_err(ext->dev, "Error updating CHGRCTRL1 reg: %d\n", ret);
+}
+
+static void cht_wc_extcon_enable_charging(struct cht_wc_extcon_data *ext,
+					  bool enable)
+{
+	unsigned int val = enable ? 0 : CHT_WC_CHGDISCTRL_OUT;
+	int ret;
+
+	ret = regmap_update_bits(ext->regmap, CHT_WC_CHGDISCTRL,
+				 CHT_WC_CHGDISCTRL_OUT, val);
+	if (ret)
+		dev_err(ext->dev, "Error updating CHGDISCTRL reg: %d\n", ret);
+}
+
 /* Small helper to sync EXTCON_CHG_USB_SDP and EXTCON_USB state */
 static void cht_wc_extcon_set_state(struct cht_wc_extcon_data *ext,
 				    unsigned int cable, bool state)
@ -221,11 +253,17 @@ static void cht_wc_extcon_pwrsrc_event(struct cht_wc_extcon_data *ext)
 	}

 	id = cht_wc_extcon_get_id(ext, pwrsrc_sts);
-	if (id == USB_ID_GND) {
+	if (id == INTEL_USB_ID_GND) {
+		cht_wc_extcon_enable_charging(ext, false);
+		cht_wc_extcon_set_otgmode(ext, true);
+
 		/* The 5v boost causes a false VBUS / SDP detect, skip */
 		goto charger_det_done;
 	}

+	cht_wc_extcon_set_otgmode(ext, false);
+	cht_wc_extcon_enable_charging(ext, true);
+
 	/* Plugged into a host/charger or not connected? */
 	if (!(pwrsrc_sts & CHT_WC_PWRSRC_VBUS)) {
 		/* Route D+ and D- to PMIC for future charger detection */
@ -248,7 +286,7 @@ set_state:
 		ext->previous_cable = cable;
 	}

-	ext->usb_host = ((id == USB_ID_GND) || (id == USB_RID_A));
+	ext->usb_host = ((id == INTEL_USB_ID_GND) || (id == INTEL_USB_RID_A));
 	extcon_set_state_sync(ext->edev, EXTCON_USB_HOST, ext->usb_host);
 }

@ -278,6 +316,14 @@ static int cht_wc_extcon_sw_control(struct cht_wc_extcon_data *ext, bool enable)
 {
 	int ret, mask, val;

+	val = enable ? 0 : CHT_WC_CHGDISCTRL_FN;
+	ret = regmap_update_bits(ext->regmap, CHT_WC_CHGDISCTRL,
+				 CHT_WC_CHGDISCTRL_FN, val);
+	if (ret)
+		dev_err(ext->dev,
+			"Error setting sw control for CHGDIS pin: %d\n",
+			ret);
+
 	mask = CHT_WC_CHGRCTRL0_SWCONTROL | CHT_WC_CHGRCTRL0_CCSM_OFF;
 	val = enable ? mask : 0;
 	ret = regmap_update_bits(ext->regmap, CHT_WC_CHGRCTRL0, mask, val);
@ -329,7 +375,10 @@ static int cht_wc_extcon_probe(struct platform_device *pdev)
 	/* Enable sw control */
 	ret = cht_wc_extcon_sw_control(ext, true);
 	if (ret)
-		return ret;
+		goto disable_sw_control;
+
+	/* Disable charging by external battery charger */
+	cht_wc_extcon_enable_charging(ext, false);

 	/* Register extcon device */
 	ret = devm_extcon_dev_register(ext->dev, ext->edev);
--- a/drivers/extcon/extcon-intel-mrfld.c
+++ b/drivers/extcon/extcon-intel-mrfld.c
@ -0,0 +1,284 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * extcon driver for Basin Cove PMIC
+ *
+ * Copyright (c) 2019, Intel Corporation.
+ * Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
+ */
+
+#include <linux/extcon-provider.h>
+#include <linux/interrupt.h>
+#include <linux/mfd/intel_soc_pmic.h>
+#include <linux/mfd/intel_soc_pmic_mrfld.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+
+#include "extcon-intel.h"
+
+#define BCOVE_USBIDCTRL			0x19
+#define BCOVE_USBIDCTRL_ID		BIT(0)
+#define BCOVE_USBIDCTRL_ACA		BIT(1)
+#define BCOVE_USBIDCTRL_ALL	(BCOVE_USBIDCTRL_ID | BCOVE_USBIDCTRL_ACA)
+
+#define BCOVE_USBIDSTS			0x1a
+#define BCOVE_USBIDSTS_GND		BIT(0)
+#define BCOVE_USBIDSTS_RARBRC_MASK	GENMASK(2, 1)
+#define BCOVE_USBIDSTS_RARBRC_SHIFT	1
+#define BCOVE_USBIDSTS_NO_ACA		0
+#define BCOVE_USBIDSTS_R_ID_A		1
+#define BCOVE_USBIDSTS_R_ID_B		2
+#define BCOVE_USBIDSTS_R_ID_C		3
+#define BCOVE_USBIDSTS_FLOAT		BIT(3)
+#define BCOVE_USBIDSTS_SHORT		BIT(4)
+
+#define BCOVE_CHGRIRQ_ALL	(BCOVE_CHGRIRQ_VBUSDET | BCOVE_CHGRIRQ_DCDET | \
+				 BCOVE_CHGRIRQ_BATTDET | BCOVE_CHGRIRQ_USBIDDET)
+
+#define BCOVE_CHGRCTRL0			0x4b
+#define BCOVE_CHGRCTRL0_CHGRRESET	BIT(0)
+#define BCOVE_CHGRCTRL0_EMRGCHREN	BIT(1)
+#define BCOVE_CHGRCTRL0_EXTCHRDIS	BIT(2)
+#define BCOVE_CHGRCTRL0_SWCONTROL	BIT(3)
+#define BCOVE_CHGRCTRL0_TTLCK		BIT(4)
+#define BCOVE_CHGRCTRL0_BIT_5		BIT(5)
+#define BCOVE_CHGRCTRL0_BIT_6		BIT(6)
+#define BCOVE_CHGRCTRL0_CHR_WDT_NOKICK	BIT(7)
+
+struct mrfld_extcon_data {
+	struct device *dev;
+	struct regmap *regmap;
+	struct extcon_dev *edev;
+	unsigned int status;
+	unsigned int id;
+};
+
+static const unsigned int mrfld_extcon_cable[] = {
+	EXTCON_USB,
+	EXTCON_USB_HOST,
+	EXTCON_CHG_USB_SDP,
+	EXTCON_CHG_USB_CDP,
+	EXTCON_CHG_USB_DCP,
+	EXTCON_CHG_USB_ACA,
+	EXTCON_NONE,
+};
+
+static int mrfld_extcon_clear(struct mrfld_extcon_data *data, unsigned int reg,
+			      unsigned int mask)
+{
+	return regmap_update_bits(data->regmap, reg, mask, 0x00);
+}
+
+static int mrfld_extcon_set(struct mrfld_extcon_data *data, unsigned int reg,
+			    unsigned int mask)
+{
+	return regmap_update_bits(data->regmap, reg, mask, 0xff);
+}
+
+static int mrfld_extcon_sw_control(struct mrfld_extcon_data *data, bool enable)
+{
+	unsigned int mask = BCOVE_CHGRCTRL0_SWCONTROL;
+	struct device *dev = data->dev;
+	int ret;
+
+	if (enable)
+		ret = mrfld_extcon_set(data, BCOVE_CHGRCTRL0, mask);
+	else
+		ret = mrfld_extcon_clear(data, BCOVE_CHGRCTRL0, mask);
+	if (ret)
+		dev_err(dev, "can't set SW control: %d\n", ret);
+	return ret;
+}
+
+static int mrfld_extcon_get_id(struct mrfld_extcon_data *data)
+{
+	struct regmap *regmap = data->regmap;
+	unsigned int id;
+	bool ground;
+	int ret;
+
+	ret = regmap_read(regmap, BCOVE_USBIDSTS, &id);
+	if (ret)
+		return ret;
+
+	if (id & BCOVE_USBIDSTS_FLOAT)
+		return INTEL_USB_ID_FLOAT;
+
+	switch ((id & BCOVE_USBIDSTS_RARBRC_MASK) >> BCOVE_USBIDSTS_RARBRC_SHIFT) {
+	case BCOVE_USBIDSTS_R_ID_A:
+		return INTEL_USB_RID_A;
+	case BCOVE_USBIDSTS_R_ID_B:
+		return INTEL_USB_RID_B;
+	case BCOVE_USBIDSTS_R_ID_C:
+		return INTEL_USB_RID_C;
+	}
+
+	/*
+	 * PMIC A0 reports USBIDSTS_GND = 1 for ID_GND,
+	 * but PMIC B0 reports USBIDSTS_GND = 0 for ID_GND.
+	 * Thus we must check this bit at last.
+	 */
+	ground = id & BCOVE_USBIDSTS_GND;
+	switch ('A' + BCOVE_MAJOR(data->id)) {
+	case 'A':
+		return ground ? INTEL_USB_ID_GND : INTEL_USB_ID_FLOAT;
+	case 'B':
+		return ground ? INTEL_USB_ID_FLOAT : INTEL_USB_ID_GND;
+	}
+
+	/* Unknown or unsupported type */
+	return INTEL_USB_ID_FLOAT;
+}
+
+static int mrfld_extcon_role_detect(struct mrfld_extcon_data *data)
+{
+	unsigned int id;
+	bool usb_host;
+	int ret;
+
+	ret = mrfld_extcon_get_id(data);
+	if (ret < 0)
+		return ret;
+
+	id = ret;
+
+	usb_host = (id == INTEL_USB_ID_GND) || (id == INTEL_USB_RID_A);
+	extcon_set_state_sync(data->edev, EXTCON_USB_HOST, usb_host);
+
+	return 0;
+}
+
+static int mrfld_extcon_cable_detect(struct mrfld_extcon_data *data)
+{
+	struct regmap *regmap = data->regmap;
+	unsigned int status, change;
+	int ret;
+
+	/*
+	 * It seems SCU firmware clears the content of BCOVE_CHGRIRQ1
+	 * and makes it useless for OS. Instead we compare a previously
+	 * stored status to the current one, provided by BCOVE_SCHGRIRQ1.
+	 */
+	ret = regmap_read(regmap, BCOVE_SCHGRIRQ1, &status);
+	if (ret)
+		return ret;
+
+	change = status ^ data->status;
+	if (!change)
+		return -ENODATA;
+
+	if (change & BCOVE_CHGRIRQ_USBIDDET) {
+		ret = mrfld_extcon_role_detect(data);
+		if (ret)
+			return ret;
+	}
+
+	data->status = status;
+
+	return 0;
+}
+
+static irqreturn_t mrfld_extcon_interrupt(int irq, void *dev_id)
+{
+	struct mrfld_extcon_data *data = dev_id;
+	int ret;
+
+	ret = mrfld_extcon_cable_detect(data);
+
+	mrfld_extcon_clear(data, BCOVE_MIRQLVL1, BCOVE_LVL1_CHGR);
+
+	return ret ? IRQ_NONE: IRQ_HANDLED;
+}
+
+static int mrfld_extcon_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct intel_soc_pmic *pmic = dev_get_drvdata(dev->parent);
+	struct regmap *regmap = pmic->regmap;
+	struct mrfld_extcon_data *data;
+	unsigned int id;
+	int irq, ret;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->dev = dev;
+	data->regmap = regmap;
+
+	data->edev = devm_extcon_dev_allocate(dev, mrfld_extcon_cable);
+	if (IS_ERR(data->edev))
+		return -ENOMEM;
+
+	ret = devm_extcon_dev_register(dev, data->edev);
+	if (ret < 0) {
+		dev_err(dev, "can't register extcon device: %d\n", ret);
+		return ret;
+	}
+
+	ret = devm_request_threaded_irq(dev, irq, NULL, mrfld_extcon_interrupt,
+					IRQF_ONESHOT | IRQF_SHARED, pdev->name,
+					data);
+	if (ret) {
+		dev_err(dev, "can't register IRQ handler: %d\n", ret);
+		return ret;
+	}
+
+	ret = regmap_read(regmap, BCOVE_ID, &id);
+	if (ret) {
+		dev_err(dev, "can't read PMIC ID: %d\n", ret);
+		return ret;
+	}
+
+	data->id = id;
+
+	ret = mrfld_extcon_sw_control(data, true);
+	if (ret)
+		return ret;
+
+	/* Get initial state */
+	mrfld_extcon_role_detect(data);
+
+	mrfld_extcon_clear(data, BCOVE_MIRQLVL1, BCOVE_LVL1_CHGR);
+	mrfld_extcon_clear(data, BCOVE_MCHGRIRQ1, BCOVE_CHGRIRQ_ALL);
+
+	mrfld_extcon_set(data, BCOVE_USBIDCTRL, BCOVE_USBIDCTRL_ALL);
+
+	platform_set_drvdata(pdev, data);
+
+	return 0;
+}
+
+static int mrfld_extcon_remove(struct platform_device *pdev)
+{
+	struct mrfld_extcon_data *data = platform_get_drvdata(pdev);
+
+	mrfld_extcon_sw_control(data, false);
+
+	return 0;
+}
+
+static const struct platform_device_id mrfld_extcon_id_table[] = {
+	{ .name = "mrfld_bcove_pwrsrc" },
+	{}
+};
+MODULE_DEVICE_TABLE(platform, mrfld_extcon_id_table);
+
+static struct platform_driver mrfld_extcon_driver = {
+	.driver = {
+		.name	= "mrfld_bcove_pwrsrc",
+	},
+	.probe		= mrfld_extcon_probe,
+	.remove		= mrfld_extcon_remove,
+	.id_table	= mrfld_extcon_id_table,
+};
+module_platform_driver(mrfld_extcon_driver);
+
+MODULE_AUTHOR("Andy Shevchenko <andriy.shevchenko@linux.intel.com>");
+MODULE_DESCRIPTION("extcon driver for Intel Merrifield Basin Cove PMIC");
+MODULE_LICENSE("GPL v2");
--- a/drivers/extcon/extcon-intel.h
+++ b/drivers/extcon/extcon-intel.h
@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Header file for Intel extcon hardware
+ *
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ */
+
+#ifndef __EXTCON_INTEL_H__
+#define __EXTCON_INTEL_H__
+
+enum extcon_intel_usb_id {
+	INTEL_USB_ID_OTG,
+	INTEL_USB_ID_GND,
+	INTEL_USB_ID_FLOAT,
+	INTEL_USB_RID_A,
+	INTEL_USB_RID_B,
+	INTEL_USB_RID_C,
+};
+
+#endif	/* __EXTCON_INTEL_H__ */
--- a/drivers/firmware/google/vpd.c
+++ b/drivers/firmware/google/vpd.c
@ -254,7 +254,7 @@ static int vpd_section_destroy(struct vpd_section *sec)

 static int vpd_sections_init(phys_addr_t physaddr)
 {
-	struct vpd_cbmem __iomem *temp;
+	struct vpd_cbmem *temp;
 	struct vpd_cbmem header;
 	int ret = 0;

@ -262,7 +262,7 @@ static int vpd_sections_init(phys_addr_t physaddr)
 	if (!temp)
 		return -ENOMEM;

-	memcpy_fromio(&header, temp, sizeof(struct vpd_cbmem));
+	memcpy(&header, temp, sizeof(struct vpd_cbmem));
 	memunmap(temp);

 	if (header.magic != VPD_CBMEM_MAGIC)
--- a/drivers/gnss/ubx.c
+++ b/drivers/gnss/ubx.c
@ -130,6 +130,7 @@ static void ubx_remove(struct serdev_device *serdev)

 #ifdef CONFIG_OF
 static const struct of_device_id ubx_of_match[] = {
+	{ .compatible = "u-blox,neo-6m" },
 	{ .compatible = "u-blox,neo-8" },
 	{ .compatible = "u-blox,neo-m8" },
 	{},
--- a/drivers/hwtracing/coresight/Kconfig
+++ b/drivers/hwtracing/coresight/Kconfig
@ -75,20 +75,13 @@ config CORESIGHT_SOURCE_ETM4X
 	bool "CoreSight Embedded Trace Macrocell 4.x driver"
 	depends on ARM64
 	select CORESIGHT_LINKS_AND_SINKS
+	select PID_IN_CONTEXTIDR
 	help
 	  This driver provides support for the ETM4.x tracer module, tracing the
 	  instructions that a processor is executing. This is primarily useful
 	  for instruction level tracing. Depending on the implemented version
 	  data tracing may also be available.

-config CORESIGHT_DYNAMIC_REPLICATOR
-	bool "CoreSight Programmable Replicator driver"
-	depends on CORESIGHT_LINKS_AND_SINKS
-	help
-	  This enables support for dynamic CoreSight replicator link driver.
-	  The programmable ATB replicator allows independent filtering of the
-	  trace data based on the traceid.
-
 config CORESIGHT_STM
 	bool "CoreSight System Trace Macrocell driver"
 	depends on (ARM && !(CPU_32v3 || CPU_32v4 || CPU_32v4T)) || ARM64
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@ -15,7 +15,6 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM3X) += coresight-etm3x.o coresight-etm-cp14.o \
 					coresight-etm3x-sysfs.o
 obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
 					coresight-etm4x-sysfs.o
-obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
 obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
 obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
 obj-$(CONFIG_CORESIGHT_CATU) += coresight-catu.o
--- a/drivers/hwtracing/coresight/coresight-catu.c
+++ b/drivers/hwtracing/coresight/coresight-catu.c
@ -485,12 +485,12 @@ static int catu_disable(struct coresight_device *csdev, void *__unused)
 	return rc;
 }

-const struct coresight_ops_helper catu_helper_ops = {
+static const struct coresight_ops_helper catu_helper_ops = {
 	.enable = catu_enable,
 	.disable = catu_disable,
 };

-const struct coresight_ops catu_ops = {
+static const struct coresight_ops catu_ops = {
 	.helper_ops = &catu_helper_ops,
 };

@ -557,8 +557,9 @@ static int catu_probe(struct amba_device *adev, const struct amba_id *id)
 	drvdata->csdev = coresight_register(&catu_desc);
 	if (IS_ERR(drvdata->csdev))
 		ret = PTR_ERR(drvdata->csdev);
+	else
+		pm_runtime_put(&adev->dev);
 out:
-	pm_runtime_put(&adev->dev);
 	return ret;
 }

--- a/drivers/hwtracing/coresight/coresight-catu.h
+++ b/drivers/hwtracing/coresight/coresight-catu.h
@ -109,11 +109,6 @@ static inline bool coresight_is_catu_device(struct coresight_device *csdev)
 	return true;
 }

-#ifdef CONFIG_CORESIGHT_CATU
 extern const struct etr_buf_operations etr_catu_buf_ops;
-#else
-/* Dummy declaration for the CATU ops */
-static const struct etr_buf_operations etr_catu_buf_ops;
-#endif

 #endif
--- a/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-dynamic-replicator.c
@ -1,255 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (c) 2011-2015, The Linux Foundation. All rights reserved.
- */
-
-#include <linux/amba/bus.h>
-#include <linux/clk.h>
-#include <linux/coresight.h>
-#include <linux/device.h>
-#include <linux/err.h>
-#include <linux/init.h>
-#include <linux/io.h>
-#include <linux/kernel.h>
-#include <linux/of.h>
-#include <linux/pm_runtime.h>
-#include <linux/slab.h>
-
-#include "coresight-priv.h"
-
-#define REPLICATOR_IDFILTER0		0x000
-#define REPLICATOR_IDFILTER1		0x004
-
-/**
- * struct replicator_state - specifics associated to a replicator component
- * @base:	memory mapped base address for this component.
- * @dev:	the device entity associated with this component
- * @atclk:	optional clock for the core parts of the replicator.
- * @csdev:	component vitals needed by the framework
- */
-struct replicator_state {
-	void __iomem		*base;
-	struct device		*dev;
-	struct clk		*atclk;
-	struct coresight_device	*csdev;
-};
-
-/*
- * replicator_reset : Reset the replicator configuration to sane values.
- */
-static void replicator_reset(struct replicator_state *drvdata)
-{
-	CS_UNLOCK(drvdata->base);
-
-	if (!coresight_claim_device_unlocked(drvdata->base)) {
-		writel_relaxed(0xff, drvdata->base + REPLICATOR_IDFILTER0);
-		writel_relaxed(0xff, drvdata->base + REPLICATOR_IDFILTER1);
-		coresight_disclaim_device_unlocked(drvdata->base);
-	}
-
-	CS_LOCK(drvdata->base);
-}
-
-static int replicator_enable(struct coresight_device *csdev, int inport,
-			      int outport)
-{
-	int rc = 0;
-	u32 reg;
-	struct replicator_state *drvdata = dev_get_drvdata(csdev->dev.parent);
-
-	switch (outport) {
-	case 0:
-		reg = REPLICATOR_IDFILTER0;
-		break;
-	case 1:
-		reg = REPLICATOR_IDFILTER1;
-		break;
-	default:
-		WARN_ON(1);
-		return -EINVAL;
-	}
-
-	CS_UNLOCK(drvdata->base);
-
-	if ((readl_relaxed(drvdata->base + REPLICATOR_IDFILTER0) == 0xff) &&
-	    (readl_relaxed(drvdata->base + REPLICATOR_IDFILTER1) == 0xff))
-		rc = coresight_claim_device_unlocked(drvdata->base);
-
-	/* Ensure that the outport is enabled. */
-	if (!rc) {
-		writel_relaxed(0x00, drvdata->base + reg);
-		dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
-	}
-
-	CS_LOCK(drvdata->base);
-
-	return rc;
-}
-
-static void replicator_disable(struct coresight_device *csdev, int inport,
-				int outport)
-{
-	u32 reg;
-	struct replicator_state *drvdata = dev_get_drvdata(csdev->dev.parent);
-
-	switch (outport) {
-	case 0:
-		reg = REPLICATOR_IDFILTER0;
-		break;
-	case 1:
-		reg = REPLICATOR_IDFILTER1;
-		break;
-	default:
-		WARN_ON(1);
-		return;
-	}
-
-	CS_UNLOCK(drvdata->base);
-
-	/* disable the flow of ATB data through port */
-	writel_relaxed(0xff, drvdata->base + reg);
-
-	if ((readl_relaxed(drvdata->base + REPLICATOR_IDFILTER0) == 0xff) &&
-	    (readl_relaxed(drvdata->base + REPLICATOR_IDFILTER1) == 0xff))
-		coresight_disclaim_device_unlocked(drvdata->base);
-	CS_LOCK(drvdata->base);
-
-	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
-}
-
-static const struct coresight_ops_link replicator_link_ops = {
-	.enable		= replicator_enable,
-	.disable	= replicator_disable,
-};
-
-static const struct coresight_ops replicator_cs_ops = {
-	.link_ops	= &replicator_link_ops,
-};
-
-#define coresight_replicator_reg(name, offset) \
-	coresight_simple_reg32(struct replicator_state, name, offset)
-
-coresight_replicator_reg(idfilter0, REPLICATOR_IDFILTER0);
-coresight_replicator_reg(idfilter1, REPLICATOR_IDFILTER1);
-
-static struct attribute *replicator_mgmt_attrs[] = {
-	&dev_attr_idfilter0.attr,
-	&dev_attr_idfilter1.attr,
-	NULL,
-};
-
-static const struct attribute_group replicator_mgmt_group = {
-	.attrs = replicator_mgmt_attrs,
-	.name = "mgmt",
-};
-
-static const struct attribute_group *replicator_groups[] = {
-	&replicator_mgmt_group,
-	NULL,
-};
-
-static int replicator_probe(struct amba_device *adev, const struct amba_id *id)
-{
-	int ret;
-	struct device *dev = &adev->dev;
-	struct resource *res = &adev->res;
-	struct coresight_platform_data *pdata = NULL;
-	struct replicator_state *drvdata;
-	struct coresight_desc desc = { 0 };
-	struct device_node *np = adev->dev.of_node;
-	void __iomem *base;
-
-	if (np) {
-		pdata = of_get_coresight_platform_data(dev, np);
-		if (IS_ERR(pdata))
-			return PTR_ERR(pdata);
-		adev->dev.platform_data = pdata;
-	}
-
-	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
-	if (!drvdata)
-		return -ENOMEM;
-
-	drvdata->dev = &adev->dev;
-	drvdata->atclk = devm_clk_get(&adev->dev, "atclk"); /* optional */
-	if (!IS_ERR(drvdata->atclk)) {
-		ret = clk_prepare_enable(drvdata->atclk);
-		if (ret)
-			return ret;
-	}
-
-	/* Validity for the resource is already checked by the AMBA core */
-	base = devm_ioremap_resource(dev, res);
-	if (IS_ERR(base))
-		return PTR_ERR(base);
-
-	drvdata->base = base;
-	dev_set_drvdata(dev, drvdata);
-	pm_runtime_put(&adev->dev);
-
-	desc.type = CORESIGHT_DEV_TYPE_LINK;
-	desc.subtype.link_subtype = CORESIGHT_DEV_SUBTYPE_LINK_SPLIT;
-	desc.ops = &replicator_cs_ops;
-	desc.pdata = adev->dev.platform_data;
-	desc.dev = &adev->dev;
-	desc.groups = replicator_groups;
-	drvdata->csdev = coresight_register(&desc);
-
-	if (!IS_ERR(drvdata->csdev)) {
-		replicator_reset(drvdata);
-		return 0;
-	}
-	return PTR_ERR(drvdata->csdev);
-}
-
-#ifdef CONFIG_PM
-static int replicator_runtime_suspend(struct device *dev)
-{
-	struct replicator_state *drvdata = dev_get_drvdata(dev);
-
-	if (drvdata && !IS_ERR(drvdata->atclk))
-		clk_disable_unprepare(drvdata->atclk);
-
-	return 0;
-}
-
-static int replicator_runtime_resume(struct device *dev)
-{
-	struct replicator_state *drvdata = dev_get_drvdata(dev);
-
-	if (drvdata && !IS_ERR(drvdata->atclk))
-		clk_prepare_enable(drvdata->atclk);
-
-	return 0;
-}
-#endif
-
-static const struct dev_pm_ops replicator_dev_pm_ops = {
-	SET_RUNTIME_PM_OPS(replicator_runtime_suspend,
-			   replicator_runtime_resume,
-			   NULL)
-};
-
-static const struct amba_id replicator_ids[] = {
-	{
-		.id     = 0x000bb909,
-		.mask   = 0x000fffff,
-	},
-	{
-		/* Coresight SoC-600 */
-		.id     = 0x000bb9ec,
-		.mask   = 0x000fffff,
-	},
-	{ 0, 0 },
-};
-
-static struct amba_driver replicator_driver = {
-	.drv = {
-		.name	= "coresight-dynamic-replicator",
-		.pm	= &replicator_dev_pm_ops,
-		.suppress_bind_attrs = true,
-	},
-	.probe		= replicator_probe,
-	.id_table	= replicator_ids,
-};
-builtin_amba_driver(replicator_driver);
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@ -5,6 +5,7 @@
 * Description: CoreSight Embedded Trace Buffer driver
 */

+#include <linux/atomic.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/types.h>
@ -71,6 +72,8 @@
 * @miscdev:	specifics to handle "/dev/xyz.etb" entry.
 * @spinlock:	only one at a time pls.
 * @reading:	synchronise user space access to etb buffer.
+ * @pid:	Process ID of the process being monitored by the session
+ *		that is using this component.
 * @buf:	area of memory where ETB buffer content gets sent.
 * @mode:	this ETB is being used.
 * @buffer_depth: size of @buf.
@ -84,6 +87,7 @@ struct etb_drvdata {
 	struct miscdevice	miscdev;
 	spinlock_t		spinlock;
 	local_t			reading;
+	pid_t			pid;
 	u8			*buf;
 	u32			mode;
 	u32			buffer_depth;
@ -93,17 +97,9 @@ struct etb_drvdata {
 static int etb_set_buffer(struct coresight_device *csdev,
 			  struct perf_output_handle *handle);

-static unsigned int etb_get_buffer_depth(struct etb_drvdata *drvdata)
+static inline unsigned int etb_get_buffer_depth(struct etb_drvdata *drvdata)
 {
-	u32 depth = 0;
-
-	pm_runtime_get_sync(drvdata->dev);
-
-	/* RO registers don't need locking */
-	depth = readl_relaxed(drvdata->base + ETB_RAM_DEPTH_REG);
-
-	pm_runtime_put(drvdata->dev);
-	return depth;
+	return readl_relaxed(drvdata->base + ETB_RAM_DEPTH_REG);
 }

 static void __etb_enable_hw(struct etb_drvdata *drvdata)
@ -159,14 +155,15 @@ static int etb_enable_sysfs(struct coresight_device *csdev)
 		goto out;
 	}

-	/* Nothing to do, the tracer is already enabled. */
-	if (drvdata->mode == CS_MODE_SYSFS)
-		goto out;
+	if (drvdata->mode == CS_MODE_DISABLED) {
+		ret = etb_enable_hw(drvdata);
+		if (ret)
+			goto out;

-	ret = etb_enable_hw(drvdata);
-	if (!ret)
 		drvdata->mode = CS_MODE_SYSFS;
+	}

+	atomic_inc(csdev->refcnt);
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 	return ret;
@ -175,29 +172,52 @@ out:
 static int etb_enable_perf(struct coresight_device *csdev, void *data)
 {
 	int ret = 0;
+	pid_t pid;
 	unsigned long flags;
 	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	struct perf_output_handle *handle = data;

 	spin_lock_irqsave(&drvdata->spinlock, flags);

-	/* No need to continue if the component is already in use. */
-	if (drvdata->mode != CS_MODE_DISABLED) {
+	/* No need to continue if the component is already in used by sysFS. */
+	if (drvdata->mode == CS_MODE_SYSFS) {
 		ret = -EBUSY;
 		goto out;
 	}

+	/* Get a handle on the pid of the process to monitor */
+	pid = task_pid_nr(handle->event->owner);
+
+	if (drvdata->pid != -1 && drvdata->pid != pid) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	/*
+	 * No HW configuration is needed if the sink is already in
+	 * use for this session.
+	 */
+	if (drvdata->pid == pid) {
+		atomic_inc(csdev->refcnt);
+		goto out;
+	}
+
 	/*
 	 * We don't have an internal state to clean up if we fail to setup
 	 * the perf buffer. So we can perform the step before we turn the
 	 * ETB on and leave without cleaning up.
 	 */
-	ret = etb_set_buffer(csdev, (struct perf_output_handle *)data);
+	ret = etb_set_buffer(csdev, handle);
 	if (ret)
 		goto out;

 	ret = etb_enable_hw(drvdata);
-	if (!ret)
+	if (!ret) {
+		/* Associate with monitored process. */
+		drvdata->pid = pid;
 		drvdata->mode = CS_MODE_PERF;
+		atomic_inc(csdev->refcnt);
+	}

 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
@ -325,27 +345,35 @@ static void etb_disable_hw(struct etb_drvdata *drvdata)
 	coresight_disclaim_device(drvdata->base);
 }

-static void etb_disable(struct coresight_device *csdev)
+static int etb_disable(struct coresight_device *csdev)
 {
 	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 	unsigned long flags;

 	spin_lock_irqsave(&drvdata->spinlock, flags);

-	/* Disable the ETB only if it needs to */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		etb_disable_hw(drvdata);
-		drvdata->mode = CS_MODE_DISABLED;
+	if (atomic_dec_return(csdev->refcnt)) {
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		return -EBUSY;
 	}
+
+	/* Complain if we (somehow) got out of sync */
+	WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED);
+	etb_disable_hw(drvdata);
+	/* Dissociate from monitored process. */
+	drvdata->pid = -1;
+	drvdata->mode = CS_MODE_DISABLED;
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);

 	dev_dbg(drvdata->dev, "ETB disabled\n");
+	return 0;
 }

-static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
-			      void **pages, int nr_pages, bool overwrite)
+static void *etb_alloc_buffer(struct coresight_device *csdev,
+			      struct perf_event *event, void **pages,
+			      int nr_pages, bool overwrite)
 {
-	int node;
+	int node, cpu = event->cpu;
 	struct cs_buffers *buf;

 	if (cpu == -1)
@ -404,7 +432,7 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev,
 	const u32 *barrier;
 	u32 read_ptr, write_ptr, capacity;
 	u32 status, read_data;
-	unsigned long offset, to_read;
+	unsigned long offset, to_read = 0, flags;
 	struct cs_buffers *buf = sink_config;
 	struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

@ -413,6 +441,12 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev,

 	capacity = drvdata->buffer_depth * ETB_FRAME_SIZE_WORDS;

+	spin_lock_irqsave(&drvdata->spinlock, flags);
+
+	/* Don't do anything if another tracer is using this sink */
+	if (atomic_read(csdev->refcnt) != 1)
+		goto out;
+
 	__etb_disable_hw(drvdata);
 	CS_UNLOCK(drvdata->base);

@ -523,6 +557,8 @@ static unsigned long etb_update_buffer(struct coresight_device *csdev,
 	}
 	__etb_enable_hw(drvdata);
 	CS_LOCK(drvdata->base);
+out:
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);

 	return to_read;
 }
@ -720,7 +756,6 @@ static int etb_probe(struct amba_device *adev, const struct amba_id *id)
 	spin_lock_init(&drvdata->spinlock);

 	drvdata->buffer_depth = etb_get_buffer_depth(drvdata);
-	pm_runtime_put(&adev->dev);

 	if (drvdata->buffer_depth & 0x80000000)
 		return -EINVAL;
@ -730,6 +765,9 @@ static int etb_probe(struct amba_device *adev, const struct amba_id *id)
 	if (!drvdata->buf)
 		return -ENOMEM;

+	/* This device is not associated with a session */
+	drvdata->pid = -1;
+
 	desc.type = CORESIGHT_DEV_TYPE_SINK;
 	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_BUFFER;
 	desc.ops = &etb_cs_ops;
@ -747,6 +785,7 @@ static int etb_probe(struct amba_device *adev, const struct amba_id *id)
 	if (ret)
 		goto err_misc_register;

+	pm_runtime_put(&adev->dev);
 	return 0;

 err_misc_register:
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@ -29,6 +29,7 @@ static DEFINE_PER_CPU(struct coresight_device *, csdev_src);

 /* ETMv3.5/PTM's ETMCR is 'config' */
 PMU_FORMAT_ATTR(cycacc,		"config:" __stringify(ETM_OPT_CYCACC));
+PMU_FORMAT_ATTR(contextid,	"config:" __stringify(ETM_OPT_CTXTID));
 PMU_FORMAT_ATTR(timestamp,	"config:" __stringify(ETM_OPT_TS));
 PMU_FORMAT_ATTR(retstack,	"config:" __stringify(ETM_OPT_RETSTK));
 /* Sink ID - same for all ETMs */
@ -36,6 +37,7 @@ PMU_FORMAT_ATTR(sinkid,		"config2:0-31");

 static struct attribute *etm_config_formats_attr[] = {
 	&format_attr_cycacc.attr,
+	&format_attr_contextid.attr,
 	&format_attr_timestamp.attr,
 	&format_attr_retstack.attr,
 	&format_attr_sinkid.attr,
@ -118,23 +120,34 @@ out:
 	return ret;
 }

+static void free_sink_buffer(struct etm_event_data *event_data)
+{
+	int cpu;
+	cpumask_t *mask = &event_data->mask;
+	struct coresight_device *sink;
+
+	if (WARN_ON(cpumask_empty(mask)))
+		return;
+
+	if (!event_data->snk_config)
+		return;
+
+	cpu = cpumask_first(mask);
+	sink = coresight_get_sink(etm_event_cpu_path(event_data, cpu));
+	sink_ops(sink)->free_buffer(event_data->snk_config);
+}
+
 static void free_event_data(struct work_struct *work)
 {
 	int cpu;
 	cpumask_t *mask;
 	struct etm_event_data *event_data;
-	struct coresight_device *sink;

 	event_data = container_of(work, struct etm_event_data, work);
 	mask = &event_data->mask;

 	/* Free the sink buffers, if there are any */
-	if (event_data->snk_config && !WARN_ON(cpumask_empty(mask))) {
-		cpu = cpumask_first(mask);
-		sink = coresight_get_sink(etm_event_cpu_path(event_data, cpu));
-		if (sink_ops(sink)->free_buffer)
-			sink_ops(sink)->free_buffer(event_data->snk_config);
-	}
+	free_sink_buffer(event_data);

 	for_each_cpu(cpu, mask) {
 		struct list_head **ppath;
@ -213,7 +226,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 		sink = coresight_get_enabled_sink(true);
 	}

-	if (!sink || !sink_ops(sink)->alloc_buffer)
+	if (!sink)
 		goto err;

 	mask = &event_data->mask;
@ -259,9 +272,12 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	if (cpu >= nr_cpu_ids)
 		goto err;

+	if (!sink_ops(sink)->alloc_buffer || !sink_ops(sink)->free_buffer)
+		goto err;
+
 	/* Allocate the sink buffer for this session */
 	event_data->snk_config =
-			sink_ops(sink)->alloc_buffer(sink, cpu, pages,
+			sink_ops(sink)->alloc_buffer(sink, event, pages,
 						     nr_pages, overwrite);
 	if (!event_data->snk_config)
 		goto err;
@ -566,7 +582,8 @@ static int __init etm_perf_init(void)
 {
 	int ret;

-	etm_pmu.capabilities		= PERF_PMU_CAP_EXCLUSIVE;
+	etm_pmu.capabilities		= (PERF_PMU_CAP_EXCLUSIVE |
+					   PERF_PMU_CAP_ITRACE);

 	etm_pmu.attr_groups		= etm_pmu_attr_groups;
 	etm_pmu.task_ctx_nr		= perf_sw_context;
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@ -138,8 +138,11 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
 			       drvdata->base + TRCCNTVRn(i));
 	}

-	/* Resource selector pair 0 is always implemented and reserved */
-	for (i = 0; i < drvdata->nr_resource * 2; i++)
+	/*
+	 * Resource selector pair 0 is always implemented and reserved.  As
+	 * such start at 2.
+	 */
+	for (i = 2; i < drvdata->nr_resource * 2; i++)
 		writel_relaxed(config->res_ctrl[i],
 			       drvdata->base + TRCRSCTLRn(i));

@ -201,6 +204,91 @@ static void etm4_enable_hw_smp_call(void *info)
 	arg->rc = etm4_enable_hw(arg->drvdata);
 }

+/*
+ * The goal of function etm4_config_timestamp_event() is to configure a
+ * counter that will tell the tracer to emit a timestamp packet when it
+ * reaches zero.  This is done in order to get a more fine grained idea
+ * of when instructions are executed so that they can be correlated
+ * with execution on other CPUs.
+ *
+ * To do this the counter itself is configured to self reload and
+ * TRCRSCTLR1 (always true) used to get the counter to decrement.  From
+ * there a resource selector is configured with the counter and the
+ * timestamp control register to use the resource selector to trigger the
+ * event that will insert a timestamp packet in the stream.
+ */
+static int etm4_config_timestamp_event(struct etmv4_drvdata *drvdata)
+{
+	int ctridx, ret = -EINVAL;
+	int counter, rselector;
+	u32 val = 0;
+	struct etmv4_config *config = &drvdata->config;
+
+	/* No point in trying if we don't have at least one counter */
+	if (!drvdata->nr_cntr)
+		goto out;
+
+	/* Find a counter that hasn't been initialised */
+	for (ctridx = 0; ctridx < drvdata->nr_cntr; ctridx++)
+		if (config->cntr_val[ctridx] == 0)
+			break;
+
+	/* All the counters have been configured already, bail out */
+	if (ctridx == drvdata->nr_cntr) {
+		pr_debug("%s: no available counter found\n", __func__);
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	/*
+	 * Searching for an available resource selector to use, starting at
+	 * '2' since every implementation has at least 2 resource selector.
+	 * ETMIDR4 gives the number of resource selector _pairs_,
+	 * hence multiply by 2.
+	 */
+	for (rselector = 2; rselector < drvdata->nr_resource * 2; rselector++)
+		if (!config->res_ctrl[rselector])
+			break;
+
+	if (rselector == drvdata->nr_resource * 2) {
+		pr_debug("%s: no available resource selector found\n",
+			 __func__);
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	/* Remember what counter we used */
+	counter = 1 << ctridx;
+
+	/*
+	 * Initialise original and reload counter value to the smallest
+	 * possible value in order to get as much precision as we can.
+	 */
+	config->cntr_val[ctridx] = 1;
+	config->cntrldvr[ctridx] = 1;
+
+	/* Set the trace counter control register */
+	val =  0x1 << 16	|  /* Bit 16, reload counter automatically */
+	       0x0 << 7		|  /* Select single resource selector */
+	       0x1;		   /* Resource selector 1, i.e always true */
+
+	config->cntr_ctrl[ctridx] = val;
+
+	val = 0x2 << 16		| /* Group 0b0010 - Counter and sequencers */
+	      counter << 0;	  /* Counter to use */
+
+	config->res_ctrl[rselector] = val;
+
+	val = 0x0 << 7		| /* Select single resource selector */
+	      rselector;	  /* Resource selector */
+
+	config->ts_ctrl = val;
+
+	ret = 0;
+out:
+	return ret;
+}
+
 static int etm4_parse_event_config(struct etmv4_drvdata *drvdata,
 				   struct perf_event *event)
 {
@ -236,9 +324,29 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata,
 		/* TRM: Must program this for cycacc to work */
 		config->ccctlr = ETM_CYC_THRESHOLD_DEFAULT;
 	}
-	if (attr->config & BIT(ETM_OPT_TS))
+	if (attr->config & BIT(ETM_OPT_TS)) {
+		/*
+		 * Configure timestamps to be emitted at regular intervals in
+		 * order to correlate instructions executed on different CPUs
+		 * (CPU-wide trace scenarios).
+		 */
+		ret = etm4_config_timestamp_event(drvdata);
+
+		/*
+		 * No need to go further if timestamp intervals can't
+		 * be configured.
+		 */
+		if (ret)
+			goto out;
+
 		/* bit[11], Global timestamp tracing bit */
 		config->cfg |= BIT(11);
+	}
+
+	if (attr->config & BIT(ETM_OPT_CTXTID))
+		/* bit[6], Context ID tracing bit */
+		config->cfg |= BIT(ETM4_CFG_BIT_CTXTID);
+
 	/* return stack - enable if selected and supported */
 	if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
 		/* bit[12], Return stack enable bit */
--- a/drivers/hwtracing/coresight/coresight-funnel.c
+++ b/drivers/hwtracing/coresight/coresight-funnel.c
@ -12,6 +12,8 @@
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
 #include <linux/coresight.h>
 #include <linux/amba/bus.h>
@ -43,7 +45,7 @@ struct funnel_drvdata {
 	unsigned long		priority;
 };

-static int funnel_enable_hw(struct funnel_drvdata *drvdata, int port)
+static int dynamic_funnel_enable_hw(struct funnel_drvdata *drvdata, int port)
 {
 	u32 functl;
 	int rc = 0;
@ -71,17 +73,19 @@ done:
 static int funnel_enable(struct coresight_device *csdev, int inport,
 			 int outport)
 {
-	int rc;
+	int rc = 0;
 	struct funnel_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

-	rc = funnel_enable_hw(drvdata, inport);
+	if (drvdata->base)
+		rc = dynamic_funnel_enable_hw(drvdata, inport);

 	if (!rc)
 		dev_dbg(drvdata->dev, "FUNNEL inport %d enabled\n", inport);
 	return rc;
 }

-static void funnel_disable_hw(struct funnel_drvdata *drvdata, int inport)
+static void dynamic_funnel_disable_hw(struct funnel_drvdata *drvdata,
+				      int inport)
 {
 	u32 functl;

@ -103,7 +107,8 @@ static void funnel_disable(struct coresight_device *csdev, int inport,
 {
 	struct funnel_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

-	funnel_disable_hw(drvdata, inport);
+	if (drvdata->base)
+		dynamic_funnel_disable_hw(drvdata, inport);

 	dev_dbg(drvdata->dev, "FUNNEL inport %d disabled\n", inport);
 }
@ -177,54 +182,70 @@ static struct attribute *coresight_funnel_attrs[] = {
 };
 ATTRIBUTE_GROUPS(coresight_funnel);

-static int funnel_probe(struct amba_device *adev, const struct amba_id *id)
+static int funnel_probe(struct device *dev, struct resource *res)
 {
 	int ret;
 	void __iomem *base;
-	struct device *dev = &adev->dev;
 	struct coresight_platform_data *pdata = NULL;
 	struct funnel_drvdata *drvdata;
-	struct resource *res = &adev->res;
 	struct coresight_desc desc = { 0 };
-	struct device_node *np = adev->dev.of_node;
+	struct device_node *np = dev->of_node;

 	if (np) {
 		pdata = of_get_coresight_platform_data(dev, np);
 		if (IS_ERR(pdata))
 			return PTR_ERR(pdata);
-		adev->dev.platform_data = pdata;
+		dev->platform_data = pdata;
 	}

+	if (of_device_is_compatible(np, "arm,coresight-funnel"))
+		pr_warn_once("Uses OBSOLETE CoreSight funnel binding\n");
+
 	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
 	if (!drvdata)
 		return -ENOMEM;

-	drvdata->dev = &adev->dev;
-	drvdata->atclk = devm_clk_get(&adev->dev, "atclk"); /* optional */
+	drvdata->dev = dev;
+	drvdata->atclk = devm_clk_get(dev, "atclk"); /* optional */
 	if (!IS_ERR(drvdata->atclk)) {
 		ret = clk_prepare_enable(drvdata->atclk);
 		if (ret)
 			return ret;
 	}
+
+	/*
+	 * Map the device base for dynamic-funnel, which has been
+	 * validated by AMBA core.
+	 */
+	if (res) {
+		base = devm_ioremap_resource(dev, res);
+		if (IS_ERR(base)) {
+			ret = PTR_ERR(base);
+			goto out_disable_clk;
+		}
+		drvdata->base = base;
+		desc.groups = coresight_funnel_groups;
+	}
+
 	dev_set_drvdata(dev, drvdata);

-	/* Validity for the resource is already checked by the AMBA core */
-	base = devm_ioremap_resource(dev, res);
-	if (IS_ERR(base))
-		return PTR_ERR(base);
-
-	drvdata->base = base;
-	pm_runtime_put(&adev->dev);
-
 	desc.type = CORESIGHT_DEV_TYPE_LINK;
 	desc.subtype.link_subtype = CORESIGHT_DEV_SUBTYPE_LINK_MERG;
 	desc.ops = &funnel_cs_ops;
 	desc.pdata = pdata;
 	desc.dev = dev;
-	desc.groups = coresight_funnel_groups;
 	drvdata->csdev = coresight_register(&desc);
+	if (IS_ERR(drvdata->csdev)) {
+		ret = PTR_ERR(drvdata->csdev);
+		goto out_disable_clk;
+	}

-	return PTR_ERR_OR_ZERO(drvdata->csdev);
+	pm_runtime_put(dev);
+
+out_disable_clk:
+	if (ret && !IS_ERR_OR_NULL(drvdata->atclk))
+		clk_disable_unprepare(drvdata->atclk);
+	return ret;
 }

 #ifdef CONFIG_PM
@ -253,7 +274,48 @@ static const struct dev_pm_ops funnel_dev_pm_ops = {
 	SET_RUNTIME_PM_OPS(funnel_runtime_suspend, funnel_runtime_resume, NULL)
 };

-static const struct amba_id funnel_ids[] = {
+static int static_funnel_probe(struct platform_device *pdev)
+{
+	int ret;
+
+	pm_runtime_get_noresume(&pdev->dev);
+	pm_runtime_set_active(&pdev->dev);
+	pm_runtime_enable(&pdev->dev);
+
+	/* Static funnel do not have programming base */
+	ret = funnel_probe(&pdev->dev, NULL);
+
+	if (ret) {
+		pm_runtime_put_noidle(&pdev->dev);
+		pm_runtime_disable(&pdev->dev);
+	}
+
+	return ret;
+}
+
+static const struct of_device_id static_funnel_match[] = {
+	{.compatible = "arm,coresight-static-funnel"},
+	{}
+};
+
+static struct platform_driver static_funnel_driver = {
+	.probe          = static_funnel_probe,
+	.driver         = {
+		.name   = "coresight-static-funnel",
+		.of_match_table = static_funnel_match,
+		.pm	= &funnel_dev_pm_ops,
+		.suppress_bind_attrs = true,
+	},
+};
+builtin_platform_driver(static_funnel_driver);
+
+static int dynamic_funnel_probe(struct amba_device *adev,
+				const struct amba_id *id)
+{
+	return funnel_probe(&adev->dev, &adev->res);
+}
+
+static const struct amba_id dynamic_funnel_ids[] = {
 	{
 		.id     = 0x000bb908,
 		.mask   = 0x000fffff,
@ -266,14 +328,14 @@ static const struct amba_id funnel_ids[] = {
 	{ 0, 0},
 };

-static struct amba_driver funnel_driver = {
+static struct amba_driver dynamic_funnel_driver = {
 	.drv = {
-		.name	= "coresight-funnel",
+		.name	= "coresight-dynamic-funnel",
 		.owner	= THIS_MODULE,
 		.pm	= &funnel_dev_pm_ops,
 		.suppress_bind_attrs = true,
 	},
-	.probe		= funnel_probe,
-	.id_table	= funnel_ids,
+	.probe		= dynamic_funnel_probe,
+	.id_table	= dynamic_funnel_ids,
 };
-builtin_amba_driver(funnel_driver);
+builtin_amba_driver(dynamic_funnel_driver);
--- a/drivers/hwtracing/coresight/coresight-replicator.c
+++ b/drivers/hwtracing/coresight/coresight-replicator.c
@ -1,10 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (c) 2011-2012, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2011-2015, The Linux Foundation. All rights reserved.
 *
 * Description: CoreSight Replicator driver
 */

+#include <linux/amba/bus.h>
 #include <linux/kernel.h>
 #include <linux/device.h>
 #include <linux/platform_device.h>
@ -18,25 +19,117 @@

 #include "coresight-priv.h"

+#define REPLICATOR_IDFILTER0		0x000
+#define REPLICATOR_IDFILTER1		0x004
+
 /**
 * struct replicator_drvdata - specifics associated to a replicator component
+ * @base:	memory mapped base address for this component. Also indicates
+ *		whether this one is programmable or not.
 * @dev:	the device entity associated with this component
 * @atclk:	optional clock for the core parts of the replicator.
 * @csdev:	component vitals needed by the framework
 */
 struct replicator_drvdata {
+	void __iomem		*base;
 	struct device		*dev;
 	struct clk		*atclk;
 	struct coresight_device	*csdev;
 };

+static void dynamic_replicator_reset(struct replicator_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	if (!coresight_claim_device_unlocked(drvdata->base)) {
+		writel_relaxed(0xff, drvdata->base + REPLICATOR_IDFILTER0);
+		writel_relaxed(0xff, drvdata->base + REPLICATOR_IDFILTER1);
+		coresight_disclaim_device_unlocked(drvdata->base);
+	}
+
+	CS_LOCK(drvdata->base);
+}
+
+/*
+ * replicator_reset : Reset the replicator configuration to sane values.
+ */
+static inline void replicator_reset(struct replicator_drvdata *drvdata)
+{
+	if (drvdata->base)
+		dynamic_replicator_reset(drvdata);
+}
+
+static int dynamic_replicator_enable(struct replicator_drvdata *drvdata,
+				     int inport, int outport)
+{
+	int rc = 0;
+	u32 reg;
+
+	switch (outport) {
+	case 0:
+		reg = REPLICATOR_IDFILTER0;
+		break;
+	case 1:
+		reg = REPLICATOR_IDFILTER1;
+		break;
+	default:
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
+	CS_UNLOCK(drvdata->base);
+
+	if ((readl_relaxed(drvdata->base + REPLICATOR_IDFILTER0) == 0xff) &&
+	    (readl_relaxed(drvdata->base + REPLICATOR_IDFILTER1) == 0xff))
+		rc = coresight_claim_device_unlocked(drvdata->base);
+
+	/* Ensure that the outport is enabled. */
+	if (!rc)
+		writel_relaxed(0x00, drvdata->base + reg);
+	CS_LOCK(drvdata->base);
+
+	return rc;
+}
+
 static int replicator_enable(struct coresight_device *csdev, int inport,
 			     int outport)
 {
+	int rc = 0;
 	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

-	dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
-	return 0;
+	if (drvdata->base)
+		rc = dynamic_replicator_enable(drvdata, inport, outport);
+	if (!rc)
+		dev_dbg(drvdata->dev, "REPLICATOR enabled\n");
+	return rc;
+}
+
+static void dynamic_replicator_disable(struct replicator_drvdata *drvdata,
+				       int inport, int outport)
+{
+	u32 reg;
+
+	switch (outport) {
+	case 0:
+		reg = REPLICATOR_IDFILTER0;
+		break;
+	case 1:
+		reg = REPLICATOR_IDFILTER1;
+		break;
+	default:
+		WARN_ON(1);
+		return;
+	}
+
+	CS_UNLOCK(drvdata->base);
+
+	/* disable the flow of ATB data through port */
+	writel_relaxed(0xff, drvdata->base + reg);
+
+	if ((readl_relaxed(drvdata->base + REPLICATOR_IDFILTER0) == 0xff) &&
+	    (readl_relaxed(drvdata->base + REPLICATOR_IDFILTER1) == 0xff))
+		coresight_disclaim_device_unlocked(drvdata->base);
+	CS_LOCK(drvdata->base);
 }

 static void replicator_disable(struct coresight_device *csdev, int inport,
@ -44,6 +137,8 @@ static void replicator_disable(struct coresight_device *csdev, int inport,
 {
 	struct replicator_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

+	if (drvdata->base)
+		dynamic_replicator_disable(drvdata, inport, outport);
 	dev_dbg(drvdata->dev, "REPLICATOR disabled\n");
 }

@ -56,58 +151,110 @@ static const struct coresight_ops replicator_cs_ops = {
 	.link_ops	= &replicator_link_ops,
 };

-static int replicator_probe(struct platform_device *pdev)
+#define coresight_replicator_reg(name, offset) \
+	coresight_simple_reg32(struct replicator_drvdata, name, offset)
+
+coresight_replicator_reg(idfilter0, REPLICATOR_IDFILTER0);
+coresight_replicator_reg(idfilter1, REPLICATOR_IDFILTER1);
+
+static struct attribute *replicator_mgmt_attrs[] = {
+	&dev_attr_idfilter0.attr,
+	&dev_attr_idfilter1.attr,
+	NULL,
+};
+
+static const struct attribute_group replicator_mgmt_group = {
+	.attrs = replicator_mgmt_attrs,
+	.name = "mgmt",
+};
+
+static const struct attribute_group *replicator_groups[] = {
+	&replicator_mgmt_group,
+	NULL,
+};
+
+static int replicator_probe(struct device *dev, struct resource *res)
 {
-	int ret;
-	struct device *dev = &pdev->dev;
+	int ret = 0;
 	struct coresight_platform_data *pdata = NULL;
 	struct replicator_drvdata *drvdata;
 	struct coresight_desc desc = { 0 };
-	struct device_node *np = pdev->dev.of_node;
+	struct device_node *np = dev->of_node;
+	void __iomem *base;

 	if (np) {
 		pdata = of_get_coresight_platform_data(dev, np);
 		if (IS_ERR(pdata))
 			return PTR_ERR(pdata);
-		pdev->dev.platform_data = pdata;
+		dev->platform_data = pdata;
 	}

+	if (of_device_is_compatible(np, "arm,coresight-replicator"))
+		pr_warn_once("Uses OBSOLETE CoreSight replicator binding\n");
+
 	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
 	if (!drvdata)
 		return -ENOMEM;

-	drvdata->dev = &pdev->dev;
-	drvdata->atclk = devm_clk_get(&pdev->dev, "atclk"); /* optional */
+	drvdata->dev = dev;
+	drvdata->atclk = devm_clk_get(dev, "atclk"); /* optional */
 	if (!IS_ERR(drvdata->atclk)) {
 		ret = clk_prepare_enable(drvdata->atclk);
 		if (ret)
 			return ret;
 	}
-	pm_runtime_get_noresume(&pdev->dev);
-	pm_runtime_set_active(&pdev->dev);
-	pm_runtime_enable(&pdev->dev);
-	platform_set_drvdata(pdev, drvdata);
+
+	/*
+	 * Map the device base for dynamic-replicator, which has been
+	 * validated by AMBA core
+	 */
+	if (res) {
+		base = devm_ioremap_resource(dev, res);
+		if (IS_ERR(base)) {
+			ret = PTR_ERR(base);
+			goto out_disable_clk;
+		}
+		drvdata->base = base;
+		desc.groups = replicator_groups;
+	}
+
+	dev_set_drvdata(dev, drvdata);

 	desc.type = CORESIGHT_DEV_TYPE_LINK;
 	desc.subtype.link_subtype = CORESIGHT_DEV_SUBTYPE_LINK_SPLIT;
 	desc.ops = &replicator_cs_ops;
-	desc.pdata = pdev->dev.platform_data;
-	desc.dev = &pdev->dev;
+	desc.pdata = dev->platform_data;
+	desc.dev = dev;
 	drvdata->csdev = coresight_register(&desc);
 	if (IS_ERR(drvdata->csdev)) {
 		ret = PTR_ERR(drvdata->csdev);
-		goto out_disable_pm;
+		goto out_disable_clk;
 	}

-	pm_runtime_put(&pdev->dev);
+	replicator_reset(drvdata);
+	pm_runtime_put(dev);

-	return 0;
-
-out_disable_pm:
-	if (!IS_ERR(drvdata->atclk))
+out_disable_clk:
+	if (ret && !IS_ERR_OR_NULL(drvdata->atclk))
 		clk_disable_unprepare(drvdata->atclk);
-	pm_runtime_put_noidle(&pdev->dev);
-	pm_runtime_disable(&pdev->dev);
+	return ret;
+}
+
+static int static_replicator_probe(struct platform_device *pdev)
+{
+	int ret;
+
+	pm_runtime_get_noresume(&pdev->dev);
+	pm_runtime_set_active(&pdev->dev);
+	pm_runtime_enable(&pdev->dev);
+
+	/* Static replicators do not have programming base */
+	ret = replicator_probe(&pdev->dev, NULL);
+
+	if (ret) {
+		pm_runtime_put_noidle(&pdev->dev);
+		pm_runtime_disable(&pdev->dev);
+	}

 	return ret;
 }
@ -139,18 +286,49 @@ static const struct dev_pm_ops replicator_dev_pm_ops = {
 			   replicator_runtime_resume, NULL)
 };

-static const struct of_device_id replicator_match[] = {
+static const struct of_device_id static_replicator_match[] = {
 	{.compatible = "arm,coresight-replicator"},
+	{.compatible = "arm,coresight-static-replicator"},
 	{}
 };

-static struct platform_driver replicator_driver = {
-	.probe          = replicator_probe,
+static struct platform_driver static_replicator_driver = {
+	.probe          = static_replicator_probe,
 	.driver         = {
-		.name   = "coresight-replicator",
-		.of_match_table = replicator_match,
+		.name   = "coresight-static-replicator",
+		.of_match_table = static_replicator_match,
 		.pm	= &replicator_dev_pm_ops,
 		.suppress_bind_attrs = true,
 	},
 };
-builtin_platform_driver(replicator_driver);
+builtin_platform_driver(static_replicator_driver);
+
+static int dynamic_replicator_probe(struct amba_device *adev,
+				    const struct amba_id *id)
+{
+	return replicator_probe(&adev->dev, &adev->res);
+}
+
+static const struct amba_id dynamic_replicator_ids[] = {
+	{
+		.id     = 0x000bb909,
+		.mask   = 0x000fffff,
+	},
+	{
+		/* Coresight SoC-600 */
+		.id     = 0x000bb9ec,
+		.mask   = 0x000fffff,
+	},
+	{ 0, 0 },
+};
+
+static struct amba_driver dynamic_replicator_driver = {
+	.drv = {
+		.name	= "coresight-dynamic-replicator",
+		.pm	= &replicator_dev_pm_ops,
+		.suppress_bind_attrs = true,
+	},
+	.probe		= dynamic_replicator_probe,
+	.id_table	= dynamic_replicator_ids,
+};
+builtin_amba_driver(dynamic_replicator_driver);
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@ -4,6 +4,7 @@
 * Author: Mathieu Poirier <mathieu.poirier@linaro.org>
 */

+#include <linux/atomic.h>
 #include <linux/circ_buf.h>
 #include <linux/coresight.h>
 #include <linux/perf_event.h>
@ -180,8 +181,10 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 	 * sink is already enabled no memory is needed and the HW need not be
 	 * touched.
 	 */
-	if (drvdata->mode == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS) {
+		atomic_inc(csdev->refcnt);
 		goto out;
+	}

 	/*
 	 * If drvdata::buf isn't NULL, memory was allocated for a previous
@ -200,11 +203,13 @@ static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev)
 	}

 	ret = tmc_etb_enable_hw(drvdata);
-	if (!ret)
+	if (!ret) {
 		drvdata->mode = CS_MODE_SYSFS;
-	else
+		atomic_inc(csdev->refcnt);
+	} else {
 		/* Free up the buffer if we failed to enable */
 		used = false;
+	}
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);

@ -218,6 +223,7 @@ out:
 static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data)
 {
 	int ret = 0;
+	pid_t pid;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 	struct perf_output_handle *handle = data;
@ -228,19 +234,42 @@ static int tmc_enable_etf_sink_perf(struct coresight_device *csdev, void *data)
 		if (drvdata->reading)
 			break;
 		/*
-		 * In Perf mode there can be only one writer per sink.  There
-		 * is also no need to continue if the ETB/ETF is already
-		 * operated from sysFS.
+		 * No need to continue if the ETB/ETF is already operated
+		 * from sysFS.
 		 */
-		if (drvdata->mode != CS_MODE_DISABLED)
+		if (drvdata->mode == CS_MODE_SYSFS) {
+			ret = -EBUSY;
 			break;
+		}
+
+		/* Get a handle on the pid of the process to monitor */
+		pid = task_pid_nr(handle->event->owner);
+
+		if (drvdata->pid != -1 && drvdata->pid != pid) {
+			ret = -EBUSY;
+			break;
+		}

 		ret = tmc_set_etf_buffer(csdev, handle);
 		if (ret)
 			break;
+
+		/*
+		 * No HW configuration is needed if the sink is already in
+		 * use for this session.
+		 */
+		if (drvdata->pid == pid) {
+			atomic_inc(csdev->refcnt);
+			break;
+		}
+
 		ret  = tmc_etb_enable_hw(drvdata);
-		if (!ret)
+		if (!ret) {
+			/* Associate with monitored process. */
+			drvdata->pid = pid;
 			drvdata->mode = CS_MODE_PERF;
+			atomic_inc(csdev->refcnt);
+		}
 	} while (0);
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);

@ -273,26 +302,34 @@ static int tmc_enable_etf_sink(struct coresight_device *csdev,
 	return 0;
 }

-static void tmc_disable_etf_sink(struct coresight_device *csdev)
+static int tmc_disable_etf_sink(struct coresight_device *csdev)
 {
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

 	spin_lock_irqsave(&drvdata->spinlock, flags);
+
 	if (drvdata->reading) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
-		return;
+		return -EBUSY;
 	}

-	/* Disable the TMC only if it needs to */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		tmc_etb_disable_hw(drvdata);
-		drvdata->mode = CS_MODE_DISABLED;
+	if (atomic_dec_return(csdev->refcnt)) {
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		return -EBUSY;
 	}

+	/* Complain if we (somehow) got out of sync */
+	WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED);
+	tmc_etb_disable_hw(drvdata);
+	/* Dissociate from monitored process. */
+	drvdata->pid = -1;
+	drvdata->mode = CS_MODE_DISABLED;
+
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);

 	dev_dbg(drvdata->dev, "TMC-ETB/ETF disabled\n");
+	return 0;
 }

 static int tmc_enable_etf_link(struct coresight_device *csdev,
@ -337,10 +374,11 @@ static void tmc_disable_etf_link(struct coresight_device *csdev,
 	dev_dbg(drvdata->dev, "TMC-ETF disabled\n");
 }

-static void *tmc_alloc_etf_buffer(struct coresight_device *csdev, int cpu,
-				  void **pages, int nr_pages, bool overwrite)
+static void *tmc_alloc_etf_buffer(struct coresight_device *csdev,
+				  struct perf_event *event, void **pages,
+				  int nr_pages, bool overwrite)
 {
-	int node;
+	int node, cpu = event->cpu;
 	struct cs_buffers *buf;

 	if (cpu == -1)
@ -400,7 +438,7 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
 	u32 *buf_ptr;
 	u64 read_ptr, write_ptr;
 	u32 status;
-	unsigned long offset, to_read;
+	unsigned long offset, to_read = 0, flags;
 	struct cs_buffers *buf = sink_config;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

@ -411,6 +449,12 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
 	if (WARN_ON_ONCE(drvdata->mode != CS_MODE_PERF))
 		return 0;

+	spin_lock_irqsave(&drvdata->spinlock, flags);
+
+	/* Don't do anything if another tracer is using this sink */
+	if (atomic_read(csdev->refcnt) != 1)
+		goto out;
+
 	CS_UNLOCK(drvdata->base);

 	tmc_flush_and_stop(drvdata);
@ -504,6 +548,8 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev,
 		to_read = buf->nr_pages << PAGE_SHIFT;
 	}
 	CS_LOCK(drvdata->base);
+out:
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);

 	return to_read;
 }
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@ -4,10 +4,15 @@
 * Author: Mathieu Poirier <mathieu.poirier@linaro.org>
 */

+#include <linux/atomic.h>
 #include <linux/coresight.h>
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
+#include <linux/idr.h>
+#include <linux/mutex.h>
+#include <linux/refcount.h>
 #include <linux/slab.h>
+#include <linux/types.h>
 #include <linux/vmalloc.h>
 #include "coresight-catu.h"
 #include "coresight-etm-perf.h"
@ -23,14 +28,18 @@ struct etr_flat_buf {

 /*
 * etr_perf_buffer - Perf buffer used for ETR
+ * @drvdata		- The ETR drvdaga this buffer has been allocated for.
 * @etr_buf		- Actual buffer used by the ETR
+ * @pid			- The PID this etr_perf_buffer belongs to.
 * @snaphost		- Perf session mode
 * @head		- handle->head at the beginning of the session.
 * @nr_pages		- Number of pages in the ring buffer.
 * @pages		- Array of Pages in the ring buffer.
 */
 struct etr_perf_buffer {
+	struct tmc_drvdata	*drvdata;
 	struct etr_buf		*etr_buf;
+	pid_t			pid;
 	bool			snapshot;
 	unsigned long		head;
 	int			nr_pages;
@ -772,7 +781,8 @@ static inline void tmc_etr_disable_catu(struct tmc_drvdata *drvdata)
 static const struct etr_buf_operations *etr_buf_ops[] = {
 	[ETR_MODE_FLAT] = &etr_flat_buf_ops,
 	[ETR_MODE_ETR_SG] = &etr_sg_buf_ops,
-	[ETR_MODE_CATU] = &etr_catu_buf_ops,
+	[ETR_MODE_CATU] = IS_ENABLED(CONFIG_CORESIGHT_CATU)
+						? &etr_catu_buf_ops : NULL,
 };

 static inline int tmc_etr_mode_alloc_buf(int mode,
@ -786,7 +796,7 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
 	case ETR_MODE_FLAT:
 	case ETR_MODE_ETR_SG:
 	case ETR_MODE_CATU:
-		if (etr_buf_ops[mode]->alloc)
+		if (etr_buf_ops[mode] && etr_buf_ops[mode]->alloc)
 			rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf,
 						      node, pages);
 		if (!rc)
@ -1124,8 +1134,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	 * sink is already enabled no memory is needed and the HW need not be
 	 * touched, even if the buffer size has changed.
 	 */
-	if (drvdata->mode == CS_MODE_SYSFS)
+	if (drvdata->mode == CS_MODE_SYSFS) {
+		atomic_inc(csdev->refcnt);
 		goto out;
+	}

 	/*
 	 * If we don't have a buffer or it doesn't match the requested size,
@ -1138,8 +1150,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 	}

 	ret = tmc_etr_enable_hw(drvdata, drvdata->sysfs_buf);
-	if (!ret)
+	if (!ret) {
 		drvdata->mode = CS_MODE_SYSFS;
+		atomic_inc(csdev->refcnt);
+	}
 out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);

@ -1154,23 +1168,23 @@ out:
 }

 /*
- * tmc_etr_setup_perf_buf: Allocate ETR buffer for use by perf.
+ * alloc_etr_buf: Allocate ETR buffer for use by perf.
 * The size of the hardware buffer is dependent on the size configured
 * via sysfs and the perf ring buffer size. We prefer to allocate the
 * largest possible size, scaling down the size by half until it
 * reaches a minimum limit (1M), beyond which we give up.
 */
-static struct etr_perf_buffer *
-tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages,
-		       void **pages, bool snapshot)
+static struct etr_buf *
+alloc_etr_buf(struct tmc_drvdata *drvdata, struct perf_event *event,
+	      int nr_pages, void **pages, bool snapshot)
 {
+	int node, cpu = event->cpu;
 	struct etr_buf *etr_buf;
-	struct etr_perf_buffer *etr_perf;
 	unsigned long size;

-	etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node);
-	if (!etr_perf)
-		return ERR_PTR(-ENOMEM);
+	if (cpu == -1)
+		cpu = smp_processor_id();
+	node = cpu_to_node(cpu);

 	/*
 	 * Try to match the perf ring buffer size if it is larger
@ -1195,32 +1209,160 @@ tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, int node, int nr_pages,
 		size /= 2;
 	} while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);

+	return ERR_PTR(-ENOMEM);
+
+done:
+	return etr_buf;
+}
+
+static struct etr_buf *
+get_perf_etr_buf_cpu_wide(struct tmc_drvdata *drvdata,
+			  struct perf_event *event, int nr_pages,
+			  void **pages, bool snapshot)
+{
+	int ret;
+	pid_t pid = task_pid_nr(event->owner);
+	struct etr_buf *etr_buf;
+
+retry:
+	/*
+	 * An etr_perf_buffer is associated with an event and holds a reference
+	 * to the AUX ring buffer that was created for that event.  In CPU-wide
+	 * N:1 mode multiple events (one per CPU), each with its own AUX ring
+	 * buffer, share a sink.  As such an etr_perf_buffer is created for each
+	 * event but a single etr_buf associated with the ETR is shared between
+	 * them.  The last event in a trace session will copy the content of the
+	 * etr_buf to its AUX ring buffer.  Ring buffer associated to other
+	 * events are simply not used an freed as events are destoyed.  We still
+	 * need to allocate a ring buffer for each event since we don't know
+	 * which event will be last.
+	 */
+
+	/*
+	 * The first thing to do here is check if an etr_buf has already been
+	 * allocated for this session.  If so it is shared with this event,
+	 * otherwise it is created.
+	 */
+	mutex_lock(&drvdata->idr_mutex);
+	etr_buf = idr_find(&drvdata->idr, pid);
+	if (etr_buf) {
+		refcount_inc(&etr_buf->refcount);
+		mutex_unlock(&drvdata->idr_mutex);
+		return etr_buf;
+	}
+
+	/* If we made it here no buffer has been allocated, do so now. */
+	mutex_unlock(&drvdata->idr_mutex);
+
+	etr_buf = alloc_etr_buf(drvdata, event, nr_pages, pages, snapshot);
+	if (IS_ERR(etr_buf))
+		return etr_buf;
+
+	refcount_set(&etr_buf->refcount, 1);
+
+	/* Now that we have a buffer, add it to the IDR. */
+	mutex_lock(&drvdata->idr_mutex);
+	ret = idr_alloc(&drvdata->idr, etr_buf, pid, pid + 1, GFP_KERNEL);
+	mutex_unlock(&drvdata->idr_mutex);
+
+	/* Another event with this session ID has allocated this buffer. */
+	if (ret == -ENOSPC) {
+		tmc_free_etr_buf(etr_buf);
+		goto retry;
+	}
+
+	/* The IDR can't allocate room for a new session, abandon ship. */
+	if (ret == -ENOMEM) {
+		tmc_free_etr_buf(etr_buf);
+		return ERR_PTR(ret);
+	}
+
+
+	return etr_buf;
+}
+
+static struct etr_buf *
+get_perf_etr_buf_per_thread(struct tmc_drvdata *drvdata,
+			    struct perf_event *event, int nr_pages,
+			    void **pages, bool snapshot)
+{
+	struct etr_buf *etr_buf;
+
+	/*
+	 * In per-thread mode the etr_buf isn't shared, so just go ahead
+	 * with memory allocation.
+	 */
+	etr_buf = alloc_etr_buf(drvdata, event, nr_pages, pages, snapshot);
+	if (IS_ERR(etr_buf))
+		goto out;
+
+	refcount_set(&etr_buf->refcount, 1);
+out:
+	return etr_buf;
+}
+
+static struct etr_buf *
+get_perf_etr_buf(struct tmc_drvdata *drvdata, struct perf_event *event,
+		 int nr_pages, void **pages, bool snapshot)
+{
+	if (event->cpu == -1)
+		return get_perf_etr_buf_per_thread(drvdata, event, nr_pages,
+						   pages, snapshot);
+
+	return get_perf_etr_buf_cpu_wide(drvdata, event, nr_pages,
+					 pages, snapshot);
+}
+
+static struct etr_perf_buffer *
+tmc_etr_setup_perf_buf(struct tmc_drvdata *drvdata, struct perf_event *event,
+		       int nr_pages, void **pages, bool snapshot)
+{
+	int node, cpu = event->cpu;
+	struct etr_buf *etr_buf;
+	struct etr_perf_buffer *etr_perf;
+
+	if (cpu == -1)
+		cpu = smp_processor_id();
+	node = cpu_to_node(cpu);
+
+	etr_perf = kzalloc_node(sizeof(*etr_perf), GFP_KERNEL, node);
+	if (!etr_perf)
+		return ERR_PTR(-ENOMEM);
+
+	etr_buf = get_perf_etr_buf(drvdata, event, nr_pages, pages, snapshot);
+	if (!IS_ERR(etr_buf))
+		goto done;
+
 	kfree(etr_perf);
 	return ERR_PTR(-ENOMEM);

 done:
+	/*
+	 * Keep a reference to the ETR this buffer has been allocated for
+	 * in order to have access to the IDR in tmc_free_etr_buffer().
+	 */
+	etr_perf->drvdata = drvdata;
 	etr_perf->etr_buf = etr_buf;
+
 	return etr_perf;
 }


 static void *tmc_alloc_etr_buffer(struct coresight_device *csdev,
-				  int cpu, void **pages, int nr_pages,
-				  bool snapshot)
+				  struct perf_event *event, void **pages,
+				  int nr_pages, bool snapshot)
 {
 	struct etr_perf_buffer *etr_perf;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

-	if (cpu == -1)
-		cpu = smp_processor_id();
-
-	etr_perf = tmc_etr_setup_perf_buf(drvdata, cpu_to_node(cpu),
+	etr_perf = tmc_etr_setup_perf_buf(drvdata, event,
 					  nr_pages, pages, snapshot);
 	if (IS_ERR(etr_perf)) {
 		dev_dbg(drvdata->dev, "Unable to allocate ETR buffer\n");
 		return NULL;
 	}

+	etr_perf->pid = task_pid_nr(event->owner);
 	etr_perf->snapshot = snapshot;
 	etr_perf->nr_pages = nr_pages;
 	etr_perf->pages = pages;
@ -1231,9 +1373,33 @@ static void *tmc_alloc_etr_buffer(struct coresight_device *csdev,
 static void tmc_free_etr_buffer(void *config)
 {
 	struct etr_perf_buffer *etr_perf = config;
+	struct tmc_drvdata *drvdata = etr_perf->drvdata;
+	struct etr_buf *buf, *etr_buf = etr_perf->etr_buf;

-	if (etr_perf->etr_buf)
-		tmc_free_etr_buf(etr_perf->etr_buf);
+	if (!etr_buf)
+		goto free_etr_perf_buffer;
+
+	mutex_lock(&drvdata->idr_mutex);
+	/* If we are not the last one to use the buffer, don't touch it. */
+	if (!refcount_dec_and_test(&etr_buf->refcount)) {
+		mutex_unlock(&drvdata->idr_mutex);
+		goto free_etr_perf_buffer;
+	}
+
+	/* We are the last one, remove from the IDR and free the buffer. */
+	buf = idr_remove(&drvdata->idr, etr_perf->pid);
+	mutex_unlock(&drvdata->idr_mutex);
+
+	/*
+	 * Something went very wrong if the buffer associated with this ID
+	 * is not the same in the IDR.  Leak to avoid use after free.
+	 */
+	if (buf && WARN_ON(buf != etr_buf))
+		goto free_etr_perf_buffer;
+
+	tmc_free_etr_buf(etr_perf->etr_buf);
+
+free_etr_perf_buffer:
 	kfree(etr_perf);
 }

@ -1308,6 +1474,13 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
 	struct etr_buf *etr_buf = etr_perf->etr_buf;

 	spin_lock_irqsave(&drvdata->spinlock, flags);
+
+	/* Don't do anything if another tracer is using this sink */
+	if (atomic_read(csdev->refcnt) != 1) {
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		goto out;
+	}
+
 	if (WARN_ON(drvdata->perf_data != etr_perf)) {
 		lost = true;
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
@ -1347,17 +1520,15 @@ out:
 static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
 {
 	int rc = 0;
+	pid_t pid;
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
 	struct perf_output_handle *handle = data;
 	struct etr_perf_buffer *etr_perf = etm_perf_sink_config(handle);

 	spin_lock_irqsave(&drvdata->spinlock, flags);
-	/*
-	 * There can be only one writer per sink in perf mode. If the sink
-	 * is already open in SYSFS mode, we can't use it.
-	 */
-	if (drvdata->mode != CS_MODE_DISABLED || WARN_ON(drvdata->perf_data)) {
+	 /* Don't use this sink if it is already claimed by sysFS */
+	if (drvdata->mode == CS_MODE_SYSFS) {
 		rc = -EBUSY;
 		goto unlock_out;
 	}
@ -1367,11 +1538,34 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
 		goto unlock_out;
 	}

+	/* Get a handle on the pid of the process to monitor */
+	pid = etr_perf->pid;
+
+	/* Do not proceed if this device is associated with another session */
+	if (drvdata->pid != -1 && drvdata->pid != pid) {
+		rc = -EBUSY;
+		goto unlock_out;
+	}
+
 	etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf);
 	drvdata->perf_data = etr_perf;
+
+	/*
+	 * No HW configuration is needed if the sink is already in
+	 * use for this session.
+	 */
+	if (drvdata->pid == pid) {
+		atomic_inc(csdev->refcnt);
+		goto unlock_out;
+	}
+
 	rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);
-	if (!rc)
+	if (!rc) {
+		/* Associate with monitored process. */
+		drvdata->pid = pid;
 		drvdata->mode = CS_MODE_PERF;
+		atomic_inc(csdev->refcnt);
+	}

 unlock_out:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
@ -1392,26 +1586,34 @@ static int tmc_enable_etr_sink(struct coresight_device *csdev,
 	return -EINVAL;
 }

-static void tmc_disable_etr_sink(struct coresight_device *csdev)
+static int tmc_disable_etr_sink(struct coresight_device *csdev)
 {
 	unsigned long flags;
 	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

 	spin_lock_irqsave(&drvdata->spinlock, flags);
+
 	if (drvdata->reading) {
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
-		return;
+		return -EBUSY;
 	}

-	/* Disable the TMC only if it needs to */
-	if (drvdata->mode != CS_MODE_DISABLED) {
-		tmc_etr_disable_hw(drvdata);
-		drvdata->mode = CS_MODE_DISABLED;
+	if (atomic_dec_return(csdev->refcnt)) {
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		return -EBUSY;
 	}

+	/* Complain if we (somehow) got out of sync */
+	WARN_ON_ONCE(drvdata->mode == CS_MODE_DISABLED);
+	tmc_etr_disable_hw(drvdata);
+	/* Dissociate from monitored process. */
+	drvdata->pid = -1;
+	drvdata->mode = CS_MODE_DISABLED;
+
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);

 	dev_dbg(drvdata->dev, "TMC-ETR disabled\n");
+	return 0;
 }

 static const struct coresight_ops_sink tmc_etr_sink_ops = {
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@ -8,10 +8,12 @@
 #include <linux/init.h>
 #include <linux/types.h>
 #include <linux/device.h>
+#include <linux/idr.h>
 #include <linux/io.h>
 #include <linux/err.h>
 #include <linux/fs.h>
 #include <linux/miscdevice.h>
+#include <linux/mutex.h>
 #include <linux/property.h>
 #include <linux/uaccess.h>
 #include <linux/slab.h>
@ -340,6 +342,8 @@ static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata)
 static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 			     u32 devid, void *dev_caps)
 {
+	int rc;
+
 	u32 dma_mask = 0;

 	/* Set the unadvertised capabilities */
@ -369,7 +373,10 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata,
 		dma_mask = 40;
 	}

-	return dma_set_mask_and_coherent(drvdata->dev, DMA_BIT_MASK(dma_mask));
+	rc = dma_set_mask_and_coherent(drvdata->dev, DMA_BIT_MASK(dma_mask));
+	if (rc)
+		dev_err(drvdata->dev, "Failed to setup DMA mask: %d\n", rc);
+	return rc;
 }

 static int tmc_probe(struct amba_device *adev, const struct amba_id *id)
@ -415,6 +422,8 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id)
 	devid = readl_relaxed(drvdata->base + CORESIGHT_DEVID);
 	drvdata->config_type = BMVAL(devid, 6, 7);
 	drvdata->memwidth = tmc_get_memwidth(devid);
+	/* This device is not associated with a session */
+	drvdata->pid = -1;

 	if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) {
 		if (np)
@ -427,8 +436,6 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id)
 		drvdata->size = readl_relaxed(drvdata->base + TMC_RSZ) * 4;
 	}

-	pm_runtime_put(&adev->dev);
-
 	desc.pdata = pdata;
 	desc.dev = dev;
 	desc.groups = coresight_tmc_groups;
@ -447,6 +454,8 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id)
 					 coresight_get_uci_data(id));
 		if (ret)
 			goto out;
+		idr_init(&drvdata->idr);
+		mutex_init(&drvdata->idr_mutex);
 		break;
 	case TMC_CONFIG_TYPE_ETF:
 		desc.type = CORESIGHT_DEV_TYPE_LINKSINK;
@ -471,6 +480,8 @@ static int tmc_probe(struct amba_device *adev, const struct amba_id *id)
 	ret = misc_register(&drvdata->miscdev);
 	if (ret)
 		coresight_unregister(drvdata->csdev);
+	else
+		pm_runtime_put(&adev->dev);
 out:
 	return ret;
 }
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@ -8,7 +8,10 @@
 #define _CORESIGHT_TMC_H

 #include <linux/dma-mapping.h>
+#include <linux/idr.h>
 #include <linux/miscdevice.h>
+#include <linux/mutex.h>
+#include <linux/refcount.h>

 #define TMC_RSZ			0x004
 #define TMC_STS			0x00c
@ -133,6 +136,7 @@ struct etr_buf_operations;

 /**
 * struct etr_buf - Details of the buffer used by ETR
+ * refcount	; Number of sources currently using this etr_buf.
 * @mode	: Mode of the ETR buffer, contiguous, Scatter Gather etc.
 * @full	: Trace data overflow
 * @size	: Size of the buffer.
@ -143,6 +147,7 @@ struct etr_buf_operations;
 * @private	: Backend specific information for the buf
 */
 struct etr_buf {
+	refcount_t			refcount;
 	enum etr_mode			mode;
 	bool				full;
 	ssize_t				size;
@ -160,6 +165,8 @@ struct etr_buf {
 * @csdev:	component vitals needed by the framework.
 * @miscdev:	specifics to handle "/dev/xyz.tmc" entry.
 * @spinlock:	only one at a time pls.
+ * @pid:	Process ID of the process being monitored by the session
+ *		that is using this component.
 * @buf:	Snapshot of the trace data for ETF/ETB.
 * @etr_buf:	details of buffer used in TMC-ETR
 * @len:	size of the available trace for ETF/ETB.
@ -170,6 +177,8 @@ struct etr_buf {
 * @trigger_cntr: amount of words to store after a trigger.
 * @etr_caps:	Bitmask of capabilities of the TMC ETR, inferred from the
 *		device configuration register (DEVID)
+ * @idr:	Holds etr_bufs allocated for this ETR.
+ * @idr_mutex:	Access serialisation for idr.
 * @perf_data:	PERF buffer for ETR.
 * @sysfs_data:	SYSFS buffer for ETR.
 */
@ -179,6 +188,7 @@ struct tmc_drvdata {
 	struct coresight_device	*csdev;
 	struct miscdevice	miscdev;
 	spinlock_t		spinlock;
+	pid_t			pid;
 	bool			reading;
 	union {
 		char		*buf;		/* TMC ETB */
@ -191,6 +201,8 @@ struct tmc_drvdata {
 	enum tmc_mem_intf_width	memwidth;
 	u32			trigger_cntr;
 	u32			etr_caps;
+	struct idr		idr;
+	struct mutex		idr_mutex;
 	struct etr_buf		*sysfs_buf;
 	void			*perf_data;
 };
--- a/drivers/hwtracing/coresight/coresight-tpiu.c
+++ b/drivers/hwtracing/coresight/coresight-tpiu.c
@ -5,6 +5,7 @@
 * Description: CoreSight Trace Port Interface Unit driver
 */

+#include <linux/atomic.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/device.h>
@ -73,7 +74,7 @@ static int tpiu_enable(struct coresight_device *csdev, u32 mode, void *__unused)
 	struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

 	tpiu_enable_hw(drvdata);
-
+	atomic_inc(csdev->refcnt);
 	dev_dbg(drvdata->dev, "TPIU enabled\n");
 	return 0;
 }
@ -94,13 +95,17 @@ static void tpiu_disable_hw(struct tpiu_drvdata *drvdata)
 	CS_LOCK(drvdata->base);
 }

-static void tpiu_disable(struct coresight_device *csdev)
+static int tpiu_disable(struct coresight_device *csdev)
 {
 	struct tpiu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);

+	if (atomic_dec_return(csdev->refcnt))
+		return -EBUSY;
+
 	tpiu_disable_hw(drvdata);

 	dev_dbg(drvdata->dev, "TPIU disabled\n");
+	return 0;
 }

 static const struct coresight_ops_sink tpiu_sink_ops = {
@ -153,8 +158,6 @@ static int tpiu_probe(struct amba_device *adev, const struct amba_id *id)
 	/* Disable tpiu to support older devices */
 	tpiu_disable_hw(drvdata);

-	pm_runtime_put(&adev->dev);
-
 	desc.type = CORESIGHT_DEV_TYPE_SINK;
 	desc.subtype.sink_subtype = CORESIGHT_DEV_SUBTYPE_SINK_PORT;
 	desc.ops = &tpiu_cs_ops;
@ -162,7 +165,12 @@ static int tpiu_probe(struct amba_device *adev, const struct amba_id *id)
 	desc.dev = dev;
 	drvdata->csdev = coresight_register(&desc);

-	return PTR_ERR_OR_ZERO(drvdata->csdev);
+	if (!IS_ERR(drvdata->csdev)) {
+		pm_runtime_put(&adev->dev);
+		return 0;
+	}
+
+	return PTR_ERR(drvdata->csdev);
 }

 #ifdef CONFIG_PM
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@ -225,26 +225,28 @@ static int coresight_enable_sink(struct coresight_device *csdev,
 	 * We need to make sure the "new" session is compatible with the
 	 * existing "mode" of operation.
 	 */
-	if (sink_ops(csdev)->enable) {
-		ret = sink_ops(csdev)->enable(csdev, mode, data);
-		if (ret)
-			return ret;
-		csdev->enable = true;
-	}
+	if (!sink_ops(csdev)->enable)
+		return -EINVAL;

-	atomic_inc(csdev->refcnt);
+	ret = sink_ops(csdev)->enable(csdev, mode, data);
+	if (ret)
+		return ret;
+	csdev->enable = true;

 	return 0;
 }

 static void coresight_disable_sink(struct coresight_device *csdev)
 {
-	if (atomic_dec_return(csdev->refcnt) == 0) {
-		if (sink_ops(csdev)->disable) {
-			sink_ops(csdev)->disable(csdev);
-			csdev->enable = false;
-		}
-	}
+	int ret;
+
+	if (!sink_ops(csdev)->disable)
+		return;
+
+	ret = sink_ops(csdev)->disable(csdev);
+	if (ret)
+		return;
+	csdev->enable = false;
 }

 static int coresight_enable_link(struct coresight_device *csdev,
@ -973,7 +975,6 @@ static void coresight_device_release(struct device *dev)
 {
 	struct coresight_device *csdev = to_coresight_device(dev);

-	kfree(csdev->conns);
 	kfree(csdev->refcnt);
 	kfree(csdev);
 }
--- a/drivers/hwtracing/intel_th/acpi.c
+++ b/drivers/hwtracing/intel_th/acpi.c
@ -37,15 +37,21 @@ MODULE_DEVICE_TABLE(acpi, intel_th_acpi_ids);
 static int intel_th_acpi_probe(struct platform_device *pdev)
 {
 	struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
+	struct resource resource[TH_MMIO_END];
 	const struct acpi_device_id *id;
 	struct intel_th *th;
+	int i, r;

 	id = acpi_match_device(intel_th_acpi_ids, &pdev->dev);
 	if (!id)
 		return -ENODEV;

-	th = intel_th_alloc(&pdev->dev, (void *)id->driver_data,
-			    pdev->resource, pdev->num_resources, -1);
+	for (i = 0, r = 0; i < pdev->num_resources && r < TH_MMIO_END; i++)
+		if (pdev->resource[i].flags &
+		    (IORESOURCE_IRQ | IORESOURCE_MEM))
+			resource[r++] = pdev->resource[i];
+
+	th = intel_th_alloc(&pdev->dev, (void *)id->driver_data, resource, r);
 	if (IS_ERR(th))
 		return PTR_ERR(th);

--- a/drivers/hwtracing/intel_th/core.c
+++ b/drivers/hwtracing/intel_th/core.c
@ -430,9 +430,9 @@ static const struct intel_th_subdevice {
 		.nres	= 1,
 		.res	= {
 			{
-				/* Handle TSCU from GTH driver */
+				/* Handle TSCU and CTS from GTH driver */
 				.start	= REG_GTH_OFFSET,
-				.end	= REG_TSCU_OFFSET + REG_TSCU_LENGTH - 1,
+				.end	= REG_CTS_OFFSET + REG_CTS_LENGTH - 1,
 				.flags	= IORESOURCE_MEM,
 			},
 		},
@ -491,7 +491,7 @@ static const struct intel_th_subdevice {
 				.flags	= IORESOURCE_MEM,
 			},
 			{
-				.start	= 1, /* use resource[1] */
+				.start	= TH_MMIO_SW,
 				.end	= 0,
 				.flags	= IORESOURCE_MEM,
 			},
@ -500,6 +500,24 @@ static const struct intel_th_subdevice {
 		.name	= "sth",
 		.type	= INTEL_TH_SOURCE,
 	},
+	{
+		.nres	= 2,
+		.res	= {
+			{
+				.start	= REG_STH_OFFSET,
+				.end	= REG_STH_OFFSET + REG_STH_LENGTH - 1,
+				.flags	= IORESOURCE_MEM,
+			},
+			{
+				.start	= TH_MMIO_RTIT,
+				.end	= 0,
+				.flags	= IORESOURCE_MEM,
+			},
+		},
+		.id	= -1,
+		.name	= "rtit",
+		.type	= INTEL_TH_SOURCE,
+	},
 	{
 		.nres	= 1,
 		.res	= {
@ -584,7 +602,6 @@ intel_th_subdevice_alloc(struct intel_th *th,
 	struct intel_th_device *thdev;
 	struct resource res[3];
 	unsigned int req = 0;
-	bool is64bit = false;
 	int r, err;

 	thdev = intel_th_device_alloc(th, subdev->type, subdev->name,
@ -594,18 +611,12 @@ intel_th_subdevice_alloc(struct intel_th *th,

 	thdev->drvdata = th->drvdata;

-	for (r = 0; r < th->num_resources; r++)
-		if (th->resource[r].flags & IORESOURCE_MEM_64) {
-			is64bit = true;
-			break;
-		}
-
 	memcpy(res, subdev->res,
 	       sizeof(struct resource) * subdev->nres);

 	for (r = 0; r < subdev->nres; r++) {
 		struct resource *devres = th->resource;
-		int bar = 0; /* cut subdevices' MMIO from resource[0] */
+		int bar = TH_MMIO_CONFIG;

 		/*
 		 * Take .end == 0 to mean 'take the whole bar',
@ -614,8 +625,9 @@ intel_th_subdevice_alloc(struct intel_th *th,
 		 */
 		if (!res[r].end && res[r].flags == IORESOURCE_MEM) {
 			bar = res[r].start;
-			if (is64bit)
-				bar *= 2;
+			err = -ENODEV;
+			if (bar >= th->num_resources)
+				goto fail_put_device;
 			res[r].start = 0;
 			res[r].end = resource_size(&devres[bar]) - 1;
 		}
@ -627,7 +639,12 @@ intel_th_subdevice_alloc(struct intel_th *th,
 			dev_dbg(th->dev, "%s:%d @ %pR\n",
 				subdev->name, r, &res[r]);
 		} else if (res[r].flags & IORESOURCE_IRQ) {
-			res[r].start	= th->irq;
+			/*
+			 * Only pass on the IRQ if we have useful interrupts:
+			 * the ones that can be configured via MINTCTL.
+			 */
+			if (INTEL_TH_CAP(th, has_mintctl) && th->irq != -1)
+				res[r].start = th->irq;
 		}
 	}

@ -758,8 +775,13 @@ static int intel_th_populate(struct intel_th *th)

 		thdev = intel_th_subdevice_alloc(th, subdev);
 		/* note: caller should free subdevices from th::thdev[] */
-		if (IS_ERR(thdev))
+		if (IS_ERR(thdev)) {
+			/* ENODEV for individual subdevices is allowed */
+			if (PTR_ERR(thdev) == -ENODEV)
+				continue;
+
 			return PTR_ERR(thdev);
+		}

 		th->thdev[th->num_thdevs++] = thdev;
 	}
@ -809,26 +831,40 @@ static const struct file_operations intel_th_output_fops = {
 	.llseek	= noop_llseek,
 };

+static irqreturn_t intel_th_irq(int irq, void *data)
+{
+	struct intel_th *th = data;
+	irqreturn_t ret = IRQ_NONE;
+	struct intel_th_driver *d;
+	int i;
+
+	for (i = 0; i < th->num_thdevs; i++) {
+		if (th->thdev[i]->type != INTEL_TH_OUTPUT)
+			continue;
+
+		d = to_intel_th_driver(th->thdev[i]->dev.driver);
+		if (d && d->irq)
+			ret |= d->irq(th->thdev[i]);
+	}
+
+	if (ret == IRQ_NONE)
+		pr_warn_ratelimited("nobody cared for irq\n");
+
+	return ret;
+}
+
 /**
 * intel_th_alloc() - allocate a new Intel TH device and its subdevices
 * @dev:	parent device
- * @devres:	parent's resources
- * @ndevres:	number of resources
+ * @devres:	resources indexed by th_mmio_idx
 * @irq:	irq number
 */
 struct intel_th *
 intel_th_alloc(struct device *dev, struct intel_th_drvdata *drvdata,
-	       struct resource *devres, unsigned int ndevres, int irq)
+	       struct resource *devres, unsigned int ndevres)
 {
+	int err, r, nr_mmios = 0;
 	struct intel_th *th;
-	int err, r;
-
-	if (irq == -1)
-		for (r = 0; r < ndevres; r++)
-			if (devres[r].flags & IORESOURCE_IRQ) {
-				irq = devres[r].start;
-				break;
-			}

 	th = kzalloc(sizeof(*th), GFP_KERNEL);
 	if (!th)
@ -846,12 +882,32 @@ intel_th_alloc(struct device *dev, struct intel_th_drvdata *drvdata,
 		err = th->major;
 		goto err_ida;
 	}
+	th->irq = -1;
 	th->dev = dev;
 	th->drvdata = drvdata;

-	th->resource = devres;
-	th->num_resources = ndevres;
-	th->irq = irq;
+	for (r = 0; r < ndevres; r++)
+		switch (devres[r].flags & IORESOURCE_TYPE_BITS) {
+		case IORESOURCE_MEM:
+			th->resource[nr_mmios++] = devres[r];
+			break;
+		case IORESOURCE_IRQ:
+			err = devm_request_irq(dev, devres[r].start,
+					       intel_th_irq, IRQF_SHARED,
+					       dev_name(dev), th);
+			if (err)
+				goto err_chrdev;
+
+			if (th->irq == -1)
+				th->irq = devres[r].start;
+			break;
+		default:
+			dev_warn(dev, "Unknown resource type %lx\n",
+				 devres[r].flags);
+			break;
+		}
+
+	th->num_resources = nr_mmios;

 	dev_set_drvdata(dev, th);

@ -868,6 +924,10 @@ intel_th_alloc(struct device *dev, struct intel_th_drvdata *drvdata,

 	return th;

+err_chrdev:
+	__unregister_chrdev(th->major, 0, TH_POSSIBLE_OUTPUTS,
+			    "intel_th/output");
+
 err_ida:
 	ida_simple_remove(&intel_th_ida, th->id);

@ -927,6 +987,27 @@ int intel_th_trace_enable(struct intel_th_device *thdev)
 }
 EXPORT_SYMBOL_GPL(intel_th_trace_enable);

+/**
+ * intel_th_trace_switch() - execute a switch sequence
+ * @thdev:	output device that requests tracing switch
+ */
+int intel_th_trace_switch(struct intel_th_device *thdev)
+{
+	struct intel_th_device *hub = to_intel_th_device(thdev->dev.parent);
+	struct intel_th_driver *hubdrv = to_intel_th_driver(hub->dev.driver);
+
+	if (WARN_ON_ONCE(hub->type != INTEL_TH_SWITCH))
+		return -EINVAL;
+
+	if (WARN_ON_ONCE(thdev->type != INTEL_TH_OUTPUT))
+		return -EINVAL;
+
+	hubdrv->trig_switch(hub, &thdev->output);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(intel_th_trace_switch);
+
 /**
 * intel_th_trace_disable() - disable tracing for an output device
 * @thdev:	output device that requests tracing be disabled
--- a/drivers/hwtracing/intel_th/gth.c
+++ b/drivers/hwtracing/intel_th/gth.c
@ -308,6 +308,11 @@ static int intel_th_gth_reset(struct gth_device *gth)
 	iowrite32(0, gth->base + REG_GTH_SCR);
 	iowrite32(0xfc, gth->base + REG_GTH_SCR2);

+	/* setup CTS for single trigger */
+	iowrite32(CTS_EVENT_ENABLE_IF_ANYTHING, gth->base + REG_CTS_C0S0_EN);
+	iowrite32(CTS_ACTION_CONTROL_SET_STATE(CTS_STATE_IDLE) |
+		  CTS_ACTION_CONTROL_TRIGGER, gth->base + REG_CTS_C0S0_ACT);
+
 	return 0;
 }

@ -456,6 +461,68 @@ static int intel_th_output_attributes(struct gth_device *gth)
 	return sysfs_create_group(&gth->dev->kobj, &gth->output_group);
 }

+/**
+ * intel_th_gth_stop() - stop tracing to an output device
+ * @gth:		GTH device
+ * @output:		output device's descriptor
+ * @capture_done:	set when no more traces will be captured
+ *
+ * This will stop tracing using force storeEn off signal and wait for the
+ * pipelines to be empty for the corresponding output port.
+ */
+static void intel_th_gth_stop(struct gth_device *gth,
+			      struct intel_th_output *output,
+			      bool capture_done)
+{
+	struct intel_th_device *outdev =
+		container_of(output, struct intel_th_device, output);
+	struct intel_th_driver *outdrv =
+		to_intel_th_driver(outdev->dev.driver);
+	unsigned long count;
+	u32 reg;
+	u32 scr2 = 0xfc | (capture_done ? 1 : 0);
+
+	iowrite32(0, gth->base + REG_GTH_SCR);
+	iowrite32(scr2, gth->base + REG_GTH_SCR2);
+
+	/* wait on pipeline empty for the given port */
+	for (reg = 0, count = GTH_PLE_WAITLOOP_DEPTH;
+	     count && !(reg & BIT(output->port)); count--) {
+		reg = ioread32(gth->base + REG_GTH_STAT);
+		cpu_relax();
+	}
+
+	if (!count)
+		dev_dbg(gth->dev, "timeout waiting for GTH[%d] PLE\n",
+			output->port);
+
+	/* wait on output piepline empty */
+	if (outdrv->wait_empty)
+		outdrv->wait_empty(outdev);
+
+	/* clear force capture done for next captures */
+	iowrite32(0xfc, gth->base + REG_GTH_SCR2);
+}
+
+/**
+ * intel_th_gth_start() - start tracing to an output device
+ * @gth:	GTH device
+ * @output:	output device's descriptor
+ *
+ * This will start tracing using force storeEn signal.
+ */
+static void intel_th_gth_start(struct gth_device *gth,
+			       struct intel_th_output *output)
+{
+	u32 scr = 0xfc0000;
+
+	if (output->multiblock)
+		scr |= 0xff;
+
+	iowrite32(scr, gth->base + REG_GTH_SCR);
+	iowrite32(0, gth->base + REG_GTH_SCR2);
+}
+
 /**
 * intel_th_gth_disable() - disable tracing to an output device
 * @thdev:	GTH device
@ -469,7 +536,6 @@ static void intel_th_gth_disable(struct intel_th_device *thdev,
 				 struct intel_th_output *output)
 {
 	struct gth_device *gth = dev_get_drvdata(&thdev->dev);
-	unsigned long count;
 	int master;
 	u32 reg;

@ -482,22 +548,7 @@ static void intel_th_gth_disable(struct intel_th_device *thdev,
 	}
 	spin_unlock(&gth->gth_lock);

-	iowrite32(0, gth->base + REG_GTH_SCR);
-	iowrite32(0xfd, gth->base + REG_GTH_SCR2);
-
-	/* wait on pipeline empty for the given port */
-	for (reg = 0, count = GTH_PLE_WAITLOOP_DEPTH;
-	     count && !(reg & BIT(output->port)); count--) {
-		reg = ioread32(gth->base + REG_GTH_STAT);
-		cpu_relax();
-	}
-
-	/* clear force capture done for next captures */
-	iowrite32(0xfc, gth->base + REG_GTH_SCR2);
-
-	if (!count)
-		dev_dbg(&thdev->dev, "timeout waiting for GTH[%d] PLE\n",
-			output->port);
+	intel_th_gth_stop(gth, output, true);

 	reg = ioread32(gth->base + REG_GTH_SCRPD0);
 	reg &= ~output->scratchpad;
@ -526,8 +577,8 @@ static void intel_th_gth_enable(struct intel_th_device *thdev,
 {
 	struct gth_device *gth = dev_get_drvdata(&thdev->dev);
 	struct intel_th *th = to_intel_th(thdev);
-	u32 scr = 0xfc0000, scrpd;
 	int master;
+	u32 scrpd;

 	spin_lock(&gth->gth_lock);
 	for_each_set_bit(master, gth->output[output->port].master,
@ -535,9 +586,6 @@ static void intel_th_gth_enable(struct intel_th_device *thdev,
 		gth_master_set(gth, master, output->port);
 	}

-	if (output->multiblock)
-		scr |= 0xff;
-
 	output->active = true;
 	spin_unlock(&gth->gth_lock);

@ -548,8 +596,38 @@ static void intel_th_gth_enable(struct intel_th_device *thdev,
 	scrpd |= output->scratchpad;
 	iowrite32(scrpd, gth->base + REG_GTH_SCRPD0);

-	iowrite32(scr, gth->base + REG_GTH_SCR);
-	iowrite32(0, gth->base + REG_GTH_SCR2);
+	intel_th_gth_start(gth, output);
+}
+
+/**
+ * intel_th_gth_switch() - execute a switch sequence
+ * @thdev:	GTH device
+ * @output:	output device's descriptor
+ *
+ * This will execute a switch sequence that will trigger a switch window
+ * when tracing to MSC in multi-block mode.
+ */
+static void intel_th_gth_switch(struct intel_th_device *thdev,
+				struct intel_th_output *output)
+{
+	struct gth_device *gth = dev_get_drvdata(&thdev->dev);
+	unsigned long count;
+	u32 reg;
+
+	/* trigger */
+	iowrite32(0, gth->base + REG_CTS_CTL);
+	iowrite32(CTS_CTL_SEQUENCER_ENABLE, gth->base + REG_CTS_CTL);
+	/* wait on trigger status */
+	for (reg = 0, count = CTS_TRIG_WAITLOOP_DEPTH;
+	     count && !(reg & BIT(4)); count--) {
+		reg = ioread32(gth->base + REG_CTS_STAT);
+		cpu_relax();
+	}
+	if (!count)
+		dev_dbg(&thdev->dev, "timeout waiting for CTS Trigger\n");
+
+	intel_th_gth_stop(gth, output, false);
+	intel_th_gth_start(gth, output);
 }

 /**
@ -735,6 +813,7 @@ static struct intel_th_driver intel_th_gth_driver = {
 	.unassign	= intel_th_gth_unassign,
 	.set_output	= intel_th_gth_set_output,
 	.enable		= intel_th_gth_enable,
+	.trig_switch	= intel_th_gth_switch,
 	.disable	= intel_th_gth_disable,
 	.driver	= {
 		.name	= "gth",
--- a/drivers/hwtracing/intel_th/gth.h
+++ b/drivers/hwtracing/intel_th/gth.h
@ -49,6 +49,12 @@ enum {
 	REG_GTH_SCRPD3		= 0xec, /* ScratchPad[3] */
 	REG_TSCU_TSUCTRL	= 0x2000, /* TSCU control register */
 	REG_TSCU_TSCUSTAT	= 0x2004, /* TSCU status register */
+
+	/* Common Capture Sequencer (CTS) registers */
+	REG_CTS_C0S0_EN		= 0x30c0, /* clause_event_enable_c0s0 */
+	REG_CTS_C0S0_ACT	= 0x3180, /* clause_action_control_c0s0 */
+	REG_CTS_STAT		= 0x32a0, /* cts_status */
+	REG_CTS_CTL		= 0x32a4, /* cts_control */
 };

 /* waiting for Pipeline Empty bit(s) to assert for GTH */
@ -57,4 +63,17 @@ enum {
 #define TSUCTRL_CTCRESYNC	BIT(0)
 #define TSCUSTAT_CTCSYNCING	BIT(1)

+/* waiting for Trigger status to assert for CTS */
+#define CTS_TRIG_WAITLOOP_DEPTH	10000
+
+#define CTS_EVENT_ENABLE_IF_ANYTHING	BIT(31)
+#define CTS_ACTION_CONTROL_STATE_OFF	27
+#define CTS_ACTION_CONTROL_SET_STATE(x)	\
+	(((x) & 0x1f) << CTS_ACTION_CONTROL_STATE_OFF)
+#define CTS_ACTION_CONTROL_TRIGGER	BIT(4)
+
+#define CTS_STATE_IDLE			0x10u
+
+#define CTS_CTL_SEQUENCER_ENABLE	BIT(0)
+
 #endif /* __INTEL_TH_GTH_H__ */
--- a/drivers/hwtracing/intel_th/intel_th.h
+++ b/drivers/hwtracing/intel_th/intel_th.h
@ -8,6 +8,8 @@
 #ifndef __INTEL_TH_H__
 #define __INTEL_TH_H__

+#include <linux/irqreturn.h>
+
 /* intel_th_device device types */
 enum {
 	/* Devices that generate trace data */
@ -18,6 +20,8 @@ enum {
 	INTEL_TH_SWITCH,
 };

+struct intel_th_device;
+
 /**
 * struct intel_th_output - descriptor INTEL_TH_OUTPUT type devices
 * @port:	output port number, assigned by the switch
@ -25,6 +29,7 @@ enum {
 * @scratchpad:	scratchpad bits to flag when this output is enabled
 * @multiblock:	true for multiblock output configuration
 * @active:	true when this output is enabled
+ * @wait_empty:	wait for device pipeline to be empty
 *
 * Output port descriptor, used by switch driver to tell which output
 * port this output device corresponds to. Filled in at output device's
@ -42,10 +47,12 @@ struct intel_th_output {
 /**
 * struct intel_th_drvdata - describes hardware capabilities and quirks
 * @tscu_enable:	device needs SW to enable time stamping unit
+ * @has_mintctl:	device has interrupt control (MINTCTL) register
 * @host_mode_only:	device can only operate in 'host debugger' mode
 */
 struct intel_th_drvdata {
 	unsigned int	tscu_enable        : 1,
+			has_mintctl        : 1,
 			host_mode_only     : 1;
 };

@ -157,10 +164,13 @@ struct intel_th_driver {
 					    struct intel_th_device *othdev);
 	void			(*enable)(struct intel_th_device *thdev,
 					  struct intel_th_output *output);
+	void			(*trig_switch)(struct intel_th_device *thdev,
+					       struct intel_th_output *output);
 	void			(*disable)(struct intel_th_device *thdev,
 					   struct intel_th_output *output);
 	/* output ops */
-	void			(*irq)(struct intel_th_device *thdev);
+	irqreturn_t		(*irq)(struct intel_th_device *thdev);
+	void			(*wait_empty)(struct intel_th_device *thdev);
 	int			(*activate)(struct intel_th_device *thdev);
 	void			(*deactivate)(struct intel_th_device *thdev);
 	/* file_operations for those who want a device node */
@ -213,21 +223,23 @@ static inline struct intel_th *to_intel_th(struct intel_th_device *thdev)

 struct intel_th *
 intel_th_alloc(struct device *dev, struct intel_th_drvdata *drvdata,
-	       struct resource *devres, unsigned int ndevres, int irq);
+	       struct resource *devres, unsigned int ndevres);
 void intel_th_free(struct intel_th *th);

 int intel_th_driver_register(struct intel_th_driver *thdrv);
 void intel_th_driver_unregister(struct intel_th_driver *thdrv);

 int intel_th_trace_enable(struct intel_th_device *thdev);
+int intel_th_trace_switch(struct intel_th_device *thdev);
 int intel_th_trace_disable(struct intel_th_device *thdev);
 int intel_th_set_output(struct intel_th_device *thdev,
 			unsigned int master);
 int intel_th_output_enable(struct intel_th *th, unsigned int otype);

-enum {
+enum th_mmio_idx {
 	TH_MMIO_CONFIG = 0,
-	TH_MMIO_SW = 2,
+	TH_MMIO_SW = 1,
+	TH_MMIO_RTIT = 2,
 	TH_MMIO_END,
 };

@ -237,6 +249,9 @@ enum {
 #define TH_CONFIGURABLE_MASTERS 256
 #define TH_MSC_MAX		2

+/* Maximum IRQ vectors */
+#define TH_NVEC_MAX		8
+
 /**
 * struct intel_th - Intel TH controller
 * @dev:	driver core's device
@ -244,7 +259,7 @@ enum {
 * @hub:	"switch" subdevice (GTH)
 * @resource:	resources of the entire controller
 * @num_thdevs:	number of devices in the @thdev array
- * @num_resources:	number or resources in the @resource array
+ * @num_resources:	number of resources in the @resource array
 * @irq:	irq number
 * @id:		this Intel TH controller's device ID in the system
 * @major:	device node major for output devices
@ -256,7 +271,7 @@ struct intel_th {
 	struct intel_th_device	*hub;
 	struct intel_th_drvdata	*drvdata;

-	struct resource		*resource;
+	struct resource		resource[TH_MMIO_END];
 	int			(*activate)(struct intel_th *);
 	void			(*deactivate)(struct intel_th *);
 	unsigned int		num_thdevs;
@ -296,6 +311,9 @@ enum {
 	REG_TSCU_OFFSET		= 0x2000,
 	REG_TSCU_LENGTH		= 0x1000,

+	REG_CTS_OFFSET		= 0x3000,
+	REG_CTS_LENGTH		= 0x1000,
+
 	/* Software Trace Hub (STH) [0x4000..0x4fff] */
 	REG_STH_OFFSET		= 0x4000,
 	REG_STH_LENGTH		= 0x2000,
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@ -28,29 +28,19 @@

 #define msc_dev(x) (&(x)->thdev->dev)

-/**
- * struct msc_block - multiblock mode block descriptor
- * @bdesc:	pointer to hardware descriptor (beginning of the block)
- * @addr:	physical address of the block
- */
-struct msc_block {
-	struct msc_block_desc	*bdesc;
-	dma_addr_t		addr;
-};
-
 /**
 * struct msc_window - multiblock mode window descriptor
 * @entry:	window list linkage (msc::win_list)
 * @pgoff:	page offset into the buffer that this window starts at
 * @nr_blocks:	number of blocks (pages) in this window
- * @block:	array of block descriptors
+ * @sgt:	array of block descriptors
 */
 struct msc_window {
 	struct list_head	entry;
 	unsigned long		pgoff;
 	unsigned int		nr_blocks;
 	struct msc		*msc;
-	struct msc_block	block[0];
+	struct sg_table		sgt;
 };

 /**
@ -84,6 +74,8 @@ struct msc_iter {
 * @reg_base:		register window base address
 * @thdev:		intel_th_device pointer
 * @win_list:		list of windows in multiblock mode
+ * @single_sgt:		single mode buffer
+ * @cur_win:		current window
 * @nr_pages:		total number of pages allocated for this buffer
 * @single_sz:		amount of data in single mode
 * @single_wrap:	single mode wrap occurred
@ -101,9 +93,12 @@ struct msc_iter {
 */
 struct msc {
 	void __iomem		*reg_base;
+	void __iomem		*msu_base;
 	struct intel_th_device	*thdev;

 	struct list_head	win_list;
+	struct sg_table		single_sgt;
+	struct msc_window	*cur_win;
 	unsigned long		nr_pages;
 	unsigned long		single_sz;
 	unsigned int		single_wrap : 1;
@ -120,7 +115,8 @@ struct msc {

 	/* config */
 	unsigned int		enabled : 1,
-				wrap	: 1;
+				wrap	: 1,
+				do_irq	: 1;
 	unsigned int		mode;
 	unsigned int		burst_len;
 	unsigned int		index;
@ -139,72 +135,22 @@ static inline bool msc_block_is_empty(struct msc_block_desc *bdesc)
 	return false;
 }

-/**
- * msc_oldest_window() - locate the window with oldest data
- * @msc:	MSC device
- *
- * This should only be used in multiblock mode. Caller should hold the
- * msc::user_count reference.
- *
- * Return:	the oldest window with valid data
- */
-static struct msc_window *msc_oldest_window(struct msc *msc)
+static inline struct msc_block_desc *
+msc_win_block(struct msc_window *win, unsigned int block)
 {
-	struct msc_window *win;
-	u32 reg = ioread32(msc->reg_base + REG_MSU_MSC0NWSA);
-	unsigned long win_addr = (unsigned long)reg << PAGE_SHIFT;
-	unsigned int found = 0;
-
-	if (list_empty(&msc->win_list))
-		return NULL;
-
-	/*
-	 * we might need a radix tree for this, depending on how
-	 * many windows a typical user would allocate; ideally it's
-	 * something like 2, in which case we're good
-	 */
-	list_for_each_entry(win, &msc->win_list, entry) {
-		if (win->block[0].addr == win_addr)
-			found++;
-
-		/* skip the empty ones */
-		if (msc_block_is_empty(win->block[0].bdesc))
-			continue;
-
-		if (found)
-			return win;
-	}
-
-	return list_entry(msc->win_list.next, struct msc_window, entry);
+	return sg_virt(&win->sgt.sgl[block]);
 }

-/**
- * msc_win_oldest_block() - locate the oldest block in a given window
- * @win:	window to look at
- *
- * Return:	index of the block with the oldest data
- */
-static unsigned int msc_win_oldest_block(struct msc_window *win)
+static inline dma_addr_t
+msc_win_baddr(struct msc_window *win, unsigned int block)
 {
-	unsigned int blk;
-	struct msc_block_desc *bdesc = win->block[0].bdesc;
+	return sg_dma_address(&win->sgt.sgl[block]);
+}

-	/* without wrapping, first block is the oldest */
-	if (!msc_block_wrapped(bdesc))
-		return 0;
-
-	/*
-	 * with wrapping, last written block contains both the newest and the
-	 * oldest data for this window.
-	 */
-	for (blk = 0; blk < win->nr_blocks; blk++) {
-		bdesc = win->block[blk].bdesc;
-
-		if (msc_block_last_written(bdesc))
-			return blk;
-	}
-
-	return 0;
+static inline unsigned long
+msc_win_bpfn(struct msc_window *win, unsigned int block)
+{
+	return msc_win_baddr(win, block) >> PAGE_SHIFT;
 }

 /**
@ -226,15 +172,81 @@ static inline bool msc_is_last_win(struct msc_window *win)
 static struct msc_window *msc_next_window(struct msc_window *win)
 {
 	if (msc_is_last_win(win))
-		return list_entry(win->msc->win_list.next, struct msc_window,
-				  entry);
+		return list_first_entry(&win->msc->win_list, struct msc_window,
+					entry);

-	return list_entry(win->entry.next, struct msc_window, entry);
+	return list_next_entry(win, entry);
+}
+
+/**
+ * msc_oldest_window() - locate the window with oldest data
+ * @msc:	MSC device
+ *
+ * This should only be used in multiblock mode. Caller should hold the
+ * msc::user_count reference.
+ *
+ * Return:	the oldest window with valid data
+ */
+static struct msc_window *msc_oldest_window(struct msc *msc)
+{
+	struct msc_window *win, *next = msc_next_window(msc->cur_win);
+	unsigned int found = 0;
+
+	if (list_empty(&msc->win_list))
+		return NULL;
+
+	/*
+	 * we might need a radix tree for this, depending on how
+	 * many windows a typical user would allocate; ideally it's
+	 * something like 2, in which case we're good
+	 */
+	list_for_each_entry(win, &msc->win_list, entry) {
+		if (win == next)
+			found++;
+
+		/* skip the empty ones */
+		if (msc_block_is_empty(msc_win_block(win, 0)))
+			continue;
+
+		if (found)
+			return win;
+	}
+
+	return list_first_entry(&msc->win_list, struct msc_window, entry);
+}
+
+/**
+ * msc_win_oldest_block() - locate the oldest block in a given window
+ * @win:	window to look at
+ *
+ * Return:	index of the block with the oldest data
+ */
+static unsigned int msc_win_oldest_block(struct msc_window *win)
+{
+	unsigned int blk;
+	struct msc_block_desc *bdesc = msc_win_block(win, 0);
+
+	/* without wrapping, first block is the oldest */
+	if (!msc_block_wrapped(bdesc))
+		return 0;
+
+	/*
+	 * with wrapping, last written block contains both the newest and the
+	 * oldest data for this window.
+	 */
+	for (blk = 0; blk < win->nr_blocks; blk++) {
+		bdesc = msc_win_block(win, blk);
+
+		if (msc_block_last_written(bdesc))
+			return blk;
+	}
+
+	return 0;
 }

 static struct msc_block_desc *msc_iter_bdesc(struct msc_iter *iter)
 {
-	return iter->win->block[iter->block].bdesc;
+	return msc_win_block(iter->win, iter->block);
 }

 static void msc_iter_init(struct msc_iter *iter)
@ -467,13 +479,47 @@ static void msc_buffer_clear_hw_header(struct msc *msc)
 			offsetof(struct msc_block_desc, hw_tag);

 		for (blk = 0; blk < win->nr_blocks; blk++) {
-			struct msc_block_desc *bdesc = win->block[blk].bdesc;
+			struct msc_block_desc *bdesc = msc_win_block(win, blk);

 			memset(&bdesc->hw_tag, 0, hw_sz);
 		}
 	}
 }

+static int intel_th_msu_init(struct msc *msc)
+{
+	u32 mintctl, msusts;
+
+	if (!msc->do_irq)
+		return 0;
+
+	mintctl = ioread32(msc->msu_base + REG_MSU_MINTCTL);
+	mintctl |= msc->index ? M1BLIE : M0BLIE;
+	iowrite32(mintctl, msc->msu_base + REG_MSU_MINTCTL);
+	if (mintctl != ioread32(msc->msu_base + REG_MSU_MINTCTL)) {
+		dev_info(msc_dev(msc), "MINTCTL ignores writes: no usable interrupts\n");
+		msc->do_irq = 0;
+		return 0;
+	}
+
+	msusts = ioread32(msc->msu_base + REG_MSU_MSUSTS);
+	iowrite32(msusts, msc->msu_base + REG_MSU_MSUSTS);
+
+	return 0;
+}
+
+static void intel_th_msu_deinit(struct msc *msc)
+{
+	u32 mintctl;
+
+	if (!msc->do_irq)
+		return;
+
+	mintctl = ioread32(msc->msu_base + REG_MSU_MINTCTL);
+	mintctl &= msc->index ? ~M1BLIE : ~M0BLIE;
+	iowrite32(mintctl, msc->msu_base + REG_MSU_MINTCTL);
+}
+
 /**
 * msc_configure() - set up MSC hardware
 * @msc:	the MSC device to configure
@ -531,23 +577,14 @@ static int msc_configure(struct msc *msc)
 */
 static void msc_disable(struct msc *msc)
 {
-	unsigned long count;
 	u32 reg;

 	lockdep_assert_held(&msc->buf_mutex);

 	intel_th_trace_disable(msc->thdev);

-	for (reg = 0, count = MSC_PLE_WAITLOOP_DEPTH;
-	     count && !(reg & MSCSTS_PLE); count--) {
-		reg = ioread32(msc->reg_base + REG_MSU_MSC0STS);
-		cpu_relax();
-	}
-
-	if (!count)
-		dev_dbg(msc_dev(msc), "timeout waiting for MSC0 PLE\n");
-
 	if (msc->mode == MSC_MODE_SINGLE) {
+		reg = ioread32(msc->reg_base + REG_MSU_MSC0STS);
 		msc->single_wrap = !!(reg & MSCSTS_WRAPSTAT);

 		reg = ioread32(msc->reg_base + REG_MSU_MSC0MWP);
@ -617,22 +654,45 @@ static void intel_th_msc_deactivate(struct intel_th_device *thdev)
 */
 static int msc_buffer_contig_alloc(struct msc *msc, unsigned long size)
 {
+	unsigned long nr_pages = size >> PAGE_SHIFT;
 	unsigned int order = get_order(size);
 	struct page *page;
+	int ret;

 	if (!size)
 		return 0;

+	ret = sg_alloc_table(&msc->single_sgt, 1, GFP_KERNEL);
+	if (ret)
+		goto err_out;
+
+	ret = -ENOMEM;
 	page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
 	if (!page)
-		return -ENOMEM;
+		goto err_free_sgt;

 	split_page(page, order);
-	msc->nr_pages = size >> PAGE_SHIFT;
+	sg_set_buf(msc->single_sgt.sgl, page_address(page), size);
+
+	ret = dma_map_sg(msc_dev(msc)->parent->parent, msc->single_sgt.sgl, 1,
+			 DMA_FROM_DEVICE);
+	if (ret < 0)
+		goto err_free_pages;
+
+	msc->nr_pages = nr_pages;
 	msc->base = page_address(page);
-	msc->base_addr = page_to_phys(page);
+	msc->base_addr = sg_dma_address(msc->single_sgt.sgl);

 	return 0;
+
+err_free_pages:
+	__free_pages(page, order);
+
+err_free_sgt:
+	sg_free_table(&msc->single_sgt);
+
+err_out:
+	return ret;
 }

 /**
@ -643,6 +703,10 @@ static void msc_buffer_contig_free(struct msc *msc)
 {
 	unsigned long off;

+	dma_unmap_sg(msc_dev(msc)->parent->parent, msc->single_sgt.sgl,
+		     1, DMA_FROM_DEVICE);
+	sg_free_table(&msc->single_sgt);
+
 	for (off = 0; off < msc->nr_pages << PAGE_SHIFT; off += PAGE_SIZE) {
 		struct page *page = virt_to_page(msc->base + off);

@ -669,6 +733,40 @@ static struct page *msc_buffer_contig_get_page(struct msc *msc,
 	return virt_to_page(msc->base + (pgoff << PAGE_SHIFT));
 }

+static int __msc_buffer_win_alloc(struct msc_window *win,
+				  unsigned int nr_blocks)
+{
+	struct scatterlist *sg_ptr;
+	void *block;
+	int i, ret;
+
+	ret = sg_alloc_table(&win->sgt, nr_blocks, GFP_KERNEL);
+	if (ret)
+		return -ENOMEM;
+
+	for_each_sg(win->sgt.sgl, sg_ptr, nr_blocks, i) {
+		block = dma_alloc_coherent(msc_dev(win->msc)->parent->parent,
+					  PAGE_SIZE, &sg_dma_address(sg_ptr),
+					  GFP_KERNEL);
+		if (!block)
+			goto err_nomem;
+
+		sg_set_buf(sg_ptr, block, PAGE_SIZE);
+	}
+
+	return nr_blocks;
+
+err_nomem:
+	for (i--; i >= 0; i--)
+		dma_free_coherent(msc_dev(win->msc)->parent->parent, PAGE_SIZE,
+				  msc_win_block(win, i),
+				  msc_win_baddr(win, i));
+
+	sg_free_table(&win->sgt);
+
+	return -ENOMEM;
+}
+
 /**
 * msc_buffer_win_alloc() - alloc a window for a multiblock mode
 * @msc:	MSC device
@ -682,44 +780,49 @@ static struct page *msc_buffer_contig_get_page(struct msc *msc,
 static int msc_buffer_win_alloc(struct msc *msc, unsigned int nr_blocks)
 {
 	struct msc_window *win;
-	unsigned long size = PAGE_SIZE;
-	int i, ret = -ENOMEM;
+	int ret = -ENOMEM, i;

 	if (!nr_blocks)
 		return 0;

-	win = kzalloc(offsetof(struct msc_window, block[nr_blocks]),
-		      GFP_KERNEL);
+	/*
+	 * This limitation hold as long as we need random access to the
+	 * block. When that changes, this can go away.
+	 */
+	if (nr_blocks > SG_MAX_SINGLE_ALLOC)
+		return -EINVAL;
+
+	win = kzalloc(sizeof(*win), GFP_KERNEL);
 	if (!win)
 		return -ENOMEM;

-	if (!list_empty(&msc->win_list)) {
-		struct msc_window *prev = list_entry(msc->win_list.prev,
-						     struct msc_window, entry);
+	win->msc = msc;

+	if (!list_empty(&msc->win_list)) {
+		struct msc_window *prev = list_last_entry(&msc->win_list,
+							  struct msc_window,
+							  entry);
+
+		/* This works as long as blocks are page-sized */
 		win->pgoff = prev->pgoff + prev->nr_blocks;
 	}

-	for (i = 0; i < nr_blocks; i++) {
-		win->block[i].bdesc =
-			dma_alloc_coherent(msc_dev(msc)->parent->parent, size,
-					   &win->block[i].addr, GFP_KERNEL);
-
-		if (!win->block[i].bdesc)
-			goto err_nomem;
+	ret = __msc_buffer_win_alloc(win, nr_blocks);
+	if (ret < 0)
+		goto err_nomem;

 #ifdef CONFIG_X86
+	for (i = 0; i < ret; i++)
 		/* Set the page as uncached */
-		set_memory_uc((unsigned long)win->block[i].bdesc, 1);
+		set_memory_uc((unsigned long)msc_win_block(win, i), 1);
 #endif
-	}

-	win->msc = msc;
-	win->nr_blocks = nr_blocks;
+	win->nr_blocks = ret;

 	if (list_empty(&msc->win_list)) {
-		msc->base = win->block[0].bdesc;
-		msc->base_addr = win->block[0].addr;
+		msc->base = msc_win_block(win, 0);
+		msc->base_addr = msc_win_baddr(win, 0);
+		msc->cur_win = win;
 	}

 	list_add_tail(&win->entry, &msc->win_list);
@ -728,19 +831,25 @@ static int msc_buffer_win_alloc(struct msc *msc, unsigned int nr_blocks)
 	return 0;

 err_nomem:
-	for (i--; i >= 0; i--) {
-#ifdef CONFIG_X86
-		/* Reset the page to write-back before releasing */
-		set_memory_wb((unsigned long)win->block[i].bdesc, 1);
-#endif
-		dma_free_coherent(msc_dev(msc)->parent->parent, size,
-				  win->block[i].bdesc, win->block[i].addr);
-	}
 	kfree(win);

 	return ret;
 }

+static void __msc_buffer_win_free(struct msc *msc, struct msc_window *win)
+{
+	int i;
+
+	for (i = 0; i < win->nr_blocks; i++) {
+		struct page *page = sg_page(&win->sgt.sgl[i]);
+
+		page->mapping = NULL;
+		dma_free_coherent(msc_dev(win->msc)->parent->parent, PAGE_SIZE,
+				  msc_win_block(win, i), msc_win_baddr(win, i));
+	}
+	sg_free_table(&win->sgt);
+}
+
 /**
 * msc_buffer_win_free() - free a window from MSC's window list
 * @msc:	MSC device
@ -761,17 +870,13 @@ static void msc_buffer_win_free(struct msc *msc, struct msc_window *win)
 		msc->base_addr = 0;
 	}

-	for (i = 0; i < win->nr_blocks; i++) {
-		struct page *page = virt_to_page(win->block[i].bdesc);
-
-		page->mapping = NULL;
 #ifdef CONFIG_X86
-		/* Reset the page to write-back before releasing */
-		set_memory_wb((unsigned long)win->block[i].bdesc, 1);
+	for (i = 0; i < win->nr_blocks; i++)
+		/* Reset the page to write-back */
+		set_memory_wb((unsigned long)msc_win_block(win, i), 1);
 #endif
-		dma_free_coherent(msc_dev(win->msc)->parent->parent, PAGE_SIZE,
-				  win->block[i].bdesc, win->block[i].addr);
-	}
+
+	__msc_buffer_win_free(msc, win);

 	kfree(win);
 }
@ -798,19 +903,18 @@ static void msc_buffer_relink(struct msc *msc)
 		 */
 		if (msc_is_last_win(win)) {
 			sw_tag |= MSC_SW_TAG_LASTWIN;
-			next_win = list_entry(msc->win_list.next,
-					      struct msc_window, entry);
+			next_win = list_first_entry(&msc->win_list,
+						    struct msc_window, entry);
 		} else {
-			next_win = list_entry(win->entry.next,
-					      struct msc_window, entry);
+			next_win = list_next_entry(win, entry);
 		}

 		for (blk = 0; blk < win->nr_blocks; blk++) {
-			struct msc_block_desc *bdesc = win->block[blk].bdesc;
+			struct msc_block_desc *bdesc = msc_win_block(win, blk);

 			memset(bdesc, 0, sizeof(*bdesc));

-			bdesc->next_win = next_win->block[0].addr >> PAGE_SHIFT;
+			bdesc->next_win = msc_win_bpfn(next_win, 0);

 			/*
 			 * Similarly to last window, last block should point
@ -818,11 +922,9 @@ static void msc_buffer_relink(struct msc *msc)
 			 */
 			if (blk == win->nr_blocks - 1) {
 				sw_tag |= MSC_SW_TAG_LASTBLK;
-				bdesc->next_blk =
-					win->block[0].addr >> PAGE_SHIFT;
+				bdesc->next_blk = msc_win_bpfn(win, 0);
 			} else {
-				bdesc->next_blk =
-					win->block[blk + 1].addr >> PAGE_SHIFT;
+				bdesc->next_blk = msc_win_bpfn(win, blk + 1);
 			}

 			bdesc->sw_tag = sw_tag;
@ -997,7 +1099,7 @@ static struct page *msc_buffer_get_page(struct msc *msc, unsigned long pgoff)

 found:
 	pgoff -= win->pgoff;
-	return virt_to_page(win->block[pgoff].bdesc);
+	return sg_page(&win->sgt.sgl[pgoff]);
 }

 /**
@ -1250,6 +1352,22 @@ static const struct file_operations intel_th_msc_fops = {
 	.owner		= THIS_MODULE,
 };

+static void intel_th_msc_wait_empty(struct intel_th_device *thdev)
+{
+	struct msc *msc = dev_get_drvdata(&thdev->dev);
+	unsigned long count;
+	u32 reg;
+
+	for (reg = 0, count = MSC_PLE_WAITLOOP_DEPTH;
+	     count && !(reg & MSCSTS_PLE); count--) {
+		reg = __raw_readl(msc->reg_base + REG_MSU_MSC0STS);
+		cpu_relax();
+	}
+
+	if (!count)
+		dev_dbg(msc_dev(msc), "timeout waiting for MSC0 PLE\n");
+}
+
 static int intel_th_msc_init(struct msc *msc)
 {
 	atomic_set(&msc->user_count, -1);
@ -1266,6 +1384,39 @@ static int intel_th_msc_init(struct msc *msc)
 	return 0;
 }

+static void msc_win_switch(struct msc *msc)
+{
+	struct msc_window *last, *first;
+
+	first = list_first_entry(&msc->win_list, struct msc_window, entry);
+	last = list_last_entry(&msc->win_list, struct msc_window, entry);
+
+	if (msc_is_last_win(msc->cur_win))
+		msc->cur_win = first;
+	else
+		msc->cur_win = list_next_entry(msc->cur_win, entry);
+
+	msc->base = msc_win_block(msc->cur_win, 0);
+	msc->base_addr = msc_win_baddr(msc->cur_win, 0);
+
+	intel_th_trace_switch(msc->thdev);
+}
+
+static irqreturn_t intel_th_msc_interrupt(struct intel_th_device *thdev)
+{
+	struct msc *msc = dev_get_drvdata(&thdev->dev);
+	u32 msusts = ioread32(msc->msu_base + REG_MSU_MSUSTS);
+	u32 mask = msc->index ? MSUSTS_MSC1BLAST : MSUSTS_MSC0BLAST;
+
+	if (!(msusts & mask)) {
+		if (msc->enabled)
+			return IRQ_HANDLED;
+		return IRQ_NONE;
+	}
+
+	return IRQ_HANDLED;
+}
+
 static const char * const msc_mode[] = {
 	[MSC_MODE_SINGLE]	= "single",
 	[MSC_MODE_MULTI]	= "multi",
@ -1440,10 +1591,38 @@ free_win:

 static DEVICE_ATTR_RW(nr_pages);

+static ssize_t
+win_switch_store(struct device *dev, struct device_attribute *attr,
+		 const char *buf, size_t size)
+{
+	struct msc *msc = dev_get_drvdata(dev);
+	unsigned long val;
+	int ret;
+
+	ret = kstrtoul(buf, 10, &val);
+	if (ret)
+		return ret;
+
+	if (val != 1)
+		return -EINVAL;
+
+	mutex_lock(&msc->buf_mutex);
+	if (msc->mode != MSC_MODE_MULTI)
+		ret = -ENOTSUPP;
+	else
+		msc_win_switch(msc);
+	mutex_unlock(&msc->buf_mutex);
+
+	return ret ? ret : size;
+}
+
+static DEVICE_ATTR_WO(win_switch);
+
 static struct attribute *msc_output_attrs[] = {
 	&dev_attr_wrap.attr,
 	&dev_attr_mode.attr,
 	&dev_attr_nr_pages.attr,
+	&dev_attr_win_switch.attr,
 	NULL,
 };

@ -1471,10 +1650,19 @@ static int intel_th_msc_probe(struct intel_th_device *thdev)
 	if (!msc)
 		return -ENOMEM;

+	res = intel_th_device_get_resource(thdev, IORESOURCE_IRQ, 1);
+	if (!res)
+		msc->do_irq = 1;
+
 	msc->index = thdev->id;

 	msc->thdev = thdev;
 	msc->reg_base = base + msc->index * 0x100;
+	msc->msu_base = base;
+
+	err = intel_th_msu_init(msc);
+	if (err)
+		return err;

 	err = intel_th_msc_init(msc);
 	if (err)
@ -1491,6 +1679,7 @@ static void intel_th_msc_remove(struct intel_th_device *thdev)
 	int ret;

 	intel_th_msc_deactivate(thdev);
+	intel_th_msu_deinit(msc);

 	/*
 	 * Buffers should not be used at this point except if the
@ -1504,6 +1693,8 @@ static void intel_th_msc_remove(struct intel_th_device *thdev)
 static struct intel_th_driver intel_th_msc_driver = {
 	.probe	= intel_th_msc_probe,
 	.remove	= intel_th_msc_remove,
+	.irq		= intel_th_msc_interrupt,
+	.wait_empty	= intel_th_msc_wait_empty,
 	.activate	= intel_th_msc_activate,
 	.deactivate	= intel_th_msc_deactivate,
 	.fops	= &intel_th_msc_fops,
--- a/drivers/hwtracing/intel_th/msu.h
+++ b/drivers/hwtracing/intel_th/msu.h
@ -11,6 +11,7 @@
 enum {
 	REG_MSU_MSUPARAMS	= 0x0000,
 	REG_MSU_MSUSTS		= 0x0008,
+	REG_MSU_MINTCTL		= 0x0004, /* MSU-global interrupt control */
 	REG_MSU_MSC0CTL		= 0x0100, /* MSC0 control */
 	REG_MSU_MSC0STS		= 0x0104, /* MSC0 status */
 	REG_MSU_MSC0BAR		= 0x0108, /* MSC0 output base address */
@ -28,6 +29,8 @@ enum {

 /* MSUSTS bits */
 #define MSUSTS_MSU_INT	BIT(0)
+#define MSUSTS_MSC0BLAST	BIT(16)
+#define MSUSTS_MSC1BLAST	BIT(24)

 /* MSCnCTL bits */
 #define MSC_EN		BIT(0)
@ -36,6 +39,11 @@ enum {
 #define MSC_MODE	(BIT(4) | BIT(5))
 #define MSC_LEN		(BIT(8) | BIT(9) | BIT(10))

+/* MINTCTL bits */
+#define MICDE		BIT(0)
+#define M0BLIE		BIT(16)
+#define M1BLIE		BIT(24)
+
 /* MSC operating modes (MSC_MODE) */
 enum {
 	MSC_MODE_SINGLE	= 0,
@ -87,7 +95,7 @@ static inline unsigned long msc_data_sz(struct msc_block_desc *bdesc)

 static inline bool msc_block_wrapped(struct msc_block_desc *bdesc)
 {
-	if (bdesc->hw_tag & MSC_HW_TAG_BLOCKWRAP)
+	if (bdesc->hw_tag & (MSC_HW_TAG_BLOCKWRAP | MSC_HW_TAG_WINWRAP))
 		return true;

 	return false;
--- a/drivers/hwtracing/intel_th/pci.c
+++ b/drivers/hwtracing/intel_th/pci.c
@ -17,7 +17,13 @@

 #define DRIVER_NAME "intel_th_pci"

-#define BAR_MASK (BIT(TH_MMIO_CONFIG) | BIT(TH_MMIO_SW))
+enum {
+	TH_PCI_CONFIG_BAR	= 0,
+	TH_PCI_STH_SW_BAR	= 2,
+	TH_PCI_RTIT_BAR		= 4,
+};
+
+#define BAR_MASK (BIT(TH_PCI_CONFIG_BAR) | BIT(TH_PCI_STH_SW_BAR))

 #define PCI_REG_NPKDSC	0x80
 #define NPKDSC_TSACT	BIT(5)
@ -66,8 +72,12 @@ static int intel_th_pci_probe(struct pci_dev *pdev,
 			      const struct pci_device_id *id)
 {
 	struct intel_th_drvdata *drvdata = (void *)id->driver_data;
+	struct resource resource[TH_MMIO_END + TH_NVEC_MAX] = {
+		[TH_MMIO_CONFIG]	= pdev->resource[TH_PCI_CONFIG_BAR],
+		[TH_MMIO_SW]		= pdev->resource[TH_PCI_STH_SW_BAR],
+	};
+	int err, r = TH_MMIO_SW + 1, i;
 	struct intel_th *th;
-	int err;

 	err = pcim_enable_device(pdev);
 	if (err)
@ -77,8 +87,19 @@ static int intel_th_pci_probe(struct pci_dev *pdev,
 	if (err)
 		return err;

-	th = intel_th_alloc(&pdev->dev, drvdata, pdev->resource,
-			    DEVICE_COUNT_RESOURCE, pdev->irq);
+	if (pdev->resource[TH_PCI_RTIT_BAR].start) {
+		resource[TH_MMIO_RTIT] = pdev->resource[TH_PCI_RTIT_BAR];
+		r++;
+	}
+
+	err = pci_alloc_irq_vectors(pdev, 1, 8, PCI_IRQ_ALL_TYPES);
+	if (err > 0)
+		for (i = 0; i < err; i++, r++) {
+			resource[r].flags = IORESOURCE_IRQ;
+			resource[r].start = pci_irq_vector(pdev, i);
+		}
+
+	th = intel_th_alloc(&pdev->dev, drvdata, resource, r);
 	if (IS_ERR(th))
 		return PTR_ERR(th);

@ -95,10 +116,13 @@ static void intel_th_pci_remove(struct pci_dev *pdev)
 	struct intel_th *th = pci_get_drvdata(pdev);

 	intel_th_free(th);
+
+	pci_free_irq_vectors(pdev);
 }

 static const struct intel_th_drvdata intel_th_2x = {
 	.tscu_enable	= 1,
+	.has_mintctl	= 1,
 };

 static const struct pci_device_id intel_th_pci_id_table[] = {
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@ -90,18 +90,7 @@ static int icc_summary_show(struct seq_file *s, void *data)

 	return 0;
 }
-
-static int icc_summary_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, icc_summary_show, inode->i_private);
-}
-
-static const struct file_operations icc_summary_fops = {
-	.open		= icc_summary_open,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-};
+DEFINE_SHOW_ATTRIBUTE(icc_summary);

 static struct icc_node *node_find(const int id)
 {
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@ -496,6 +496,14 @@ config VEXPRESS_SYSCFG
 	  bus. System Configuration interface is one of the possible means
 	  of generating transactions on this bus.

+config ASPEED_P2A_CTRL
+	depends on (ARCH_ASPEED || COMPILE_TEST) && REGMAP && MFD_SYSCON
+	tristate "Aspeed ast2400/2500 HOST P2A VGA MMIO to BMC bridge control"
+	help
+	  Control Aspeed ast2400/2500 HOST P2A VGA MMIO to BMC mappings through
+	  ioctl()s, the driver also provides an interface for userspace mappings to
+	  a pre-defined region.
+
 config ASPEED_LPC_CTRL
 	depends on (ARCH_ASPEED || COMPILE_TEST) && REGMAP && MFD_SYSCON
 	tristate "Aspeed ast2400/2500 HOST LPC to BMC bridge control"
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@ -56,6 +56,7 @@ obj-$(CONFIG_VEXPRESS_SYSCFG)	+= vexpress-syscfg.o
 obj-$(CONFIG_CXL_BASE)		+= cxl/
 obj-$(CONFIG_ASPEED_LPC_CTRL)	+= aspeed-lpc-ctrl.o
 obj-$(CONFIG_ASPEED_LPC_SNOOP)	+= aspeed-lpc-snoop.o
+obj-$(CONFIG_ASPEED_P2A_CTRL)	+= aspeed-p2a-ctrl.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)	+= pci_endpoint_test.o
 obj-$(CONFIG_OCXL)		+= ocxl/
 obj-y				+= cardreader/
--- a/drivers/misc/aspeed-p2a-ctrl.c
+++ b/drivers/misc/aspeed-p2a-ctrl.c
@ -0,0 +1,444 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright 2019 Google Inc
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Provides a simple driver to control the ASPEED P2A interface which allows
+ * the host to read and write to various regions of the BMC's memory.
+ */
+
+#include <linux/fs.h>
+#include <linux/io.h>
+#include <linux/mfd/syscon.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of_address.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <linux/aspeed-p2a-ctrl.h>
+
+#define DEVICE_NAME	"aspeed-p2a-ctrl"
+
+/* SCU2C is a Misc. Control Register. */
+#define SCU2C 0x2c
+/* SCU180 is the PCIe Configuration Setting Control Register. */
+#define SCU180 0x180
+/* Bit 1 controls the P2A bridge, while bit 0 controls the entire VGA device
+ * on the PCI bus.
+ */
+#define SCU180_ENP2A BIT(1)
+
+/* The ast2400/2500 both have six ranges. */
+#define P2A_REGION_COUNT 6
+
+struct region {
+	u64 min;
+	u64 max;
+	u32 bit;
+};
+
+struct aspeed_p2a_model_data {
+	/* min, max, bit */
+	struct region regions[P2A_REGION_COUNT];
+};
+
+struct aspeed_p2a_ctrl {
+	struct miscdevice miscdev;
+	struct regmap *regmap;
+
+	const struct aspeed_p2a_model_data *config;
+
+	/* Access to these needs to be locked, held via probe, mapping ioctl,
+	 * and release, remove.
+	 */
+	struct mutex tracking;
+	u32 readers;
+	u32 readerwriters[P2A_REGION_COUNT];
+
+	phys_addr_t mem_base;
+	resource_size_t mem_size;
+};
+
+struct aspeed_p2a_user {
+	struct file *file;
+	struct aspeed_p2a_ctrl *parent;
+
+	/* The entire memory space is opened for reading once the bridge is
+	 * enabled, therefore this needs only to be tracked once per user.
+	 * If any user has it open for read, the bridge must stay enabled.
+	 */
+	u32 read;
+
+	/* Each entry of the array corresponds to a P2A Region.  If the user
+	 * opens for read or readwrite, the reference goes up here.  On
+	 * release, this array is walked and references adjusted accordingly.
+	 */
+	u32 readwrite[P2A_REGION_COUNT];
+};
+
+static void aspeed_p2a_enable_bridge(struct aspeed_p2a_ctrl *p2a_ctrl)
+{
+	regmap_update_bits(p2a_ctrl->regmap,
+		SCU180, SCU180_ENP2A, SCU180_ENP2A);
+}
+
+static void aspeed_p2a_disable_bridge(struct aspeed_p2a_ctrl *p2a_ctrl)
+{
+	regmap_update_bits(p2a_ctrl->regmap, SCU180, SCU180_ENP2A, 0);
+}
+
+static int aspeed_p2a_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned long vsize;
+	pgprot_t prot;
+	struct aspeed_p2a_user *priv = file->private_data;
+	struct aspeed_p2a_ctrl *ctrl = priv->parent;
+
+	if (ctrl->mem_base == 0 && ctrl->mem_size == 0)
+		return -EINVAL;
+
+	vsize = vma->vm_end - vma->vm_start;
+	prot = vma->vm_page_prot;
+
+	if (vma->vm_pgoff + vsize > ctrl->mem_base + ctrl->mem_size)
+		return -EINVAL;
+
+	/* ast2400/2500 AHB accesses are not cache coherent */
+	prot = pgprot_noncached(prot);
+
+	if (remap_pfn_range(vma, vma->vm_start,
+		(ctrl->mem_base >> PAGE_SHIFT) + vma->vm_pgoff,
+		vsize, prot))
+		return -EAGAIN;
+
+	return 0;
+}
+
+static bool aspeed_p2a_region_acquire(struct aspeed_p2a_user *priv,
+		struct aspeed_p2a_ctrl *ctrl,
+		struct aspeed_p2a_ctrl_mapping *map)
+{
+	int i;
+	u64 base, end;
+	bool matched = false;
+
+	base = map->addr;
+	end = map->addr + (map->length - 1);
+
+	/* If the value is a legal u32, it will find a match. */
+	for (i = 0; i < P2A_REGION_COUNT; i++) {
+		const struct region *curr = &ctrl->config->regions[i];
+
+		/* If the top of this region is lower than your base, skip it.
+		 */
+		if (curr->max < base)
+			continue;
+
+		/* If the bottom of this region is higher than your end, bail.
+		 */
+		if (curr->min > end)
+			break;
+
+		/* Lock this and update it, therefore it someone else is
+		 * closing their file out, this'll preserve the increment.
+		 */
+		mutex_lock(&ctrl->tracking);
+		ctrl->readerwriters[i] += 1;
+		mutex_unlock(&ctrl->tracking);
+
+		/* Track with the user, so when they close their file, we can
+		 * decrement properly.
+		 */
+		priv->readwrite[i] += 1;
+
+		/* Enable the region as read-write. */
+		regmap_update_bits(ctrl->regmap, SCU2C, curr->bit, 0);
+		matched = true;
+	}
+
+	return matched;
+}
+
+static long aspeed_p2a_ioctl(struct file *file, unsigned int cmd,
+		unsigned long data)
+{
+	struct aspeed_p2a_user *priv = file->private_data;
+	struct aspeed_p2a_ctrl *ctrl = priv->parent;
+	void __user *arg = (void __user *)data;
+	struct aspeed_p2a_ctrl_mapping map;
+
+	if (copy_from_user(&map, arg, sizeof(map)))
+		return -EFAULT;
+
+	switch (cmd) {
+	case ASPEED_P2A_CTRL_IOCTL_SET_WINDOW:
+		/* If they want a region to be read-only, since the entire
+		 * region is read-only once enabled, we just need to track this
+		 * user wants to read from the bridge, and if it's not enabled.
+		 * Enable it.
+		 */
+		if (map.flags == ASPEED_P2A_CTRL_READ_ONLY) {
+			mutex_lock(&ctrl->tracking);
+			ctrl->readers += 1;
+			mutex_unlock(&ctrl->tracking);
+
+			/* Track with the user, so when they close their file,
+			 * we can decrement properly.
+			 */
+			priv->read += 1;
+		} else if (map.flags == ASPEED_P2A_CTRL_READWRITE) {
+			/* If we don't acquire any region return error. */
+			if (!aspeed_p2a_region_acquire(priv, ctrl, &map)) {
+				return -EINVAL;
+			}
+		} else {
+			/* Invalid map flags. */
+			return -EINVAL;
+		}
+
+		aspeed_p2a_enable_bridge(ctrl);
+		return 0;
+	case ASPEED_P2A_CTRL_IOCTL_GET_MEMORY_CONFIG:
+		/* This is a request for the memory-region and corresponding
+		 * length that is used by the driver for mmap.
+		 */
+
+		map.flags = 0;
+		map.addr = ctrl->mem_base;
+		map.length = ctrl->mem_size;
+
+		return copy_to_user(arg, &map, sizeof(map)) ? -EFAULT : 0;
+	}
+
+	return -EINVAL;
+}
+
+
+/*
+ * When a user opens this file, we create a structure to track their mappings.
+ *
+ * A user can map a region as read-only (bridge enabled), or read-write (bit
+ * flipped, and bridge enabled).  Either way, this tracking is used, s.t. when
+ * they release the device references are handled.
+ *
+ * The bridge is not enabled until a user calls an ioctl to map a region,
+ * simply opening the device does not enable it.
+ */
+static int aspeed_p2a_open(struct inode *inode, struct file *file)
+{
+	struct aspeed_p2a_user *priv;
+
+	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->file = file;
+	priv->read = 0;
+	memset(priv->readwrite, 0, sizeof(priv->readwrite));
+
+	/* The file's private_data is initialized to the p2a_ctrl. */
+	priv->parent = file->private_data;
+
+	/* Set the file's private_data to the user's data. */
+	file->private_data = priv;
+
+	return 0;
+}
+
+/*
+ * This will close the users mappings.  It will go through what they had opened
+ * for readwrite, and decrement those counts.  If at the end, this is the last
+ * user, it'll close the bridge.
+ */
+static int aspeed_p2a_release(struct inode *inode, struct file *file)
+{
+	int i;
+	u32 bits = 0;
+	bool open_regions = false;
+	struct aspeed_p2a_user *priv = file->private_data;
+
+	/* Lock others from changing these values until everything is updated
+	 * in one pass.
+	 */
+	mutex_lock(&priv->parent->tracking);
+
+	priv->parent->readers -= priv->read;
+
+	for (i = 0; i < P2A_REGION_COUNT; i++) {
+		priv->parent->readerwriters[i] -= priv->readwrite[i];
+
+		if (priv->parent->readerwriters[i] > 0)
+			open_regions = true;
+		else
+			bits |= priv->parent->config->regions[i].bit;
+	}
+
+	/* Setting a bit to 1 disables the region, so let's just OR with the
+	 * above to disable any.
+	 */
+
+	/* Note, if another user is trying to ioctl, they can't grab tracking,
+	 * and therefore can't grab either register mutex.
+	 * If another user is trying to close, they can't grab tracking either.
+	 */
+	regmap_update_bits(priv->parent->regmap, SCU2C, bits, bits);
+
+	/* If parent->readers is zero and open windows is 0, disable the
+	 * bridge.
+	 */
+	if (!open_regions && priv->parent->readers == 0)
+		aspeed_p2a_disable_bridge(priv->parent);
+
+	mutex_unlock(&priv->parent->tracking);
+
+	kfree(priv);
+
+	return 0;
+}
+
+static const struct file_operations aspeed_p2a_ctrl_fops = {
+	.owner = THIS_MODULE,
+	.mmap = aspeed_p2a_mmap,
+	.unlocked_ioctl = aspeed_p2a_ioctl,
+	.open = aspeed_p2a_open,
+	.release = aspeed_p2a_release,
+};
+
+/* The regions are controlled by SCU2C */
+static void aspeed_p2a_disable_all(struct aspeed_p2a_ctrl *p2a_ctrl)
+{
+	int i;
+	u32 value = 0;
+
+	for (i = 0; i < P2A_REGION_COUNT; i++)
+		value |= p2a_ctrl->config->regions[i].bit;
+
+	regmap_update_bits(p2a_ctrl->regmap, SCU2C, value, value);
+
+	/* Disable the bridge. */
+	aspeed_p2a_disable_bridge(p2a_ctrl);
+}
+
+static int aspeed_p2a_ctrl_probe(struct platform_device *pdev)
+{
+	struct aspeed_p2a_ctrl *misc_ctrl;
+	struct device *dev;
+	struct resource resm;
+	struct device_node *node;
+	int rc = 0;
+
+	dev = &pdev->dev;
+
+	misc_ctrl = devm_kzalloc(dev, sizeof(*misc_ctrl), GFP_KERNEL);
+	if (!misc_ctrl)
+		return -ENOMEM;
+
+	mutex_init(&misc_ctrl->tracking);
+
+	/* optional. */
+	node = of_parse_phandle(dev->of_node, "memory-region", 0);
+	if (node) {
+		rc = of_address_to_resource(node, 0, &resm);
+		of_node_put(node);
+		if (rc) {
+			dev_err(dev, "Couldn't address to resource for reserved memory\n");
+			return -ENODEV;
+		}
+
+		misc_ctrl->mem_size = resource_size(&resm);
+		misc_ctrl->mem_base = resm.start;
+	}
+
+	misc_ctrl->regmap = syscon_node_to_regmap(pdev->dev.parent->of_node);
+	if (IS_ERR(misc_ctrl->regmap)) {
+		dev_err(dev, "Couldn't get regmap\n");
+		return -ENODEV;
+	}
+
+	misc_ctrl->config = of_device_get_match_data(dev);
+
+	dev_set_drvdata(&pdev->dev, misc_ctrl);
+
+	aspeed_p2a_disable_all(misc_ctrl);
+
+	misc_ctrl->miscdev.minor = MISC_DYNAMIC_MINOR;
+	misc_ctrl->miscdev.name = DEVICE_NAME;
+	misc_ctrl->miscdev.fops = &aspeed_p2a_ctrl_fops;
+	misc_ctrl->miscdev.parent = dev;
+
+	rc = misc_register(&misc_ctrl->miscdev);
+	if (rc)
+		dev_err(dev, "Unable to register device\n");
+
+	return rc;
+}
+
+static int aspeed_p2a_ctrl_remove(struct platform_device *pdev)
+{
+	struct aspeed_p2a_ctrl *p2a_ctrl = dev_get_drvdata(&pdev->dev);
+
+	misc_deregister(&p2a_ctrl->miscdev);
+
+	return 0;
+}
+
+#define SCU2C_DRAM	BIT(25)
+#define SCU2C_SPI	BIT(24)
+#define SCU2C_SOC	BIT(23)
+#define SCU2C_FLASH	BIT(22)
+
+static const struct aspeed_p2a_model_data ast2400_model_data = {
+	.regions = {
+		{0x00000000, 0x17FFFFFF, SCU2C_FLASH},
+		{0x18000000, 0x1FFFFFFF, SCU2C_SOC},
+		{0x20000000, 0x2FFFFFFF, SCU2C_FLASH},
+		{0x30000000, 0x3FFFFFFF, SCU2C_SPI},
+		{0x40000000, 0x5FFFFFFF, SCU2C_DRAM},
+		{0x60000000, 0xFFFFFFFF, SCU2C_SOC},
+	}
+};
+
+static const struct aspeed_p2a_model_data ast2500_model_data = {
+	.regions = {
+		{0x00000000, 0x0FFFFFFF, SCU2C_FLASH},
+		{0x10000000, 0x1FFFFFFF, SCU2C_SOC},
+		{0x20000000, 0x3FFFFFFF, SCU2C_FLASH},
+		{0x40000000, 0x5FFFFFFF, SCU2C_SOC},
+		{0x60000000, 0x7FFFFFFF, SCU2C_SPI},
+		{0x80000000, 0xFFFFFFFF, SCU2C_DRAM},
+	}
+};
+
+static const struct of_device_id aspeed_p2a_ctrl_match[] = {
+	{ .compatible = "aspeed,ast2400-p2a-ctrl",
+	  .data = &ast2400_model_data },
+	{ .compatible = "aspeed,ast2500-p2a-ctrl",
+	  .data = &ast2500_model_data },
+	{ },
+};
+
+static struct platform_driver aspeed_p2a_ctrl_driver = {
+	.driver = {
+		.name		= DEVICE_NAME,
+		.of_match_table = aspeed_p2a_ctrl_match,
+	},
+	.probe = aspeed_p2a_ctrl_probe,
+	.remove = aspeed_p2a_ctrl_remove,
+};
+
+module_platform_driver(aspeed_p2a_ctrl_driver);
+
+MODULE_DEVICE_TABLE(of, aspeed_p2a_ctrl_match);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Patrick Venture <venture@google.com>");
+MODULE_DESCRIPTION("Control for aspeed 2400/2500 P2A VGA HOST to BMC mappings");
--- a/drivers/misc/cardreader/rts5260.c
+++ b/drivers/misc/cardreader/rts5260.c
@ -456,13 +456,13 @@ static void rts5260_pwr_saving_setting(struct rtsx_pcr *pcr)
 		pcr_dbg(pcr, "Set parameters for L1.2.");
 		rtsx_pci_write_register(pcr, PWR_GLOBAL_CTRL,
 					0xFF, PCIE_L1_2_EN);
-	rtsx_pci_write_register(pcr, RTS5260_DVCC_CTRL,
+		rtsx_pci_write_register(pcr, RTS5260_DVCC_CTRL,
 					RTS5260_DVCC_OCP_EN |
 					RTS5260_DVCC_OCP_CL_EN,
 					RTS5260_DVCC_OCP_EN |
 					RTS5260_DVCC_OCP_CL_EN);

-	rtsx_pci_write_register(pcr, PWR_FE_CTL,
+		rtsx_pci_write_register(pcr, PWR_FE_CTL,
 					0xFF, PCIE_L1_2_PD_FE_EN);
 	} else if (lss_l1_1) {
 		pcr_dbg(pcr, "Set parameters for L1.1.");
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/of_address.h>
 #include <linux/of.h>
+#include <linux/sort.h>
 #include <linux/of_platform.h>
 #include <linux/rpmsg.h>
 #include <linux/scatterlist.h>
@ -31,7 +32,7 @@
 #define FASTRPC_CTX_MAX (256)
 #define FASTRPC_INIT_HANDLE	1
 #define FASTRPC_CTXID_MASK (0xFF0)
-#define INIT_FILELEN_MAX (2 * 1024 * 1024)
+#define INIT_FILELEN_MAX (64 * 1024 * 1024)
 #define INIT_MEMLEN_MAX  (8 * 1024 * 1024)
 #define FASTRPC_DEVICE_NAME	"fastrpc"

@ -104,6 +105,15 @@ struct fastrpc_invoke_rsp {
 	int retval;		/* invoke return value */
 };

+struct fastrpc_buf_overlap {
+	u64 start;
+	u64 end;
+	int raix;
+	u64 mstart;
+	u64 mend;
+	u64 offset;
+};
+
 struct fastrpc_buf {
 	struct fastrpc_user *fl;
 	struct dma_buf *dmabuf;
@ -149,12 +159,14 @@ struct fastrpc_invoke_ctx {
 	struct kref refcount;
 	struct list_head node; /* list of ctxs */
 	struct completion work;
+	struct work_struct put_work;
 	struct fastrpc_msg msg;
 	struct fastrpc_user *fl;
 	struct fastrpc_remote_arg *rpra;
 	struct fastrpc_map **maps;
 	struct fastrpc_buf *buf;
 	struct fastrpc_invoke_args *args;
+	struct fastrpc_buf_overlap *olaps;
 	struct fastrpc_channel_ctx *cctx;
 };

@ -282,6 +294,7 @@ static void fastrpc_context_free(struct kref *ref)
 {
 	struct fastrpc_invoke_ctx *ctx;
 	struct fastrpc_channel_ctx *cctx;
+	unsigned long flags;
 	int i;

 	ctx = container_of(ref, struct fastrpc_invoke_ctx, refcount);
@ -293,11 +306,12 @@ static void fastrpc_context_free(struct kref *ref)
 	if (ctx->buf)
 		fastrpc_buf_free(ctx->buf);

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	kfree(ctx->maps);
+	kfree(ctx->olaps);
 	kfree(ctx);
 }

@ -311,12 +325,70 @@ static void fastrpc_context_put(struct fastrpc_invoke_ctx *ctx)
 	kref_put(&ctx->refcount, fastrpc_context_free);
 }

+static void fastrpc_context_put_wq(struct work_struct *work)
+{
+	struct fastrpc_invoke_ctx *ctx =
+			container_of(work, struct fastrpc_invoke_ctx, put_work);
+
+	fastrpc_context_put(ctx);
+}
+
+#define CMP(aa, bb) ((aa) == (bb) ? 0 : (aa) < (bb) ? -1 : 1)
+static int olaps_cmp(const void *a, const void *b)
+{
+	struct fastrpc_buf_overlap *pa = (struct fastrpc_buf_overlap *)a;
+	struct fastrpc_buf_overlap *pb = (struct fastrpc_buf_overlap *)b;
+	/* sort with lowest starting buffer first */
+	int st = CMP(pa->start, pb->start);
+	/* sort with highest ending buffer first */
+	int ed = CMP(pb->end, pa->end);
+
+	return st == 0 ? ed : st;
+}
+
+static void fastrpc_get_buff_overlaps(struct fastrpc_invoke_ctx *ctx)
+{
+	u64 max_end = 0;
+	int i;
+
+	for (i = 0; i < ctx->nbufs; ++i) {
+		ctx->olaps[i].start = ctx->args[i].ptr;
+		ctx->olaps[i].end = ctx->olaps[i].start + ctx->args[i].length;
+		ctx->olaps[i].raix = i;
+	}
+
+	sort(ctx->olaps, ctx->nbufs, sizeof(*ctx->olaps), olaps_cmp, NULL);
+
+	for (i = 0; i < ctx->nbufs; ++i) {
+		/* Falling inside previous range */
+		if (ctx->olaps[i].start < max_end) {
+			ctx->olaps[i].mstart = max_end;
+			ctx->olaps[i].mend = ctx->olaps[i].end;
+			ctx->olaps[i].offset = max_end - ctx->olaps[i].start;
+
+			if (ctx->olaps[i].end > max_end) {
+				max_end = ctx->olaps[i].end;
+			} else {
+				ctx->olaps[i].mend = 0;
+				ctx->olaps[i].mstart = 0;
+			}
+
+		} else  {
+			ctx->olaps[i].mend = ctx->olaps[i].end;
+			ctx->olaps[i].mstart = ctx->olaps[i].start;
+			ctx->olaps[i].offset = 0;
+			max_end = ctx->olaps[i].end;
+		}
+	}
+}
+
 static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
 			struct fastrpc_user *user, u32 kernel, u32 sc,
 			struct fastrpc_invoke_args *args)
 {
 	struct fastrpc_channel_ctx *cctx = user->cctx;
 	struct fastrpc_invoke_ctx *ctx = NULL;
+	unsigned long flags;
 	int ret;

 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@ -336,7 +408,15 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
 			kfree(ctx);
 			return ERR_PTR(-ENOMEM);
 		}
+		ctx->olaps = kcalloc(ctx->nscalars,
+				    sizeof(*ctx->olaps), GFP_KERNEL);
+		if (!ctx->olaps) {
+			kfree(ctx->maps);
+			kfree(ctx);
+			return ERR_PTR(-ENOMEM);
+		}
 		ctx->args = args;
+		fastrpc_get_buff_overlaps(ctx);
 	}

 	ctx->sc = sc;
@ -345,20 +425,21 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
 	ctx->tgid = user->tgid;
 	ctx->cctx = cctx;
 	init_completion(&ctx->work);
+	INIT_WORK(&ctx->put_work, fastrpc_context_put_wq);

 	spin_lock(&user->lock);
 	list_add_tail(&ctx->node, &user->pending);
 	spin_unlock(&user->lock);

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	ret = idr_alloc_cyclic(&cctx->ctx_idr, ctx, 1,
 			       FASTRPC_CTX_MAX, GFP_ATOMIC);
 	if (ret < 0) {
-		spin_unlock(&cctx->lock);
+		spin_unlock_irqrestore(&cctx->lock, flags);
 		goto err_idr;
 	}
 	ctx->ctxid = ret << 4;
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	kref_init(&ctx->refcount);

@ -368,6 +449,7 @@ err_idr:
 	list_del(&ctx->node);
 	spin_unlock(&user->lock);
 	kfree(ctx->maps);
+	kfree(ctx->olaps);
 	kfree(ctx);

 	return ERR_PTR(ret);
@ -586,8 +668,11 @@ static u64 fastrpc_get_payload_size(struct fastrpc_invoke_ctx *ctx, int metalen)
 	size = ALIGN(metalen, FASTRPC_ALIGN);
 	for (i = 0; i < ctx->nscalars; i++) {
 		if (ctx->args[i].fd == 0 || ctx->args[i].fd == -1) {
-			size = ALIGN(size, FASTRPC_ALIGN);
-			size += ctx->args[i].length;
+
+			if (ctx->olaps[i].offset == 0)
+				size = ALIGN(size, FASTRPC_ALIGN);
+
+			size += (ctx->olaps[i].mend - ctx->olaps[i].mstart);
 		}
 	}

@ -625,12 +710,12 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
 	struct fastrpc_remote_arg *rpra;
 	struct fastrpc_invoke_buf *list;
 	struct fastrpc_phy_page *pages;
-	int inbufs, i, err = 0;
-	u64 rlen, pkt_size;
+	int inbufs, i, oix, err = 0;
+	u64 len, rlen, pkt_size;
+	u64 pg_start, pg_end;
 	uintptr_t args;
 	int metalen;

-
 	inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
 	metalen = fastrpc_get_meta_size(ctx);
 	pkt_size = fastrpc_get_payload_size(ctx, metalen);
@ -653,8 +738,11 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
 	rlen = pkt_size - metalen;
 	ctx->rpra = rpra;

-	for (i = 0; i < ctx->nbufs; ++i) {
-		u64 len = ctx->args[i].length;
+	for (oix = 0; oix < ctx->nbufs; ++oix) {
+		int mlen;
+
+		i = ctx->olaps[oix].raix;
+		len = ctx->args[i].length;

 		rpra[i].pv = 0;
 		rpra[i].len = len;
@ -664,22 +752,45 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
 		if (!len)
 			continue;

-		pages[i].size = roundup(len, PAGE_SIZE);
-
 		if (ctx->maps[i]) {
+			struct vm_area_struct *vma = NULL;
+
 			rpra[i].pv = (u64) ctx->args[i].ptr;
 			pages[i].addr = ctx->maps[i]->phys;
+
+			vma = find_vma(current->mm, ctx->args[i].ptr);
+			if (vma)
+				pages[i].addr += ctx->args[i].ptr -
+						 vma->vm_start;
+
+			pg_start = (ctx->args[i].ptr & PAGE_MASK) >> PAGE_SHIFT;
+			pg_end = ((ctx->args[i].ptr + len - 1) & PAGE_MASK) >>
+				  PAGE_SHIFT;
+			pages[i].size = (pg_end - pg_start + 1) * PAGE_SIZE;
+
 		} else {
-			rlen -= ALIGN(args, FASTRPC_ALIGN) - args;
-			args = ALIGN(args, FASTRPC_ALIGN);
-			if (rlen < len)
+
+			if (ctx->olaps[oix].offset == 0) {
+				rlen -= ALIGN(args, FASTRPC_ALIGN) - args;
+				args = ALIGN(args, FASTRPC_ALIGN);
+			}
+
+			mlen = ctx->olaps[oix].mend - ctx->olaps[oix].mstart;
+
+			if (rlen < mlen)
 				goto bail;

-			rpra[i].pv = args;
-			pages[i].addr = ctx->buf->phys + (pkt_size - rlen);
+			rpra[i].pv = args - ctx->olaps[oix].offset;
+			pages[i].addr = ctx->buf->phys -
+					ctx->olaps[oix].offset +
+					(pkt_size - rlen);
 			pages[i].addr = pages[i].addr &	PAGE_MASK;
-			args = args + len;
-			rlen -= len;
+
+			pg_start = (args & PAGE_MASK) >> PAGE_SHIFT;
+			pg_end = ((args + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
+			pages[i].size = (pg_end - pg_start + 1) * PAGE_SIZE;
+			args = args + mlen;
+			rlen -= mlen;
 		}

 		if (i < inbufs && !ctx->maps[i]) {
@ -782,6 +893,9 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl,  u32 kernel,
 		if (err)
 			goto bail;
 	}
+
+	/* make sure that all CPU memory writes are seen by DSP */
+	dma_wmb();
 	/* Send invoke buffer to remote dsp */
 	err = fastrpc_invoke_send(fl->sctx, ctx, kernel, handle);
 	if (err)
@ -798,6 +912,8 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl,  u32 kernel,
 		goto bail;

 	if (ctx->nscalars) {
+		/* make sure that all memory writes by DSP are seen by CPU */
+		dma_rmb();
 		/* populate all the output buffers with results */
 		err = fastrpc_put_args(ctx, kernel);
 		if (err)
@ -843,12 +959,12 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,

 	if (copy_from_user(&init, argp, sizeof(init))) {
 		err = -EFAULT;
-		goto bail;
+		goto err;
 	}

 	if (init.filelen > INIT_FILELEN_MAX) {
 		err = -EINVAL;
-		goto bail;
+		goto err;
 	}

 	inbuf.pgid = fl->tgid;
@ -862,17 +978,15 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
 	if (init.filelen && init.filefd) {
 		err = fastrpc_map_create(fl, init.filefd, init.filelen, &map);
 		if (err)
-			goto bail;
+			goto err;
 	}

 	memlen = ALIGN(max(INIT_FILELEN_MAX, (int)init.filelen * 4),
 		       1024 * 1024);
 	err = fastrpc_buf_alloc(fl, fl->sctx->dev, memlen,
 				&imem);
-	if (err) {
-		fastrpc_map_put(map);
-		goto bail;
-	}
+	if (err)
+		goto err_alloc;

 	fl->init_mem = imem;
 	args[0].ptr = (u64)(uintptr_t)&inbuf;
@ -908,13 +1022,24 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,

 	err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
 				      sc, args);
+	if (err)
+		goto err_invoke;

-	if (err) {
+	kfree(args);
+
+	return 0;
+
+err_invoke:
+	fl->init_mem = NULL;
+	fastrpc_buf_free(imem);
+err_alloc:
+	if (map) {
+		spin_lock(&fl->lock);
+		list_del(&map->node);
+		spin_unlock(&fl->lock);
 		fastrpc_map_put(map);
-		fastrpc_buf_free(imem);
 	}
-
-bail:
+err:
 	kfree(args);

 	return err;
@ -924,9 +1049,10 @@ static struct fastrpc_session_ctx *fastrpc_session_alloc(
 					struct fastrpc_channel_ctx *cctx)
 {
 	struct fastrpc_session_ctx *session = NULL;
+	unsigned long flags;
 	int i;

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	for (i = 0; i < cctx->sesscount; i++) {
 		if (!cctx->session[i].used && cctx->session[i].valid) {
 			cctx->session[i].used = true;
@ -934,7 +1060,7 @@ static struct fastrpc_session_ctx *fastrpc_session_alloc(
 			break;
 		}
 	}
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	return session;
 }
@ -942,9 +1068,11 @@ static struct fastrpc_session_ctx *fastrpc_session_alloc(
 static void fastrpc_session_free(struct fastrpc_channel_ctx *cctx,
 				 struct fastrpc_session_ctx *session)
 {
-	spin_lock(&cctx->lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&cctx->lock, flags);
 	session->used = false;
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);
 }

 static int fastrpc_release_current_dsp_process(struct fastrpc_user *fl)
@ -970,12 +1098,13 @@ static int fastrpc_device_release(struct inode *inode, struct file *file)
 	struct fastrpc_channel_ctx *cctx = fl->cctx;
 	struct fastrpc_invoke_ctx *ctx, *n;
 	struct fastrpc_map *map, *m;
+	unsigned long flags;

 	fastrpc_release_current_dsp_process(fl);

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	list_del(&fl->user);
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	if (fl->init_mem)
 		fastrpc_buf_free(fl->init_mem);
@ -1003,6 +1132,7 @@ static int fastrpc_device_open(struct inode *inode, struct file *filp)
 {
 	struct fastrpc_channel_ctx *cctx = miscdev_to_cctx(filp->private_data);
 	struct fastrpc_user *fl = NULL;
+	unsigned long flags;

 	fl = kzalloc(sizeof(*fl), GFP_KERNEL);
 	if (!fl)
@ -1026,9 +1156,9 @@ static int fastrpc_device_open(struct inode *inode, struct file *filp)
 		return -EBUSY;
 	}

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	list_add_tail(&fl->user, &cctx->users);
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	return 0;
 }
@ -1184,6 +1314,7 @@ static int fastrpc_cb_probe(struct platform_device *pdev)
 	struct fastrpc_session_ctx *sess;
 	struct device *dev = &pdev->dev;
 	int i, sessions = 0;
+	unsigned long flags;
 	int rc;

 	cctx = dev_get_drvdata(dev->parent);
@ -1192,7 +1323,7 @@ static int fastrpc_cb_probe(struct platform_device *pdev)

 	of_property_read_u32(dev->of_node, "qcom,nsessions", &sessions);

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	sess = &cctx->session[cctx->sesscount];
 	sess->used = false;
 	sess->valid = true;
@ -1213,7 +1344,7 @@ static int fastrpc_cb_probe(struct platform_device *pdev)
 		}
 	}
 	cctx->sesscount++;
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);
 	rc = dma_set_mask(dev, DMA_BIT_MASK(32));
 	if (rc) {
 		dev_err(dev, "32-bit DMA enable failed\n");
@ -1227,16 +1358,17 @@ static int fastrpc_cb_remove(struct platform_device *pdev)
 {
 	struct fastrpc_channel_ctx *cctx = dev_get_drvdata(pdev->dev.parent);
 	struct fastrpc_session_ctx *sess = dev_get_drvdata(&pdev->dev);
+	unsigned long flags;
 	int i;

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	for (i = 1; i < FASTRPC_MAX_SESSIONS; i++) {
 		if (cctx->session[i].sid == sess->sid) {
 			cctx->session[i].valid = false;
 			cctx->sesscount--;
 		}
 	}
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	return 0;
 }
@ -1318,11 +1450,12 @@ static void fastrpc_rpmsg_remove(struct rpmsg_device *rpdev)
 {
 	struct fastrpc_channel_ctx *cctx = dev_get_drvdata(&rpdev->dev);
 	struct fastrpc_user *user;
+	unsigned long flags;

-	spin_lock(&cctx->lock);
+	spin_lock_irqsave(&cctx->lock, flags);
 	list_for_each_entry(user, &cctx->users, user)
 		fastrpc_notify_users(user);
-	spin_unlock(&cctx->lock);
+	spin_unlock_irqrestore(&cctx->lock, flags);

 	misc_deregister(&cctx->miscdev);
 	of_platform_depopulate(&rpdev->dev);
@ -1354,7 +1487,13 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,

 	ctx->retval = rsp->retval;
 	complete(&ctx->work);
-	fastrpc_context_put(ctx);
+
+	/*
+	 * The DMA buffer associated with the context cannot be freed in
+	 * interrupt context so schedule it through a worker thread to
+	 * avoid a kernel BUG.
+	 */
+	schedule_work(&ctx->put_work);

 	return 0;
 }
--- a/drivers/misc/genwqe/card_debugfs.c
+++ b/drivers/misc/genwqe/card_debugfs.c
@ -227,7 +227,7 @@ static int ddcb_info_show(struct seq_file *s, void *unused)
 	seq_puts(s, "DDCB QUEUE:\n");
 	seq_printf(s, "  ddcb_max:            %d\n"
 		   "  ddcb_daddr:          %016llx - %016llx\n"
-		   "  ddcb_vaddr:          %016llx\n"
+		   "  ddcb_vaddr:          %p\n"
 		   "  ddcbs_in_flight:     %u\n"
 		   "  ddcbs_max_in_flight: %u\n"
 		   "  ddcbs_completed:     %u\n"
@ -237,7 +237,7 @@ static int ddcb_info_show(struct seq_file *s, void *unused)
 		   queue->ddcb_max, (long long)queue->ddcb_daddr,
 		   (long long)queue->ddcb_daddr +
 		   (queue->ddcb_max * DDCB_LENGTH),
-		   (long long)queue->ddcb_vaddr, queue->ddcbs_in_flight,
+		   queue->ddcb_vaddr, queue->ddcbs_in_flight,
 		   queue->ddcbs_max_in_flight, queue->ddcbs_completed,
 		   queue->return_on_busy, queue->wait_on_busy,
 		   cd->irqs_processed);
--- a/drivers/misc/habanalabs/Makefile
+++ b/drivers/misc/habanalabs/Makefile
@ -6,7 +6,7 @@ obj-m	:= habanalabs.o

 habanalabs-y := habanalabs_drv.o device.o context.o asid.o habanalabs_ioctl.o \
 		command_buffer.o hw_queue.o irq.o sysfs.o hwmon.o memory.o \
-		command_submission.o mmu.o
+		command_submission.o mmu.o firmware_if.o pci.o

 habanalabs-$(CONFIG_DEBUG_FS) += debugfs.o

--- a/drivers/misc/habanalabs/command_buffer.c
+++ b/drivers/misc/habanalabs/command_buffer.c
@ -13,7 +13,7 @@

 static void cb_fini(struct hl_device *hdev, struct hl_cb *cb)
 {
-	hdev->asic_funcs->dma_free_coherent(hdev, cb->size,
+	hdev->asic_funcs->asic_dma_free_coherent(hdev, cb->size,
 			(void *) (uintptr_t) cb->kernel_address,
 			cb->bus_address);
 	kfree(cb);
@ -66,10 +66,10 @@ static struct hl_cb *hl_cb_alloc(struct hl_device *hdev, u32 cb_size,
 		return NULL;

 	if (ctx_id == HL_KERNEL_ASID_ID)
-		p = hdev->asic_funcs->dma_alloc_coherent(hdev, cb_size,
+		p = hdev->asic_funcs->asic_dma_alloc_coherent(hdev, cb_size,
 						&cb->bus_address, GFP_ATOMIC);
 	else
-		p = hdev->asic_funcs->dma_alloc_coherent(hdev, cb_size,
+		p = hdev->asic_funcs->asic_dma_alloc_coherent(hdev, cb_size,
 						&cb->bus_address,
 						GFP_USER | __GFP_ZERO);
 	if (!p) {
@ -214,6 +214,13 @@ int hl_cb_ioctl(struct hl_fpriv *hpriv, void *data)
 	u64 handle;
 	int rc;

+	if (hl_device_disabled_or_in_reset(hdev)) {
+		dev_warn_ratelimited(hdev->dev,
+			"Device is %s. Can't execute CB IOCTL\n",
+			atomic_read(&hdev->in_reset) ? "in_reset" : "disabled");
+		return -EBUSY;
+	}
+
 	switch (args->in.op) {
 	case HL_CB_OP_CREATE:
 		rc = hl_cb_create(hdev, &hpriv->cb_mgr, args->in.cb_size,
--- a/drivers/misc/habanalabs/command_submission.c
+++ b/drivers/misc/habanalabs/command_submission.c
@ -93,7 +93,6 @@ static int cs_parser(struct hl_fpriv *hpriv, struct hl_cs_job *job)
 	parser.user_cb_size = job->user_cb_size;
 	parser.ext_queue = job->ext_queue;
 	job->patched_cb = NULL;
-	parser.use_virt_addr = hdev->mmu_enable;

 	rc = hdev->asic_funcs->cs_parser(hdev, &parser);
 	if (job->ext_queue) {
@ -261,7 +260,8 @@ static void cs_timedout(struct work_struct *work)
 	ctx_asid = cs->ctx->asid;

 	/* TODO: add information about last signaled seq and last emitted seq */
-	dev_err(hdev->dev, "CS %d.%llu got stuck!\n", ctx_asid, cs->sequence);
+	dev_err(hdev->dev, "User %d command submission %llu got stuck!\n",
+		ctx_asid, cs->sequence);

 	cs_put(cs);

@ -600,20 +600,20 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
 	void __user *chunks;
 	u32 num_chunks;
 	u64 cs_seq = ULONG_MAX;
-	int rc, do_restore;
+	int rc, do_ctx_switch;
 	bool need_soft_reset = false;

 	if (hl_device_disabled_or_in_reset(hdev)) {
-		dev_warn(hdev->dev,
+		dev_warn_ratelimited(hdev->dev,
 			"Device is %s. Can't submit new CS\n",
 			atomic_read(&hdev->in_reset) ? "in_reset" : "disabled");
 		rc = -EBUSY;
 		goto out;
 	}

-	do_restore = atomic_cmpxchg(&ctx->thread_restore_token, 1, 0);
+	do_ctx_switch = atomic_cmpxchg(&ctx->thread_ctx_switch_token, 1, 0);

-	if (do_restore || (args->in.cs_flags & HL_CS_FLAGS_FORCE_RESTORE)) {
+	if (do_ctx_switch || (args->in.cs_flags & HL_CS_FLAGS_FORCE_RESTORE)) {
 		long ret;

 		chunks = (void __user *)(uintptr_t)args->in.chunks_restore;
@ -621,7 +621,7 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)

 		mutex_lock(&hpriv->restore_phase_mutex);

-		if (do_restore) {
+		if (do_ctx_switch) {
 			rc = hdev->asic_funcs->context_switch(hdev, ctx->asid);
 			if (rc) {
 				dev_err_ratelimited(hdev->dev,
@ -677,18 +677,18 @@ int hl_cs_ioctl(struct hl_fpriv *hpriv, void *data)
 			}
 		}

-		ctx->thread_restore_wait_token = 1;
-	} else if (!ctx->thread_restore_wait_token) {
+		ctx->thread_ctx_switch_wait_token = 1;
+	} else if (!ctx->thread_ctx_switch_wait_token) {
 		u32 tmp;

 		rc = hl_poll_timeout_memory(hdev,
-			(u64) (uintptr_t) &ctx->thread_restore_wait_token,
+			(u64) (uintptr_t) &ctx->thread_ctx_switch_wait_token,
 			jiffies_to_usecs(hdev->timeout_jiffies),
 			&tmp);

 		if (rc || !tmp) {
 			dev_err(hdev->dev,
-				"restore phase hasn't finished in time\n");
+				"context switch phase didn't finish in time\n");
 			rc = -ETIMEDOUT;
 			goto out;
 		}
--- a/drivers/misc/habanalabs/context.c
+++ b/drivers/misc/habanalabs/context.c
@ -106,8 +106,8 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)

 	ctx->cs_sequence = 1;
 	spin_lock_init(&ctx->cs_lock);
-	atomic_set(&ctx->thread_restore_token, 1);
-	ctx->thread_restore_wait_token = 0;
+	atomic_set(&ctx->thread_ctx_switch_token, 1);
+	ctx->thread_ctx_switch_wait_token = 0;

 	if (is_kernel_ctx) {
 		ctx->asid = HL_KERNEL_ASID_ID; /* KMD gets ASID 0 */
--- a/drivers/misc/habanalabs/debugfs.c
+++ b/drivers/misc/habanalabs/debugfs.c
@ -505,22 +505,97 @@ err:
 	return -EINVAL;
 }

+static int device_va_to_pa(struct hl_device *hdev, u64 virt_addr,
+				u64 *phys_addr)
+{
+	struct hl_ctx *ctx = hdev->user_ctx;
+	u64 hop_addr, hop_pte_addr, hop_pte;
+	int rc = 0;
+
+	if (!ctx) {
+		dev_err(hdev->dev, "no ctx available\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&ctx->mmu_lock);
+
+	/* hop 0 */
+	hop_addr = get_hop0_addr(ctx);
+	hop_pte_addr = get_hop0_pte_addr(ctx, hop_addr, virt_addr);
+	hop_pte = hdev->asic_funcs->read_pte(hdev, hop_pte_addr);
+
+	/* hop 1 */
+	hop_addr = get_next_hop_addr(hop_pte);
+	if (hop_addr == ULLONG_MAX)
+		goto not_mapped;
+	hop_pte_addr = get_hop1_pte_addr(ctx, hop_addr, virt_addr);
+	hop_pte = hdev->asic_funcs->read_pte(hdev, hop_pte_addr);
+
+	/* hop 2 */
+	hop_addr = get_next_hop_addr(hop_pte);
+	if (hop_addr == ULLONG_MAX)
+		goto not_mapped;
+	hop_pte_addr = get_hop2_pte_addr(ctx, hop_addr, virt_addr);
+	hop_pte = hdev->asic_funcs->read_pte(hdev, hop_pte_addr);
+
+	/* hop 3 */
+	hop_addr = get_next_hop_addr(hop_pte);
+	if (hop_addr == ULLONG_MAX)
+		goto not_mapped;
+	hop_pte_addr = get_hop3_pte_addr(ctx, hop_addr, virt_addr);
+	hop_pte = hdev->asic_funcs->read_pte(hdev, hop_pte_addr);
+
+	if (!(hop_pte & LAST_MASK)) {
+		/* hop 4 */
+		hop_addr = get_next_hop_addr(hop_pte);
+		if (hop_addr == ULLONG_MAX)
+			goto not_mapped;
+		hop_pte_addr = get_hop4_pte_addr(ctx, hop_addr, virt_addr);
+		hop_pte = hdev->asic_funcs->read_pte(hdev, hop_pte_addr);
+	}
+
+	if (!(hop_pte & PAGE_PRESENT_MASK))
+		goto not_mapped;
+
+	*phys_addr = (hop_pte & PTE_PHYS_ADDR_MASK) | (virt_addr & OFFSET_MASK);
+
+	goto out;
+
+not_mapped:
+	dev_err(hdev->dev, "virt addr 0x%llx is not mapped to phys addr\n",
+			virt_addr);
+	rc = -EINVAL;
+out:
+	mutex_unlock(&ctx->mmu_lock);
+	return rc;
+}
+
 static ssize_t hl_data_read32(struct file *f, char __user *buf,
 					size_t count, loff_t *ppos)
 {
 	struct hl_dbg_device_entry *entry = file_inode(f)->i_private;
 	struct hl_device *hdev = entry->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
 	char tmp_buf[32];
+	u64 addr = entry->addr;
 	u32 val;
 	ssize_t rc;

 	if (*ppos)
 		return 0;

-	rc = hdev->asic_funcs->debugfs_read32(hdev, entry->addr, &val);
+	if (addr >= prop->va_space_dram_start_address &&
+			addr < prop->va_space_dram_end_address &&
+			hdev->mmu_enable &&
+			hdev->dram_supports_virtual_memory) {
+		rc = device_va_to_pa(hdev, entry->addr, &addr);
+		if (rc)
+			return rc;
+	}
+
+	rc = hdev->asic_funcs->debugfs_read32(hdev, addr, &val);
 	if (rc) {
-		dev_err(hdev->dev, "Failed to read from 0x%010llx\n",
-			entry->addr);
+		dev_err(hdev->dev, "Failed to read from 0x%010llx\n", addr);
 		return rc;
 	}

@ -536,6 +611,8 @@ static ssize_t hl_data_write32(struct file *f, const char __user *buf,
 {
 	struct hl_dbg_device_entry *entry = file_inode(f)->i_private;
 	struct hl_device *hdev = entry->hdev;
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 addr = entry->addr;
 	u32 value;
 	ssize_t rc;

@ -543,10 +620,19 @@ static ssize_t hl_data_write32(struct file *f, const char __user *buf,
 	if (rc)
 		return rc;

-	rc = hdev->asic_funcs->debugfs_write32(hdev, entry->addr, value);
+	if (addr >= prop->va_space_dram_start_address &&
+			addr < prop->va_space_dram_end_address &&
+			hdev->mmu_enable &&
+			hdev->dram_supports_virtual_memory) {
+		rc = device_va_to_pa(hdev, entry->addr, &addr);
+		if (rc)
+			return rc;
+	}
+
+	rc = hdev->asic_funcs->debugfs_write32(hdev, addr, value);
 	if (rc) {
 		dev_err(hdev->dev, "Failed to write 0x%08x to 0x%010llx\n",
-			value, entry->addr);
+			value, addr);
 		return rc;
 	}

--- a/drivers/misc/habanalabs/device.c
+++ b/drivers/misc/habanalabs/device.c
@ -5,11 +5,14 @@
 * All Rights Reserved.
 */

+#define pr_fmt(fmt)			"habanalabs: " fmt
+
 #include "habanalabs.h"

 #include <linux/pci.h>
 #include <linux/sched/signal.h>
 #include <linux/hwmon.h>
+#include <uapi/misc/habanalabs.h>

 #define HL_PLDM_PENDING_RESET_PER_SEC	(HL_PENDING_RESET_PER_SEC * 10)

@ -21,6 +24,20 @@ bool hl_device_disabled_or_in_reset(struct hl_device *hdev)
 		return false;
 }

+enum hl_device_status hl_device_status(struct hl_device *hdev)
+{
+	enum hl_device_status status;
+
+	if (hdev->disabled)
+		status = HL_DEVICE_STATUS_MALFUNCTION;
+	else if (atomic_read(&hdev->in_reset))
+		status = HL_DEVICE_STATUS_IN_RESET;
+	else
+		status = HL_DEVICE_STATUS_OPERATIONAL;
+
+	return status;
+};
+
 static void hpriv_release(struct kref *ref)
 {
 	struct hl_fpriv *hpriv;
@ -498,11 +515,8 @@ disable_device:
 	return rc;
 }

-static void hl_device_hard_reset_pending(struct work_struct *work)
+static void device_kill_open_processes(struct hl_device *hdev)
 {
-	struct hl_device_reset_work *device_reset_work =
-		container_of(work, struct hl_device_reset_work, reset_work);
-	struct hl_device *hdev = device_reset_work->hdev;
 	u16 pending_total, pending_cnt;
 	struct task_struct *task = NULL;

@ -537,6 +551,12 @@ static void hl_device_hard_reset_pending(struct work_struct *work)
 		}
 	}

+	/* We killed the open users, but because the driver cleans up after the
+	 * user contexts are closed (e.g. mmu mappings), we need to wait again
+	 * to make sure the cleaning phase is finished before continuing with
+	 * the reset
+	 */
+
 	pending_cnt = pending_total;

 	while ((atomic_read(&hdev->fd_open_cnt)) && (pending_cnt)) {
@ -552,6 +572,16 @@ static void hl_device_hard_reset_pending(struct work_struct *work)

 	mutex_unlock(&hdev->fd_open_cnt_lock);

+}
+
+static void device_hard_reset_pending(struct work_struct *work)
+{
+	struct hl_device_reset_work *device_reset_work =
+		container_of(work, struct hl_device_reset_work, reset_work);
+	struct hl_device *hdev = device_reset_work->hdev;
+
+	device_kill_open_processes(hdev);
+
 	hl_device_reset(hdev, true, true);

 	kfree(device_reset_work);
@ -613,6 +643,8 @@ again:
 	if ((hard_reset) && (!from_hard_reset_thread)) {
 		struct hl_device_reset_work *device_reset_work;

+		hdev->hard_reset_pending = true;
+
 		if (!hdev->pdev) {
 			dev_err(hdev->dev,
 				"Reset action is NOT supported in simulator\n");
@ -620,8 +652,6 @@ again:
 			goto out_err;
 		}

-		hdev->hard_reset_pending = true;
-
 		device_reset_work = kzalloc(sizeof(*device_reset_work),
 						GFP_ATOMIC);
 		if (!device_reset_work) {
@ -635,7 +665,7 @@ again:
 		 * from a dedicated work
 		 */
 		INIT_WORK(&device_reset_work->reset_work,
-				hl_device_hard_reset_pending);
+				device_hard_reset_pending);
 		device_reset_work->hdev = hdev;
 		schedule_work(&device_reset_work->reset_work);

@ -663,17 +693,9 @@ again:
 	/* Go over all the queues, release all CS and their jobs */
 	hl_cs_rollback_all(hdev);

-	if (hard_reset) {
-		/* Release kernel context */
-		if (hl_ctx_put(hdev->kernel_ctx) != 1) {
-			dev_err(hdev->dev,
-				"kernel ctx is alive during hard reset\n");
-			rc = -EBUSY;
-			goto out_err;
-		}
-
+	/* Release kernel context */
+	if ((hard_reset) && (hl_ctx_put(hdev->kernel_ctx) == 1))
 		hdev->kernel_ctx = NULL;
-	}

 	/* Reset the H/W. It will be in idle state after this returns */
 	hdev->asic_funcs->hw_fini(hdev, hard_reset);
@ -688,16 +710,24 @@ again:
 	for (i = 0 ; i < hdev->asic_prop.completion_queues_count ; i++)
 		hl_cq_reset(hdev, &hdev->completion_queue[i]);

-	/* Make sure the setup phase for the user context will run again */
+	/* Make sure the context switch phase will run again */
 	if (hdev->user_ctx) {
-		atomic_set(&hdev->user_ctx->thread_restore_token, 1);
-		hdev->user_ctx->thread_restore_wait_token = 0;
+		atomic_set(&hdev->user_ctx->thread_ctx_switch_token, 1);
+		hdev->user_ctx->thread_ctx_switch_wait_token = 0;
 	}

 	/* Finished tear-down, starting to re-initialize */

 	if (hard_reset) {
 		hdev->device_cpu_disabled = false;
+		hdev->hard_reset_pending = false;
+
+		if (hdev->kernel_ctx) {
+			dev_crit(hdev->dev,
+				"kernel ctx was alive during hard reset, something is terribly wrong\n");
+			rc = -EBUSY;
+			goto out_err;
+		}

 		/* Allocate the kernel context */
 		hdev->kernel_ctx = kzalloc(sizeof(*hdev->kernel_ctx),
@ -752,8 +782,6 @@ again:
 		}

 		hl_set_max_power(hdev, hdev->max_power);
-
-		hdev->hard_reset_pending = false;
 	} else {
 		rc = hdev->asic_funcs->soft_reset_late_init(hdev);
 		if (rc) {
@ -1030,11 +1058,22 @@ void hl_device_fini(struct hl_device *hdev)
 			WARN(1, "Failed to remove device because reset function did not finish\n");
 			return;
 		}
-	};
+	}

 	/* Mark device as disabled */
 	hdev->disabled = true;

+	/*
+	 * Flush anyone that is inside the critical section of enqueue
+	 * jobs to the H/W
+	 */
+	hdev->asic_funcs->hw_queues_lock(hdev);
+	hdev->asic_funcs->hw_queues_unlock(hdev);
+
+	hdev->hard_reset_pending = true;
+
+	device_kill_open_processes(hdev);
+
 	hl_hwmon_fini(hdev);

 	device_late_fini(hdev);
@ -1108,7 +1147,13 @@ int hl_poll_timeout_memory(struct hl_device *hdev, u64 addr,
 	 * either by the direct access of the device or by another core
 	 */
 	u32 *paddr = (u32 *) (uintptr_t) addr;
-	ktime_t timeout = ktime_add_us(ktime_get(), timeout_us);
+	ktime_t timeout;
+
+	/* timeout should be longer when working with simulator */
+	if (!hdev->pdev)
+		timeout_us *= 10;
+
+	timeout = ktime_add_us(ktime_get(), timeout_us);

 	might_sleep();

--- a/drivers/misc/habanalabs/firmware_if.c
+++ b/drivers/misc/habanalabs/firmware_if.c
@ -0,0 +1,322 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2016-2019 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "habanalabs.h"
+
+#include <linux/firmware.h>
+#include <linux/genalloc.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+
+/**
+ * hl_fw_push_fw_to_device() - Push FW code to device.
+ * @hdev: pointer to hl_device structure.
+ *
+ * Copy fw code from firmware file to device memory.
+ *
+ * Return: 0 on success, non-zero for failure.
+ */
+int hl_fw_push_fw_to_device(struct hl_device *hdev, const char *fw_name,
+				void __iomem *dst)
+{
+	const struct firmware *fw;
+	const u64 *fw_data;
+	size_t fw_size, i;
+	int rc;
+
+	rc = request_firmware(&fw, fw_name, hdev->dev);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to request %s\n", fw_name);
+		goto out;
+	}
+
+	fw_size = fw->size;
+	if ((fw_size % 4) != 0) {
+		dev_err(hdev->dev, "illegal %s firmware size %zu\n",
+			fw_name, fw_size);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	dev_dbg(hdev->dev, "%s firmware size == %zu\n", fw_name, fw_size);
+
+	fw_data = (const u64 *) fw->data;
+
+	if ((fw->size % 8) != 0)
+		fw_size -= 8;
+
+	for (i = 0 ; i < fw_size ; i += 8, fw_data++, dst += 8) {
+		if (!(i & (0x80000 - 1))) {
+			dev_dbg(hdev->dev,
+				"copied so far %zu out of %zu for %s firmware",
+				i, fw_size, fw_name);
+			usleep_range(20, 100);
+		}
+
+		writeq(*fw_data, dst);
+	}
+
+	if ((fw->size % 8) != 0)
+		writel(*(const u32 *) fw_data, dst);
+
+out:
+	release_firmware(fw);
+	return rc;
+}
+
+int hl_fw_send_pci_access_msg(struct hl_device *hdev, u32 opcode)
+{
+	struct armcp_packet pkt = {};
+
+	pkt.ctl = cpu_to_le32(opcode << ARMCP_PKT_CTL_OPCODE_SHIFT);
+
+	return hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt,
+				sizeof(pkt), HL_DEVICE_TIMEOUT_USEC, NULL);
+}
+
+int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
+				u16 len, u32 timeout, long *result)
+{
+	struct armcp_packet *pkt;
+	dma_addr_t pkt_dma_addr;
+	u32 tmp;
+	int rc = 0;
+
+	if (len > HL_CPU_CB_SIZE) {
+		dev_err(hdev->dev, "Invalid CPU message size of %d bytes\n",
+			len);
+		return -ENOMEM;
+	}
+
+	pkt = hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev, len,
+								&pkt_dma_addr);
+	if (!pkt) {
+		dev_err(hdev->dev,
+			"Failed to allocate DMA memory for packet to CPU\n");
+		return -ENOMEM;
+	}
+
+	memcpy(pkt, msg, len);
+
+	mutex_lock(&hdev->send_cpu_message_lock);
+
+	if (hdev->disabled)
+		goto out;
+
+	if (hdev->device_cpu_disabled) {
+		rc = -EIO;
+		goto out;
+	}
+
+	rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, len, pkt_dma_addr);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to send CB on CPU PQ (%d)\n", rc);
+		goto out;
+	}
+
+	rc = hl_poll_timeout_memory(hdev, (u64) (uintptr_t) &pkt->fence,
+					timeout, &tmp);
+
+	hl_hw_queue_inc_ci_kernel(hdev, hw_queue_id);
+
+	if (rc == -ETIMEDOUT) {
+		dev_err(hdev->dev, "Timeout while waiting for device CPU\n");
+		hdev->device_cpu_disabled = true;
+		goto out;
+	}
+
+	if (tmp == ARMCP_PACKET_FENCE_VAL) {
+		u32 ctl = le32_to_cpu(pkt->ctl);
+
+		rc = (ctl & ARMCP_PKT_CTL_RC_MASK) >> ARMCP_PKT_CTL_RC_SHIFT;
+		if (rc) {
+			dev_err(hdev->dev,
+				"F/W ERROR %d for CPU packet %d\n",
+				rc, (ctl & ARMCP_PKT_CTL_OPCODE_MASK)
+						>> ARMCP_PKT_CTL_OPCODE_SHIFT);
+			rc = -EINVAL;
+		} else if (result) {
+			*result = (long) le64_to_cpu(pkt->result);
+		}
+	} else {
+		dev_err(hdev->dev, "CPU packet wrong fence value\n");
+		rc = -EINVAL;
+	}
+
+out:
+	mutex_unlock(&hdev->send_cpu_message_lock);
+
+	hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev, len, pkt);
+
+	return rc;
+}
+
+int hl_fw_test_cpu_queue(struct hl_device *hdev)
+{
+	struct armcp_packet test_pkt = {};
+	long result;
+	int rc;
+
+	test_pkt.ctl = cpu_to_le32(ARMCP_PACKET_TEST <<
+					ARMCP_PKT_CTL_OPCODE_SHIFT);
+	test_pkt.value = cpu_to_le64(ARMCP_PACKET_FENCE_VAL);
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &test_pkt,
+			sizeof(test_pkt), HL_DEVICE_TIMEOUT_USEC, &result);
+
+	if (!rc) {
+		if (result == ARMCP_PACKET_FENCE_VAL)
+			dev_info(hdev->dev,
+				"queue test on CPU queue succeeded\n");
+		else
+			dev_err(hdev->dev,
+				"CPU queue test failed (0x%08lX)\n", result);
+	} else {
+		dev_err(hdev->dev, "CPU queue test failed, error %d\n", rc);
+	}
+
+	return rc;
+}
+
+void *hl_fw_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
+						dma_addr_t *dma_handle)
+{
+	u64 kernel_addr;
+
+	/* roundup to HL_CPU_PKT_SIZE */
+	size = (size + (HL_CPU_PKT_SIZE - 1)) & HL_CPU_PKT_MASK;
+
+	kernel_addr = gen_pool_alloc(hdev->cpu_accessible_dma_pool, size);
+
+	*dma_handle = hdev->cpu_accessible_dma_address +
+		(kernel_addr - (u64) (uintptr_t) hdev->cpu_accessible_dma_mem);
+
+	return (void *) (uintptr_t) kernel_addr;
+}
+
+void hl_fw_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
+					void *vaddr)
+{
+	/* roundup to HL_CPU_PKT_SIZE */
+	size = (size + (HL_CPU_PKT_SIZE - 1)) & HL_CPU_PKT_MASK;
+
+	gen_pool_free(hdev->cpu_accessible_dma_pool, (u64) (uintptr_t) vaddr,
+			size);
+}
+
+int hl_fw_send_heartbeat(struct hl_device *hdev)
+{
+	struct armcp_packet hb_pkt = {};
+	long result;
+	int rc;
+
+	hb_pkt.ctl = cpu_to_le32(ARMCP_PACKET_TEST <<
+					ARMCP_PKT_CTL_OPCODE_SHIFT);
+	hb_pkt.value = cpu_to_le64(ARMCP_PACKET_FENCE_VAL);
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &hb_pkt,
+			sizeof(hb_pkt), HL_DEVICE_TIMEOUT_USEC, &result);
+
+	if ((rc) || (result != ARMCP_PACKET_FENCE_VAL))
+		rc = -EIO;
+
+	return rc;
+}
+
+int hl_fw_armcp_info_get(struct hl_device *hdev)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	struct armcp_packet pkt = {};
+	void *armcp_info_cpu_addr;
+	dma_addr_t armcp_info_dma_addr;
+	long result;
+	int rc;
+
+	armcp_info_cpu_addr =
+			hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev,
+					sizeof(struct armcp_info),
+					&armcp_info_dma_addr);
+	if (!armcp_info_cpu_addr) {
+		dev_err(hdev->dev,
+			"Failed to allocate DMA memory for ArmCP info packet\n");
+		return -ENOMEM;
+	}
+
+	memset(armcp_info_cpu_addr, 0, sizeof(struct armcp_info));
+
+	pkt.ctl = cpu_to_le32(ARMCP_PACKET_INFO_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(armcp_info_dma_addr);
+	pkt.data_max_size = cpu_to_le32(sizeof(struct armcp_info));
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
+					HL_ARMCP_INFO_TIMEOUT_USEC, &result);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to send armcp info pkt, error %d\n", rc);
+		goto out;
+	}
+
+	memcpy(&prop->armcp_info, armcp_info_cpu_addr,
+			sizeof(prop->armcp_info));
+
+	rc = hl_build_hwmon_channel_info(hdev, prop->armcp_info.sensors);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to build hwmon channel info, error %d\n", rc);
+		rc = -EFAULT;
+		goto out;
+	}
+
+out:
+	hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev,
+			sizeof(struct armcp_info), armcp_info_cpu_addr);
+
+	return rc;
+}
+
+int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
+{
+	struct armcp_packet pkt = {};
+	void *eeprom_info_cpu_addr;
+	dma_addr_t eeprom_info_dma_addr;
+	long result;
+	int rc;
+
+	eeprom_info_cpu_addr =
+			hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev,
+					max_size, &eeprom_info_dma_addr);
+	if (!eeprom_info_cpu_addr) {
+		dev_err(hdev->dev,
+			"Failed to allocate DMA memory for EEPROM info packet\n");
+		return -ENOMEM;
+	}
+
+	memset(eeprom_info_cpu_addr, 0, max_size);
+
+	pkt.ctl = cpu_to_le32(ARMCP_PACKET_EEPROM_DATA_GET <<
+				ARMCP_PKT_CTL_OPCODE_SHIFT);
+	pkt.addr = cpu_to_le64(eeprom_info_dma_addr);
+	pkt.data_max_size = cpu_to_le32(max_size);
+
+	rc = hdev->asic_funcs->send_cpu_message(hdev, (u32 *) &pkt, sizeof(pkt),
+			HL_ARMCP_EEPROM_TIMEOUT_USEC, &result);
+
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to send armcp EEPROM pkt, error %d\n", rc);
+		goto out;
+	}
+
+	/* result contains the actual size */
+	memcpy(data, eeprom_info_cpu_addr, min((size_t)result, max_size));
+
+out:
+	hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev, max_size,
+			eeprom_info_cpu_addr);
+
+	return rc;
+}
--- a/drivers/misc/habanalabs/goya/Makefile
+++ b/drivers/misc/habanalabs/goya/Makefile
@ -1,3 +1,4 @@
 subdir-ccflags-y += -I$(src)

-HL_GOYA_FILES :=  goya/goya.o goya/goya_security.o goya/goya_hwmgr.o
+HL_GOYA_FILES :=  goya/goya.o goya/goya_security.o goya/goya_hwmgr.o \
+	goya/goya_coresight.o
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
--- a/drivers/misc/habanalabs/goya/goyaP.h
+++ b/drivers/misc/habanalabs/goya/goyaP.h
@ -39,9 +39,13 @@
 #error "Number of MSIX interrupts must be smaller or equal to GOYA_MSIX_ENTRIES"
 #endif

-#define QMAN_FENCE_TIMEOUT_USEC		10000	/* 10 ms */
+#define QMAN_FENCE_TIMEOUT_USEC		10000		/* 10 ms */

-#define QMAN_STOP_TIMEOUT_USEC		100000	/* 100 ms */
+#define QMAN_STOP_TIMEOUT_USEC		100000		/* 100 ms */
+
+#define CORESIGHT_TIMEOUT_USEC		100000		/* 100 ms */
+
+#define GOYA_CPU_TIMEOUT_USEC		10000000	/* 10s */

 #define TPC_ENABLED_MASK		0xFF

@ -49,19 +53,14 @@

 #define MAX_POWER_DEFAULT		200000		/* 200W */

-#define GOYA_ARMCP_INFO_TIMEOUT		10000000	/* 10s */
-#define GOYA_ARMCP_EEPROM_TIMEOUT	10000000	/* 10s */
-
 #define DRAM_PHYS_DEFAULT_SIZE		0x100000000ull	/* 4GB */

 /* DRAM Memory Map */

 #define CPU_FW_IMAGE_SIZE		0x10000000	/* 256MB */
-#define MMU_PAGE_TABLES_SIZE		0x0DE00000	/* 222MB */
+#define MMU_PAGE_TABLES_SIZE		0x0FC00000	/* 252MB */
 #define MMU_DRAM_DEFAULT_PAGE_SIZE	0x00200000	/* 2MB */
 #define MMU_CACHE_MNG_SIZE		0x00001000	/* 4KB */
-#define CPU_PQ_PKT_SIZE			0x00001000	/* 4KB */
-#define CPU_PQ_DATA_SIZE		0x01FFE000	/* 32MB - 8KB  */

 #define CPU_FW_IMAGE_ADDR		DRAM_PHYS_BASE
 #define MMU_PAGE_TABLES_ADDR		(CPU_FW_IMAGE_ADDR + CPU_FW_IMAGE_SIZE)
@ -69,13 +68,13 @@
 						MMU_PAGE_TABLES_SIZE)
 #define MMU_CACHE_MNG_ADDR		(MMU_DRAM_DEFAULT_PAGE_ADDR + \
 					MMU_DRAM_DEFAULT_PAGE_SIZE)
-#define CPU_PQ_PKT_ADDR			(MMU_CACHE_MNG_ADDR + \
+#define DRAM_KMD_END_ADDR		(MMU_CACHE_MNG_ADDR + \
 						MMU_CACHE_MNG_SIZE)
-#define CPU_PQ_DATA_ADDR		(CPU_PQ_PKT_ADDR + CPU_PQ_PKT_SIZE)
-#define DRAM_BASE_ADDR_USER		(CPU_PQ_DATA_ADDR + CPU_PQ_DATA_SIZE)

-#if (DRAM_BASE_ADDR_USER != 0x20000000)
-#error "KMD must reserve 512MB"
+#define DRAM_BASE_ADDR_USER		0x20000000
+
+#if (DRAM_KMD_END_ADDR > DRAM_BASE_ADDR_USER)
+#error "KMD must reserve no more than 512MB"
 #endif

 /*
@ -142,22 +141,12 @@
 #define HW_CAP_GOLDEN		0x00000400
 #define HW_CAP_TPC		0x00000800

-#define CPU_PKT_SHIFT		5
-#define CPU_PKT_SIZE		(1 << CPU_PKT_SHIFT)
-#define CPU_PKT_MASK		(~((1 << CPU_PKT_SHIFT) - 1))
-#define CPU_MAX_PKTS_IN_CB	32
-#define CPU_CB_SIZE		(CPU_PKT_SIZE * CPU_MAX_PKTS_IN_CB)
-#define CPU_ACCESSIBLE_MEM_SIZE	(HL_QUEUE_LENGTH * CPU_CB_SIZE)
-
 enum goya_fw_component {
 	FW_COMP_UBOOT,
 	FW_COMP_PREBOOT
 };

 struct goya_device {
-	int (*test_cpu_queue)(struct hl_device *hdev);
-	int (*armcp_info_get)(struct hl_device *hdev);
-
 	/* TODO: remove hw_queues_lock after moving to scheduler code */
 	spinlock_t	hw_queues_lock;

@ -170,13 +159,34 @@ struct goya_device {
 	u32		hw_cap_initialized;
 };

+void goya_get_fixed_properties(struct hl_device *hdev);
+int goya_mmu_init(struct hl_device *hdev);
+void goya_init_dma_qmans(struct hl_device *hdev);
+void goya_init_mme_qmans(struct hl_device *hdev);
+void goya_init_tpc_qmans(struct hl_device *hdev);
+int goya_init_cpu_queues(struct hl_device *hdev);
+void goya_init_security(struct hl_device *hdev);
+int goya_late_init(struct hl_device *hdev);
+void goya_late_fini(struct hl_device *hdev);
+
+void goya_ring_doorbell(struct hl_device *hdev, u32 hw_queue_id, u32 pi);
+void goya_flush_pq_write(struct hl_device *hdev, u64 *pq, u64 exp_val);
+void goya_update_eq_ci(struct hl_device *hdev, u32 val);
+void goya_restore_phase_topology(struct hl_device *hdev);
+int goya_context_switch(struct hl_device *hdev, u32 asid);
+
 int goya_debugfs_i2c_read(struct hl_device *hdev, u8 i2c_bus,
 			u8 i2c_addr, u8 i2c_reg, u32 *val);
 int goya_debugfs_i2c_write(struct hl_device *hdev, u8 i2c_bus,
 			u8 i2c_addr, u8 i2c_reg, u32 val);
+void goya_debugfs_led_set(struct hl_device *hdev, u8 led, u8 state);
+
+int goya_test_queue(struct hl_device *hdev, u32 hw_queue_id);
+int goya_test_queues(struct hl_device *hdev);
 int goya_test_cpu_queue(struct hl_device *hdev);
 int goya_send_cpu_message(struct hl_device *hdev, u32 *msg, u16 len,
 				u32 timeout, long *result);
+
 long goya_get_temperature(struct hl_device *hdev, int sensor_index, u32 attr);
 long goya_get_voltage(struct hl_device *hdev, int sensor_index, u32 attr);
 long goya_get_current(struct hl_device *hdev, int sensor_index, u32 attr);
@ -184,28 +194,35 @@ long goya_get_fan_speed(struct hl_device *hdev, int sensor_index, u32 attr);
 long goya_get_pwm_info(struct hl_device *hdev, int sensor_index, u32 attr);
 void goya_set_pwm_info(struct hl_device *hdev, int sensor_index, u32 attr,
 			long value);
-void goya_debugfs_led_set(struct hl_device *hdev, u8 led, u8 state);
-void goya_set_pll_profile(struct hl_device *hdev, enum hl_pll_frequency freq);
-void goya_add_device_attr(struct hl_device *hdev,
-			struct attribute_group *dev_attr_grp);
-void goya_init_security(struct hl_device *hdev);
 u64 goya_get_max_power(struct hl_device *hdev);
 void goya_set_max_power(struct hl_device *hdev, u64 value);

-int goya_send_pci_access_msg(struct hl_device *hdev, u32 opcode);
-void goya_late_fini(struct hl_device *hdev);
+void goya_set_pll_profile(struct hl_device *hdev, enum hl_pll_frequency freq);
+void goya_add_device_attr(struct hl_device *hdev,
+			struct attribute_group *dev_attr_grp);
+int goya_armcp_info_get(struct hl_device *hdev);
+int goya_debug_coresight(struct hl_device *hdev, void *data);
+
+void goya_mmu_prepare(struct hl_device *hdev, u32 asid);
+int goya_mmu_clear_pgt_range(struct hl_device *hdev);
+int goya_mmu_set_dram_default_page(struct hl_device *hdev);
+
 int goya_suspend(struct hl_device *hdev);
 int goya_resume(struct hl_device *hdev);
-void goya_flush_pq_write(struct hl_device *hdev, u64 *pq, u64 exp_val);
+
 void goya_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry);
 void *goya_get_events_stat(struct hl_device *hdev, u32 *size);
+
 void goya_add_end_of_cb_packets(u64 kernel_address, u32 len, u64 cq_addr,
 				u32 cq_val, u32 msix_vec);
 int goya_cs_parser(struct hl_device *hdev, struct hl_cs_parser *parser);
 void *goya_get_int_queue_base(struct hl_device *hdev, u32 queue_id,
-		dma_addr_t *dma_handle,	u16 *queue_len);
+				dma_addr_t *dma_handle,	u16 *queue_len);
 u32 goya_get_dma_desc_list_size(struct hl_device *hdev, struct sg_table *sgt);
-int goya_test_queue(struct hl_device *hdev, u32 hw_queue_id);
 int goya_send_heartbeat(struct hl_device *hdev);
+void *goya_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
+					dma_addr_t *dma_handle);
+void goya_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
+					void *vaddr);

 #endif /* GOYAP_H_ */
--- a/drivers/misc/habanalabs/goya/goya_coresight.c
+++ b/drivers/misc/habanalabs/goya/goya_coresight.c
@ -0,0 +1,628 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2016-2019 HabanaLabs, Ltd.
+ * All Rights Reserved.
+ */
+
+#include "goyaP.h"
+#include "include/goya/goya_coresight.h"
+#include "include/goya/asic_reg/goya_regs.h"
+
+#include <uapi/misc/habanalabs.h>
+
+#include <linux/coresight.h>
+
+#define GOYA_PLDM_CORESIGHT_TIMEOUT_USEC	(CORESIGHT_TIMEOUT_USEC * 100)
+
+static u64 debug_stm_regs[GOYA_STM_LAST + 1] = {
+	[GOYA_STM_CPU]		= mmCPU_STM_BASE,
+	[GOYA_STM_DMA_CH_0_CS]	= mmDMA_CH_0_CS_STM_BASE,
+	[GOYA_STM_DMA_CH_1_CS]	= mmDMA_CH_1_CS_STM_BASE,
+	[GOYA_STM_DMA_CH_2_CS]	= mmDMA_CH_2_CS_STM_BASE,
+	[GOYA_STM_DMA_CH_3_CS]	= mmDMA_CH_3_CS_STM_BASE,
+	[GOYA_STM_DMA_CH_4_CS]	= mmDMA_CH_4_CS_STM_BASE,
+	[GOYA_STM_DMA_MACRO_CS]	= mmDMA_MACRO_CS_STM_BASE,
+	[GOYA_STM_MME1_SBA]	= mmMME1_SBA_STM_BASE,
+	[GOYA_STM_MME3_SBB]	= mmMME3_SBB_STM_BASE,
+	[GOYA_STM_MME4_WACS2]	= mmMME4_WACS2_STM_BASE,
+	[GOYA_STM_MME4_WACS]	= mmMME4_WACS_STM_BASE,
+	[GOYA_STM_MMU_CS]	= mmMMU_CS_STM_BASE,
+	[GOYA_STM_PCIE]		= mmPCIE_STM_BASE,
+	[GOYA_STM_PSOC]		= mmPSOC_STM_BASE,
+	[GOYA_STM_TPC0_EML]	= mmTPC0_EML_STM_BASE,
+	[GOYA_STM_TPC1_EML]	= mmTPC1_EML_STM_BASE,
+	[GOYA_STM_TPC2_EML]	= mmTPC2_EML_STM_BASE,
+	[GOYA_STM_TPC3_EML]	= mmTPC3_EML_STM_BASE,
+	[GOYA_STM_TPC4_EML]	= mmTPC4_EML_STM_BASE,
+	[GOYA_STM_TPC5_EML]	= mmTPC5_EML_STM_BASE,
+	[GOYA_STM_TPC6_EML]	= mmTPC6_EML_STM_BASE,
+	[GOYA_STM_TPC7_EML]	= mmTPC7_EML_STM_BASE
+};
+
+static u64 debug_etf_regs[GOYA_ETF_LAST + 1] = {
+	[GOYA_ETF_CPU_0]	= mmCPU_ETF_0_BASE,
+	[GOYA_ETF_CPU_1]	= mmCPU_ETF_1_BASE,
+	[GOYA_ETF_CPU_TRACE]	= mmCPU_ETF_TRACE_BASE,
+	[GOYA_ETF_DMA_CH_0_CS]	= mmDMA_CH_0_CS_ETF_BASE,
+	[GOYA_ETF_DMA_CH_1_CS]	= mmDMA_CH_1_CS_ETF_BASE,
+	[GOYA_ETF_DMA_CH_2_CS]	= mmDMA_CH_2_CS_ETF_BASE,
+	[GOYA_ETF_DMA_CH_3_CS]	= mmDMA_CH_3_CS_ETF_BASE,
+	[GOYA_ETF_DMA_CH_4_CS]	= mmDMA_CH_4_CS_ETF_BASE,
+	[GOYA_ETF_DMA_MACRO_CS]	= mmDMA_MACRO_CS_ETF_BASE,
+	[GOYA_ETF_MME1_SBA]	= mmMME1_SBA_ETF_BASE,
+	[GOYA_ETF_MME3_SBB]	= mmMME3_SBB_ETF_BASE,
+	[GOYA_ETF_MME4_WACS2]	= mmMME4_WACS2_ETF_BASE,
+	[GOYA_ETF_MME4_WACS]	= mmMME4_WACS_ETF_BASE,
+	[GOYA_ETF_MMU_CS]	= mmMMU_CS_ETF_BASE,
+	[GOYA_ETF_PCIE]		= mmPCIE_ETF_BASE,
+	[GOYA_ETF_PSOC]		= mmPSOC_ETF_BASE,
+	[GOYA_ETF_TPC0_EML]	= mmTPC0_EML_ETF_BASE,
+	[GOYA_ETF_TPC1_EML]	= mmTPC1_EML_ETF_BASE,
+	[GOYA_ETF_TPC2_EML]	= mmTPC2_EML_ETF_BASE,
+	[GOYA_ETF_TPC3_EML]	= mmTPC3_EML_ETF_BASE,
+	[GOYA_ETF_TPC4_EML]	= mmTPC4_EML_ETF_BASE,
+	[GOYA_ETF_TPC5_EML]	= mmTPC5_EML_ETF_BASE,
+	[GOYA_ETF_TPC6_EML]	= mmTPC6_EML_ETF_BASE,
+	[GOYA_ETF_TPC7_EML]	= mmTPC7_EML_ETF_BASE
+};
+
+static u64 debug_funnel_regs[GOYA_FUNNEL_LAST + 1] = {
+	[GOYA_FUNNEL_CPU]		= mmCPU_FUNNEL_BASE,
+	[GOYA_FUNNEL_DMA_CH_6_1]	= mmDMA_CH_FUNNEL_6_1_BASE,
+	[GOYA_FUNNEL_DMA_MACRO_3_1]	= mmDMA_MACRO_FUNNEL_3_1_BASE,
+	[GOYA_FUNNEL_MME0_RTR]		= mmMME0_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_MME1_RTR]		= mmMME1_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_MME2_RTR]		= mmMME2_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_MME3_RTR]		= mmMME3_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_MME4_RTR]		= mmMME4_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_MME5_RTR]		= mmMME5_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_PCIE]		= mmPCIE_FUNNEL_BASE,
+	[GOYA_FUNNEL_PSOC]		= mmPSOC_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC0_EML]		= mmTPC0_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC1_EML]		= mmTPC1_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC1_RTR]		= mmTPC1_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC2_EML]		= mmTPC2_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC2_RTR]		= mmTPC2_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC3_EML]		= mmTPC3_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC3_RTR]		= mmTPC3_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC4_EML]		= mmTPC4_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC4_RTR]		= mmTPC4_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC5_EML]		= mmTPC5_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC5_RTR]		= mmTPC5_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC6_EML]		= mmTPC6_EML_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC6_RTR]		= mmTPC6_RTR_FUNNEL_BASE,
+	[GOYA_FUNNEL_TPC7_EML]		= mmTPC7_EML_FUNNEL_BASE
+};
+
+static u64 debug_bmon_regs[GOYA_BMON_LAST + 1] = {
+	[GOYA_BMON_CPU_RD]		= mmCPU_RD_BMON_BASE,
+	[GOYA_BMON_CPU_WR]		= mmCPU_WR_BMON_BASE,
+	[GOYA_BMON_DMA_CH_0_0]		= mmDMA_CH_0_BMON_0_BASE,
+	[GOYA_BMON_DMA_CH_0_1]		= mmDMA_CH_0_BMON_1_BASE,
+	[GOYA_BMON_DMA_CH_1_0]		= mmDMA_CH_1_BMON_0_BASE,
+	[GOYA_BMON_DMA_CH_1_1]		= mmDMA_CH_1_BMON_1_BASE,
+	[GOYA_BMON_DMA_CH_2_0]		= mmDMA_CH_2_BMON_0_BASE,
+	[GOYA_BMON_DMA_CH_2_1]		= mmDMA_CH_2_BMON_1_BASE,
+	[GOYA_BMON_DMA_CH_3_0]		= mmDMA_CH_3_BMON_0_BASE,
+	[GOYA_BMON_DMA_CH_3_1]		= mmDMA_CH_3_BMON_1_BASE,
+	[GOYA_BMON_DMA_CH_4_0]		= mmDMA_CH_4_BMON_0_BASE,
+	[GOYA_BMON_DMA_CH_4_1]		= mmDMA_CH_4_BMON_1_BASE,
+	[GOYA_BMON_DMA_MACRO_0]		= mmDMA_MACRO_BMON_0_BASE,
+	[GOYA_BMON_DMA_MACRO_1]		= mmDMA_MACRO_BMON_1_BASE,
+	[GOYA_BMON_DMA_MACRO_2]		= mmDMA_MACRO_BMON_2_BASE,
+	[GOYA_BMON_DMA_MACRO_3]		= mmDMA_MACRO_BMON_3_BASE,
+	[GOYA_BMON_DMA_MACRO_4]		= mmDMA_MACRO_BMON_4_BASE,
+	[GOYA_BMON_DMA_MACRO_5]		= mmDMA_MACRO_BMON_5_BASE,
+	[GOYA_BMON_DMA_MACRO_6]		= mmDMA_MACRO_BMON_6_BASE,
+	[GOYA_BMON_DMA_MACRO_7]		= mmDMA_MACRO_BMON_7_BASE,
+	[GOYA_BMON_MME1_SBA_0]		= mmMME1_SBA_BMON0_BASE,
+	[GOYA_BMON_MME1_SBA_1]		= mmMME1_SBA_BMON1_BASE,
+	[GOYA_BMON_MME3_SBB_0]		= mmMME3_SBB_BMON0_BASE,
+	[GOYA_BMON_MME3_SBB_1]		= mmMME3_SBB_BMON1_BASE,
+	[GOYA_BMON_MME4_WACS2_0]	= mmMME4_WACS2_BMON0_BASE,
+	[GOYA_BMON_MME4_WACS2_1]	= mmMME4_WACS2_BMON1_BASE,
+	[GOYA_BMON_MME4_WACS2_2]	= mmMME4_WACS2_BMON2_BASE,
+	[GOYA_BMON_MME4_WACS_0]		= mmMME4_WACS_BMON0_BASE,
+	[GOYA_BMON_MME4_WACS_1]		= mmMME4_WACS_BMON1_BASE,
+	[GOYA_BMON_MME4_WACS_2]		= mmMME4_WACS_BMON2_BASE,
+	[GOYA_BMON_MME4_WACS_3]		= mmMME4_WACS_BMON3_BASE,
+	[GOYA_BMON_MME4_WACS_4]		= mmMME4_WACS_BMON4_BASE,
+	[GOYA_BMON_MME4_WACS_5]		= mmMME4_WACS_BMON5_BASE,
+	[GOYA_BMON_MME4_WACS_6]		= mmMME4_WACS_BMON6_BASE,
+	[GOYA_BMON_MMU_0]		= mmMMU_BMON_0_BASE,
+	[GOYA_BMON_MMU_1]		= mmMMU_BMON_1_BASE,
+	[GOYA_BMON_PCIE_MSTR_RD]	= mmPCIE_BMON_MSTR_RD_BASE,
+	[GOYA_BMON_PCIE_MSTR_WR]	= mmPCIE_BMON_MSTR_WR_BASE,
+	[GOYA_BMON_PCIE_SLV_RD]		= mmPCIE_BMON_SLV_RD_BASE,
+	[GOYA_BMON_PCIE_SLV_WR]		= mmPCIE_BMON_SLV_WR_BASE,
+	[GOYA_BMON_TPC0_EML_0]		= mmTPC0_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC0_EML_1]		= mmTPC0_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC0_EML_2]		= mmTPC0_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC0_EML_3]		= mmTPC0_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC1_EML_0]		= mmTPC1_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC1_EML_1]		= mmTPC1_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC1_EML_2]		= mmTPC1_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC1_EML_3]		= mmTPC1_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC2_EML_0]		= mmTPC2_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC2_EML_1]		= mmTPC2_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC2_EML_2]		= mmTPC2_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC2_EML_3]		= mmTPC2_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC3_EML_0]		= mmTPC3_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC3_EML_1]		= mmTPC3_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC3_EML_2]		= mmTPC3_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC3_EML_3]		= mmTPC3_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC4_EML_0]		= mmTPC4_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC4_EML_1]		= mmTPC4_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC4_EML_2]		= mmTPC4_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC4_EML_3]		= mmTPC4_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC5_EML_0]		= mmTPC5_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC5_EML_1]		= mmTPC5_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC5_EML_2]		= mmTPC5_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC5_EML_3]		= mmTPC5_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC6_EML_0]		= mmTPC6_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC6_EML_1]		= mmTPC6_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC6_EML_2]		= mmTPC6_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC6_EML_3]		= mmTPC6_EML_BUSMON_3_BASE,
+	[GOYA_BMON_TPC7_EML_0]		= mmTPC7_EML_BUSMON_0_BASE,
+	[GOYA_BMON_TPC7_EML_1]		= mmTPC7_EML_BUSMON_1_BASE,
+	[GOYA_BMON_TPC7_EML_2]		= mmTPC7_EML_BUSMON_2_BASE,
+	[GOYA_BMON_TPC7_EML_3]		= mmTPC7_EML_BUSMON_3_BASE
+};
+
+static u64 debug_spmu_regs[GOYA_SPMU_LAST + 1] = {
+	[GOYA_SPMU_DMA_CH_0_CS]		= mmDMA_CH_0_CS_SPMU_BASE,
+	[GOYA_SPMU_DMA_CH_1_CS]		= mmDMA_CH_1_CS_SPMU_BASE,
+	[GOYA_SPMU_DMA_CH_2_CS]		= mmDMA_CH_2_CS_SPMU_BASE,
+	[GOYA_SPMU_DMA_CH_3_CS]		= mmDMA_CH_3_CS_SPMU_BASE,
+	[GOYA_SPMU_DMA_CH_4_CS]		= mmDMA_CH_4_CS_SPMU_BASE,
+	[GOYA_SPMU_DMA_MACRO_CS]	= mmDMA_MACRO_CS_SPMU_BASE,
+	[GOYA_SPMU_MME1_SBA]		= mmMME1_SBA_SPMU_BASE,
+	[GOYA_SPMU_MME3_SBB]		= mmMME3_SBB_SPMU_BASE,
+	[GOYA_SPMU_MME4_WACS2]		= mmMME4_WACS2_SPMU_BASE,
+	[GOYA_SPMU_MME4_WACS]		= mmMME4_WACS_SPMU_BASE,
+	[GOYA_SPMU_MMU_CS]		= mmMMU_CS_SPMU_BASE,
+	[GOYA_SPMU_PCIE]		= mmPCIE_SPMU_BASE,
+	[GOYA_SPMU_TPC0_EML]		= mmTPC0_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC1_EML]		= mmTPC1_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC2_EML]		= mmTPC2_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC3_EML]		= mmTPC3_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC4_EML]		= mmTPC4_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC5_EML]		= mmTPC5_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC6_EML]		= mmTPC6_EML_SPMU_BASE,
+	[GOYA_SPMU_TPC7_EML]		= mmTPC7_EML_SPMU_BASE
+};
+
+static int goya_coresight_timeout(struct hl_device *hdev, u64 addr,
+		int position, bool up)
+{
+	int rc;
+	u32 val, timeout_usec;
+
+	if (hdev->pldm)
+		timeout_usec = GOYA_PLDM_CORESIGHT_TIMEOUT_USEC;
+	else
+		timeout_usec = CORESIGHT_TIMEOUT_USEC;
+
+	rc = hl_poll_timeout(
+		hdev,
+		addr,
+		val,
+		up ? val & BIT(position) : !(val & BIT(position)),
+		1000,
+		timeout_usec);
+
+	if (rc) {
+		dev_err(hdev->dev,
+			"Timeout while waiting for coresight, addr: 0x%llx, position: %d, up: %d\n",
+				addr, position, up);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int goya_config_stm(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	struct hl_debug_params_stm *input;
+	u64 base_reg = debug_stm_regs[params->reg_idx] - CFG_BASE;
+	int rc;
+
+	WREG32(base_reg + 0xFB0, CORESIGHT_UNLOCK);
+
+	if (params->enable) {
+		input = params->input;
+
+		if (!input)
+			return -EINVAL;
+
+		WREG32(base_reg + 0xE80, 0x80004);
+		WREG32(base_reg + 0xD64, 7);
+		WREG32(base_reg + 0xD60, 0);
+		WREG32(base_reg + 0xD00, lower_32_bits(input->he_mask));
+		WREG32(base_reg + 0xD20, lower_32_bits(input->sp_mask));
+		WREG32(base_reg + 0xD60, 1);
+		WREG32(base_reg + 0xD00, upper_32_bits(input->he_mask));
+		WREG32(base_reg + 0xD20, upper_32_bits(input->sp_mask));
+		WREG32(base_reg + 0xE70, 0x10);
+		WREG32(base_reg + 0xE60, 0);
+		WREG32(base_reg + 0xE64, 0x420000);
+		WREG32(base_reg + 0xE00, 0xFFFFFFFF);
+		WREG32(base_reg + 0xE20, 0xFFFFFFFF);
+		WREG32(base_reg + 0xEF4, input->id);
+		WREG32(base_reg + 0xDF4, 0x80);
+		WREG32(base_reg + 0xE8C, input->frequency);
+		WREG32(base_reg + 0xE90, 0x7FF);
+		WREG32(base_reg + 0xE80, 0x7 | (input->id << 16));
+	} else {
+		WREG32(base_reg + 0xE80, 4);
+		WREG32(base_reg + 0xD64, 0);
+		WREG32(base_reg + 0xD60, 1);
+		WREG32(base_reg + 0xD00, 0);
+		WREG32(base_reg + 0xD20, 0);
+		WREG32(base_reg + 0xD60, 0);
+		WREG32(base_reg + 0xE20, 0);
+		WREG32(base_reg + 0xE00, 0);
+		WREG32(base_reg + 0xDF4, 0x80);
+		WREG32(base_reg + 0xE70, 0);
+		WREG32(base_reg + 0xE60, 0);
+		WREG32(base_reg + 0xE64, 0);
+		WREG32(base_reg + 0xE8C, 0);
+
+		rc = goya_coresight_timeout(hdev, base_reg + 0xE80, 23, false);
+		if (rc) {
+			dev_err(hdev->dev,
+				"Failed to disable STM on timeout, error %d\n",
+				rc);
+			return rc;
+		}
+
+		WREG32(base_reg + 0xE80, 4);
+	}
+
+	return 0;
+}
+
+static int goya_config_etf(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	struct hl_debug_params_etf *input;
+	u64 base_reg = debug_etf_regs[params->reg_idx] - CFG_BASE;
+	u32 val;
+	int rc;
+
+	WREG32(base_reg + 0xFB0, CORESIGHT_UNLOCK);
+
+	val = RREG32(base_reg + 0x304);
+	val |= 0x1000;
+	WREG32(base_reg + 0x304, val);
+	val |= 0x40;
+	WREG32(base_reg + 0x304, val);
+
+	rc = goya_coresight_timeout(hdev, base_reg + 0x304, 6, false);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to %s ETF on timeout, error %d\n",
+				params->enable ? "enable" : "disable", rc);
+		return rc;
+	}
+
+	rc = goya_coresight_timeout(hdev, base_reg + 0xC, 2, true);
+	if (rc) {
+		dev_err(hdev->dev,
+			"Failed to %s ETF on timeout, error %d\n",
+				params->enable ? "enable" : "disable", rc);
+		return rc;
+	}
+
+	WREG32(base_reg + 0x20, 0);
+
+	if (params->enable) {
+		input = params->input;
+
+		if (!input)
+			return -EINVAL;
+
+		WREG32(base_reg + 0x34, 0x3FFC);
+		WREG32(base_reg + 0x28, input->sink_mode);
+		WREG32(base_reg + 0x304, 0x4001);
+		WREG32(base_reg + 0x308, 0xA);
+		WREG32(base_reg + 0x20, 1);
+	} else {
+		WREG32(base_reg + 0x34, 0);
+		WREG32(base_reg + 0x28, 0);
+		WREG32(base_reg + 0x304, 0);
+	}
+
+	return 0;
+}
+
+static int goya_etr_validate_address(struct hl_device *hdev, u64 addr,
+		u32 size)
+{
+	struct asic_fixed_properties *prop = &hdev->asic_prop;
+	u64 range_start, range_end;
+
+	if (hdev->mmu_enable) {
+		range_start = prop->va_space_dram_start_address;
+		range_end = prop->va_space_dram_end_address;
+	} else {
+		range_start = prop->dram_user_base_address;
+		range_end = prop->dram_end_address;
+	}
+
+	return hl_mem_area_inside_range(addr, size, range_start, range_end);
+}
+
+static int goya_config_etr(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	struct hl_debug_params_etr *input;
+	u64 base_reg = mmPSOC_ETR_BASE - CFG_BASE;
+	u32 val;
+	int rc;
+
+	WREG32(base_reg + 0xFB0, CORESIGHT_UNLOCK);
+
+	val = RREG32(base_reg + 0x304);
+	val |= 0x1000;
+	WREG32(base_reg + 0x304, val);
+	val |= 0x40;
+	WREG32(base_reg + 0x304, val);
+
+	rc = goya_coresight_timeout(hdev, base_reg + 0x304, 6, false);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to %s ETR on timeout, error %d\n",
+				params->enable ? "enable" : "disable", rc);
+		return rc;
+	}
+
+	rc = goya_coresight_timeout(hdev, base_reg + 0xC, 2, true);
+	if (rc) {
+		dev_err(hdev->dev, "Failed to %s ETR on timeout, error %d\n",
+				params->enable ? "enable" : "disable", rc);
+		return rc;
+	}
+
+	WREG32(base_reg + 0x20, 0);
+
+	if (params->enable) {
+		input = params->input;
+
+		if (!input)
+			return -EINVAL;
+
+		if (input->buffer_size == 0) {
+			dev_err(hdev->dev,
+				"ETR buffer size should be bigger than 0\n");
+			return -EINVAL;
+		}
+
+		if (!goya_etr_validate_address(hdev,
+				input->buffer_address, input->buffer_size)) {
+			dev_err(hdev->dev, "buffer address is not valid\n");
+			return -EINVAL;
+		}
+
+		WREG32(base_reg + 0x34, 0x3FFC);
+		WREG32(base_reg + 0x4, input->buffer_size);
+		WREG32(base_reg + 0x28, input->sink_mode);
+		WREG32(base_reg + 0x110, 0x700);
+		WREG32(base_reg + 0x118,
+				lower_32_bits(input->buffer_address));
+		WREG32(base_reg + 0x11C,
+				upper_32_bits(input->buffer_address));
+		WREG32(base_reg + 0x304, 3);
+		WREG32(base_reg + 0x308, 0xA);
+		WREG32(base_reg + 0x20, 1);
+	} else {
+		WREG32(base_reg + 0x34, 0);
+		WREG32(base_reg + 0x4, 0x400);
+		WREG32(base_reg + 0x118, 0);
+		WREG32(base_reg + 0x11C, 0);
+		WREG32(base_reg + 0x308, 0);
+		WREG32(base_reg + 0x28, 0);
+		WREG32(base_reg + 0x304, 0);
+
+		if (params->output_size >= sizeof(u32))
+			*(u32 *) params->output = RREG32(base_reg + 0x18);
+	}
+
+	return 0;
+}
+
+static int goya_config_funnel(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	WREG32(debug_funnel_regs[params->reg_idx] - CFG_BASE + 0xFB0,
+			CORESIGHT_UNLOCK);
+
+	WREG32(debug_funnel_regs[params->reg_idx] - CFG_BASE,
+			params->enable ? 0x33F : 0);
+
+	return 0;
+}
+
+static int goya_config_bmon(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	struct hl_debug_params_bmon *input;
+	u64 base_reg = debug_bmon_regs[params->reg_idx] - CFG_BASE;
+	u32 pcie_base = 0;
+
+	WREG32(base_reg + 0x104, 1);
+
+	if (params->enable) {
+		input = params->input;
+
+		if (!input)
+			return -EINVAL;
+
+		WREG32(base_reg + 0x200, lower_32_bits(input->start_addr0));
+		WREG32(base_reg + 0x204, upper_32_bits(input->start_addr0));
+		WREG32(base_reg + 0x208, lower_32_bits(input->addr_mask0));
+		WREG32(base_reg + 0x20C, upper_32_bits(input->addr_mask0));
+		WREG32(base_reg + 0x240, lower_32_bits(input->start_addr1));
+		WREG32(base_reg + 0x244, upper_32_bits(input->start_addr1));
+		WREG32(base_reg + 0x248, lower_32_bits(input->addr_mask1));
+		WREG32(base_reg + 0x24C, upper_32_bits(input->addr_mask1));
+		WREG32(base_reg + 0x224, 0);
+		WREG32(base_reg + 0x234, 0);
+		WREG32(base_reg + 0x30C, input->bw_win);
+		WREG32(base_reg + 0x308, input->win_capture);
+
+		/* PCIE IF BMON bug WA */
+		if (params->reg_idx != GOYA_BMON_PCIE_MSTR_RD &&
+				params->reg_idx != GOYA_BMON_PCIE_MSTR_WR &&
+				params->reg_idx != GOYA_BMON_PCIE_SLV_RD &&
+				params->reg_idx != GOYA_BMON_PCIE_SLV_WR)
+			pcie_base = 0xA000000;
+
+		WREG32(base_reg + 0x700, pcie_base | 0xB00 | (input->id << 12));
+		WREG32(base_reg + 0x708, pcie_base | 0xA00 | (input->id << 12));
+		WREG32(base_reg + 0x70C, pcie_base | 0xC00 | (input->id << 12));
+
+		WREG32(base_reg + 0x100, 0x11);
+		WREG32(base_reg + 0x304, 0x1);
+	} else {
+		WREG32(base_reg + 0x200, 0);
+		WREG32(base_reg + 0x204, 0);
+		WREG32(base_reg + 0x208, 0xFFFFFFFF);
+		WREG32(base_reg + 0x20C, 0xFFFFFFFF);
+		WREG32(base_reg + 0x240, 0);
+		WREG32(base_reg + 0x244, 0);
+		WREG32(base_reg + 0x248, 0xFFFFFFFF);
+		WREG32(base_reg + 0x24C, 0xFFFFFFFF);
+		WREG32(base_reg + 0x224, 0xFFFFFFFF);
+		WREG32(base_reg + 0x234, 0x1070F);
+		WREG32(base_reg + 0x30C, 0);
+		WREG32(base_reg + 0x308, 0xFFFF);
+		WREG32(base_reg + 0x700, 0xA000B00);
+		WREG32(base_reg + 0x708, 0xA000A00);
+		WREG32(base_reg + 0x70C, 0xA000C00);
+		WREG32(base_reg + 0x100, 1);
+		WREG32(base_reg + 0x304, 0);
+		WREG32(base_reg + 0x104, 0);
+	}
+
+	return 0;
+}
+
+static int goya_config_spmu(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	u64 base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE;
+	struct hl_debug_params_spmu *input = params->input;
+	u64 *output;
+	u32 output_arr_len;
+	u32 events_num;
+	u32 overflow_idx;
+	u32 cycle_cnt_idx;
+	int i;
+
+	if (params->enable) {
+		input = params->input;
+
+		if (!input)
+			return -EINVAL;
+
+		if (input->event_types_num < 3) {
+			dev_err(hdev->dev,
+				"not enough values for SPMU enable\n");
+			return -EINVAL;
+		}
+
+		WREG32(base_reg + 0xE04, 0x41013046);
+		WREG32(base_reg + 0xE04, 0x41013040);
+
+		for (i = 0 ; i < input->event_types_num ; i++)
+			WREG32(base_reg + 0x400 + i * 4, input->event_types[i]);
+
+		WREG32(base_reg + 0xE04, 0x41013041);
+		WREG32(base_reg + 0xC00, 0x8000003F);
+	} else {
+		output = params->output;
+		output_arr_len = params->output_size / 8;
+		events_num = output_arr_len - 2;
+		overflow_idx = output_arr_len - 2;
+		cycle_cnt_idx = output_arr_len - 1;
+
+		if (!output)
+			return -EINVAL;
+
+		if (output_arr_len < 3) {
+			dev_err(hdev->dev,
+				"not enough values for SPMU disable\n");
+			return -EINVAL;
+		}
+
+		WREG32(base_reg + 0xE04, 0x41013040);
+
+		for (i = 0 ; i < events_num ; i++)
+			output[i] = RREG32(base_reg + i * 8);
+
+		output[overflow_idx] = RREG32(base_reg + 0xCC0);
+
+		output[cycle_cnt_idx] = RREG32(base_reg + 0xFC);
+		output[cycle_cnt_idx] <<= 32;
+		output[cycle_cnt_idx] |= RREG32(base_reg + 0xF8);
+
+		WREG32(base_reg + 0xCC0, 0);
+	}
+
+	return 0;
+}
+
+static int goya_config_timestamp(struct hl_device *hdev,
+		struct hl_debug_params *params)
+{
+	WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 0);
+	if (params->enable) {
+		WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0xC, 0);
+		WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE + 0x8, 0);
+		WREG32(mmPSOC_TIMESTAMP_BASE - CFG_BASE, 1);
+	}
+
+	return 0;
+}
+
+int goya_debug_coresight(struct hl_device *hdev, void *data)
+{
+	struct hl_debug_params *params = data;
+	u32 val;
+	int rc;
+
+	switch (params->op) {
+	case HL_DEBUG_OP_STM:
+		rc = goya_config_stm(hdev, params);
+		break;
+	case HL_DEBUG_OP_ETF:
+		rc = goya_config_etf(hdev, params);
+		break;
+	case HL_DEBUG_OP_ETR:
+		rc = goya_config_etr(hdev, params);
+		break;
+	case HL_DEBUG_OP_FUNNEL:
+		rc = goya_config_funnel(hdev, params);
+		break;
+	case HL_DEBUG_OP_BMON:
+		rc = goya_config_bmon(hdev, params);
+		break;
+	case HL_DEBUG_OP_SPMU:
+		rc = goya_config_spmu(hdev, params);
+		break;
+	case HL_DEBUG_OP_TIMESTAMP:
+		rc = goya_config_timestamp(hdev, params);
+		break;
+
+	default:
+		dev_err(hdev->dev, "Unknown coresight id %d\n", params->op);
+		return -EINVAL;
+	}
+
+	/* Perform read from the device to flush all configuration */
+	val = RREG32(mmPCIE_DBI_DEVICE_ID_VENDOR_ID_REG);
+
+	return rc;
+}
--- a/drivers/misc/habanalabs/goya/goya_security.c
+++ b/drivers/misc/habanalabs/goya/goya_security.c
@ -6,6 +6,7 @@
 */

 #include "goyaP.h"
+#include "include/goya/asic_reg/goya_regs.h"

 /*
 * goya_set_block_as_protected - set the given block as protected
@ -2159,6 +2160,8 @@ static void goya_init_protection_bits(struct hl_device *hdev)
 	 * Bits 7-11 represents the word offset inside the 128 bytes.
 	 * Bits 2-6 represents the bit location inside the word.
 	 */
+	u32 pb_addr, mask;
+	u8 word_offset;

 	goya_pb_set_block(hdev, mmPCI_NRTR_BASE);
 	goya_pb_set_block(hdev, mmPCI_RD_REGULATOR_BASE);
@ -2237,6 +2240,14 @@ static void goya_init_protection_bits(struct hl_device *hdev)
 	goya_pb_set_block(hdev, mmPCIE_AUX_BASE);
 	goya_pb_set_block(hdev, mmPCIE_DB_RSV_BASE);
 	goya_pb_set_block(hdev, mmPCIE_PHY_BASE);
+	goya_pb_set_block(hdev, mmTPC0_NRTR_BASE);
+	goya_pb_set_block(hdev, mmTPC_PLL_BASE);
+
+	pb_addr = (mmTPC_PLL_CLK_RLX_0 & ~0xFFF) + PROT_BITS_OFFS;
+	word_offset = ((mmTPC_PLL_CLK_RLX_0 & PROT_BITS_OFFS) >> 7) << 2;
+	mask = 1 << ((mmTPC_PLL_CLK_RLX_0 & 0x7C) >> 2);
+
+	WREG32(pb_addr + word_offset, mask);

 	goya_init_mme_protection_bits(hdev);

@ -2294,8 +2305,8 @@ void goya_init_security(struct hl_device *hdev)
 	u32 lbw_rng10_base = 0xFCC60000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;
 	u32 lbw_rng10_mask = 0xFFFE0000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;

-	u32 lbw_rng11_base = 0xFCE00000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;
-	u32 lbw_rng11_mask = 0xFFFFC000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;
+	u32 lbw_rng11_base = 0xFCE02000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;
+	u32 lbw_rng11_mask = 0xFFFFE000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;

 	u32 lbw_rng12_base = 0xFE484000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;
 	u32 lbw_rng12_mask = 0xFFFFF000 & DMA_MACRO_LBW_RANGE_BASE_R_MASK;
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@ -11,8 +11,6 @@
 #include "include/armcp_if.h"
 #include "include/qman_if.h"

-#define pr_fmt(fmt)			"habanalabs: " fmt
-
 #include <linux/cdev.h>
 #include <linux/iopoll.h>
 #include <linux/irqreturn.h>
@ -33,6 +31,9 @@

 #define HL_PLL_LOW_JOB_FREQ_USEC	5000000 /* 5 s */

+#define HL_ARMCP_INFO_TIMEOUT_USEC	10000000 /* 10s */
+#define HL_ARMCP_EEPROM_TIMEOUT_USEC	10000000 /* 10s */
+
 #define HL_MAX_QUEUES			128

 #define HL_MAX_JOBS_PER_CS		64
@ -48,8 +49,9 @@

 /**
 * struct pgt_info - MMU hop page info.
- * @node: hash linked-list node for the pgts hash of pgts.
- * @addr: physical address of the pgt.
+ * @node: hash linked-list node for the pgts shadow hash of pgts.
+ * @phys_addr: physical address of the pgt.
+ * @shadow_addr: shadow hop in the host.
 * @ctx: pointer to the owner ctx.
 * @num_of_ptes: indicates how many ptes are used in the pgt.
 *
@ -59,10 +61,11 @@
 * page, it is freed with its pgt_info structure.
 */
 struct pgt_info {
-	struct hlist_node node;
-	u64 addr;
-	struct hl_ctx *ctx;
-	int num_of_ptes;
+	struct hlist_node	node;
+	u64			phys_addr;
+	u64			shadow_addr;
+	struct hl_ctx		*ctx;
+	int			num_of_ptes;
 };

 struct hl_device;
@ -132,8 +135,6 @@ enum hl_device_hw_state {
 * @dram_user_base_address: DRAM physical start address for user access.
 * @dram_size: DRAM total size.
 * @dram_pci_bar_size: size of PCI bar towards DRAM.
- * @host_phys_base_address: base physical address of host memory for
- *				transactions that the device generates.
 * @max_power_default: max power of the device after reset
 * @va_space_host_start_address: base address of virtual memory range for
 *                               mapping host memory.
@ -145,6 +146,8 @@ enum hl_device_hw_state {
 *                             mapping DRAM memory.
 * @dram_size_for_default_page_mapping: DRAM size needed to map to avoid page
 *                                      fault.
+ * @pcie_dbi_base_address: Base address of the PCIE_DBI block.
+ * @pcie_aux_dbi_reg_addr: Address of the PCIE_AUX DBI register.
 * @mmu_pgt_addr: base physical address in DRAM of MMU page tables.
 * @mmu_dram_default_page_addr: DRAM default page physical address.
 * @mmu_pgt_size: MMU page tables total size.
@ -179,13 +182,14 @@ struct asic_fixed_properties {
 	u64			dram_user_base_address;
 	u64			dram_size;
 	u64			dram_pci_bar_size;
-	u64			host_phys_base_address;
 	u64			max_power_default;
 	u64			va_space_host_start_address;
 	u64			va_space_host_end_address;
 	u64			va_space_dram_start_address;
 	u64			va_space_dram_end_address;
 	u64			dram_size_for_default_page_mapping;
+	u64			pcie_dbi_base_address;
+	u64			pcie_aux_dbi_reg_addr;
 	u64			mmu_pgt_addr;
 	u64			mmu_dram_default_page_addr;
 	u32			mmu_pgt_size;
@ -314,6 +318,18 @@ struct hl_cs_job;
 #define HL_EQ_LENGTH			64
 #define HL_EQ_SIZE_IN_BYTES		(HL_EQ_LENGTH * HL_EQ_ENTRY_SIZE)

+#define HL_CPU_PKT_SHIFT		5
+#define HL_CPU_PKT_SIZE			(1 << HL_CPU_PKT_SHIFT)
+#define HL_CPU_PKT_MASK			(~((1 << HL_CPU_PKT_SHIFT) - 1))
+#define HL_CPU_MAX_PKTS_IN_CB		32
+#define HL_CPU_CB_SIZE			(HL_CPU_PKT_SIZE * \
+					 HL_CPU_MAX_PKTS_IN_CB)
+#define HL_CPU_CB_QUEUE_SIZE		(HL_QUEUE_LENGTH * HL_CPU_CB_SIZE)
+
+/* KMD <-> ArmCP shared memory size (EQ + PQ + CPU CB queue) */
+#define HL_CPU_ACCESSIBLE_MEM_SIZE	(HL_EQ_SIZE_IN_BYTES + \
+					 HL_QUEUE_SIZE_IN_BYTES + \
+					 HL_CPU_CB_QUEUE_SIZE)

 /**
 * struct hl_hw_queue - describes a H/W transport queue.
@ -381,14 +397,12 @@ struct hl_eq {

 /**
 * enum hl_asic_type - supported ASIC types.
- * @ASIC_AUTO_DETECT: ASIC type will be automatically set.
- * @ASIC_GOYA: Goya device.
 * @ASIC_INVALID: Invalid ASIC type.
+ * @ASIC_GOYA: Goya device.
 */
 enum hl_asic_type {
-	ASIC_AUTO_DETECT,
-	ASIC_GOYA,
-	ASIC_INVALID
+	ASIC_INVALID,
+	ASIC_GOYA
 };

 struct hl_cs_parser;
@ -436,19 +450,19 @@ enum hl_pll_frequency {
 * @cb_mmap: maps a CB.
 * @ring_doorbell: increment PI on a given QMAN.
 * @flush_pq_write: flush PQ entry write if necessary, WARN if flushing failed.
- * @dma_alloc_coherent: Allocate coherent DMA memory by calling
- *                      dma_alloc_coherent(). This is ASIC function because its
- *                      implementation is not trivial when the driver is loaded
- *                      in simulation mode (not upstreamed).
- * @dma_free_coherent: Free coherent DMA memory by calling dma_free_coherent().
- *                     This is ASIC function because its implementation is not
- *                     trivial when the driver is loaded in simulation mode
- *                     (not upstreamed).
+ * @asic_dma_alloc_coherent: Allocate coherent DMA memory by calling
+ *                           dma_alloc_coherent(). This is ASIC function because
+ *                           its implementation is not trivial when the driver
+ *                           is loaded in simulation mode (not upstreamed).
+ * @asic_dma_free_coherent:  Free coherent DMA memory by calling
+ *                           dma_free_coherent(). This is ASIC function because
+ *                           its implementation is not trivial when the driver
+ *                           is loaded in simulation mode (not upstreamed).
 * @get_int_queue_base: get the internal queue base address.
 * @test_queues: run simple test on all queues for sanity check.
- * @dma_pool_zalloc: small DMA allocation of coherent memory from DMA pool.
- *                   size of allocation is HL_DMA_POOL_BLK_SIZE.
- * @dma_pool_free: free small DMA allocation from pool.
+ * @asic_dma_pool_zalloc: small DMA allocation of coherent memory from DMA pool.
+ *                        size of allocation is HL_DMA_POOL_BLK_SIZE.
+ * @asic_dma_pool_free: free small DMA allocation from pool.
 * @cpu_accessible_dma_pool_alloc: allocate CPU PQ packet from DMA pool.
 * @cpu_accessible_dma_pool_free: free CPU PQ packet from DMA pool.
 * @hl_dma_unmap_sg: DMA unmap scatter-gather list.
@ -472,8 +486,7 @@ enum hl_pll_frequency {
 * @mmu_invalidate_cache_range: flush specific MMU STLB cache lines with
 *                              ASID-VA-size mask.
 * @send_heartbeat: send is-alive packet to ArmCP and verify response.
- * @enable_clock_gating: enable clock gating for reducing power consumption.
- * @disable_clock_gating: disable clock for accessing registers on HBW.
+ * @debug_coresight: perform certain actions on Coresight for debugging.
 * @is_device_idle: return true if device is idle, false otherwise.
 * @soft_reset_late_init: perform certain actions needed after soft reset.
 * @hw_queues_lock: acquire H/W queues lock.
@ -482,6 +495,12 @@ enum hl_pll_frequency {
 * @get_eeprom_data: retrieve EEPROM data from F/W.
 * @send_cpu_message: send buffer to ArmCP.
 * @get_hw_state: retrieve the H/W state
+ * @pci_bars_map: Map PCI BARs.
+ * @set_dram_bar_base: Set DRAM BAR to map specific device address. Returns
+ *                     old address the bar pointed to or U64_MAX for failure
+ * @init_iatu: Initialize the iATU unit inside the PCI controller.
+ * @rreg: Read a register. Needed for simulator support.
+ * @wreg: Write a register. Needed for simulator support.
 */
 struct hl_asic_funcs {
 	int (*early_init)(struct hl_device *hdev);
@ -499,27 +518,27 @@ struct hl_asic_funcs {
 			u64 kaddress, phys_addr_t paddress, u32 size);
 	void (*ring_doorbell)(struct hl_device *hdev, u32 hw_queue_id, u32 pi);
 	void (*flush_pq_write)(struct hl_device *hdev, u64 *pq, u64 exp_val);
-	void* (*dma_alloc_coherent)(struct hl_device *hdev, size_t size,
+	void* (*asic_dma_alloc_coherent)(struct hl_device *hdev, size_t size,
 					dma_addr_t *dma_handle, gfp_t flag);
-	void (*dma_free_coherent)(struct hl_device *hdev, size_t size,
+	void (*asic_dma_free_coherent)(struct hl_device *hdev, size_t size,
 					void *cpu_addr, dma_addr_t dma_handle);
 	void* (*get_int_queue_base)(struct hl_device *hdev, u32 queue_id,
 				dma_addr_t *dma_handle, u16 *queue_len);
 	int (*test_queues)(struct hl_device *hdev);
-	void* (*dma_pool_zalloc)(struct hl_device *hdev, size_t size,
+	void* (*asic_dma_pool_zalloc)(struct hl_device *hdev, size_t size,
 				gfp_t mem_flags, dma_addr_t *dma_handle);
-	void (*dma_pool_free)(struct hl_device *hdev, void *vaddr,
+	void (*asic_dma_pool_free)(struct hl_device *hdev, void *vaddr,
 				dma_addr_t dma_addr);
 	void* (*cpu_accessible_dma_pool_alloc)(struct hl_device *hdev,
 				size_t size, dma_addr_t *dma_handle);
 	void (*cpu_accessible_dma_pool_free)(struct hl_device *hdev,
 				size_t size, void *vaddr);
 	void (*hl_dma_unmap_sg)(struct hl_device *hdev,
-				struct scatterlist *sg, int nents,
+				struct scatterlist *sgl, int nents,
 				enum dma_data_direction dir);
 	int (*cs_parser)(struct hl_device *hdev, struct hl_cs_parser *parser);
 	int (*asic_dma_map_sg)(struct hl_device *hdev,
-				struct scatterlist *sg, int nents,
+				struct scatterlist *sgl, int nents,
 				enum dma_data_direction dir);
 	u32 (*get_dma_desc_list_size)(struct hl_device *hdev,
 					struct sg_table *sgt);
@ -543,9 +562,8 @@ struct hl_asic_funcs {
 	void (*mmu_invalidate_cache_range)(struct hl_device *hdev, bool is_hard,
 			u32 asid, u64 va, u64 size);
 	int (*send_heartbeat)(struct hl_device *hdev);
-	void (*enable_clock_gating)(struct hl_device *hdev);
-	void (*disable_clock_gating)(struct hl_device *hdev);
-	bool (*is_device_idle)(struct hl_device *hdev);
+	int (*debug_coresight)(struct hl_device *hdev, void *data);
+	bool (*is_device_idle)(struct hl_device *hdev, char *buf, size_t size);
 	int (*soft_reset_late_init)(struct hl_device *hdev);
 	void (*hw_queues_lock)(struct hl_device *hdev);
 	void (*hw_queues_unlock)(struct hl_device *hdev);
@ -555,6 +573,11 @@ struct hl_asic_funcs {
 	int (*send_cpu_message)(struct hl_device *hdev, u32 *msg,
 				u16 len, u32 timeout, long *result);
 	enum hl_device_hw_state (*get_hw_state)(struct hl_device *hdev);
+	int (*pci_bars_map)(struct hl_device *hdev);
+	u64 (*set_dram_bar_base)(struct hl_device *hdev, u64 addr);
+	int (*init_iatu)(struct hl_device *hdev);
+	u32 (*rreg)(struct hl_device *hdev, u32 reg);
+	void (*wreg)(struct hl_device *hdev, u32 reg, u32 val);
 };


@ -582,7 +605,8 @@ struct hl_va_range {
 * struct hl_ctx - user/kernel context.
 * @mem_hash: holds mapping from virtual address to virtual memory area
 *		descriptor (hl_vm_phys_pg_list or hl_userptr).
- * @mmu_hash: holds a mapping from virtual address to pgt_info structure.
+ * @mmu_phys_hash: holds a mapping from physical address to pgt_info structure.
+ * @mmu_shadow_hash: holds a mapping from shadow address to pgt_info structure.
 * @hpriv: pointer to the private (KMD) data of the process (fd).
 * @hdev: pointer to the device structure.
 * @refcount: reference counter for the context. Context is released only when
@ -601,17 +625,19 @@ struct hl_va_range {
 *                     DRAM mapping.
 * @cs_lock: spinlock to protect cs_sequence.
 * @dram_phys_mem: amount of used physical DRAM memory by this context.
- * @thread_restore_token: token to prevent multiple threads of the same context
- *				from running the restore phase. Only one thread
- *				should run it.
- * @thread_restore_wait_token: token to prevent the threads that didn't run
- *				the restore phase from moving to their execution
- *				phase before the restore phase has finished.
+ * @thread_ctx_switch_token: token to prevent multiple threads of the same
+ *				context	from running the context switch phase.
+ *				Only a single thread should run it.
+ * @thread_ctx_switch_wait_token: token to prevent the threads that didn't run
+ *				the context switch phase from moving to their
+ *				execution phase before the context switch phase
+ *				has finished.
 * @asid: context's unique address space ID in the device's MMU.
 */
 struct hl_ctx {
 	DECLARE_HASHTABLE(mem_hash, MEM_HASH_TABLE_BITS);
-	DECLARE_HASHTABLE(mmu_hash, MMU_HASH_TABLE_BITS);
+	DECLARE_HASHTABLE(mmu_phys_hash, MMU_HASH_TABLE_BITS);
+	DECLARE_HASHTABLE(mmu_shadow_hash, MMU_HASH_TABLE_BITS);
 	struct hl_fpriv		*hpriv;
 	struct hl_device	*hdev;
 	struct kref		refcount;
@ -625,8 +651,8 @@ struct hl_ctx {
 	u64			*dram_default_hops;
 	spinlock_t		cs_lock;
 	atomic64_t		dram_phys_mem;
-	atomic_t		thread_restore_token;
-	u32			thread_restore_wait_token;
+	atomic_t		thread_ctx_switch_token;
+	u32			thread_ctx_switch_wait_token;
 	u32			asid;
 };

@ -753,8 +779,6 @@ struct hl_cs_job {
 * @patched_cb_size: the size of the CB after parsing.
 * @ext_queue: whether the job is for external queue or internal queue.
 * @job_id: the id of the related job inside the related CS.
- * @use_virt_addr: whether to treat the addresses in the CB as virtual during
- *			parsing.
 */
 struct hl_cs_parser {
 	struct hl_cb		*user_cb;
@ -767,7 +791,6 @@ struct hl_cs_parser {
 	u32			patched_cb_size;
 	u8			ext_queue;
 	u8			job_id;
-	u8			use_virt_addr;
 };


@ -850,6 +873,29 @@ struct hl_vm {
 	u8			init_done;
 };

+
+/*
+ * DEBUG, PROFILING STRUCTURE
+ */
+
+/**
+ * struct hl_debug_params - Coresight debug parameters.
+ * @input: pointer to component specific input parameters.
+ * @output: pointer to component specific output parameters.
+ * @output_size: size of output buffer.
+ * @reg_idx: relevant register ID.
+ * @op: component operation to execute.
+ * @enable: true if to enable component debugging, false otherwise.
+ */
+struct hl_debug_params {
+	void *input;
+	void *output;
+	u32 output_size;
+	u32 reg_idx;
+	u32 op;
+	bool enable;
+};
+
 /*
 * FILE PRIVATE STRUCTURE
 */
@ -973,13 +1019,10 @@ struct hl_dbg_device_entry {
 u32 hl_rreg(struct hl_device *hdev, u32 reg);
 void hl_wreg(struct hl_device *hdev, u32 reg, u32 val);

-#define hl_poll_timeout(hdev, addr, val, cond, sleep_us, timeout_us) \
-	readl_poll_timeout(hdev->rmmio + addr, val, cond, sleep_us, timeout_us)
-
-#define RREG32(reg) hl_rreg(hdev, (reg))
-#define WREG32(reg, v) hl_wreg(hdev, (reg), (v))
+#define RREG32(reg) hdev->asic_funcs->rreg(hdev, (reg))
+#define WREG32(reg, v) hdev->asic_funcs->wreg(hdev, (reg), (v))
 #define DREG32(reg) pr_info("REGISTER: " #reg " : 0x%08X\n",	\
-				hl_rreg(hdev, (reg)))
+			hdev->asic_funcs->rreg(hdev, (reg)))

 #define WREG32_P(reg, val, mask)				\
 	do {							\
@ -997,6 +1040,36 @@ void hl_wreg(struct hl_device *hdev, u32 reg, u32 val);
 	WREG32(mm##reg, (RREG32(mm##reg) & ~REG_FIELD_MASK(reg, field)) | \
 			(val) << REG_FIELD_SHIFT(reg, field))

+#define hl_poll_timeout(hdev, addr, val, cond, sleep_us, timeout_us) \
+({ \
+	ktime_t __timeout; \
+	/* timeout should be longer when working with simulator */ \
+	if (hdev->pdev) \
+		__timeout = ktime_add_us(ktime_get(), timeout_us); \
+	else \
+		__timeout = ktime_add_us(ktime_get(), (timeout_us * 10)); \
+	might_sleep_if(sleep_us); \
+	for (;;) { \
+		(val) = RREG32(addr); \
+		if (cond) \
+			break; \
+		if (timeout_us && ktime_compare(ktime_get(), __timeout) > 0) { \
+			(val) = RREG32(addr); \
+			break; \
+		} \
+		if (sleep_us) \
+			usleep_range((sleep_us >> 2) + 1, sleep_us); \
+	} \
+	(cond) ? 0 : -ETIMEDOUT; \
+})
+
+
+#define HL_ENG_BUSY(buf, size, fmt, ...) ({ \
+		if (buf) \
+			snprintf(buf, size, fmt, ##__VA_ARGS__); \
+		false; \
+	})
+
 struct hwmon_chip_info;

 /**
@ -1047,7 +1120,8 @@ struct hl_device_reset_work {
 * @asic_specific: ASIC specific information to use only from ASIC files.
 * @mmu_pgt_pool: pool of available MMU hops.
 * @vm: virtual memory manager for MMU.
- * @mmu_cache_lock: protects MMU cache invalidation as it can serve one context
+ * @mmu_cache_lock: protects MMU cache invalidation as it can serve one context.
+ * @mmu_shadow_hop0: shadow mapping of the MMU hop 0 zone.
 * @hwmon_dev: H/W monitor device.
 * @pm_mng_profile: current power management profile.
 * @hl_chip_info: ASIC's sensors information.
@ -1082,6 +1156,7 @@ struct hl_device_reset_work {
 * @init_done: is the initialization of the device done.
 * @mmu_enable: is MMU enabled.
 * @device_cpu_disabled: is the device CPU disabled (due to timeouts)
+ * @dma_mask: the dma mask that was set for this device
 */
 struct hl_device {
 	struct pci_dev			*pdev;
@ -1117,6 +1192,7 @@ struct hl_device {
 	struct gen_pool			*mmu_pgt_pool;
 	struct hl_vm			vm;
 	struct mutex			mmu_cache_lock;
+	void				*mmu_shadow_hop0;
 	struct device			*hwmon_dev;
 	enum hl_pm_mng_profile		pm_mng_profile;
 	struct hwmon_chip_info		*hl_chip_info;
@ -1151,6 +1227,7 @@ struct hl_device {
 	u8				dram_default_page_mapping;
 	u8				init_done;
 	u8				device_cpu_disabled;
+	u8				dma_mask;

 	/* Parameters for bring-up */
 	u8				mmu_enable;
@ -1245,6 +1322,7 @@ static inline bool hl_mem_area_crosses_range(u64 address, u32 size,

 int hl_device_open(struct inode *inode, struct file *filp);
 bool hl_device_disabled_or_in_reset(struct hl_device *hdev);
+enum hl_device_status hl_device_status(struct hl_device *hdev);
 int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
 		enum hl_asic_type asic_type, int minor);
 void destroy_hdev(struct hl_device *hdev);
@ -1351,6 +1429,32 @@ int hl_mmu_unmap(struct hl_ctx *ctx, u64 virt_addr, u32 page_size);
 void hl_mmu_swap_out(struct hl_ctx *ctx);
 void hl_mmu_swap_in(struct hl_ctx *ctx);

+int hl_fw_push_fw_to_device(struct hl_device *hdev, const char *fw_name,
+				void __iomem *dst);
+int hl_fw_send_pci_access_msg(struct hl_device *hdev, u32 opcode);
+int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
+				u16 len, u32 timeout, long *result);
+int hl_fw_test_cpu_queue(struct hl_device *hdev);
+void *hl_fw_cpu_accessible_dma_pool_alloc(struct hl_device *hdev, size_t size,
+						dma_addr_t *dma_handle);
+void hl_fw_cpu_accessible_dma_pool_free(struct hl_device *hdev, size_t size,
+					void *vaddr);
+int hl_fw_send_heartbeat(struct hl_device *hdev);
+int hl_fw_armcp_info_get(struct hl_device *hdev);
+int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size);
+
+int hl_pci_bars_map(struct hl_device *hdev, const char * const name[3],
+			bool is_wc[3]);
+int hl_pci_iatu_write(struct hl_device *hdev, u32 addr, u32 data);
+int hl_pci_set_dram_bar_base(struct hl_device *hdev, u8 inbound_region, u8 bar,
+				u64 addr);
+int hl_pci_init_iatu(struct hl_device *hdev, u64 sram_base_address,
+			u64 dram_base_address, u64 host_phys_base_address,
+			u64 host_phys_size);
+int hl_pci_init(struct hl_device *hdev, u8 dma_mask);
+void hl_pci_fini(struct hl_device *hdev);
+int hl_pci_set_dma_mask(struct hl_device *hdev, u8 dma_mask);
+
 long hl_get_frequency(struct hl_device *hdev, u32 pll_index, bool curr);
 void hl_set_frequency(struct hl_device *hdev, u32 pll_index, u64 freq);
 long hl_get_temperature(struct hl_device *hdev, int sensor_index, u32 attr);
--- a/drivers/misc/habanalabs/habanalabs_drv.c
+++ b/drivers/misc/habanalabs/habanalabs_drv.c
@ -6,6 +6,8 @@
 *
 */

+#define pr_fmt(fmt)		"habanalabs: " fmt
+
 #include "habanalabs.h"

 #include <linux/pci.h>
@ -218,7 +220,7 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
 	hdev->disabled = true;
 	hdev->pdev = pdev; /* can be NULL in case of simulator device */

-	if (asic_type == ASIC_AUTO_DETECT) {
+	if (pdev) {
 		hdev->asic_type = get_asic_type(pdev->device);
 		if (hdev->asic_type == ASIC_INVALID) {
 			dev_err(&pdev->dev, "Unsupported ASIC\n");
@ -229,6 +231,9 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
 		hdev->asic_type = asic_type;
 	}

+	/* Set default DMA mask to 32 bits */
+	hdev->dma_mask = 32;
+
 	mutex_lock(&hl_devs_idr_lock);

 	if (minor == -1) {
@ -334,7 +339,7 @@ static int hl_pci_probe(struct pci_dev *pdev,
 		 " device found [%04x:%04x] (rev %x)\n",
 		 (int)pdev->vendor, (int)pdev->device, (int)pdev->revision);

-	rc = create_hdev(&hdev, pdev, ASIC_AUTO_DETECT, -1);
+	rc = create_hdev(&hdev, pdev, ASIC_INVALID, -1);
 	if (rc)
 		return rc;

--- a/drivers/misc/habanalabs/habanalabs_ioctl.c
+++ b/drivers/misc/habanalabs/habanalabs_ioctl.c
@ -12,6 +12,32 @@
 #include <linux/uaccess.h>
 #include <linux/slab.h>

+static u32 hl_debug_struct_size[HL_DEBUG_OP_TIMESTAMP + 1] = {
+	[HL_DEBUG_OP_ETR] = sizeof(struct hl_debug_params_etr),
+	[HL_DEBUG_OP_ETF] = sizeof(struct hl_debug_params_etf),
+	[HL_DEBUG_OP_STM] = sizeof(struct hl_debug_params_stm),
+	[HL_DEBUG_OP_FUNNEL] = 0,
+	[HL_DEBUG_OP_BMON] = sizeof(struct hl_debug_params_bmon),
+	[HL_DEBUG_OP_SPMU] = sizeof(struct hl_debug_params_spmu),
+	[HL_DEBUG_OP_TIMESTAMP] = 0
+
+};
+
+static int device_status_info(struct hl_device *hdev, struct hl_info_args *args)
+{
+	struct hl_info_device_status dev_stat = {0};
+	u32 size = args->return_size;
+	void __user *out = (void __user *) (uintptr_t) args->return_pointer;
+
+	if ((!size) || (!out))
+		return -EINVAL;
+
+	dev_stat.status = hl_device_status(hdev);
+
+	return copy_to_user(out, &dev_stat,
+			min((size_t)size, sizeof(dev_stat))) ? -EFAULT : 0;
+}
+
 static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args)
 {
 	struct hl_info_hw_ip_info hw_ip = {0};
@ -93,21 +119,91 @@ static int hw_idle(struct hl_device *hdev, struct hl_info_args *args)
 	if ((!max_size) || (!out))
 		return -EINVAL;

-	hw_idle.is_idle = hdev->asic_funcs->is_device_idle(hdev);
+	hw_idle.is_idle = hdev->asic_funcs->is_device_idle(hdev, NULL, 0);

 	return copy_to_user(out, &hw_idle,
 		min((size_t) max_size, sizeof(hw_idle))) ? -EFAULT : 0;
 }

+static int debug_coresight(struct hl_device *hdev, struct hl_debug_args *args)
+{
+	struct hl_debug_params *params;
+	void *input = NULL, *output = NULL;
+	int rc;
+
+	params = kzalloc(sizeof(*params), GFP_KERNEL);
+	if (!params)
+		return -ENOMEM;
+
+	params->reg_idx = args->reg_idx;
+	params->enable = args->enable;
+	params->op = args->op;
+
+	if (args->input_ptr && args->input_size) {
+		input = memdup_user((const void __user *) args->input_ptr,
+					args->input_size);
+		if (IS_ERR(input)) {
+			rc = PTR_ERR(input);
+			input = NULL;
+			dev_err(hdev->dev,
+				"error %d when copying input debug data\n", rc);
+			goto out;
+		}
+
+		params->input = input;
+	}
+
+	if (args->output_ptr && args->output_size) {
+		output = kzalloc(args->output_size, GFP_KERNEL);
+		if (!output) {
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		params->output = output;
+		params->output_size = args->output_size;
+	}
+
+	rc = hdev->asic_funcs->debug_coresight(hdev, params);
+	if (rc) {
+		dev_err(hdev->dev,
+			"debug coresight operation failed %d\n", rc);
+		goto out;
+	}
+
+	if (output) {
+		if (copy_to_user((void __user *) (uintptr_t) args->output_ptr,
+					output,
+					args->output_size)) {
+			dev_err(hdev->dev,
+				"copy to user failed in debug ioctl\n");
+			rc = -EFAULT;
+			goto out;
+		}
+	}
+
+out:
+	kfree(params);
+	kfree(output);
+	kfree(input);
+
+	return rc;
+}
+
 static int hl_info_ioctl(struct hl_fpriv *hpriv, void *data)
 {
 	struct hl_info_args *args = data;
 	struct hl_device *hdev = hpriv->hdev;
 	int rc;

+	/* We want to return device status even if it disabled or in reset */
+	if (args->op == HL_INFO_DEVICE_STATUS)
+		return device_status_info(hdev, args);
+
 	if (hl_device_disabled_or_in_reset(hdev)) {
-		dev_err(hdev->dev,
-			"Device is disabled or in reset. Can't execute INFO IOCTL\n");
+		dev_warn_ratelimited(hdev->dev,
+			"Device is %s. Can't execute INFO IOCTL\n",
+			atomic_read(&hdev->in_reset) ? "in_reset" : "disabled");
 		return -EBUSY;
 	}

@ -137,6 +233,40 @@ static int hl_info_ioctl(struct hl_fpriv *hpriv, void *data)
 	return rc;
 }

+static int hl_debug_ioctl(struct hl_fpriv *hpriv, void *data)
+{
+	struct hl_debug_args *args = data;
+	struct hl_device *hdev = hpriv->hdev;
+	int rc = 0;
+
+	if (hl_device_disabled_or_in_reset(hdev)) {
+		dev_warn_ratelimited(hdev->dev,
+			"Device is %s. Can't execute DEBUG IOCTL\n",
+			atomic_read(&hdev->in_reset) ? "in_reset" : "disabled");
+		return -EBUSY;
+	}
+
+	switch (args->op) {
+	case HL_DEBUG_OP_ETR:
+	case HL_DEBUG_OP_ETF:
+	case HL_DEBUG_OP_STM:
+	case HL_DEBUG_OP_FUNNEL:
+	case HL_DEBUG_OP_BMON:
+	case HL_DEBUG_OP_SPMU:
+	case HL_DEBUG_OP_TIMESTAMP:
+		args->input_size =
+			min(args->input_size, hl_debug_struct_size[args->op]);
+		rc = debug_coresight(hdev, args);
+		break;
+	default:
+		dev_err(hdev->dev, "Invalid request %d\n", args->op);
+		rc = -ENOTTY;
+		break;
+	}
+
+	return rc;
+}
+
 #define HL_IOCTL_DEF(ioctl, _func) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func}

@ -145,7 +275,8 @@ static const struct hl_ioctl_desc hl_ioctls[] = {
 	HL_IOCTL_DEF(HL_IOCTL_CB, hl_cb_ioctl),
 	HL_IOCTL_DEF(HL_IOCTL_CS, hl_cs_ioctl),
 	HL_IOCTL_DEF(HL_IOCTL_WAIT_CS, hl_cs_wait_ioctl),
-	HL_IOCTL_DEF(HL_IOCTL_MEMORY, hl_mem_ioctl)
+	HL_IOCTL_DEF(HL_IOCTL_MEMORY, hl_mem_ioctl),
+	HL_IOCTL_DEF(HL_IOCTL_DEBUG, hl_debug_ioctl)
 };

 #define HL_CORE_IOCTL_COUNT	ARRAY_SIZE(hl_ioctls)
--- a/drivers/misc/habanalabs/hw_queue.c
+++ b/drivers/misc/habanalabs/hw_queue.c
@ -82,7 +82,7 @@ static void ext_queue_submit_bd(struct hl_device *hdev, struct hl_hw_queue *q,
 	bd += hl_pi_2_offset(q->pi);
 	bd->ctl = __cpu_to_le32(ctl);
 	bd->len = __cpu_to_le32(len);
-	bd->ptr = __cpu_to_le64(ptr + hdev->asic_prop.host_phys_base_address);
+	bd->ptr = __cpu_to_le64(ptr);

 	q->pi = hl_queue_inc_ptr(q->pi);
 	hdev->asic_funcs->ring_doorbell(hdev, q->hw_queue_id, q->pi);
@ -263,9 +263,7 @@ static void ext_hw_queue_schedule_job(struct hl_cs_job *job)
 	 * checked in hl_queue_sanity_checks
 	 */
 	cq = &hdev->completion_queue[q->hw_queue_id];
-	cq_addr = cq->bus_address +
-			hdev->asic_prop.host_phys_base_address;
-	cq_addr += cq->pi * sizeof(struct hl_cq_entry);
+	cq_addr = cq->bus_address + cq->pi * sizeof(struct hl_cq_entry);

 	hdev->asic_funcs->add_end_of_cb_packets(cb->kernel_address, len,
 						cq_addr,
@ -415,14 +413,20 @@ void hl_hw_queue_inc_ci_kernel(struct hl_device *hdev, u32 hw_queue_id)
 }

 static int ext_and_cpu_hw_queue_init(struct hl_device *hdev,
-					struct hl_hw_queue *q)
+				struct hl_hw_queue *q, bool is_cpu_queue)
 {
 	void *p;
 	int rc;

-	p = hdev->asic_funcs->dma_alloc_coherent(hdev,
-				HL_QUEUE_SIZE_IN_BYTES,
-				&q->bus_address, GFP_KERNEL | __GFP_ZERO);
+	if (is_cpu_queue)
+		p = hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev,
+							HL_QUEUE_SIZE_IN_BYTES,
+							&q->bus_address);
+	else
+		p = hdev->asic_funcs->asic_dma_alloc_coherent(hdev,
+						HL_QUEUE_SIZE_IN_BYTES,
+						&q->bus_address,
+						GFP_KERNEL | __GFP_ZERO);
 	if (!p)
 		return -ENOMEM;

@ -446,8 +450,15 @@ static int ext_and_cpu_hw_queue_init(struct hl_device *hdev,
 	return 0;

 free_queue:
-	hdev->asic_funcs->dma_free_coherent(hdev, HL_QUEUE_SIZE_IN_BYTES,
-			(void *) (uintptr_t) q->kernel_address, q->bus_address);
+	if (is_cpu_queue)
+		hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev,
+					HL_QUEUE_SIZE_IN_BYTES,
+					(void *) (uintptr_t) q->kernel_address);
+	else
+		hdev->asic_funcs->asic_dma_free_coherent(hdev,
+					HL_QUEUE_SIZE_IN_BYTES,
+					(void *) (uintptr_t) q->kernel_address,
+					q->bus_address);

 	return rc;
 }
@ -474,12 +485,12 @@ static int int_hw_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)

 static int cpu_hw_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)
 {
-	return ext_and_cpu_hw_queue_init(hdev, q);
+	return ext_and_cpu_hw_queue_init(hdev, q, true);
 }

 static int ext_hw_queue_init(struct hl_device *hdev, struct hl_hw_queue *q)
 {
-	return ext_and_cpu_hw_queue_init(hdev, q);
+	return ext_and_cpu_hw_queue_init(hdev, q, false);
 }

 /*
@ -569,8 +580,15 @@ static void hw_queue_fini(struct hl_device *hdev, struct hl_hw_queue *q)

 	kfree(q->shadow_queue);

-	hdev->asic_funcs->dma_free_coherent(hdev, HL_QUEUE_SIZE_IN_BYTES,
-			(void *) (uintptr_t) q->kernel_address, q->bus_address);
+	if (q->queue_type == QUEUE_TYPE_CPU)
+		hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev,
+					HL_QUEUE_SIZE_IN_BYTES,
+					(void *) (uintptr_t) q->kernel_address);
+	else
+		hdev->asic_funcs->asic_dma_free_coherent(hdev,
+					HL_QUEUE_SIZE_IN_BYTES,
+					(void *) (uintptr_t) q->kernel_address,
+					q->bus_address);
 }

 int hl_hw_queues_create(struct hl_device *hdev)
--- a/drivers/misc/habanalabs/include/armcp_if.h
+++ b/drivers/misc/habanalabs/include/armcp_if.h
@ -32,8 +32,6 @@ struct hl_eq_entry {
 #define EQ_CTL_EVENT_TYPE_SHIFT		16
 #define EQ_CTL_EVENT_TYPE_MASK		0x03FF0000

-#define EVENT_QUEUE_MSIX_IDX		5
-
 enum pq_init_status {
 	PQ_INIT_STATUS_NA = 0,
 	PQ_INIT_STATUS_READY_FOR_CP,
--- a/drivers/misc/habanalabs/include/goya/asic_reg/cpu_ca53_cfg_masks.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/cpu_ca53_cfg_masks.h
@ -188,4 +188,3 @@
 #define CPU_CA53_CFG_ARM_PMU_EVENT_MASK                              0x3FFFFFFF

 #endif /* ASIC_REG_CPU_CA53_CFG_MASKS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/cpu_ca53_cfg_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/cpu_ca53_cfg_regs.h
@ -58,4 +58,3 @@
 #define mmCPU_CA53_CFG_ARM_PMU_1                                     0x441214

 #endif /* ASIC_REG_CPU_CA53_CFG_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/cpu_if_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/cpu_if_regs.h
@ -46,4 +46,3 @@
 #define mmCPU_IF_AXI_SPLIT_INTR                                      0x442130

 #endif /* ASIC_REG_CPU_IF_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/cpu_pll_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/cpu_pll_regs.h
@ -102,4 +102,3 @@
 #define mmCPU_PLL_FREQ_CALC_EN                                       0x4A2440

 #endif /* ASIC_REG_CPU_PLL_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_0_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_0_regs.h
@ -206,4 +206,3 @@
 #define mmDMA_CH_0_MEM_INIT_BUSY                                     0x4011FC

 #endif /* ASIC_REG_DMA_CH_0_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_1_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_1_regs.h
@ -206,4 +206,3 @@
 #define mmDMA_CH_1_MEM_INIT_BUSY                                     0x4091FC

 #endif /* ASIC_REG_DMA_CH_1_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_2_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_2_regs.h
@ -206,4 +206,3 @@
 #define mmDMA_CH_2_MEM_INIT_BUSY                                     0x4111FC

 #endif /* ASIC_REG_DMA_CH_2_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_3_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_3_regs.h
@ -206,4 +206,3 @@
 #define mmDMA_CH_3_MEM_INIT_BUSY                                     0x4191FC

 #endif /* ASIC_REG_DMA_CH_3_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_4_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_ch_4_regs.h
@ -206,4 +206,3 @@
 #define mmDMA_CH_4_MEM_INIT_BUSY                                     0x4211FC

 #endif /* ASIC_REG_DMA_CH_4_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_macro_masks.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_macro_masks.h
@ -102,4 +102,3 @@
 #define DMA_MACRO_RAZWI_HBW_RD_ID_R_MASK                             0x1FFFFFFF

 #endif /* ASIC_REG_DMA_MACRO_MASKS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_macro_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_macro_regs.h
@ -178,4 +178,3 @@
 #define mmDMA_MACRO_RAZWI_HBW_RD_ID                                  0x4B0158

 #endif /* ASIC_REG_DMA_MACRO_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_nrtr_masks.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_nrtr_masks.h
@ -206,4 +206,3 @@
 #define DMA_NRTR_NON_LIN_SCRAMB_EN_MASK                              0x1

 #endif /* ASIC_REG_DMA_NRTR_MASKS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_nrtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_nrtr_regs.h
@ -224,4 +224,3 @@
 #define mmDMA_NRTR_NON_LIN_SCRAMB                                    0x1C0604

 #endif /* ASIC_REG_DMA_NRTR_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_0_masks.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_0_masks.h
@ -462,4 +462,3 @@
 #define DMA_QM_0_CQ_BUF_RDATA_VAL_MASK                               0xFFFFFFFF

 #endif /* ASIC_REG_DMA_QM_0_MASKS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_0_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_0_regs.h
@ -176,4 +176,3 @@
 #define mmDMA_QM_0_CQ_BUF_RDATA                                      0x40030C

 #endif /* ASIC_REG_DMA_QM_0_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_1_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_1_regs.h
@ -176,4 +176,3 @@
 #define mmDMA_QM_1_CQ_BUF_RDATA                                      0x40830C

 #endif /* ASIC_REG_DMA_QM_1_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_2_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_2_regs.h
@ -176,4 +176,3 @@
 #define mmDMA_QM_2_CQ_BUF_RDATA                                      0x41030C

 #endif /* ASIC_REG_DMA_QM_2_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_3_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_3_regs.h
@ -176,4 +176,3 @@
 #define mmDMA_QM_3_CQ_BUF_RDATA                                      0x41830C

 #endif /* ASIC_REG_DMA_QM_3_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_4_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/dma_qm_4_regs.h
@ -176,4 +176,3 @@
 #define mmDMA_QM_4_CQ_BUF_RDATA                                      0x42030C

 #endif /* ASIC_REG_DMA_QM_4_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/goya_masks.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/goya_masks.h
@ -189,18 +189,6 @@
 			1 << CPU_CA53_CFG_ARM_RST_CONTROL_NL2RESET_SHIFT |\
 			1 << CPU_CA53_CFG_ARM_RST_CONTROL_NMBISTRESET_SHIFT)

-/* PCI CONFIGURATION SPACE */
-#define mmPCI_CONFIG_ELBI_ADDR		0xFF0
-#define mmPCI_CONFIG_ELBI_DATA		0xFF4
-#define mmPCI_CONFIG_ELBI_CTRL		0xFF8
-#define PCI_CONFIG_ELBI_CTRL_WRITE	(1 << 31)
-
-#define mmPCI_CONFIG_ELBI_STS		0xFFC
-#define PCI_CONFIG_ELBI_STS_ERR		(1 << 30)
-#define PCI_CONFIG_ELBI_STS_DONE	(1 << 31)
-#define PCI_CONFIG_ELBI_STS_MASK	(PCI_CONFIG_ELBI_STS_ERR | \
-					PCI_CONFIG_ELBI_STS_DONE)
-
 #define GOYA_IRQ_HBW_ID_MASK			0x1FFF
 #define GOYA_IRQ_HBW_ID_SHIFT			0
 #define GOYA_IRQ_HBW_INTERNAL_ID_MASK		0xE000
--- a/drivers/misc/habanalabs/include/goya/asic_reg/goya_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/goya_regs.h
@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0
 *
- * Copyright 2016-2018 HabanaLabs, Ltd.
+ * Copyright 2016-2019 HabanaLabs, Ltd.
 * All Rights Reserved.
 *
 */
@ -12,6 +12,7 @@
 #include "stlb_regs.h"
 #include "mmu_regs.h"
 #include "pcie_aux_regs.h"
+#include "pcie_wrap_regs.h"
 #include "psoc_global_conf_regs.h"
 #include "psoc_spi_regs.h"
 #include "psoc_mme_pll_regs.h"
--- a/drivers/misc/habanalabs/include/goya/asic_reg/ic_pll_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/ic_pll_regs.h
@ -102,4 +102,3 @@
 #define mmIC_PLL_FREQ_CALC_EN                                        0x4A3440

 #endif /* ASIC_REG_IC_PLL_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mc_pll_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mc_pll_regs.h
@ -102,4 +102,3 @@
 #define mmMC_PLL_FREQ_CALC_EN                                        0x4A1440

 #endif /* ASIC_REG_MC_PLL_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme1_rtr_masks.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme1_rtr_masks.h
@ -650,4 +650,3 @@
 #define MME1_RTR_NON_LIN_SCRAMB_EN_MASK                              0x1

 #endif /* ASIC_REG_MME1_RTR_MASKS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme1_rtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme1_rtr_regs.h
@ -328,4 +328,3 @@
 #define mmMME1_RTR_NON_LIN_SCRAMB                                    0x40604

 #endif /* ASIC_REG_MME1_RTR_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme2_rtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme2_rtr_regs.h
@ -328,4 +328,3 @@
 #define mmMME2_RTR_NON_LIN_SCRAMB                                    0x80604

 #endif /* ASIC_REG_MME2_RTR_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme3_rtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme3_rtr_regs.h
@ -328,4 +328,3 @@
 #define mmMME3_RTR_NON_LIN_SCRAMB                                    0xC0604

 #endif /* ASIC_REG_MME3_RTR_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme4_rtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme4_rtr_regs.h
@ -328,4 +328,3 @@
 #define mmMME4_RTR_NON_LIN_SCRAMB                                    0x100604

 #endif /* ASIC_REG_MME4_RTR_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme5_rtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme5_rtr_regs.h
@ -328,4 +328,3 @@
 #define mmMME5_RTR_NON_LIN_SCRAMB                                    0x140604

 #endif /* ASIC_REG_MME5_RTR_REGS_H_ */
-
--- a/drivers/misc/habanalabs/include/goya/asic_reg/mme6_rtr_regs.h
+++ b/drivers/misc/habanalabs/include/goya/asic_reg/mme6_rtr_regs.h
@ -328,4 +328,3 @@
 #define mmMME6_RTR_NON_LIN_SCRAMB                                    0x180604

 #endif /* ASIC_REG_MME6_RTR_REGS_H_ */
-
--- a/Show More
+++ b/Show More